Coalgebraic provenance tracking for AI compiler graph transformations
A preprint from arXiv introduces a lightweight provenance tracking approach for AI compilers that uses observational semantics and coalgebraic formalism rather than propagating identifiers through compiler passes. The method uses bisimulation to preserve provenance even when intermediate nodes are eliminated during normalization, lowering, and optimization. The authors implement the approach in a prototype compiler called COVAN, demonstrating stable provenance across compilation pipelines. Reliable provenance tracking is important for debugging, validating transformations, and attaching platform-specific postprocessing in production AI compiler stacks.
Related guides (1)
Related events (8)
ProvenanceGuard: Source-aware factuality verification for MCP-based LLM agents
Researchers introduce ProvenanceGuard, a verifier that checks factual claims in MCP-grounded LLM agent answers against their specific source provenance rather than pooled evidence. The system decomposes answers into atomic claims, routes each to its attributed source via MCP trace metadata, and applies NLI plus token-alignment checks to detect 'cross-source conflation' — where a claim is supported somewhere but attributed to the wrong source. Evaluated on 281 medical-domain MCP-agent traces, it achieves block F1 of 0.802 and source accuracy of 0.858 on held-out data, and detects all injected attribution swaps in 50 controlled clinical probes. The work establishes source attribution as an independent factuality axis distinct from standard grounding checks.
CoTrace: A Goal-Level Attribution Framework for Measuring AI Contributions in Human-AI Collaboration
Researchers introduce CoTrace, a framework that decomposes explicit goals into verifiable requirements and traces both direct and indirect AI contributions across dialogue turns in human-AI collaboration. Applied to 638 real-world collaboration logs, the study finds LLMs account for 11-26% of goal-shaping contribution, with disproportionate influence on lower-level concrete requirements. A user study shows that exposing participants to goal-level attribution analyses shifts their perceived AI contribution by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand AI-assisted work. The work has implications for reliance calibration, AI-assisted work evaluation, and interaction design.
OpenAI Advances Content Provenance with Content Credentials, SynthID, and Verification Tool
OpenAI is expanding its AI content provenance infrastructure by adopting Content Credentials (a C2PA standard) and integrating with Google's SynthID watermarking system. The initiative includes a new verification tool to help users identify and authenticate AI-generated media. This represents a cross-industry alignment on provenance standards aimed at improving transparency and trust in AI-generated content.
AI-Assisted Theorem Proving in Lean 4: Aristotle API Case Study on IMO 2009 Problem 6
This paper presents a case study of using the Aristotle API for AI-assisted formal theorem proving in Lean 4, targeting the Grasshopper problem (IMO 2009 Problem 6). The generated artifact verifies four helper lemmas but leaves the main theorem unresolved via a 'sorry' placeholder, exposing a key limitation: local proof search can succeed while global combinatorial bookkeeping remains unsolved. The study provides a reproducible Lean artifact and precise analysis distinguishing verified from unverified proof content, offering a concrete benchmark for evaluating AI formalization capabilities.
OpenAI Introduces Content Provenance Technology and Joins C2PA Steering Committee
OpenAI is launching new technology to help researchers identify AI-generated content from its tools, including watermarking or metadata-based provenance signals. The company is also joining the Coalition for Content Provenance and Authenticity (C2PA) Steering Committee to help shape industry standards for content authentication. This move positions OpenAI as an active participant in cross-industry efforts to address AI-generated media attribution and authenticity.
Agentic Proving for Program Verification: Claude Code Achieves 98.1% on CLEVER Benchmark
Researchers evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation, achieving 98.1% end-to-end success on program generation and verification over self-consistent entries. The system generates valid specifications for 98.8% of problems and certifies implementations against ground-truth specifications for 87.5% of problems. The results reveal a growing mismatch between existing program verification benchmark difficulty and modern agentic prover capabilities, motivating calls for more rigorous evaluation methodologies. The findings support compiler-in-the-loop agentic paradigms as the current state-of-the-art for foundational program verification.
Goedel-Architect achieves state-of-the-art formal theorem proving with blueprint-based agentic framework
Goedel-Architect is an agentic framework for formal theorem proving in Lean 4 that uses blueprint generation — a dependency graph of definitions and lemmas — rather than recursive decomposition, enabling parallel lemma closure and global refinement. Built on DeepSeek-V4-Flash (284B-A13B), it achieves 99.2% pass@1 on MiniF2F-test and 75.6% on PutnamBench, scaling to 100% on MiniF2F, 88.8% on PutnamBench, and 4/6 on IMO 2025 when seeded with natural-language proofs. The authors claim state-of-the-art performance for an open-source pipeline at up to 500x lower cost than comparable systems.
Distributionally robust optimization framework for probabilistic runtime verification of AI agents
A new arXiv preprint introduces a sound and efficient framework for verifying probabilistic security policies for AI agents operating in complex digital environments, addressing limitations of prior Datalog-based approaches that assumed deterministic policies or predicate independence. The method uses distributionally robust optimization to compute sound upper bounds on policy violation probability without requiring independence assumptions between predicates. Evaluated on benchmarks for terminal and tool-calling agents, the approach outperforms prior art on the security-utility trade-off.
