6arXiv cs.CL (Computation and Language)·1mo ago

CoTrace: A Goal-Level Attribution Framework for Measuring AI Contributions in Human-AI Collaboration

Researchers introduce CoTrace, a framework that decomposes explicit goals into verifiable requirements and traces both direct and indirect AI contributions across dialogue turns in human-AI collaboration. Applied to 638 real-world collaboration logs, the study finds LLMs account for 11-26% of goal-shaping contribution, with disproportionate influence on lower-level concrete requirements. A user study shows that exposing participants to goal-level attribution analyses shifts their perceived AI contribution by nearly 2 points on a 5-point scale, revealing systematic miscalibration in how users understand AI-assisted work. The work has implications for reliance calibration, AI-assisted work evaluation, and interaction design.

Evaluation and Benchmarking AI Safety Research Enterprise Deployment Patterns Agent and Tool Ecosystem large language models goal-level attribution framework CoTrace

Related guides (4)

AI Safety ResearchTopic guide

AI Safety Research: From Lab Policies to Real-World Flashpoints

Read asBeginner In-depth

Enterprise Deployment PatternsTopic guide

Enterprise Deployment Patterns: From LLM Demo to Production Reality

Read asIn-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

5arXiv · cs.AI·5d ago·source ↗

Taxonomy and governance gap analysis for AI contributors in open-source software

A preprint from arXiv analyzes how open-source organizations are handling AI-generated and agent-driven contributions, comparing policies across six major projects (SymPy, LLVM, matplotlib, OpenInfra, Apache Software Foundation, Linux Foundation). The authors develop a six-dimensional taxonomy covering disclosure, responsibility, human oversight, licensing, enforcement, and maintainer workload, and score each organization's policy maturity. The paper maps documented agent incidents onto governance gaps and identifies misalignments with emerging regulatory frameworks including the EU AI Act, NIST AI RMF, and ISO/IEC 42001, proposing a harmonized tiered framework.

AI Safety Research Regulatory Developments LLVM Linux Foundation NIST AI RMF +6 more

6arXiv · cs.AI·25d ago·source ↗

VeriTrace: Cognitive-Graph Framework with Explicit Regulatory Loops for Deep Research Agents

VeriTrace introduces a cognitive-graph framework for deep research agents that replaces implicit LLM reasoning over intermediate representations with three explicit regulatory loops: interpretive update, deviation feedback, and schema revision. The system addresses contamination and error propagation in evolving mental models during complex multi-step research tasks. Using Qwen3.5-27B backbones, VeriTrace improves over the strongest matched baseline by 4.22 pp on DeepResearch Bench Insight and 5.9 pp Overall win rate on DeepConsult. With Config-DeepSeek, it achieves the strongest reproducible open-source result on DeepResearch Bench.

Frontier Model Releases Evaluation and Benchmarking DeepSeek V4 cognitive-graph DeepResearch Bench +4 more

7Google Deepmind Blog·1mo ago·source ↗

Measuring progress toward AGI: A cognitive framework

DeepMind is introducing a cognitive framework designed to measure progress toward AGI, providing structured criteria for assessing how close AI systems are to general intelligence. Alongside the framework, they are launching a Kaggle hackathon to crowdsource the development of relevant evaluations. The announcement signals a formal effort by a Tier 1 lab to operationalize AGI progress measurement, which has historically been contested and informal.

Frontier Model Releases Evaluation and Benchmarking Kaggle DeepMind AGI cognitive framework +1 more

7The Batch·1mo ago·source ↗

Anthropic Alignment Breakthrough, OpenAI Audio Models, DCI Retrieval, and NLA Interpretability

This digest covers four substantive AI developments: Anthropic's research showing that training Claude on ethical reasoning (rather than just aligned actions) reduced agentic misalignment from 22% to 3%, with every Claude model from Haiku 4.5 onward scoring perfectly on misalignment evals. OpenAI launched three new audio models (GPT-Realtime-2, GPT-Realtime-Translate, GPT-Realtime-Whisper) with expanded context windows and multilingual capabilities. Researchers proposed Direct Corpus Interaction (DCI), a retrieval method using command-line tools instead of vector indexes that outperforms RAG baselines by 11-30% across 13 benchmarks. Anthropic also introduced Natural Language Autoencoders (NLAs) for interpretability, revealing Claude shows evaluation awareness more often than it discloses.

Frontier Model Releases Evaluation and Benchmarking Claude Opus 4.6 GPT-Realtime-2 Claude +14 more

5Openai Blog·1mo ago·source ↗

Improving Verifiability in AI Development: Multi-Stakeholder Report

OpenAI contributed to a multi-stakeholder report co-authored by 58 researchers across 30 organizations, including Mila, CSET, and the Schwartz Reisman Institute. The report identifies 10 mechanisms for improving the verifiability of claims about AI systems. These tools are intended to help developers demonstrate safety, security, fairness, and privacy properties, while enabling policymakers and civil society to evaluate AI development processes.

Evaluation and Benchmarking AI Safety Research Centre for the Future of Intelligence Center for Security and Emerging Technology Mila +4 more

6arXiv · cs.AI·18d ago·source ↗

Tracking Behavioral Trajectories of Adapting Agents via Trait Vectors in Embedding Space

This paper introduces a methodology for measuring behavioral traits of AI agents by defining traits as directions in the embedding space of a text embedding model, trained on labeled diffs of agent skill/memory/configuration files. A linear model achieves 91.2% sign classification accuracy and Spearman ρ=0.82 on detecting propensity to seek sensitive data across 68 labeled skill diff pairs. The framework extends to an agent-to-agent evaluation protocol where one agent can assess another's skill file updates through a trusted intermediary, enabling ongoing behavioral monitoring of self-modifying agents.

Evaluation and Benchmarking AI Safety Research agent-to-agent evaluation protocol skill file diff trait vector +3 more

6arXiv · cs.CL·1mo ago·source ↗

Probe Trajectories Reveal Reasoning Dynamics in Large Reasoning Models

This paper investigates whether hidden representations of Large Reasoning Models (LRMs) can predict future model behavior by analyzing probe trajectories—the continuous evolution of concept probabilities across Chain-of-Thought reasoning tokens. The authors find that temporal trajectory features (volatility, trend, steady-state) significantly outperform single static probes, with max-pooling achieving up to 95% AUROC across safety and mathematics domains. Two methodological insights are offered: template-based training data matches dynamically generated responses in quality, and pooling strategy is critical to probe performance. The work positions probe trajectories as a complementary safety monitoring framework for LRMs where CoT faithfulness cannot be assumed.

Frontier Model Releases Evaluation and Benchmarking Max-Pooling Chain-of-Thought Reasoning Probe Trajectories +4 more

4arXiv · cs.AI·11d ago·source ↗

Coalgebraic provenance tracking for AI compiler graph transformations

A preprint from arXiv introduces a lightweight provenance tracking approach for AI compilers that uses observational semantics and coalgebraic formalism rather than propagating identifiers through compiler passes. The method uses bisimulation to preserve provenance even when intermediate nodes are eliminated during normalization, lowering, and optimization. The authors implement the approach in a prototype compiler called COVAN, demonstrating stable provenance across compilation pipelines. Reliable provenance tracking is important for debugging, validating transformations, and attaching platform-specific postprocessing in production AI compiler stacks.

Training Infrastructure Provenance Tracking in AI Compilers through the Lens of Coalgebra COVAN