5arXiv cs.LG (Machine Learning)·9d ago

ATLAS: Active learning framework for automated discovery of interpretable behavioral models in cognitive science

ATLAS (Active Theory Learning for Automated Science) is a new active learning framework that iterates between generating mechanistic hypotheses as sparse neural network ensembles and designing maximally informative experiments to distinguish between them. The system is tested on recovering reinforcement learning agents from behavioral data in bandit tasks, achieving 5-10x sample efficiency improvements over random experimentation and matching expert-designed experiments from the literature. The work targets automated scientific discovery in cognitive science, with potential generalization to other domains requiring mechanistic modeling.

Evaluation and Benchmarking ATLAS: Active Theory Learning for Automated Science Disentangled RNNs Atlas

Related guides (1)

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6arXiv · cs.CL·1mo ago·source ↗

ATLAS: Unified Agentic and Latent Visual Reasoning via Functional Tokens

ATLAS proposes a framework where a single discrete 'functional token' serves dual roles as both an agentic operation trigger and a latent visual reasoning unit in multimodal models. This design avoids the computational cost of generating intermediate images while sidestepping the context-switching latency of external tool calls and the generalization limitations of pure latent methods. The framework is compatible with standard SFT and RL training pipelines without architectural changes, and introduces Latent-Anchored GRPO (LA-GRPO) to stabilize reinforcement learning when functional tokens are sparse. Experiments show strong performance on visual reasoning benchmarks with maintained interpretability.

Evaluation and Benchmarking Agent and Tool Ecosystem functional token GRPO Latent-Anchored GRPO +4 more

5Openai Blog·1mo ago·source ↗

Introducing Activation Atlases

OpenAI and Google researchers jointly developed activation atlases, a new neural network interpretability technique that visualizes what interactions between neurons represent. The method aims to improve understanding of internal decision-making processes in AI systems. This work is positioned as a tool for identifying weaknesses and investigating failures in deployed AI systems.

Evaluation and Benchmarking AI Safety Research Google Activation Atlases OpenAI

6arXiv · cs.CL·25d ago·source ↗

CausaLab: Scalable Benchmark for Interactive Causal Discovery by LLM Agents

CausaLab is a new evaluation environment that tests LLM agents on interactive causal discovery tasks, requiring them to recover both causal graphs and structural equations from synthetic laboratory episodes governed by randomly sampled structural causal models (SCMs). The benchmark separates predictive accuracy from genuine causal understanding, revealing a persistent gap: GPT-5.2-high achieves 92% task accuracy in a 6-node observational setting but only 0.471 all-edge F1 for mechanism recovery. Mixed observation-intervention strategies improve structural fidelity, while pure intervention strategies underperform on both metrics. Premature stopping is identified as a key agent weakness, partially mitigated by prompting models to verify hypothesis-data consistency.

Evaluation and Benchmarking AI Safety Research all-edge F1 GPT-5.2-high causal discovery +3 more

6arXiv · cs.CL·1mo ago·source ↗

STT-Arena: Benchmark for Adaptive Replanning Under Spatio-Temporal Dynamics in Tool-Using LLMs

STT-Arena is a new benchmark of 227 interactive tasks designed to evaluate LLMs' ability to detect mid-task disruptions and replan under spatio-temporal dynamics, covering nine conflict types and four solvability levels. Evaluation of frontier models including Claude-4.6-Opus shows less than 40% overall accuracy, revealing fundamental limitations in dynamic reasoning. The authors identify three recurring failure modes—Stale-State Execution, Misdiagnosis of Dynamic Triggers, and Missing Post-Adaptation Verification—and propose an iterative trajectory refinement technique combined with online RL to train STT-Agent-4B, a 4B-parameter model that outperforms frontier LLMs on the benchmark.

Evaluation and Benchmarking Agent and Tool Ecosystem Claude Opus 4.6 iterative trajectory refinement spatio-temporal dynamic reasoning +5 more

5arXiv · cs.CL·17d ago·source ↗

ACTS: Agentic Chain-of-Thought Steering for efficient and controllable LLM reasoning

Researchers introduce Agentic Chain-of-Thought Steering (ACTS), a framework that formulates inference-time reasoning control as a Markov decision process, where a controller agent adaptively steers a frozen reasoner by issuing reasoning strategy directives and steering phrases at each step. The controller is initialized from synthetic steering trajectories with multi-budget augmentation and further optimized via reinforcement learning with budget-conditioned reward shaping. ACTS matches full-thinking performance with significant token savings and enables controllable accuracy-efficiency trade-offs across multiple benchmarks and reasoner models.

Inference Economics Agent and Tool Ecosystem ACTS Agentic Chain-of-Thought Steering Agentic Chain-of-Thought Steering for Efficient and Controllable LLM Reasoning

3Openai Blog·1mo ago·source ↗

Interpretable Machine Learning Through Teaching

OpenAI published a method in 2018 that trains AI systems to teach each other using examples that are also interpretable to humans. The approach automatically selects maximally informative examples to convey a concept, such as representative images for a category like 'dogs'. Experiments showed the method effective at teaching both AI systems and humans, bridging machine learning interpretability with pedagogical example selection.

AI Safety Research machine teaching interpretable machine learning OpenAI

5arXiv · cs.CL·15d ago·source ↗

ALMANAC dataset provides action-level mental model annotations for studying human-agent collaboration

Researchers introduce ALMANAC, a dataset of 2,987 collaboration actions drawn from the Map Task dyadic routing paradigm, each annotated with theory-informed mental model labels covering self-reasoning, perceived partner intent, and perceived team goal. The dataset targets a gap in LLM agent training data: current agents are optimized for task completion but lack process-level collaborative competence grounded in mental model alignment. Six LLMs are benchmarked on predicting human next-turn behavior and mental model states. The work provides a resource for evaluating and potentially training agents toward more human-like collaborative reasoning.

Evaluation and Benchmarking Agent and Tool Ecosystem Map Task ALMANAC

6arXiv · cs.CL·8d ago·source ↗

LabVLA: Vision-Language-Action model and RoboGenesis data engine for scientific laboratory robotics

Researchers introduce LabVLA, a Vision-Language-Action model designed to bridge written scientific protocols and physical robot execution in laboratory settings. To address the data scarcity problem, they build RoboGenesis, a simulation-based data engine that composes lab workflows from atomic skills and generates structured demonstrations across robot embodiments. LabVLA uses a two-stage training recipe combining FAST action token pretraining on a Qwen3-VL-4B-Instruct backbone with flow matching posttraining via a DiT action expert. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among evaluated baselines in both in-distribution and out-of-distribution settings.

Agent and Tool Ecosystem Multimodal Progress LabVLA LabUtopia Qwen3-4B-Instruct +3 more