ATLAS: Active learning framework for automated discovery of interpretable behavioral models in cognitive science
ATLAS (Active Theory Learning for Automated Science) is a new active learning framework that iterates between generating mechanistic hypotheses as sparse neural network ensembles and designing maximally informative experiments to distinguish between them. The system is tested on recovering reinforcement learning agents from behavioral data in bandit tasks, achieving 5-10x sample efficiency improvements over random experimentation and matching expert-designed experiments from the literature. The work targets automated scientific discovery in cognitive science, with potential generalization to other domains requiring mechanistic modeling.
Related guides (1)
Related events (8)
ATLAS: Unified Agentic and Latent Visual Reasoning via Functional Tokens
ATLAS proposes a framework where a single discrete 'functional token' serves dual roles as both an agentic operation trigger and a latent visual reasoning unit in multimodal models. This design avoids the computational cost of generating intermediate images while sidestepping the context-switching latency of external tool calls and the generalization limitations of pure latent methods. The framework is compatible with standard SFT and RL training pipelines without architectural changes, and introduces Latent-Anchored GRPO (LA-GRPO) to stabilize reinforcement learning when functional tokens are sparse. Experiments show strong performance on visual reasoning benchmarks with maintained interpretability.
Introducing Activation Atlases
OpenAI and Google researchers jointly developed activation atlases, a new neural network interpretability technique that visualizes what interactions between neurons represent. The method aims to improve understanding of internal decision-making processes in AI systems. This work is positioned as a tool for identifying weaknesses and investigating failures in deployed AI systems.
CausaLab: Scalable Benchmark for Interactive Causal Discovery by LLM Agents
CausaLab is a new evaluation environment that tests LLM agents on interactive causal discovery tasks, requiring them to recover both causal graphs and structural equations from synthetic laboratory episodes governed by randomly sampled structural causal models (SCMs). The benchmark separates predictive accuracy from genuine causal understanding, revealing a persistent gap: GPT-5.2-high achieves 92% task accuracy in a 6-node observational setting but only 0.471 all-edge F1 for mechanism recovery. Mixed observation-intervention strategies improve structural fidelity, while pure intervention strategies underperform on both metrics. Premature stopping is identified as a key agent weakness, partially mitigated by prompting models to verify hypothesis-data consistency.
STT-Arena: Benchmark for Adaptive Replanning Under Spatio-Temporal Dynamics in Tool-Using LLMs
STT-Arena is a new benchmark of 227 interactive tasks designed to evaluate LLMs' ability to detect mid-task disruptions and replan under spatio-temporal dynamics, covering nine conflict types and four solvability levels. Evaluation of frontier models including Claude-4.6-Opus shows less than 40% overall accuracy, revealing fundamental limitations in dynamic reasoning. The authors identify three recurring failure modes—Stale-State Execution, Misdiagnosis of Dynamic Triggers, and Missing Post-Adaptation Verification—and propose an iterative trajectory refinement technique combined with online RL to train STT-Agent-4B, a 4B-parameter model that outperforms frontier LLMs on the benchmark.
ACTS: Agentic Chain-of-Thought Steering for efficient and controllable LLM reasoning
Researchers introduce Agentic Chain-of-Thought Steering (ACTS), a framework that formulates inference-time reasoning control as a Markov decision process, where a controller agent adaptively steers a frozen reasoner by issuing reasoning strategy directives and steering phrases at each step. The controller is initialized from synthetic steering trajectories with multi-budget augmentation and further optimized via reinforcement learning with budget-conditioned reward shaping. ACTS matches full-thinking performance with significant token savings and enables controllable accuracy-efficiency trade-offs across multiple benchmarks and reasoner models.
Interpretable Machine Learning Through Teaching
OpenAI published a method in 2018 that trains AI systems to teach each other using examples that are also interpretable to humans. The approach automatically selects maximally informative examples to convey a concept, such as representative images for a category like 'dogs'. Experiments showed the method effective at teaching both AI systems and humans, bridging machine learning interpretability with pedagogical example selection.
ALMANAC dataset provides action-level mental model annotations for studying human-agent collaboration
Researchers introduce ALMANAC, a dataset of 2,987 collaboration actions drawn from the Map Task dyadic routing paradigm, each annotated with theory-informed mental model labels covering self-reasoning, perceived partner intent, and perceived team goal. The dataset targets a gap in LLM agent training data: current agents are optimized for task completion but lack process-level collaborative competence grounded in mental model alignment. Six LLMs are benchmarked on predicting human next-turn behavior and mental model states. The work provides a resource for evaluating and potentially training agents toward more human-like collaborative reasoning.
LabVLA: Vision-Language-Action model and RoboGenesis data engine for scientific laboratory robotics
Researchers introduce LabVLA, a Vision-Language-Action model designed to bridge written scientific protocols and physical robot execution in laboratory settings. To address the data scarcity problem, they build RoboGenesis, a simulation-based data engine that composes lab workflows from atomic skills and generates structured demonstrations across robot embodiments. LabVLA uses a two-stage training recipe combining FAST action token pretraining on a Qwen3-VL-4B-Instruct backbone with flow matching posttraining via a DiT action expert. On the LabUtopia benchmark, LabVLA achieves the highest average success rate among evaluated baselines in both in-distribution and out-of-distribution settings.
