5arXiv cs.LG (Machine Learning)·2d ago

AdaJEPA: Adaptive Latent World Model with Test-Time Self-Supervised Recalibration

AdaJEPA is a latent world model that performs test-time adaptation within model predictive control (MPC) loops, addressing the common failure mode of frozen world models under distribution shift. After training, the model uses observed state transitions as self-supervised adaptation signals to continuously recalibrate predictions during planning, requiring as few as one gradient step per replanning step. Evaluated on goal-reaching tasks, AdaJEPA substantially improves planning success rates compared to static world models.

Evaluation and Benchmarking Agent and Tool Ecosystem AdaJEPA AdaJEPA: An Adaptive Latent World Model

Related guides (2)

Evaluation and BenchmarkingTopic guide

AI Evaluation and Benchmarking: From Leaderboards to the Limits of Measurement

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Related events (8)

6arXiv · cs.CL·3d ago·source ↗

WorldEvolver: Self-Evolving World Models for LLM Agent Planning via Test-Time Memory Revision

Researchers introduce WorldEvolver, a framework that equips LLM agents with self-improving world models that revise their context at deployment time without updating model parameters. The system combines episodic memory (retrieval-based simulation), semantic memory (heuristic rule extraction from prediction errors), and selective foresight (confidence-based filtering). Evaluated on ALFWorld and ScienceWorld benchmarks, WorldEvolver achieves state-of-the-art world model prediction accuracy and improved downstream agent success rates across three backbone models. The work addresses a key challenge in long-horizon agent planning: unreliable foresight that can degrade rather than improve decision-making.

Evaluation and Benchmarking Agent and Tool Ecosystem ALFWorld AgentBoard Word2World +2 more

6Berkeley Ai Research (Bair) Blog·1mo ago·source ↗

GRASP: Gradient-based Planning for World Models at Longer Horizons

Researchers from Berkeley, Meta, and collaborators introduce GRASP, a gradient-based planner designed to make long-horizon planning with learned world models more robust. The method addresses three core failure modes: ill-conditioned computation graphs from backpropagation through time, non-greedy loss landscapes with many local minima, and brittle gradients through high-dimensional vision models. GRASP lifts trajectory optimization into virtual states for parallel optimization across time, injects stochasticity into state iterates for exploration, and reshapes gradients to avoid problematic state-input gradient paths. The work is positioned in the context of scaling world models toward general-purpose simulators usable for control and planning.

Long Context Evolution Frontier Model Releases Mike Rabbat backpropagation through time Meta AI +7 more

5arXiv · cs.CL·4d ago·source ↗

Paper argues LLMs are a degenerate special case of world models, maps continuous spectrum from NTP to JEPA

A new arXiv preprint reframes the LLM-vs-world-model debate by arguing that LLMs are a degenerate special case of world models rather than a fundamentally different paradigm, with the state space being token sequences and the only action being token appending. The paper maps a continuous spectrum from next-token prediction through multi-token prediction, future-summary prediction, and next-latent prediction up to JEPA-style architectures. It identifies two open research challenges in moving along this spectrum: the data cliff from self-supervised text to action-labeled environments, and whether transformers generalize to continuous-state prediction or require a new architectural primitive. The work directly engages with Yann LeCun's 2022 argument that general intelligence requires abandoning autoregressive prediction.

From Tokens to States: LLMs as a Special Case of World Models and the Continuous Path Beyond Yann LeCun JEPA

6arXiv · cs.AI·16d ago·source ↗

Looped World Models introduce iterative latent depth as a new scaling axis for world simulation

A new arXiv preprint introduces Looped World Models (LoopWM), a parameter-shared transformer architecture that iteratively refines latent environment states to achieve up to 100x parameter efficiency over conventional world models. The approach uses adaptive computation to scale depth dynamically per prediction step, addressing the tension between long-horizon simulation fidelity and deployment cost. The authors position iterative latent depth as a new scaling axis orthogonal to model size and training data.

Training Infrastructure Frontier Model Releases Looped World Models LoopWM +2 more

5arXiv · cs.LG·1mo ago·source ↗

ProtoAda: Prototype-Guided Adaptive Adapter Expansion for Multimodal Continual Instruction Tuning

ProtoAda is a new framework for Multimodal Continual Instruction Tuning (MCIT) that addresses a key failure mode in sparse Mixture-of-LoRA-Experts architectures: image-text similarity routing is format-blind and incorrectly merges tasks with similar semantics but different output structures (e.g., coordinate prediction vs. VQA). The method introduces format-aware task prototypes to guide both routing and adapter expansion, then consolidates compatible updates geometrically to reuse and refine existing parameters. Experiments across multiple benchmarks show improved performance, particularly on tasks whose answer formats are vulnerable to corruption by sequential fine-tuning.

Agent and Tool Ecosystem Alignment and RLHF Multimodal Large Language Models ProtoAda LoRA +4 more

5arXiv · cs.CL·10d ago·source ↗

Adaptive Data Scheduling (ADS) improves LLM reinforcement learning post-training by 5.2% over GRPO

Researchers propose Adaptive Data Scheduling (ADS), a dual-level framework that replaces uniform sampling in RL post-training with adaptive distribution over semantic clusters and policy-boundary sample selection. Evaluated across three LLMs and seven reasoning benchmarks, ADS improves average accuracy by 5.2% over GRPO and generalizes across RL objectives. The method addresses a structural limitation in standard RL post-training pipelines by accounting for semantic data structure and evolving policy capability during training.

Evaluation and Benchmarking Alignment and RLHF Adaptive Data Scheduling GRPO (Group Relative Policy Optimization)

5arXiv · cs.AI·10d ago·source ↗

RECALL: Active continual learning for Vision-Language-Action models via uncertainty-guided recovery data collection

Researchers propose RECALL, an active continual learning paradigm for Vision-Language-Action (VLA) robot models that uses uncertainty-guided data collection to target states where the policy struggles, rather than passively collecting demonstrations after failures. The paper demonstrates improved fine-tuning efficiency over passive imitation learning but identifies catastrophic forgetting as a key challenge when incorporating recovery data. The authors evaluate continual learning mitigations including replay-based data mixing and elastic weight consolidation, characterizing tradeoffs between plasticity and retention in large autoregressive robot policies.

Agent and Tool Ecosystem Elastic Weight Consolidation RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models

5arXiv · cs.CL·18d ago·source ↗

RePro: Retrospective Progress-Aware Self-Refinement for LLM Agent Training

Researchers introduce RePro (Retrospective Progress-Aware Training), a framework addressing the gap between step-wise RL optimization and metacognitive task-progress awareness in LLM agents. The approach uses a forward-then-reflect rollout paradigm where agents execute actions online and then retrospectively assess step-wise progress given the completed trajectory and known outcome. Evaluated on WebShop, ALFWorld, and Sokoban, RePro achieves up to 12% absolute success rate gains over baseline Qwen-family models without requiring continuous external supervision.

Agent and Tool Ecosystem Alignment and RLHF ALFWorld Sokoban RePro +2 more