Entity · benchmark

LIBERO

benchmarkactivelibero-24b99650·5 events·first seen May 18, 2026

Aliases: LIBERO

Co-occurring entities

More like this (12)

Libero-long PRONTO RL² Merlini Verdi LibreChat LIME NLLB LEX-EC LAVE LexNeo-Bench RLVR

Recent events (5)

5arXiv · cs.AI·2d ago·source ↗

DLAM: Distributional latent action model with temporal constraints for robot policy learning

DLAM introduces a distributional latent-action model that represents robot action transitions as diagonal Gaussians, enabling structured extraction of action priors from action-free video data. The approach uses normalized composition and reversal over equal-gap triplets to constrain both mean and variance, addressing error propagation in recursive composition that affects prior deterministic methods. A flow-matching policy jointly generates mean transition sequences and robot actions, and the method outperforms latent-action baselines on MetaWorld MT50, LIBERO, and real-world manipulation tasks under a controlled π₀ transfer protocol.

Agent and Tool Ecosystem Multimodal Progress DLAM LIBERO π₀.₅ +1 more

5arXiv · cs.LG·Jun 10, 2026·source ↗

TREAD: VLM-based re-labelling framework improves robot policy generalization via dataset augmentation

TREAD (Task Robustness via Re-Labelling Vision-Action Robot Data) is a scalable framework that uses pretrained Vision-Language Models to augment existing robotics datasets without new data collection. The approach decomposes demonstrations into sub-tasks, segments videos accordingly, and generates linguistically diverse instruction labels, enriching language-action pair diversity. Evaluations on the LIBERO benchmark show improved generalization to novel tasks and goals, addressing a key limitation of current robot learning policies.

Agent and Tool Ecosystem Multimodal Progress TREAD LIBERO

7arXiv · cs.CL·May 29, 2026·source ↗

Qwen-VLA: Unified Vision-Language-Action Model Across Robot Tasks, Environments, and Embodiments

Alibaba's Qwen team presents Qwen-VLA, a unified embodied foundation model that extends the Qwen vision-language stack to continuous action and trajectory generation via a DiT-based action decoder. The model is jointly pretrained on diverse data spanning manipulation trajectories, egocentric demonstrations, synthetic simulation, and navigation data, with embodiment-aware prompt conditioning to support multiple robot platforms. A unified action-and-trajectory prediction framework covers manipulation, navigation, and trajectory prediction tasks. Benchmarks show strong results: 97.9% on LIBERO, 73.7% on Simpler-WidowX, 69.0% OSR on R2R navigation, and 76.9% average OOD success in real-world ALOHA experiments.

Frontier Model Releases Evaluation and Benchmarking Qwen-VLA DOMINO R2R +10 more

7arXiv · cs.LG·May 28, 2026·source ↗

Ω-QVLA: Training-Free W4A4 Quantization for Full Vision-Language-Action Models Including Diffusion Action Heads

Omega-QVLA is a post-training quantization framework that compresses both the LLM backbone and the diffusion-based action head of VLA models to uniform W4A4 precision without mixed-precision schemes or fine-tuning. It combines composite SVD-Hadamard rotation for weight energy equalization with per-step DiT activation scaling to handle dynamic-range drift across denoising steps. On the LIBERO benchmark, it achieves 98.0% and 87.8% task success on Pi 0.5 and GR00T N1.5 respectively—matching or exceeding FP16 baselines—while reducing static memory footprint by 71.3%. Real-world manipulation experiments confirm the approach generalizes beyond simulation.

Inference Economics Agent and Tool Ecosystem Pi 0.5 SVD-Hadamard rotation LIBERO +6 more

5The Batch·May 18, 2026·source ↗

Sony and University Researchers Train Robots To Learn Without Catastrophic Forgetting

Researchers from UT Austin, UCLA, Nanyang Technological University, and Sony developed a sequential fine-tuning recipe combining LoRA and on-policy reinforcement learning (GRPO) to reduce catastrophic forgetting in vision-language-action (VLA) models for robotics. Applied to the OpenVLA-OFT model on the LIBERO benchmark, the method achieved 81.2% success on libero-spatial tasks with near-zero forgetting (0.3 percentage point drop), outperforming established continual learning baselines including Dark Experience Replay and Elastic Weight Consolidation. The approach requires no replay of prior task data and also showed modest generalization to unseen tasks. The authors note the method has not yet been tested outside robotics simulation contexts.

Evaluation and Benchmarking Agent and Tool Ecosystem Elastic Weight Consolidation Dark Experience Replay University of California Los Angeles +11 more