Entity · benchmark

VLABench

benchmarkactivevlabench-790884b9·1 events·first seen Jun 11, 2026

Aliases: VLABench

Co-occurring entities

RoboMME Franka DROID DIRECT

More like this (12)

LVBench LabBench LiveBench LabVLA VBench LiTBench MVBench AdvBench VerifierBench Lambda Labs VR-Bench LawBench

Recent events (1)

5arXiv · cs.AI·Jun 11, 2026·source ↗

DIRECT: Adaptive test-time compute routing for embodied VLM planners

Researchers introduce DIRECT, a routing framework that dynamically allocates test-time compute for Vision-Language Models acting as embodied planners, using multimodal scene context to decide per-prompt how much compute to spend. Experiments on VLABench and RoboMME benchmarks show that different scaling axes (chain-of-thought depth, model size, memory history) yield qualitatively distinct gains, and that naive uniform scaling is wasteful. On a physical Franka arm, DIRECT matches or exceeds a stronger model's success rate at up to 65% lower average latency, improving the success-cost Pareto frontier.

Inference Economics Agent and Tool Ecosystem RoboMME Franka DROID +2 more