Almanac
benchmark

VLABench

benchmarkactiveprovisionalvlabench-790884b9·1 events·first seen 6d ago

Aliases: VLABench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·6d ago·source ↗

DIRECT: Adaptive test-time compute routing for embodied VLM planners

Researchers introduce DIRECT, a routing framework that dynamically allocates test-time compute for Vision-Language Models acting as embodied planners, using multimodal scene context to decide per-prompt how much compute to spend. Experiments on VLABench and RoboMME benchmarks show that different scaling axes (chain-of-thought depth, model size, memory history) yield qualitatively distinct gains, and that naive uniform scaling is wasteful. On a physical Franka arm, DIRECT matches or exceeds a stronger model's success rate at up to 65% lower average latency, improving the success-cost Pareto frontier.