technique
spatio-temporal dynamic reasoning
techniqueactive
spatio-temporal-dynamic-reasoning-3a593be1·1 events·first seen 29d agoAliases: spatio-temporal dynamic reasoning
Co-occurring entities
More like this (12)
Recent events (1)
STT-Arena: Benchmark for Adaptive Replanning Under Spatio-Temporal Dynamics in Tool-Using LLMs
STT-Arena is a new benchmark of 227 interactive tasks designed to evaluate LLMs' ability to detect mid-task disruptions and replan under spatio-temporal dynamics, covering nine conflict types and four solvability levels. Evaluation of frontier models including Claude-4.6-Opus shows less than 40% overall accuracy, revealing fundamental limitations in dynamic reasoning. The authors identify three recurring failure modes—Stale-State Execution, Misdiagnosis of Dynamic Triggers, and Missing Post-Adaptation Verification—and propose an iterative trajectory refinement technique combined with online RL to train STT-Agent-4B, a 4B-parameter model that outperforms frontier LLMs on the benchmark.