Entity · model

STT-Agent-4B

modelactivestt-agent-4b-1550100d·1 events·first seen May 19, 2026

Aliases: STT-Agent-4B

Co-occurring entities

Claude Opus 4.6 iterative trajectory refinement spatio-temporal dynamic reasoning STT-Arena Reinforcement Learning Anthropic

More like this (12)

Agent-S SST-2 HTV-Agent GPT-4V BLEU-4 RD-Agent Gemma-4 E4B-it Computer-Using Agent GPT-4b micro PatchTST GS-Agent ProtST

Recent events (1)

6arXiv · cs.CL·May 19, 2026·source ↗

STT-Arena: Benchmark for Adaptive Replanning Under Spatio-Temporal Dynamics in Tool-Using LLMs

STT-Arena is a new benchmark of 227 interactive tasks designed to evaluate LLMs' ability to detect mid-task disruptions and replan under spatio-temporal dynamics, covering nine conflict types and four solvability levels. Evaluation of frontier models including Claude-4.6-Opus shows less than 40% overall accuracy, revealing fundamental limitations in dynamic reasoning. The authors identify three recurring failure modes—Stale-State Execution, Misdiagnosis of Dynamic Triggers, and Missing Post-Adaptation Verification—and propose an iterative trajectory refinement technique combined with online RL to train STT-Agent-4B, a 4B-parameter model that outperforms frontier LLMs on the benchmark.

Evaluation and Benchmarking Agent and Tool Ecosystem Claude Opus 4.6 iterative trajectory refinement spatio-temporal dynamic reasoning +5 more