Almanac
product

AgenticSTS

productactiveprovisionalagenticsts-a8017da9·1 events·first seen 16h ago

Aliases: AgenticSTS

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·16h ago·source ↗

AgenticSTS: Bounded-memory testbed for studying long-horizon LLM agent decisions in Slay the Spire 2

Researchers introduce AgenticSTS, a testbed for studying long-horizon LLM agents under a bounded-memory contract where each decision is assembled from typed retrieval rather than appending a raw cross-decision transcript. The system is instantiated in Slay the Spire 2, a stochastic deck-building game requiring hundreds of sequential decisions, chosen because frontier LLMs currently win zero games at the lowest difficulty against a 16% human baseline. Ablation experiments show enabling a strategic skill layer improves win rate from 3/10 to 6/10, though sample sizes are too small for statistical significance. The authors release 298 completed trajectories, memory snapshots, and analysis scripts as a reusable methodology for isolating how explicit memory layers affect agent performance.