benchmark
ScienceWorld
benchmarkactiveprovisional
scienceworld-18ad90f9·1 events·first seen 10h agoAliases: ScienceWorld
Co-occurring entities
More like this (12)
Recent events (1)
WorldEvolver: Self-Evolving World Models for LLM Agent Planning via Test-Time Memory Revision
Researchers introduce WorldEvolver, a framework that equips LLM agents with self-improving world models that revise their context at deployment time without updating model parameters. The system combines episodic memory (retrieval-based simulation), semantic memory (heuristic rule extraction from prediction errors), and selective foresight (confidence-based filtering). Evaluated on ALFWorld and ScienceWorld benchmarks, WorldEvolver achieves state-of-the-art world model prediction accuracy and improved downstream agent success rates across three backbone models. The work addresses a key challenge in long-horizon agent planning: unreliable foresight that can degrade rather than improve decision-making.