Almanac
technique

EvoMem

techniqueactiveprovisionalevomem-6bb9e6fd·1 events·first seen 5d ago

Aliases: EvoMem

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·5d ago·source ↗

EvoArena benchmark and EvoMem memory paradigm for LLM agents in dynamic environments

Researchers introduce EvoArena, a benchmark suite that evaluates LLM agents in dynamic environments by modeling changes as progressive update sequences across terminal, software, and social domains. Alongside it, they propose EvoMem, a patch-based memory paradigm that records memory evolution as structured update histories to help agents reason about environmental change. Current agents score only 39.6% average accuracy on EvoArena, while EvoMem yields consistent gains on EvoArena and also improves performance on GAIA and LoCoMo benchmarks. The work highlights a significant gap between static-benchmark performance and real-world dynamic deployment requirements.