Almanac
benchmark

AgentCL

benchmarkactiveprovisionalagentcl-c4c6a1f4·1 events·first seen 15d ago

Aliases: AgentCL

Co-occurring entities

More like this (12)

Recent events (1)

6arXiv · cs.CL·15d ago·source ↗

AgentCL: A Rigorous Evaluation Framework for Continual Learning in Language Agents

AgentCL is a new benchmark and evaluation framework designed to rigorously assess continual learning in language agents, addressing gaps in existing benchmarks that focus on retrieval over long-context documents or use naive task streams with limited cross-task analysis. The framework constructs compositional task streams where earlier sub-solutions, evidence, or workflows are intentionally reusable in later tasks, contrasting them with naive streams to measure transfer gains. The authors also introduce MemProbe, a probing method that stores interactions, insights, and skills while filtering unreliable experiences during consolidation. Empirical results across coding, deep research, and language understanding tasks show that controlled streams better distinguish memory design quality, and that naive streams can mask memory-induced degradation.