Entity · benchmark

AgentCL

benchmarkactiveagentcl-c4c6a1f4·1 events·first seen Jun 2, 2026

Aliases: AgentCL

Co-occurring entities

MemProbe Continual Learning language agents non-parametric memory

More like this (12)

BabyCL agents-cli FlashbackCL MCLASH agent cloud Agent 365 AgentMob CLI-Anything UCL ARC RealClawBench Claude Agent SDK page-agent

Recent events (1)

6arXiv · cs.CL·Jun 2, 2026·source ↗

AgentCL: A Rigorous Evaluation Framework for Continual Learning in Language Agents

AgentCL is a new benchmark and evaluation framework designed to rigorously assess continual learning in language agents, addressing gaps in existing benchmarks that focus on retrieval over long-context documents or use naive task streams with limited cross-task analysis. The framework constructs compositional task streams where earlier sub-solutions, evidence, or workflows are intentionally reusable in later tasks, contrasting them with naive streams to measure transfer gains. The authors also introduce MemProbe, a probing method that stores interactions, insights, and skills while filtering unreliable experiences during consolidation. Empirical results across coding, deep research, and language understanding tasks show that controlled streams better distinguish memory design quality, and that naive streams can mask memory-induced degradation.

Long Context Evolution Evaluation and Benchmarking AgentCL MemProbe Continual Learning +3 more