Almanac
other

language agents

otheractiveprovisionallanguage-agents-0d18426e·2 events·first seen 22d ago

Aliases: language agents

Co-occurring entities

More like this (12)

Recent events (2)

6arXiv · cs.CL·15d ago·source ↗

AgentCL: A Rigorous Evaluation Framework for Continual Learning in Language Agents

AgentCL is a new benchmark and evaluation framework designed to rigorously assess continual learning in language agents, addressing gaps in existing benchmarks that focus on retrieval over long-context documents or use naive task streams with limited cross-task analysis. The framework constructs compositional task streams where earlier sub-solutions, evidence, or workflows are intentionally reusable in later tasks, contrasting them with naive streams to measure transfer gains. The authors also introduce MemProbe, a probing method that stores interactions, insights, and skills while filtering unreliable experiences during consolidation. Empirical results across coding, deep research, and language understanding tasks show that controlled streams better distinguish memory design quality, and that naive streams can mask memory-induced degradation.

6arXiv · cs.AI·22d ago·source ↗

Systematic Study of Model-Generated Agent Skills Across the Full Skill Lifecycle

This paper presents a utility-grounded evaluation framework for model-generated agent skills, covering the full lifecycle of experience generation, skill extraction, and skill consumption across five agentic task domains. The authors find that while such skills are beneficial on average, they exhibit non-trivial negative transfer, and that skill utility is independent of model scale or baseline task strength. A key finding is that strong extractors are not necessarily strong consumers and vice versa. The work culminates in a 'meta-skill' that guides extraction toward utility-correlated features, consistently improving skill quality and reducing negative transfer.