Almanac
benchmark

SkillsBench

benchmarkactiveprovisionalskillsbench-b1c6306e·1 events·first seen 21d ago

Aliases: SkillsBench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·21d ago·source ↗

MUSE-Autoskill: Self-Evolving LLM Agents via Skill Lifecycle Management

MUSE-Autoskill introduces a skill-centric agent framework where LLM agents continuously create, store, manage, evaluate, and refine reusable skills across tasks. The system adds skill-level memory that accumulates per-skill experience over time, enabling more effective reuse and cross-agent transfer. Experiments on SkillsBench show improvements in task success, efficiency, and reuse compared to static skill approaches.