Almanac
benchmark

SkillGenBench

benchmarkactiveskillgenbench-ec4855aa·1 events·first seen 29d ago

Aliases: SkillGenBench

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.AI·29d ago·source ↗

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

SkillGenBench is a new benchmark designed to evaluate the ability of LLM agents to generate correct, reusable, and executable skills from raw repositories and documents, rather than merely using pre-provided skills. It covers two generation regimes (task-conditioned and task-agnostic) and two procedural sources (repository-grounded and document-grounded), with standardized execution-based evaluation protocols. Experiments across multiple skill-generation methods reveal substantial performance variation and distinct failure modes depending on source type. The benchmark aims to establish skill generation as an independent research problem within agent systems.