Entity · benchmark

SkillGenBench

benchmarkactiveskillgenbench-ec4855aa·1 events·first seen May 19, 2026

Aliases: SkillGenBench

Co-occurring entities

task-conditioned generation task-agnostic generation skill generation LLM agents

More like this (12)

CompSkillBench SkillsBench SpecBench BigCodeBench SearchGen-Bench SkillKit CharacterBench SpatialBench FutureBench skill generation MedAgentBench WildBench

Recent events (1)

5arXiv · cs.AI·May 19, 2026·source ↗

SkillGenBench: Benchmarking Skill Generation Pipelines for LLM Agents

SkillGenBench is a new benchmark designed to evaluate the ability of LLM agents to generate correct, reusable, and executable skills from raw repositories and documents, rather than merely using pre-provided skills. It covers two generation regimes (task-conditioned and task-agnostic) and two procedural sources (repository-grounded and document-grounded), with standardized execution-based evaluation protocols. Experiments across multiple skill-generation methods reveal substantial performance variation and distinct failure modes depending on source type. The benchmark aims to establish skill generation as an independent research problem within agent systems.

Evaluation and Benchmarking Agent and Tool Ecosystem task-conditioned generation task-agnostic generation SkillGenBench +2 more