Entity · product

AutoSkillHarm

productactiveautoskillharm-30ae16bb·1 events·first seen Jun 2, 2026

Aliases: AutoSkillHarm

Co-occurring entities

Self-Mutating Poisoning Fixed-Payload Poisoning skill-based attacks SkillHarm

More like this (12)

SkillHarm Skill Self-Play OpenSkillRisk OpenSkillRisk: Benchmarking Agent Safety When Using Real-World Risky Third-Party Skills SkillFuzz SkillWeaver Skill-RM SKILL.md SkillOpt HarmAmp Trace2Skill meta-skill

Recent events (1)

7arXiv · cs.CL·Jun 2, 2026·source ↗

SkillHarm: Lifecycle-Aware Benchmark for Skill-Based Attacks on AI Agents

SkillHarm is a new benchmark evaluating adversarial attacks on AI agent skills across their full use lifecycle, covering two attack scenarios: Fixed-Payload Poisoning (FPP) and Self-Mutating Poisoning (SMP). The benchmark includes 879 attack samples across 71 skills, organized under a 12-category risk taxonomy targeting data pipelines, system environments, and agent autonomy. Experiments show current agents remain highly vulnerable, with attack success rates up to 86.3% (FPP) and 69.3% (SMP). An automated construction pipeline called AutoSkillHarm, driven by coding agents, was used to generate the benchmark at scale.

Evaluation and Benchmarking AI Safety Research Self-Mutating Poisoning Fixed-Payload Poisoning skill-based attacks +3 more