Entity · benchmark

ABC-Bench

benchmarkactiveabc-bench-14bd3fd7·1 events·first seen Jun 10, 2026

Aliases: ABC-Bench

Co-occurring entities

More like this (12)

ATE-Bench AdvBench APS-Bench ALE-Bench ITBench-AA Int-Bench SorryBench AGC-Bench AdversaBench TriggerBench SupraBench ACEBench-Agent

Recent events (1)

8arXiv · cs.AI·Jun 10, 2026·source ↗

ABC-Bench: Agentic biosecurity benchmark finds LLM agents surpass median expert humans on dual-use biology tasks

Researchers introduce ABC-Bench, a benchmark evaluating LLM agents on biosecurity-relevant biology tasks including liquid-handling robot programming, DNA fragment design, and evasion of DNA synthesis screening. All tested agents outperformed the median expert human baseline across all three tasks. Wet-lab validation confirmed that OpenAI's o4-mini-high produced scripts that successfully assembled DNA on an OpenTrons robot. The results highlight a meaningful shift in the biosecurity risk landscape as AI agents acquire practical wet-lab-adjacent capabilities.

Frontier Model Releases Evaluation and Benchmarking ABC-Bench OpenTrons o4-mini-high +2 more