model
o4-mini-high
modelactiveprovisional
o4-mini-high-61ca8851·1 events·first seen 6d agoAliases: o4-mini-high
Co-occurring entities
More like this (12)
Recent events (1)
ABC-Bench: Agentic biosecurity benchmark finds LLM agents surpass median expert humans on dual-use biology tasks
Researchers introduce ABC-Bench, a benchmark evaluating LLM agents on biosecurity-relevant biology tasks including liquid-handling robot programming, DNA fragment design, and evasion of DNA synthesis screening. All tested agents outperformed the median expert human baseline across all three tasks. Wet-lab validation confirmed that OpenAI's o4-mini-high produced scripts that successfully assembled DNA on an OpenTrons robot. The results highlight a meaningful shift in the biosecurity risk landscape as AI agents acquire practical wet-lab-adjacent capabilities.