Entity · benchmark

Intercode CTF

benchmarkactiveintercode-ctf-4e5043bf·1 events·first seen Jun 3, 2026

Aliases: Intercode CTF

Co-occurring entities

Carnegie Mellon University LabBench Cybench SecureBio VCT Incalmo Claude 3.7 Sonnet Anthropic

More like this (12)

DECODEM VS Code CodePath CodeAct CodeSearchNet SciCode unclecode CapCode ultracode TabICL BigCode RedCode

Recent events (1)

8Anthropic News·Jun 3, 2026·source ↗

Anthropic Frontier Red Team reports early-warning signs of rapid AI progress in cybersecurity and biosecurity capabilities

Anthropic's Frontier Red Team published findings from a year of safety evaluations across four model releases, documenting rapid capability gains in dual-use domains. In cybersecurity, Claude 3.7 Sonnet now solves roughly a third of Cybench CTF challenges (up from ~5% a year ago), and with the Incalmo toolset was able to replicate a large-scale network attack in realistic cyber range environments. In biosecurity, Claude has moved from underperforming virology experts to exceeding them on the VCT benchmark within one year, and exceeds human expert baselines on cloning workflows. Anthropic assesses current models as showing 'early warning' signs but not yet crossing thresholds of substantially elevated national security risk.

Frontier Model Releases Evaluation and Benchmarking Intercode CTF Carnegie Mellon University LabBench +7 more