benchmark

ExploitBench

benchmarkactiveprovisionalexploitbench-870055b5·1 events·first seen 6h ago

Aliases: ExploitBench

Co-occurring entities

World-Class Bio GPT-5.6 Terra GPT-5.6 Sol Claude Mythos SecureBio GPT-5.6 Luna Cerebras METR OpenAI Codex Anthropic Terminal-Bench

More like this (12)

SorryBench TriggerBench RepoBench TokenBench SkillsBench ProgramBench KernelBench MalwareBench SpecBench EvoBench LiveBench TestEvo-Bench

Recent events (1)

8The Batch·6h ago·source ↗

OpenAI Previews GPT-5.6 Family (Sol, Terra, Luna) with Government-Only Access and Advanced Safety Guardrails

OpenAI announced a preview of three vision-language models — GPT-5.6 Sol, Terra, and Luna — descending in capability and price, currently available only to U.S. government-approved organizations via API and Codex. GPT-5.6 Sol, the flagship tier, features a new 'max reasoning' mode and 'ultra mode' that spawns multiple subagents for multi-step tasks, and achieved state-of-the-art results on Terminal-Bench 2.1 (91.9%) while approaching Claude Mythos 5 on ExploitBench. The models include layered biosecurity and cybersecurity guardrails, with independent evaluations from METR and SecureBio yielding mixed but notable findings — particularly a near-10-point biology knowledge jump over GPT-5.5 and ambiguous autonomous task-duration results from METR. Wider public release is planned within weeks.

Frontier Model Releases AI Safety Research World-Class Bio GPT-5.6 Terra GPT-5.6 Sol +11 more