Agents' Last Exam
agents-last-exam-4acbbb0a·1 events·first seen 2h agoAliases: Agents' Last Exam
Co-occurring entities
More like this (12)
Recent events (1)
DiffusionGemma hits 1,000+ tokens/sec; Claude Fable 5 export controls; Agents' Last Exam benchmark launch
Google introduced DiffusionGemma, an experimental 26B MoE model using diffusion-based text generation that produces 256-token blocks simultaneously, achieving over 1,000 tokens/second on H100 hardware at the cost of lower output quality versus standard Gemma 4. Separately, the US government issued an export control directive forcing Anthropic to suspend Claude Fable 5 and Claude Mythos 5 globally, while Anthropic also reversed a controversial silent-degradation safeguard on Fable 5 after researcher backlash. UC Berkeley's Center for RDI launched Agents' Last Exam (ALE), a 1,500+ task agentic benchmark using deterministic grading, where GPT-5.5 topped the leaderboard at only 24% pass rate, highlighting the difficulty gap between current models and professional-grade workflows.