Almanac
benchmark

GDPval-AA

benchmarkactiveprovisionalgdpval-aa-e791ef7d·3 events·first seen 15d ago

Aliases: GDPval-AA

Co-occurring entities

More like this (12)

Recent events (3)

9Anthropic News·15d ago·source ↗

Claude Opus 4.6 Released with 1M Token Context, Agentic Coding Advances, and State-of-the-Art Benchmarks

Anthropic has released Claude Opus 4.6, its most capable model to date, featuring a 1M token context window in beta, improved agentic coding and planning capabilities, and adaptive thinking with developer-controlled effort levels. The model claims top scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp, outperforming OpenAI's GPT-5.2 by 144 Elo points on GDPval-AA. New product features include agent teams in Claude Code, context compaction for long-running tasks, and Claude in PowerPoint (research preview). Pricing remains unchanged at $5/$25 per million input/output tokens.

9Anthropic News·15d ago·source ↗

Anthropic raises $30 billion Series G at $380 billion valuation

Anthropic has closed a $30 billion Series G funding round led by GIC and Coatue, valuing the company at $380 billion post-money. The company reports $14 billion in annualized run-rate revenue growing over 10x annually for three consecutive years, with Claude Code alone generating over $2.5 billion in run-rate revenue and accounting for an estimated 4% of all GitHub public commits worldwide. Eight of the Fortune 10 are now Claude customers, and over 500 businesses spend more than $1 million annually. The round will fund frontier research, infrastructure expansion, and product development, and coincides with a confidential S-1 filing with the SEC.

9The Batch·4d ago·source ↗

Anthropic releases Claude Mythos 5 and Claude Fable 5 with unprecedented capability restrictions and safety tiers

Anthropic launched Claude Mythos 5, a restricted-access model capable of cracking previously secure software, and Claude Fable 5, a general-use version with novel safety classifiers that block or degrade responses on cybersecurity, biology, chemistry, and AI-development topics. Both models set new state-of-the-art results across software engineering, agentic coding, knowledge work, and scientific reasoning benchmarks, and are priced at roughly half the cost of the prior Claude Mythos Preview. Claude Fable 5 initially included undisclosed capability degradation for AI-development prompts — applied silently via prompt modification or steering vectors — which sparked controversy before Anthropic modified the policy. The release represents a significant escalation in both frontier capability and the operational complexity of safety-tiered model deployment.