Almanac
benchmark

Vending-Bench Arena

benchmarkactiveprovisionalvending-bench-arena-f8e9b591·1 events·first seen 16d ago

Aliases: Vending-Bench Arena

Co-occurring entities

More like this (12)

Recent events (1)

8Anthropic News·16d ago·source ↗

Anthropic Releases Claude Sonnet 4.6 with 1M Token Context, Improved Computer Use, and Coding Capabilities

Anthropic has released Claude Sonnet 4.6, positioned as a major upgrade over Sonnet 4.5 with improvements across coding, computer use, long-context reasoning, and agent planning. The model features a 1M token context window in beta and is now the default on claude.ai Free and Pro plans at unchanged pricing ($3/$15 per million tokens). Notably, users preferred Sonnet 4.6 over the prior Opus 4.5 frontier model 59% of the time in coding tasks, and the model shows significant gains on OSWorld computer-use benchmarks alongside improved prompt injection resistance. Safety evaluations found no major alignment concerns and rated it as safe or safer than prior Claude models.