Entity · benchmark

Vending-Bench Arena

benchmarkactivevending-bench-arena-f8e9b591·1 events·first seen Jun 1, 2026

Aliases: Vending-Bench Arena

Co-occurring entities

claude.ai Claude Sonnet 4 Claude Opus 4.6 Claude Sonnet 4.5 Claude Code Claude Cowork OSWorld-Verified OSWorld Anthropic

More like this (12)

VendingBench Vision Arena Video Arena Judge Arena Game Arena AdvBench TTS Arena WildBench EvoArena VEHBench DeliveryBench StakeBench

Recent events (1)

8Anthropic News·Jun 1, 2026·source ↗

Anthropic Releases Claude Sonnet 4.6 with 1M Token Context, Improved Computer Use, and Coding Capabilities

Anthropic has released Claude Sonnet 4.6, positioned as a major upgrade over Sonnet 4.5 with improvements across coding, computer use, long-context reasoning, and agent planning. The model features a 1M token context window in beta and is now the default on claude.ai Free and Pro plans at unchanged pricing ($3/$15 per million tokens). Notably, users preferred Sonnet 4.6 over the prior Opus 4.5 frontier model 59% of the time in coding tasks, and the model shows significant gains on OSWorld computer-use benchmarks alongside improved prompt injection resistance. Safety evaluations found no major alignment concerns and rated it as safe or safer than prior Claude models.

Long Context Evolution Frontier Model Releases claude.ai Claude Sonnet 4 Claude Opus 4.6 +11 more