Entity · benchmark

Codex HumanEval

benchmarkactivecodex-humaneval-df837778·1 events·first seen Jun 4, 2026

Aliases: Codex HumanEval

Co-occurring entities

claude.ai Claude Sourcegraph Claude 3.5 Jasper Cody GSM8K Anthropic

More like this (12)

HumanEval Codex 5.3 Codex App Codex Codex Mobile Codex SDK CruxEval CharacterEval ParaEval Codex Chrome Extension Codex Remote HumanEvalFIM

Recent events (1)

7Anthropic News·Jun 4, 2026·source ↗

Anthropic launches Claude 2 with 100K context window and improved coding, reasoning, and safety

Anthropic released Claude 2, featuring a 100K token context window, improved performance on coding (71.2% on Codex HumanEval, up from 56.0%), math (88.0% on GSM8k), and legal reasoning (76.5% on the Bar exam multiple choice section). The model is available via API at the same price as Claude 1.3 and through a new public beta at claude.ai for US and UK users. Safety improvements include a 2x reduction in harmful outputs on internal red-team evaluations compared to Claude 1.3. Early API partners include Jasper and Sourcegraph.

Long Context Evolution Frontier Model Releases claude.ai Claude Sourcegraph +7 more