Almanac
benchmark

Bias Benchmark for Question Answering

benchmarkactiveprovisionalbias-benchmark-for-question-answering-9089b397·1 events·first seen 14d ago

Aliases: Bias Benchmark for Question Answering

Co-occurring entities

More like this (12)

Recent events (1)

9Anthropic News·14d ago·source ↗

Anthropic launches Claude 3 model family: Haiku, Sonnet, and Opus

Anthropic announced the Claude 3 model family on March 4, 2024, comprising three models — Haiku, Sonnet, and Opus — in ascending capability order. Claude 3 Opus claims top performance on major benchmarks including MMLU, GPQA, and GSM8K, with near-perfect recall on long-context evaluations (200K context window, 99%+ NIAH accuracy) and new multimodal vision capabilities. The release also highlights reduced unnecessary refusals, a twofold accuracy improvement over Claude 2.1, and Constitutional AI-based safety tuning. Opus and Sonnet launched immediately via claude.ai and the Claude API across 159 countries, with Haiku to follow.