Almanac
benchmark

FrontierCode

benchmarkactiveprovisionalfrontiercode-6e2ad47b·4 events·first seen 8d ago

Aliases: FrontierCode

Co-occurring entities

More like this (12)

Recent events (4)

5Latent Space·8d ago·source ↗

Latent Space introduces FrontierCode benchmark for code quality evaluation

Latent Space has announced FrontierCode, a new benchmark targeting code quality assessment rather than simple code generation correctness. The announcement comes from the AINews newsletter, suggesting this is positioned as a community-relevant evaluation tool. The framing around 'slop' implies the benchmark is designed to distinguish genuinely high-quality code outputs from superficially plausible but low-quality generations.

5Import Ai·37h ago·source ↗

Import AI 461: Alignment concerns, FrontierCode benchmark, and synthetic research interns

Import AI issue 461 covers three topics: a claim that AI alignment is not on track, a new benchmark or dataset called FrontierCode, and work on synthetic research interns (likely LLM-based agents simulating research assistants). The newsletter is a weekly digest by Jack Clark that synthesizes developments across the AI/ML landscape. The alignment framing and synthetic agent research angle are both substantive signals worth tracking.

7The Batch·6d ago·source ↗

The Batch: Claude Mythos 5 / Fable 5 debut, Apple AFM 3, Google Live Translate, OpenAI IPO filing, FrontierCode benchmark

Anthropic launched Claude Fable 5 (a safety-guardrailed model) and Claude Mythos 5 (same underlying model with safeguards removed, for vetted cyberdefense/infrastructure users via Project Glasswing with US government collaboration), both priced at $10/$50 per million tokens. Apple released five new Apple Foundation Models (AFM 3) spanning on-device and cloud tiers, built with Google and Nvidia infrastructure. Additional headlines cover Google's Gemini 3.5 Live Translate (70+ languages, real-time), OpenAI's confidential SEC IPO filing, a NotebookLM upgrade to Gemini 3.5, and Cognition's FrontierCode benchmark for code-quality evaluation where Claude Opus 4.8 leads at 34.3%.

8Hacker News·7d ago·source ↗

Anthropic releases Claude Fable 5

Anthropic has released Claude Fable 5, a new model in the Claude family, announced via their official news channel. The Hacker News discussion generated substantial engagement with 1,468 points and 1,156 comments, indicating significant community interest. No detailed capability claims or benchmark results are available from this item alone.