Entity · benchmark

FrontierCode

benchmarkactivefrontiercode-6e2ad47b·5 events·first seen Jun 9, 2026

Aliases: FrontierCode

Co-occurring entities

More like this (12)

FrontierCode 1.1 Main frontier coding agents Frontier frontier.security FrontierScience Frontier Red Team Frontier Compliance Framework FrontierMath Frontier-Bench FrontierSWE Frontier AI Framework Frontier Alliance Partners

Recent events (5)

7The Batch·Jul 2, 2026·source ↗

U.S. lifts export controls on Claude Fable 5 and Mythos 5; Anthropic launches Claude Sonnet 5 and Claude Science platform

The Trump administration lifted export restrictions on Anthropic's Claude Fable 5 and Claude Mythos 5 after Anthropic committed to stronger safeguards, resolving a dispute over jailbreak vulnerabilities. Separately, Anthropic launched Claude Sonnet 5, a mid-tier agentic model priced at $2/$10 per million tokens through August 2026, and Claude Science, a unified research workbench for life sciences integrating PubMed, Jupyter, and HPC cluster access. The newsletter also covers Google's Nano Banana 2 Lite image model and Gemini Omni Flash video model, and Cognition's Devin Fusion multi-model routing system claiming 35% cost reduction versus GPT-5.5 and Opus 4.8.

Frontier Model Releases Inference Economics Dario Amodei Claude Sonnet 3.5 Claude Mythos +20 more

5Import Ai·Jun 15, 2026·source ↗

Import AI 461: Alignment concerns, FrontierCode benchmark, and synthetic research interns

Import AI issue 461 covers three topics: a claim that AI alignment is not on track, a new benchmark or dataset called FrontierCode, and work on synthetic research interns (likely LLM-based agents simulating research assistants). The newsletter is a weekly digest by Jack Clark that synthesizes developments across the AI/ML landscape. The alignment framing and synthetic agent research angle are both substantive signals worth tracking.

Evaluation and Benchmarking AI Safety Research FrontierCode Jack Clark Import AI +1 more

7The Batch·Jun 10, 2026·source ↗

The Batch: Claude Mythos 5 / Fable 5 debut, Apple AFM 3, Google Live Translate, OpenAI IPO filing, FrontierCode benchmark

Anthropic launched Claude Fable 5 (a safety-guardrailed model) and Claude Mythos 5 (same underlying model with safeguards removed, for vetted cyberdefense/infrastructure users via Project Glasswing with US government collaboration), both priced at $10/$50 per million tokens. Apple released five new Apple Foundation Models (AFM 3) spanning on-device and cloud tiers, built with Google and Nvidia infrastructure. Additional headlines cover Google's Gemini 3.5 Live Translate (70+ languages, real-time), OpenAI's confidential SEC IPO filing, a NotebookLM upgrade to Gemini 3.5, and Cognition's FrontierCode benchmark for code-quality evaluation where Claude Opus 4.8 leads at 34.3%.

Frontier Model Releases Evaluation and Benchmarking Claude Mythos Claude Opus 4.6 Google +19 more

8Hacker News·Jun 9, 2026·source ↗

Anthropic releases Claude Fable 5

Anthropic has released Claude Fable 5, a new model in the Claude family, announced via their official news channel. The Hacker News discussion generated substantial engagement with 1,468 points and 1,156 comments, indicating significant community interest. No detailed capability claims or benchmark results are available from this item alone.

Frontier Model Releases AI Safety Research Claude Mythos IMC Claude Opus 4.6 +11 more

5Latent Space·Jun 9, 2026·source ↗

Latent Space introduces FrontierCode benchmark for code quality evaluation

Latent Space has announced FrontierCode, a new benchmark targeting code quality assessment rather than simple code generation correctness. The announcement comes from the AINews newsletter, suggesting this is positioned as a community-relevant evaluation tool. The framing around 'slop' implies the benchmark is designed to distinguish genuinely high-quality code outputs from superficially plausible but low-quality generations.

Frontier Model Releases Evaluation and Benchmarking FrontierCode Latent Space