benchmark

FrontierScience-Olympiad

benchmarkactiveprovisionalfrontierscience-olympiad-19bbf08f·1 events·first seen 13h ago

Aliases: FrontierScience-Olympiad

Co-occurring entities

IFBench Kimi-K2 DeepSeek V4 BrowseComp SEAL-0 HiPhO Agents-A1 SciCode HLE

More like this (12)

FrontierScience FrontierMath Frontier FrontierCode OpenAI Frontier Frontier Red Team FrontierSWE Frontier Model Forum frontier model evaluation Frontier AI Framework InternScience OpenAI frontier models

Recent events (1)

7arXiv · cs.CL·13h ago·source ↗

Agents-A1: 35B MoE agent matches trillion-parameter models via horizon scaling

Researchers introduce Agents-A1, a 35B Mixture-of-Experts model that claims to match or exceed trillion-parameter models like Kimi-K2 and DeepSeek V4 on long-horizon agentic benchmarks. The approach scales agent trajectory length (averaging 45K tokens) and heterogeneous agent abilities rather than raw parameter count, using a three-stage training recipe including multi-teacher domain-routed distillation. On benchmarks such as SEAL-0, IFBench, HiPhO, and FrontierScience-Olympiad, Agents-A1 achieves leading or competitive results against models with roughly 30x more parameters. The work proposes a practical efficiency path for agentic capability scaling without proportional compute scaling.

Frontier Model Releases Inference Economics IFBench Kimi-K2 DeepSeek V4 +8 more