Almanac
benchmark

FrontierScience-Olympiad

benchmarkactiveprovisionalfrontierscience-olympiad-19bbf08f·1 events·first seen 13h ago

Aliases: FrontierScience-Olympiad

Co-occurring entities

More like this (12)

Recent events (1)

7arXiv · cs.CL·13h ago·source ↗

Agents-A1: 35B MoE agent matches trillion-parameter models via horizon scaling

Researchers introduce Agents-A1, a 35B Mixture-of-Experts model that claims to match or exceed trillion-parameter models like Kimi-K2 and DeepSeek V4 on long-horizon agentic benchmarks. The approach scales agent trajectory length (averaging 45K tokens) and heterogeneous agent abilities rather than raw parameter count, using a three-stage training recipe including multi-teacher domain-routed distillation. On benchmarks such as SEAL-0, IFBench, HiPhO, and FrontierScience-Olympiad, Agents-A1 achieves leading or competitive results against models with roughly 30x more parameters. The work proposes a practical efficiency path for agentic capability scaling without proportional compute scaling.