Almanac
benchmark

FrontierScience

benchmarkactivefrontierscience-22280d0e·2 events·first seen 28d ago

Aliases: FrontierScience, FrontierScience Research

Merged from

FrontierScience Research

Co-occurring entities

More like this (12)

Recent events (2)

7Openai Blog·28d ago·source ↗

OpenAI Introduces FrontierScience Benchmark for Scientific Research Tasks

OpenAI has released FrontierScience, a new benchmark designed to evaluate AI reasoning capabilities across physics, chemistry, and biology. The benchmark is intended to measure progress toward AI systems capable of performing real scientific research tasks. This represents OpenAI's effort to establish a rigorous evaluation framework for frontier-level scientific reasoning, going beyond standard academic problem sets.

9Meta Ai Blog·1mo ago·source ↗

Meta Introduces Muse Spark: First Model from Meta Superintelligence Labs with Multimodal Reasoning and Multi-Agent Orchestration

Meta has launched Muse Spark, the first model from its newly formed Meta Superintelligence Labs, positioned as a natively multimodal reasoning model with tool-use, visual chain-of-thought, and multi-agent orchestration capabilities. The model introduces 'Contemplating mode,' which runs multiple agents in parallel to compete with frontier reasoning modes, achieving 58% on Humanity's Last Exam and 38% on FrontierScience Research. Meta claims a greater than 10x compute efficiency improvement over Llama 4 Maverick through a rebuilt pretraining stack, and describes predictable scaling across pretraining, RL, and test-time reasoning axes. Muse Spark is available at meta.ai with a private API preview, and is framed as the first step on a scaling ladder toward 'personal superintelligence.'