Entity · benchmark

CharXiv Reasoning

benchmarkactivecharxiv-reasoning-39099e45·1 events·first seen Jun 1, 2026

Aliases: CharXiv Reasoning

Co-occurring entities

Scale AI Artificial Analysis Intelligence Index Claude Opus 4.6 HealthBench Alexandr Wang Humanity's Last Exam Meta Superintelligence Labs MMMU-Pro Llama-4-Maverick Kimi K2.5 Muse Spark Gemini-3.1-Pro thought compression Meta GPT-5.5

More like this (12)

arXiv:2602.05394 ArXiv Long-context Reasoning Benchmarks ToxiREX: A Dataset on Toxic REasoning in ConteXt EG-Reasoner SciReasoner SciReasoner Show Me How You Reason and I'll Tell You Who You Are: Reasoning Graphs for Robust LLM Authorship Attribution Chest X-ray Reasoning Reasoning Enhancement Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models Reasoning Before Translation: Enhancing Legal Machine Translation with Structured Reasoning

Recent events (1)

8The Batch·Jun 1, 2026·source ↗

Meta Introduces Muse Spark: First Closed-Weights Model from Superintelligence Labs

Meta released Muse Spark, its first AI model in roughly a year and the debut product of its Superintelligence Labs, marking a significant departure from its open-weights Llama strategy. The natively multimodal reasoning model supports tool use and multi-agent orchestration, achieves fourth place on the Artificial Analysis Intelligence Index, and claims notable token efficiency—matching Llama 4 Maverick with over 10x less training compute. Meta withheld parameter count, architecture, and training details, positioning Muse Spark as a closed commercial product competing with OpenAI, Google, and Anthropic. The release introduces 'thought compression' via RL and a parallel multi-agent 'contemplating' mode, while showing gaps in coding and agentic benchmarks.

Frontier Model Releases Open Weights Progress Scale AI Artificial Analysis Intelligence Index Claude Opus 4.6 +18 more