7The Batch (DeepLearning.AI)·19d ago

Meta Pivots to Closed Weights with Muse Spark; The Batch Issue 349 Roundup

Meta introduced Muse Spark, its first AI model in roughly a year and the first product from its Superintelligence Labs, marking a pivot away from its open-weights strategy toward a closed model. Muse Spark is a natively multimodal reasoning model supporting tool use and multi-agent orchestration, with three reasoning modes and a novel 'thought compression' post-training technique using RL to penalize excessive reasoning tokens. The model ranks fourth on the Artificial Analysis Intelligence Index and matches Llama 4 Maverick's capabilities with over an order of magnitude less training compute, though it trails in coding and agentic benchmarks. The issue also covers broader industry themes including AI-native software engineering team structures, big pharma AI adoption, and regulatory developments.

Frontier Model Releases Open Weights Progress Inference Economics Agent and Tool Ecosystem Alignment and RLHF Multimodal Progress DeepLearning.AI Artificial Analysis Intelligence Index Meta Superintelligence Labs Llama-4-Maverick Muse Spark thought compression Meta Andrew Ng

Related guides (5)

Frontier Model ReleasesTopic guide

Frontier Model Releases: The Race From Language to Action

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Alignment and RLHFTopic guide

Alignment and RLHF: From Human Feedback to Scalable Post-Training

Read asIn-depth

Related events (8)

8The Batch·19d ago·source ↗

Meta Introduces Muse Spark: First Closed-Weights Model from Superintelligence Labs

Meta released Muse Spark, its first AI model in roughly a year and the debut product of its Superintelligence Labs, marking a significant departure from its open-weights Llama strategy. The natively multimodal reasoning model supports tool use and multi-agent orchestration, achieves fourth place on the Artificial Analysis Intelligence Index, and claims notable token efficiency—matching Llama 4 Maverick with over 10x less training compute. Meta withheld parameter count, architecture, and training details, positioning Muse Spark as a closed commercial product competing with OpenAI, Google, and Anthropic. The release introduces 'thought compression' via RL and a parallel multi-agent 'contemplating' mode, while showing gaps in coding and agentic benchmarks.

Frontier Model Releases Open Weights Progress Scale AI Artificial Analysis Intelligence Index Claude Opus 4.6 +18 more

9Meta Ai Blog·1mo ago·source ↗

Meta Introduces Muse Spark: First Model from Meta Superintelligence Labs with Multimodal Reasoning and Multi-Agent Orchestration

Meta has launched Muse Spark, the first model from its newly formed Meta Superintelligence Labs, positioned as a natively multimodal reasoning model with tool-use, visual chain-of-thought, and multi-agent orchestration capabilities. The model introduces 'Contemplating mode,' which runs multiple agents in parallel to compete with frontier reasoning modes, achieving 58% on Humanity's Last Exam and 38% on FrontierScience Research. Meta claims a greater than 10x compute efficiency improvement over Llama 4 Maverick through a rebuilt pretraining stack, and describes predictable scaling across pretraining, RL, and test-time reasoning axes. Muse Spark is available at meta.ai with a private API preview, and is framed as the first step on a scaling ladder toward 'personal superintelligence.'

Training Infrastructure Long Context Evolution Hyperion Meta AI Gemini Deep Think +14 more

7Meta Ai Blog·1mo ago·source ↗

Meta Publishes Advanced AI Scaling Framework and Safety & Preparedness Report for Muse Spark

Meta has released an updated Advanced AI Scaling Framework that expands risk evaluation categories—including chemical/biological threats, cybersecurity, and loss-of-control risks—and introduces formal Safety & Preparedness Reports tied to specific model deployments. The first such report covers Muse Spark, Meta's advanced reasoning model, detailing pre- and post-safeguard evaluations across severe risk categories and ideological balance. Meta also describes a shift in safety methodology: rather than scenario-specific refusal training, Muse Spark is trained on the reasoning behind safety principles, enabling more generalizable behavior in novel situations. The framework applies across open, API, and closed deployments.

Frontier Model Releases Evaluation and Benchmarking Advanced AI Scaling Framework Meta AI Frontier AI Framework +6 more

6The Batch·19d ago·source ↗

Data Points: Nvidia Ising Models for Quantum Computing, Meta Muse Spark, GitHub Rubber Duck, Anthropic Claude Managed Agents, GPT-5.4-Cyber

Nvidia released Ising, a family of open AI models targeting quantum processor calibration and error correction, achieving 2.5x faster and 3x more accurate decoding than pyMatching, with adoption by Fermilab, Harvard, and others. Meta announced Muse Spark, a small multimodal model powering a new AI assistant series for its apps and glasses. GitHub introduced Rubber Duck, a cross-model review feature pairing Claude with GPT-5.4 for two-pass coding agent validation. Anthropic launched Claude Managed Agents, a managed infrastructure platform for enterprise autonomous AI deployment, while OpenAI expanded its Trusted Access for Cyber program with GPT-5.4-Cyber, a fine-tuned defensive cybersecurity model.

Frontier Model Releases Inference Economics Rubber Duck Notion GPT-5.5-Cyber +22 more

6The Batch·18d ago·source ↗

MiniMax M2.7 proprietary reasoning model competes with Gemini and Claude Opus; roundup covers Cursor Composer 2, MAI-Image-2, Claude Code Channels, and Anthropic defense dispute

MiniMax released M2.7, a proprietary reasoning model that achieved 66.6% on MLE Bench Lite (tying Gemini 3.1) and 56.22% on SWE-Pro, priced at $0.30/$1.20 per million tokens, with the shift to proprietary marking a potential strategic pivot among Chinese AI labs away from open weights. Cursor released Composer 2, an agentic coding model built on a fine-tuned Kimi 2.5 (via Moonshot partnership), priced 86% cheaper than its predecessor and scoring 73.7 on SWE-bench Multilingual. Anthropic released Claude Code Channels, routing Telegram and Discord messages into local Claude Code sessions via MCP plugins, and separately filed a court response denying it has any backdoor or kill switch into military deployments of Claude. Microsoft announced MAI-Image-2, a text-to-image model ranking third on Arena.ai among research labs.

Frontier Model Releases Open Weights Progress Stitch Claude Sonnet 4 SWE-Pro +17 more

8Mistral Ai News·19d ago·source ↗

Mistral AI Releases Magistral: First Reasoning Model in Open and Enterprise Variants

Mistral AI announces Magistral, its first reasoning model, released in two variants: Magistral Small (24B parameters, open-weight, Apache 2.0) and Magistral Medium (enterprise, closed). Magistral Medium scores 73.6% on AIME2024 (90% with majority voting @64), while Magistral Small scores 70.7% (83.3% respectively). Key differentiators include native multilingual chain-of-thought reasoning across eight major languages, transparent traceable reasoning steps, and up to 10x faster token throughput in Le Chat via Flash Answers. The release is accompanied by a research paper covering training infrastructure, reinforcement learning algorithm, and novel observations for training reasoning models.

Frontier Model Releases Evaluation and Benchmarking Mistral AI AIME2024 Amazon SageMaker +13 more

6The Batch·15d ago·source ↗

The Batch Issue 356: Qwen3.7-Max release, White House AI executive order, fine-tuning breaks copyright alignment

The Batch issue 356 covers several distinct AI developments: Alibaba's release of Qwen3.7-Max, a closed-weights flagship LLM targeting agentic coding and scientific tasks with a novel RL training approach that decouples task, harness, and verifier; a new White House executive order on frontier AI models focused on cybersecurity, including voluntary model-sharing with government; and a finding that fine-tuning breaks copyright alignment in LLMs. Andrew Ng's editorial commentary frames the executive order as a reasonable compromise, noting Anthropic's Mythos vulnerability-detection model as a key driver of the cybersecurity concerns behind the regulation.

Frontier Model Releases AI Safety Research Qwen3.7-Plus-Preview DeepLearning.AI Artificial Analysis Intelligence Index +9 more

7The Batch·19d ago·source ↗

Data Points: Qwen3.7-Max, OpenAI Math Proof, Gated DeltaNet-2, Trump AI Order, Microsoft Fara1.5

This edition of The Batch covers five significant AI developments: Alibaba's Qwen3.7-Max reasoning model with 1M token context and agentic capabilities ranking fifth on the Artificial Analysis Intelligence Index; an OpenAI reasoning model resolving the 80-year-old Erdős planar unit distance problem; Nvidia's Gated DeltaNet-2 outperforming Mamba-3 and other linear attention architectures; Trump pulling back a proposed AI regulation executive order; and Microsoft Research's Fara1.5 computer-use agent family beating OpenAI Operator and Google Gemini on the Online-Mind2Web benchmark.

Long Context Evolution Frontier Model Releases Paul Erdős Fara1.5 Mamba +25 more