Almanac
model

Llama-4-Maverick

modelactivellama-4-maverick-c089f52b·7 events·first seen 1mo ago

Aliases: Llama-4-Maverick, Llama 4 Maverick

Co-occurring entities

More like this (12)

Recent events (7)

8Hugging Face Blog·28d ago·source ↗

Welcome Llama 4 Maverick & Scout on Hugging Face

Hugging Face announces the availability of Meta's Llama 4 Maverick and Scout models on its platform. These are the first models in Meta's new Llama 4 generation, representing a significant open-weights release. The post covers integration details, model access, and usage on the Hugging Face ecosystem.

9Meta Ai Blog·1mo ago·source ↗

Meta Introduces Muse Spark: First Model from Meta Superintelligence Labs with Multimodal Reasoning and Multi-Agent Orchestration

Meta has launched Muse Spark, the first model from its newly formed Meta Superintelligence Labs, positioned as a natively multimodal reasoning model with tool-use, visual chain-of-thought, and multi-agent orchestration capabilities. The model introduces 'Contemplating mode,' which runs multiple agents in parallel to compete with frontier reasoning modes, achieving 58% on Humanity's Last Exam and 38% on FrontierScience Research. Meta claims a greater than 10x compute efficiency improvement over Llama 4 Maverick through a rebuilt pretraining stack, and describes predictable scaling across pretraining, RL, and test-time reasoning axes. Muse Spark is available at meta.ai with a private API preview, and is framed as the first step on a scaling ladder toward 'personal superintelligence.'

7Mistral Ai News·16d ago·source ↗

Mistral Medium 3: Frontier-Class Performance at 8x Lower Cost

Mistral AI has released Mistral Medium 3, a new enterprise-focused language model priced at $0.4/$2 per million input/output tokens. The model claims to achieve 90%+ of Claude Sonnet 3.7's benchmark performance while undercutting cost leaders like DeepSeek v3, and outperforming open models including Llama 4 Maverick. It supports hybrid, on-premises, and in-VPC deployment on as few as four GPUs, and is available immediately on Mistral La Plateforme and Amazon SageMaker, with additional cloud platforms coming soon. The announcement also teases an upcoming large open-weights model release.

7The Batch·16d ago·source ↗

Meta Pivots to Closed Weights with Muse Spark; The Batch Issue 349 Roundup

Meta introduced Muse Spark, its first AI model in roughly a year and the first product from its Superintelligence Labs, marking a pivot away from its open-weights strategy toward a closed model. Muse Spark is a natively multimodal reasoning model supporting tool use and multi-agent orchestration, with three reasoning modes and a novel 'thought compression' post-training technique using RL to penalize excessive reasoning tokens. The model ranks fourth on the Artificial Analysis Intelligence Index and matches Llama 4 Maverick's capabilities with over an order of magnitude less training compute, though it trails in coding and agentic benchmarks. The issue also covers broader industry themes including AI-native software engineering team structures, big pharma AI adoption, and regulatory developments.

8The Batch·16d ago·source ↗

Meta Introduces Muse Spark: First Closed-Weights Model from Superintelligence Labs

Meta released Muse Spark, its first AI model in roughly a year and the debut product of its Superintelligence Labs, marking a significant departure from its open-weights Llama strategy. The natively multimodal reasoning model supports tool use and multi-agent orchestration, achieves fourth place on the Artificial Analysis Intelligence Index, and claims notable token efficiency—matching Llama 4 Maverick with over 10x less training compute. Meta withheld parameter count, architecture, and training details, positioning Muse Spark as a closed commercial product competing with OpenAI, Google, and Anthropic. The release introduces 'thought compression' via RL and a parallel multi-agent 'contemplating' mode, while showing gaps in coding and agentic benchmarks.

6arXiv · cs.LG·1mo ago·source ↗

FORGE: Self-Evolving Agent Memory via Population Broadcast Without Weight Updates

FORGE (Failure-Optimized Reflective Graduation and Evolution) is a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents without any gradient updates. It wraps a Reflexion-style inner loop where a reflection agent converts failed trajectories into textual heuristics or few-shot demonstrations, then propagates the best-performing instance's memory across a population between stages. Evaluated on CybORG CAGE-2 (a stochastic network-defense POMDP), FORGE improves average return by 1.7–7.7× over zero-shot and 29–72% over Reflexion across all 12 model-representation conditions tested with four LLM families. Notably, weaker models benefit disproportionately, suggesting the method may help close capability gaps rather than amplify already-strong models.

4arXiv · cs.CL·8d ago·source ↗

Corpus-Grounded Feature Diffusion pipeline for automated IEP generation in Traditional Chinese

Researchers propose a low-resource fine-tuning pipeline called Corpus-Grounded Feature Diffusion (CGFD) to automate Individualized Education Program (IEP) drafting from Traditional Chinese parent-teacher interview transcripts. The approach fine-tunes Breeze-7B with QLoRA on 582 synthetically diffused samples and uses schema-constrained decoding at inference time, finding that Grammar-Constrained Decoding is counterproductive under Traditional Chinese token budgets. On a small formal hold-out (n=10), the system achieves BERTScore F1 of 0.779, outperforming zero-shot GPT-5.4, DeepSeek-V3.2, Gemini-3-Flash-Preview, and Llama-4-Maverick baselines while enabling fully local, air-gapped inference. The work addresses a gap in Traditional Chinese special-education NLP and demonstrates a privacy-preserving deployment pattern for sensitive document generation.