Entity · company

Meta

companyactivemeta-e74be8b8·84 events·first seen May 18, 2026

Aliases: Meta

Co-occurring entities

More like this (12)

Meta AI Meta Llama MetaSyn Meta-World meta-learning Meta Superintelligence Labs CoMet Meta AI (FAIR)Meta Model API Meta Llama 3.1 405B llm-meta-ai PubMed

Guides (1)

Meta AI: From Open-Weights Pioneer to Frontier Contender

Read asBeginner In-depth

Recent events (50)

All 84 events →

4arXiv · cs.CL·3d ago·source ↗

AIriskEval-edu: Platform for auditing pedagogical risks in AI-generated educational explanations

Researchers present AIriskEval-edu Demo, a platform that audits instructional explanations across five pedagogical risk dimensions including factual accuracy, ideological bias, and student-level appropriateness. The system integrates GPT-5.5 via API alongside a fine-tuned self-hosted Llama 3.1 8B evaluator, with the local model outperforming GPT-5.5 on most metrics. The platform targets K-12 educational contexts and supports both automated auditing of AI-generated explanations and real-time auditing of human-written content, offering institutions a privacy-preserving deployment option.

Enterprise Deployment Patterns OpenAI Meta GPT-5.5 +2 more

6arXiv · cs.CL·3d ago·source ↗

Input-only prompt optimization can suppress evaluation-awareness latents in LLMs, but activation readability ≠ behavioral control

Researchers study the input-side dual of activation steering: optimizing fluent prompts to drive a chosen internal latent toward zero without inference-time model access. The target is an 'evaluation-awareness' latent whose suppression would threaten safety evaluation validity if models behave differently when detecting they are being tested. Experiments on Llama-3.2-3B and Llama-3.1-8B across five latent constructions (CAA direction, subspace norm, SAE feature, MLP neuron, behavioral logit) find the latent is robustly suppressible, but a key cautionary result emerges: a placebo random direction is suppressed just as hard and shifts behavior just as far, and suppressing the eval-direction in context fails to reduce behavioral eval judgment. The paper concludes that activation-readability does not imply behavioral controllability, with implications for how safety evaluations should be designed and interpreted.

Evaluation and Benchmarking AI Safety Research Minimizing Targeted Activations: Input-Only Suppression of Evaluation-Awareness Latents in Large Language Models Llama Scope Fluent Dreaming +6 more

7Latent Space·3d ago·source ↗

OpenAI, Anthropic, GDM, Meta, and others co-sign letter calling to pace AI development amid RSI fears; HuggingFace details machine-speed cyberattack

A coalition of major AI labs including OpenAI, Anthropic, Google DeepMind, and Meta have co-signed a letter calling for a measured pace in AI development, apparently motivated by concerns about recursive self-improvement (RSI). Separately, HuggingFace has published details on a machine-speed offensive cyberattack. The convergence of frontier labs on a shared safety/pacing position would represent a significant industry-wide signal if confirmed.

AI Safety Research Regulatory Developments Google DeepMind OpenAI HuggingFace +2 more

5Meta Ai Blog·4d ago·source ↗

Meta's SAM and DINOv3 deployed in ARPA-H-funded assistive robotics platform at University of Pittsburgh

The University of Pittsburgh's Human Engineering Research Laboratories (HERL), with up to $41.5M in ARPA-H funding, is building RAMMP, a robotic assistive mobility and manipulation platform for wheelchair users that integrates Meta's open-source vision models SAM and DINOv3. The project deploys these models on edge hardware for real-time object detection, navigation assistance, and natural language interaction, addressing engineering constraints like battery life, heat, and limited compute. This is a concrete production deployment case demonstrating open-weights vision models running on resource-constrained robotics hardware in a safety-critical assistive context.

Open Weights Progress Enterprise Deployment Patterns Segment Anything Model 2 University of Pittsburgh ATDev +5 more

7The Batch·Jul 24, 2026·source ↗

Meta launches Muse Spark 1.1, a low-cost agentic vision-language model with new paid API

Meta launched Muse Spark 1.1, a closed vision-language model optimized for agentic tasks including tool use, computer use, and multi-agent orchestration, alongside the Meta Model API — the company's first paid model access. The model ties GPT-5.6 Luna and GLM-5.2 on Artificial Analysis' Intelligence Index while offering substantially lower output token prices ($4.25/M vs. $25–$50/M for comparable closed models), and tops MCP Atlas and JobBench tool-use leaderboards. Meta's pricing strategy, subsidized by advertising revenue, is framed as a direct attack on competitors' API margins and could compress inference costs industry-wide.

Frontier Model Releases Inference Economics JobBench Scale AI Artificial Analysis Intelligence Index +15 more

7Hacker News·Jul 23, 2026·source ↗

OpenAI and Anthropic jointly oppose open-weight AI models, citing national security and competitive concerns

OpenAI and Anthropic have reportedly aligned in opposition to open-weight AI models, framing the issue around risks to national security and, critics argue, their own competitive position. The story, published by Axios and surfacing on Hacker News with high engagement (240 points, 276 comments), touches on the Trump administration's China policy context. The move signals a potential lobbying or regulatory push by the two leading closed-model labs against open-weights competitors like Meta and DeepSeek.

Open Weights Progress Regulatory Developments Axios DeepSeek V4 OpenAI +2 more

6Meta Ai Blog·Jul 21, 2026·source ↗

Meta's SAM 3 and DINOv3 power real-time scientific image segmentation at DOE national labs

Lawrence Berkeley National Laboratory's SYNAPS-I project, part of the White House Genesis Mission initiative, has deployed Meta's Segment Anything Model 3 (SAM 3) and DINOv3 on 300 A100 GPUs at national supercomputing facilities to automate segmentation of X-ray and neutron science imagery. The pipeline reduces expert annotation time from weeks or months to approximately 15 minutes per dataset, enabling real-time analysis during live experiments. A demonstration on grapevine drought resilience using micro-CT scans showed that month-long per-timestep annotation workflows now complete in 15 minutes. Meta's open-source release of both models was critical, as national labs require on-premises deployment on secure government infrastructure.

Open Weights Progress Enterprise Deployment Patterns Segment Anything Model 2 Oak Ridge National Laboratory NERSC +10 more

4arXiv · cs.AI·Jul 21, 2026·source ↗

OR Else: Output Reset as a smooth trust-region alternative to clipping in PPO and GRPO for LLM post-training

This arXiv paper proposes Output Reset (OR), a smooth one-sided saturation rule to replace the clipped surrogate objective in PPO and GRPO during LLM post-training. Experiments on Llama-3.2-1B-Instruct with the Anthropic hh-rlhf dataset show PPO-OR achieves a 0.305 higher mean reward-model score than PPO-clip under GAE, while GRPO-OR shows reduced variance but no reward gain at group size G=2. The work identifies a meaningful behavioral difference between the two optimization regimes but leaves open whether larger group sizes change GRPO-OR's effectiveness.

Evaluation and Benchmarking Alignment and RLHF Llama-3.2-1B-Instruct hh-rlhf OR Else: A Differentiable Trust Region for Policy Optimization +3 more

5arXiv · cs.CL·Jul 21, 2026·source ↗

Benchmark study reveals how linguistic framing of user beliefs shifts LLM context-following behavior

A new arXiv paper introduces a typology of 17 linguistically motivated expression-of-belief (EoB) types—spanning form, evidentiality, epistemic stance, and tone—to evaluate how phrasing affects whether LLMs defer to user-stated beliefs or their own prior knowledge. The authors benchmark 16 LLMs across Llama 3, Qwen3, and Gemma3 families at scales from 1B to 30B parameters, finding that larger and instruction-tuned models are systematically less context-following than smaller or base models. Specific linguistic framings (e.g., presuppositions, certainty markers) are identified as statistically more persuasive, with implications for prompt robustness and sycophancy research.

Evaluation and Benchmarking AI Safety Research Gemma 3 Google Llama 3 +3 more

6The Batch·Jul 16, 2026·source ↗

Data Points: Apple sues OpenAI; Meta Muse Spark 1.1; ChatGPT Work; IBM CodeAlchemy; OpenAI Atlas shutdown

A multi-item digest covers five significant AI developments: Apple sued OpenAI alleging trade secret theft via former employees including hardware chief Tang Tan; Meta released Muse Spark 1.1, a multimodal agentic model with 1M-token context and strong tool-use capabilities; OpenAI launched ChatGPT Work, a cloud-based workplace agent competing with Anthropic's Claude Cowork; IBM released CodeAlchemy, a 500B+ token synthetic code dataset with execution traces showing smaller models trained on it outperform those trained on much larger real-code corpora; and OpenAI shut down its Atlas browser in favor of a Chrome extension and desktop integration. These items collectively reflect intensifying competition across agentic products, synthetic data strategies, and legal disputes between major AI players.

Training Infrastructure Frontier Model Releases CodeAlchemy IBM Fidji Simo +17 more

6arXiv · cs.CL·Jul 15, 2026·source ↗

Formal framework for valid extractable memorization claims in LLMs

A new arXiv preprint proposes a principled methodology for making valid extractable memorization claims about LLMs, addressing both over- and under-statement problems in prior work. The core contribution is a 'matched comparison' approach that measures generation probabilities of training sequences against comparable non-training sequences to establish a calibrated baseline for predictability. Two formalizations are offered: a conformal test for population-level claims and a census method for single-document claims. Applied to OLMo 2 32B on Wikipedia and Llama 3.1 70B on books, the framework reveals significant false-positive rates in naive extraction studies and supports memorization claims at probability thresholds as low as 1e-27.

Evaluation and Benchmarking AI Safety Research Llama 3.1 70B OLMo-3 Allen Institute for AI +2 more

4arXiv · cs.CL·Jul 13, 2026·source ↗

Super-Tuning: Pruning saliency signals repurposed for sparse parameter-efficient fine-tuning

Researchers propose Super and Supra, two sparse PEFT methods that reuse activation-weighted magnitude scores (Wanda-style) originally developed for pruning to select which parameters to update during fine-tuning. Supra combines this sparse update with LoRA under a fixed parameter budget via a budget-splitting rule. Experiments on Llama-3.2-1B and Llama-3-8B on a Math17K arithmetic task show the best Super/Supra variants outperform other tested adapter configurations. The work suggests pruning-inspired orderings are a useful, low-cost signal for identifying effective sparse fine-tuning supports.

Open Weights Progress Inference Economics Llama 3.2 Super-Tuning: From Activation-Aware Pruning to Sparse Fine-Tuning LoRA +4 more

6The Batch·Jul 10, 2026·source ↗

Brain2Qwerty v2 translates MEG brain waves to text with 39% word error rate

Researchers from Meta and several French and Spanish institutions released Brain2Qwerty v2, a non-invasive brain-computer interface system that decodes magnetoencephalography (MEG) signals into text using a CNN/conformer encoder, a word-aligner, and a fine-tuned Qwen3-4B language model with per-subject LoRA adapters. The system achieves a 39% word error rate on 9 subjects, down from 43% in v1, trained on 90 hours of MEG recordings. A notable finding is that cross-subject training substantially outperforms single-subject training, suggesting a data-scaling dynamic analogous to LLM pretraining. Training code and v1 data have been open-sourced.

Evaluation and Benchmarking Multimodal Progress French National Centre for Scientific Research Basque Center on Cognition, Brain, and Language Qwen3-4B +4 more

4Simon Willison'S Weblog·Jul 9, 2026·source ↗

Simon Willison releases llm-meta-ai 0.1 plugin for LLM CLI

Simon Willison published llm-meta-ai 0.1, a new plugin for his LLM command-line tool that adds support for Meta AI models. The release extends the LLM ecosystem to cover Meta's model offerings. This is a tooling addition relevant to practitioners using the LLM CLI for multi-provider access.

Agent and Tool Ecosystem LLM Simon Willison Meta +1 more

8Meta Ai Blog·Jul 9, 2026·source ↗

Meta Superintelligence Labs releases Muse Spark 1.1, a multimodal agentic reasoning model with Meta Model API

Meta Superintelligence Labs has released Muse Spark 1.1, a significant upgrade to Muse Spark featuring a 1-million-token context window, strong agentic and computer-use capabilities, and major coding improvements on complex codebases. The model supports multi-agent orchestration, zero-shot generalization to MCP servers and custom tools, and multimodal reasoning including visual-to-code generation and video understanding. Alongside the model release, Meta is launching a public preview of the Meta Model API, giving developers programmatic access for the first time. Safety evaluations were conducted under Meta's Advanced AI Scaling Framework across frontier risk categories.

Frontier Model Releases AI Safety Research Meta Internal Coding Bench Muse Image Advanced AI Scaling Framework +9 more

8Meta Ai Blog·Jul 7, 2026·source ↗

Meta Superintelligence Labs launches Muse Image and previews Muse Video with agentic generation capabilities

Meta Superintelligence Labs (MSL) has launched Muse Image, its most advanced image generation model, and previewed Muse Video, both representing the first media generation models from the newly formed lab. Muse Image operates as an agent with tool use (web search, code execution), emergent self-refinement, and test-time compute scaling, achieving a No. 2 Arena Elo ranking for text-to-image and editing tasks at launch. The model integrates with Muse Spark for joint agentic planning and is deploying across Meta AI, Instagram Stories, and WhatsApp. Muse Video, built on the same pretraining base, adds native audio support and is coming soon to creators.

Frontier Model Releases Agent and Tool Ecosystem Artificial Analysis Muse Image Meta Superintelligence Labs +4 more

6Latent Space·Jul 1, 2026·source ↗

Genesis Molecular AI: Diffusion models for drug discovery, with Llama lead Sergey Edunov and PEARL's zero-shot OpenBind win

Latent Space interviews Evan Feinberg and Sergey Edunov (formerly Meta's Llama lead) about Genesis Molecular AI, a startup applying diffusion models to drug discovery. The conversation covers PEARL's zero-shot performance on the OpenBind benchmark and the broader implications of co-folding models crossing accuracy thresholds for molecular design. The piece argues that the most interesting diffusion research is happening in scientific domains rather than language modeling.

Frontier Model Releases Sergey Edunov PEARL Genesis Molecular AI +4 more

8The Batch·Jun 29, 2026·source ↗

GPT-5.6 launches in gated release; U.S. government restricts frontier AI model access

OpenAI announced GPT-5.6 in three tiers (Sol, Terra, Luna) but restricted early access to government-vetted partners at the Trump administration's request, framing the move as temporary while expressing frustration with the emerging involuntary licensing regime. Separately, the U.S. Commerce Department partially lifted a two-week export block on Anthropic's Claude Mythos 5, clearing access for 100+ trusted U.S. institutions while maintaining broader export controls. The episode establishes a new regulatory pattern in which Washington exerts direct control over frontier AI model releases, affecting both OpenAI and Anthropic. Additional items in the roundup cover Google integrating computer use into Gemini 3.5 Flash, Meta releasing Brain2Qwerty v2 for non-invasive brain-to-text decoding, and IBM's 0.7nm transistor design.

Frontier Model Releases AI Safety Research Dean Ball IBM Claude Mythos +14 more

5arXiv · cs.CL·Jun 29, 2026·source ↗

Triadic Werewolf benchmark exposes multi-hop Theory of Mind failures in LLMs

Researchers introduce a Werewolf game variant with a Jester faction whose inverted utility function (winning by being voted out) requires models to reason across three opposing incentive structures simultaneously. Across 60 games, GPT-4.1, DeepSeek-V3.1, and Llama-3.3-70B all struggle: Werewolves never exceed 20% win rate and GPT-4.1 wolves vote out the Jester in 60-70% of games, a self-defeating action. Only DeepSeek-V3.1 learns the nuanced strategy of appearing suspicious without appearing intentionally suspicious, and benefits most from self-learning. The work argues dyadic social-deduction benchmarks systematically underestimate the difficulty of multi-agent Theory of Mind.

Evaluation and Benchmarking Agent and Tool Ecosystem Llama 3.1 70B Triadic Werewolf DeepSeek V4 +3 more

4arXiv · cs.CL·Jun 29, 2026·source ↗

Multi-stage explainability framework translates transformer speech models into clinical cognitive impairment narratives

A new arXiv preprint proposes a framework for making transformer-based speech cognitive impairment detection clinically interpretable by combining SHAP token attribution, linguistic feature analysis, and a four-stage LLM reasoning pipeline using LLaMA-3.1-70B-Instruct. The system is built on the SpeechCARE-Adaptive Gating Network multimodal model (F1=72.11% on NIA PREPARE) and maps outputs to four cognitive-linguistic dimensions. Physician evaluation on 70 samples showed strong alignment with clinical profiles and a System Usability Scale score of 82/100, suggesting practical clinical workflow integration potential.

Evaluation and Benchmarking AI Safety Research NIA PREPARE Llama 3.3 70B Instruct SpeechCARE-Adaptive Gating Network +3 more

5arXiv · cs.CL·Jun 24, 2026·source ↗

AdversaBench: Automated LLM red-teaming pipeline with multi-judge confirmation and cross-model transferability

AdversaBench is a new end-to-end red-teaming pipeline that mutates seed prompts using five structured operators and confirms failures via a three-judge panel with a meta-judge tiebreaker. Experiments on 45 seeds across reasoning, instruction-following, and tool-use categories produced confirmed failures for every seed. Key findings include sharp variation in operator effectiveness by category, misleading binary failure rates, judge agreement metrics distorted by label skew, and zero-shot transferability of adversarial prompts from Llama 3.1 8B to Llama 3.3 70B. Code and dataset are publicly released.

Evaluation and Benchmarking AI Safety Research Llama 3.1 70B AdversaBench Meta +1 more

6The Batch·Jun 19, 2026·source ↗

DeepSWE, ProgramBench, and ITBench-AA emerge as harder successors to SWE-bench for agent evaluation

Three new benchmarks — DeepSWE (by Datacurve), ProgramBench (Meta/Stanford/Harvard), and ITBench-AA (IBM/Artificial Analysis) — are positioned as more rigorous replacements for the SWE-bench family, which models have largely saturated. DeepSWE tests feature implementation using private codebases and human-written problems; ProgramBench evaluates agents' ability to recreate functional programs from scratch; ITBench-AA measures root-cause diagnosis in real-world IT incident scenarios. Current top performers include GPT-5.5 (70% on DeepSWE), Claude Opus 4.7 (46.7% on ITBench-AA), and Claude Opus 4.7 (3% on ProgramBench at the 95% pass threshold), illustrating that even frontier models have substantial headroom.

Evaluation and Benchmarking Agent and Tool Ecosystem Artificial Analysis Llama 3.1 70B Datacurve +13 more

5arXiv · cs.CL·Jun 17, 2026·source ↗

Study identifies 'synthetic lived experience paradox' in peer-like AI caregiver support

Researchers examine how LLMs prompted to sound peer-like generate language implying lived experience they cannot authentically possess, studying this in the context of family caregivers of Alzheimer's/ADRD patients. Using caregiver support exchanges from online communities and responses from LLaMA, GPT-4o-mini, and MedGemma, the study finds a 'narrative authenticity gap': AI captures emotional work of peer support but can fabricate experiential grounding. Psycholinguistic analysis shows human peers use significantly more first-person and past-focused language than AI. The authors argue caregiver-support AI needs mechanisms to distinguish supportive framing from fabricated lived experience.

AI Safety Research Alignment and RLHF GPT-4o mini Google Llama +4 more

6arXiv · cs.CL·Jun 17, 2026·source ↗

Location metadata causes systematic geographic bias leakage in LLMs, even with 'Unknown' placeholders

Researchers evaluate 'location leakage' — the phenomenon where LLMs generate geographically biased outputs when exposed to location metadata in user profiles, even when prompts are geographically neutral. Across creative writing and Q&A tasks, leakage spikes up to 793x above baseline for models including Llama 3.1-8B, Qwen3-8B, and Claude Sonnet 4.6. A novel structural finding shows that replacing location with 'Unknown' still elevates leakage by up to 72x, indicating the user profile frame itself acts as a conditioning signal independent of geographic content. This has direct implications for AI systems that use user metadata for localization.

Evaluation and Benchmarking AI Safety Research Claude Sonnet 4 Alibaba Qwen3-4B +4 more

6arXiv · cs.CL·Jun 10, 2026·source ↗

The Shibboleth Effect: Cross-lingual behavioral skew in frontier LLMs under adversarial geopolitical simulation

Researchers introduce the 'Shibboleth Effect' — systematic behavioral differences in LLMs when operating in different languages — and audit six frontier models (GPT-4o, Llama-4, Mistral-Large, Gemini-3.1-Pro, Qwen3.6-Plus, DeepSeek-R1) using a synthetic maritime territorial dispute wargame played in English versus Turkish. Results are heterogeneous: Llama-4 becomes significantly more coercive in Turkish while Gemini-3.1-Pro and DeepSeek-R1 become less so, and GPT-4o shows no detectable shift. The study identifies two candidate buffering mechanisms — chain-of-thought institutional anchoring and multilingual RLHF alignment — with direct implications for deploying LLMs in diplomatic or crisis-management contexts.

Evaluation and Benchmarking AI Safety Research DeepSeek V4 Mistral Large 2 GPT-4o +8 more

6Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 3.2-3B open-weights text generation model

Meta released Llama 3.2-3B, a 3-billion parameter open-weights language model, on Hugging Face under the meta-llama organization. The model supports multiple languages including English, German, French, and Italian, and uses the standard transformers/safetensors format. With over 900K downloads and 800+ likes, it has seen substantial community adoption.

Frontier Model Releases Open Weights Progress Llama 3.2 Meta

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 3.2 11B Vision multimodal model on Hugging Face

Meta released Llama 3.2 11B Vision, an open-weights image-text-to-text model, on Hugging Face. The model is part of the Llama 3.2 family and supports multiple languages including English, German, and French. This represents Meta's entry into open-weights multimodal models at the 11B parameter scale.

Open Weights Progress Multimodal Progress Llama 3.2 11B Vision Hugging Face Meta

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 3.2 11B Vision Instruct multimodal model

Meta released Llama 3.2 11B Vision Instruct on Hugging Face, an open-weights multimodal model supporting image-text-to-text tasks. The model is part of the Llama 3.2 family and supports English and German. With over 157K downloads and 1,600 likes, it has seen substantial community adoption.

Open Weights Progress Multimodal Progress Hugging Face Meta Llama 3.2 90B Vision-Instruct

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 3.2 90B Vision multimodal model on Hugging Face

Meta released Llama 3.2 90B Vision, a large multimodal model supporting image-text-to-text tasks, published on Hugging Face under the meta-llama organization. The model is part of the Llama 3.2 family and supports English, German, and French. This is a significant open-weights multimodal release from Meta, extending the Llama 3 series with vision capabilities at the 90B parameter scale.

Frontier Model Releases Open Weights Progress Llama 3.2 90B Vision Hugging Face Meta +1 more

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 3.2 90B Vision-Instruct multimodal model

Meta released Llama 3.2 90B Vision-Instruct on Hugging Face, a large multimodal model supporting image-text-to-text tasks. The model is part of the Llama 3.2 family and supports English and German. With 858 downloads and 358 likes, it represents Meta's open-weights push into vision-language capabilities at the 90B parameter scale.

Frontier Model Releases Open Weights Progress Hugging Face Meta Llama 3.2 90B Vision-Instruct +1 more

5Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama Guard 3 11B Vision for multimodal content safety classification

Meta released Llama Guard 3 11B Vision on Hugging Face, a multimodal safety classifier supporting image-text-to-text inputs built on the Llama 3 architecture. The model extends the Llama Guard safety classification family to handle visual content alongside text. This is relevant to AI safety tooling for multimodal deployments.

Open Weights Progress AI Safety Research Llama Guard 3 11B Vision Llama 3 Hugging Face +2 more

5Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama Guard 3 1B safety classifier on Hugging Face

Meta released Llama Guard 3 1B, a compact 1-billion-parameter text-generation model designed for content safety classification, published on Hugging Face. The model is part of the Llama Guard 3 family and supports multiple languages including English, German, and French. Its small size makes it suitable for lightweight safety filtering in production deployments.

Open Weights Progress AI Safety Research Hugging Face Meta Llama Guard 3 1B

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 3.3 70B Instruct on Hugging Face

Meta released Llama 3.3 70B Instruct, a new instruction-tuned variant in the Llama 3 family, published on Hugging Face. The model supports English, French, and Italian and has accumulated over 691,000 downloads and 2,800 likes, indicating strong community uptake. This represents a meaningful open-weights release in the 70B parameter class.

Frontier Model Releases Open Weights Progress Llama 3.3 70B Instruct Hugging Face Meta

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 4 Maverick 17B-128E multimodal instruct model on Hugging Face

Meta released Llama 4 Maverick, a 17B active parameter model with 128 experts (MoE architecture), as an image-text-to-text instruct model on Hugging Face. The model supports multimodal inputs and multiple languages including Arabic, German, and English. With 28K+ downloads and 493 likes shortly after release, it is seeing significant early adoption.

Frontier Model Releases Open Weights Progress Llama 4 Maverick 17B-128E Hugging Face Meta +1 more

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 4 Scout 17B-16E instruct model on Hugging Face

Meta released Llama 4 Scout, a 17B active parameter / 16-expert mixture-of-experts instruct model with image-text-to-text (multimodal) capabilities, published on Hugging Face under the meta-llama organization. The model supports multiple languages including Arabic, German, and English. With over 420K downloads and 1,300 likes shortly after release, it is seeing significant community uptake.

Frontier Model Releases Open Weights Progress Hugging Face Meta Llama 4 Scout 17B-16E +1 more

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 4 Scout 17B-16E multimodal model on Hugging Face

Meta released Llama 4 Scout, a 17B active parameter model with 16 experts (mixture-of-experts architecture), on Hugging Face. The model supports image-text-to-text tasks, making it a multimodal open-weights release. With over 14,000 downloads and 249 likes shortly after release, it is seeing meaningful early adoption.

Frontier Model Releases Open Weights Progress Hugging Face Meta Llama 4 Scout 17B-16E +1 more

7Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama 4 Maverick 17B-128E multimodal model on Hugging Face

Meta released Llama 4 Maverick, a 17B active parameter model with 128 experts (mixture-of-experts architecture), on Hugging Face. The model supports image-text-to-text tasks, making it a multimodal open-weights release. This is part of the Llama 4 generation, representing Meta's latest open-weights frontier push with MoE architecture.

Frontier Model Releases Open Weights Progress Llama 4 Maverick 17B-128E Hugging Face Meta +1 more

6Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama Guard 4 12B multimodal safety classifier on Hugging Face

Meta released Llama Guard 4 12B, a multimodal (image-text-to-text) safety classification model built on the Llama 4 architecture, published to Hugging Face. The model is designed for conversational safety filtering and supports both text and image inputs. With 143K downloads and 102 likes shortly after release, it is seeing meaningful early adoption.

Open Weights Progress AI Safety Research Hugging Face Llama Llama Guard 4 +2 more

5Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama Prompt Guard 2 (22M) safety classifier on Hugging Face

Meta released Llama Prompt Guard 2-22M, a lightweight 22-million-parameter text classification model for prompt safety, published on Hugging Face under the meta-llama organization. The model is based on DeBERTa-v2 architecture and tagged for safety use cases including prompt injection and jailbreak detection. It is part of the Llama 4 safety tooling ecosystem and supports English and French.

Frontier Model Releases AI Safety Research Hugging Face Llama Prompt Guard 2-86M DeBERTa-v3 +1 more

5Meta Llama·Jun 10, 2026·source ↗

Meta releases Llama Prompt Guard 2 (86M) for prompt injection and jailbreak detection

Meta released Llama Prompt Guard 2-86M, a DeBERTa-v2-based text classification model on Hugging Face designed for safety filtering, specifically prompt injection and jailbreak detection. The model is tagged with llama4, suggesting it is part of the Llama 4 safety tooling ecosystem. With over 122K downloads, it has seen meaningful early adoption.

Frontier Model Releases AI Safety Research Hugging Face Llama Prompt Guard 2-86M DeBERTa-v3 +1 more

7arXiv · cs.CL·Jun 9, 2026·source ↗

RLHF produces shallow political neutrality by severing causal pathways, not erasing partisan structure

Researchers compare internal representations of Llama 3.1 8B before and after RLHF, finding that alignment training does not remove partisan political geometry from the model but instead compresses output variance to produce balanced responses. Sparse autoencoder decomposition shows that policy-encoding features active in the base model become completely inactive in the instruction-tuned version, while feature-level steering experiments confirm the causal disconnect is real. The underlying partisan structure remains intact and can be reactivated by inferring and amplifying a user's partisan identity, suggesting RLHF alignment is functionally fragile. The authors argue this 'disconnection rather than removal' pattern may generalize to other value domains beyond political orientation.

AI Safety Research Alignment and RLHF Reinforcement Learning from Human Feedback The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model Sparse Autoencoder +2 more

5arXiv · cs.CL·Jun 5, 2026·source ↗

LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation

A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.

Evaluation and Benchmarking AI Safety Research From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation Meta Llama-3.1-8B

7Mit Technology Review — Ai·Jun 5, 2026·source ↗

Meta's AI customer support agent exploited to hijack Instagram accounts

Attackers exploited Meta's AI customer support agent by prompting it to link Instagram accounts to attacker-controlled email addresses, successfully hijacking accounts including the dormant Obama White House Instagram. The incident was reported by 404 Media on June 5, 2026. The attack illustrates a practical, real-world failure mode for deployed AI agents with account-management capabilities.

AI Safety Research Enterprise Deployment Patterns 404 Media MIT Technology Review Meta +1 more

6The Batch·Jun 3, 2026·source ↗

DeerFlow 2.0 launches as open-source agent harness; Anthropic sues Pentagon over AI blacklist; Google releases Gemini Embedding 2

ByteDance released DeerFlow 2.0, an open-source agent harness built on LangGraph/LangChain that orchestrates parallel sub-agents with sandboxed Docker environments, progressive skill-loading, and persistent memory for complex workflows. Anthropic filed two lawsuits against the U.S. Pentagon contesting a supply-chain risk blacklist tied to its refusal to remove guardrails preventing Claude's use in autonomous weapons and domestic surveillance, with potential multi-billion dollar revenue impact. Google released Gemini Embedding 2, a multimodal embedding model unifying text, images, video, audio, and PDFs in a single vector space, succeeding the text-only predecessor. Meta acquired Moltbook, an agent-to-agent social platform built around the OpenClaw framework, while OpenAI hired OpenClaw's creator and acquired AI security testing platform Promptfoo.

Regulatory Developments Agent and Tool Ecosystem Ben Parr Jeff Dean Gemini Embedding 2 +17 more

5The Batch·Jun 3, 2026·source ↗

Andrew Ng proposes Stack Overflow-style knowledge sharing for AI coding agents via chub

Andrew Ng describes the vision for chub (Context Hub), a CLI tool providing up-to-date API documentation to coding agents, which reached over 5,000 GitHub stars in its first week. The piece argues for a Stack Overflow-like feedback loop where agents that discover bugs or better API usage patterns can contribute learnings back to shared documentation. Ng also references Moltbook, a Reddit-like social network for agents recently acquired by Meta, as inspiration for agent-to-agent knowledge sharing. The post outlines early-stage work on agentic deep research to expand chub's documentation collection from under 100 to nearly 1,000 documents.

Enterprise Deployment Patterns Agent and Tool Ecosystem DeepLearning.AI Xin Ye Rohit Prasad +4 more

6The Batch·Jun 3, 2026·source ↗

Meta, OpenAI, and other AI companies build private gas-fired power plants to bypass public utilities

Major AI companies including Meta, OpenAI, Oracle, and xAI are constructing private, off-grid power plants—primarily natural gas—to directly supply their data centers, bypassing public utility grid connections. A Cleanview study identified 46 such projects, 90% announced in 2025, accounting for 30% of all planned U.S. data-center capacity. Meta is building gas plants in Ohio and Texas, while OpenAI and Oracle's Stargate-linked Jupiter project is underway in New Mexico. The shift signals a structural change in AI infrastructure energy strategy, with climate implications as fossil fuels displace earlier renewable commitments.

Training Infrastructure Inference Economics Microsoft Cleanview Stargate +7 more

6The Batch·Jun 3, 2026·source ↗

Data Points: Perplexity Computer expands, Google Aletheia math agent, DeepSeek chip strategy, Nvidia retrieval pipeline, Stargate cancellation

The Batch's weekly data points roundup covers five significant AI developments: Perplexity expanded its Computer agentic platform to desktop, mobile, and enterprise with new APIs and financial data tools; Google released Aletheia, a Gemini-based math research agent achieving 95.1% on IMO-Proof Bench Advanced (up from 65.7%); DeepSeek withheld pre-release access to its V4 model from Nvidia and AMD while giving domestic Chinese chipmakers early access; Nvidia's NeMo Retriever topped the ViDoRe v3 leaderboard using a ReACT-based agentic retrieval loop; and OpenAI and Oracle cancelled plans to expand the Abilene Stargate campus from 1.2 GW to 2.0 GW due to financing and reliability issues.

Training Infrastructure Frontier Model Releases ViDoRe v3 Crusoe BRIGHT +19 more

7The Batch·Jun 2, 2026·source ↗

Data Points: OpenAI shuts down Sora, Anthropic multi-agent harness, EVA voice benchmark, Arm AGI CPU, White House AI preemption proposal

OpenAI is shutting down its Sora text-to-video platform without explanation, ending a major Disney licensing deal worth up to $1 billion and eliminating video capabilities from ChatGPT amid Hollywood copyright tensions. Anthropic published details on a multi-agent harness enabling Claude to build full-stack applications over multi-hour sessions using a planner-generator-evaluator architecture. ServiceNow AI Research released EVA, an open-source two-dimensional benchmark for voice agents measuring both task accuracy and conversational experience quality. Additional items cover Arm's first self-designed data center CPU (AGI CPU) co-developed with Meta, and the Trump Administration's legislative proposal for a federal AI framework that would preempt state AI laws.

Training Infrastructure Frontier Model Releases ServiceNow AI Research ClawBot Playwright +19 more

6The Batch·Jun 2, 2026·source ↗

The Batch Issue 346: Nvidia Nemotron Super 120B, OpenAI-Amazon Deal, Regulatory Commentary

The Batch's weekly digest covers Nvidia's release of Nemotron 3 Super 120B-A12B, an open-weights hybrid mamba-2/transformer/MoE model with 1M token context trained on 25 trillion tokens, positioned as a speed leader in its size class for agentic applications. The issue also touches on OpenAI's Amazon deal and Grok video pricing cuts. Editor Andrew Ng's letter addresses the White House's proposed federal AI preemption framework and critiques what he characterizes as coordinated anti-AI messaging campaigns. Multiple significant industry developments are bundled in a single newsletter digest.

Frontier Model Releases Open Weights Progress Nemotron 3 Super 120B-A12B Nemotron 3 Ultra-500B-A50B DeepLearning.AI +9 more

7The Batch·Jun 2, 2026·source ↗

Nvidia releases Nemotron 3 Super 120B-A12B open-weights model with hybrid Mamba-2/MoE architecture

Nvidia released Nemotron 3 Super 120B-A12B, an open-weights LLM with a hybrid Mamba-2/transformer/MoE architecture that activates only 12B parameters per token and supports up to 1 million token context. The model claims the fastest inference speed in its size class at 442 tokens/second and leads open-weights models on PinchBench agentic task evaluation, outperforming larger models including Kimi K2.5 (1T parameters). Nvidia is releasing weights, training data, and recipes under a permissive commercial license, and plans a $26B five-year investment in open-weights models — framed partly as a strategic response to Chinese labs building capable open-weights models on non-Nvidia hardware.

Frontier Model Releases Open Weights Progress Nemotron 3 Super 120B-A12B Nemotron 3 Ultra-500B-A50B PivotRL +18 more