7Meta AI Blog·1mo ago

SAM 3.1: Meta Releases Faster Real-Time Video Segmentation Model with Object Multiplexing

Meta has released SAM 3.1, an incremental update to Segment Anything Model 3, introducing object multiplexing that allows tracking up to 16 objects in a single forward pass. This doubles video processing throughput from 16 to 32 FPS on a single H100 GPU, reducing GPU resource requirements and enabling real-time tracking on smaller hardware. SAM 3.1 is a drop-in replacement for SAM 3 and is available via updated model checkpoints and codebase. The broader SAM 3 release also includes text and exemplar prompting, a new Segment Anything Playground, the SA-Co evaluation dataset, and SAM 3D for 3D reconstruction.

Evaluation and Benchmarking Inference Economics Agent and Tool Ecosystem Multimodal Progress SA-Co Segment Anything Playground Conservation X Labs SAM 3.1 NVIDIA H100 object multiplexing Osa Conservation Meta SAM 3D

Related guides (4)

Multimodal ProgressTopic guide

Multimodal Progress: How AI Learned to See, Hear, and Act

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Inference EconomicsTopic guide

Inference Economics: The Cost Structure of Running AI Models in Production

Read asIn-depth

Evaluation and BenchmarkingTopic guide

Evaluation and Benchmarking: How We Measure AI — and Why It Keeps Getting Harder

Read asBeginner In-depth

Related events (8)

6Github Trending·29d ago·source ↗

Meta SAM 3 (Segment Anything Model 3) Released on GitHub

Meta / Facebook Research has released SAM 3, the third generation of their Segment Anything Model, with code for inference and finetuning, pretrained model checkpoints, and example notebooks. The repository has accumulated over 10,000 stars with strong daily momentum (+93). SAM 3 continues Meta's open-weights tradition in computer vision foundation models. No accompanying paper or technical blog post is referenced in this item.

Open Weights Progress Multimodal Progress Segment Anything Model 2 Meta AI Facebook AI Research

7Meta Ai Blog·1mo ago·source ↗

Meta Introduces SAM Audio: Unified Multimodal Model for Audio Separation with PE-AV, Benchmark, and Judge Model

Meta has released SAM Audio, a unified multimodal audio separation model that accepts text, visual, and temporal span prompts to isolate sounds from complex audio mixtures. The system is powered by Perception Encoder Audiovisual (PE-AV), an extension of Meta's open-source Perception Encoder released earlier in 2025, and uses a flow-matching diffusion transformer architecture. Alongside the model, Meta is releasing SAM Audio-Bench (the first in-the-wild audio separation benchmark) and SAM Audio Judge (an automatic evaluation model for audio separation). All components are available today via the Segment Anything Playground.

Evaluation and Benchmarking Agent and Tool Ecosystem SAM Audio Judge Segment Anything Model 2 SAM Audio +7 more

5arXiv · cs.AI·4d ago·source ↗

ActiveSAM: Training-free open-vocabulary segmentation via image-conditional class pruning on SAM 3

ActiveSAM is a training-free, zero-shot inference framework that wraps Segment Anything Model 3 (SAM 3) to perform open-vocabulary semantic segmentation more efficiently. It estimates an image-conditioned active class subset at low resolution before running full-resolution decoding only on retained classes, using bucketed prompt multiplexing and margin-aware background calibration. Across eight benchmarks, it outperforms the prior state-of-the-art SegEarth-OV3 by ~1.4 mIoU on average while running up to 5.5x faster on large-vocabulary datasets, with strong robustness to image corruption relevant to autonomous driving and embodied AI.

Evaluation and Benchmarking Inference Economics VILA-Lab Segment Anything Model 2 ActiveSAM +1 more

4Meta Ai Blog·1mo ago·source ↗

USRA Applies SAM 2 Fine-Tuning for Real-Time Flood and River Monitoring

The Universities Space Research Association (USRA) and Meta are collaborating with the U.S. Geological Survey (USGS) to apply a fine-tuned version of SAM 2 for automated water segmentation in drone and satellite imagery, targeting real-time flood detection and river extent mapping. The fine-tuned model replaces a labor-intensive manual digitization workflow that was a key bottleneck in rapid-response image analysis. The system integrates with PlanetScope satellite imagery and USGS 3D Hydrography data, with case studies in the Chesapeake Bay area showing promise for nationwide deployment. The collaboration also anticipates leveraging the recently released SAM 3 for unified detection, segmentation, and tracking.

Agent and Tool Ecosystem Multimodal Progress Segment Anything Model 2 SAM 3.1 Universities Space Research Association +5 more

9Deepseek News·1mo ago·source ↗

DeepSeek V4 Preview Release: 1.6T-param Pro and 284B Flash Models with 1M Context, Open-Sourced

DeepSeek has released DeepSeek-V4 as an open-weights preview, comprising two MoE variants: V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters). Both models support 1M token context by default, enabled by a novel Token-wise compression and DeepSeek Sparse Attention (DSA) architecture. V4-Pro claims open-source SOTA on agentic coding benchmarks and world-class math/STEM/coding performance rivaling top closed-source models, while V4-Flash offers near-parity reasoning at lower cost and latency. The API is live today with OpenAI and Anthropic compatibility, and legacy model endpoints will be retired in July 2026.

Long Context Evolution Frontier Model Releases DeepSeek V4 DeepSeek-V4-Flash Claude Code +7 more

5Meta Ai Blog·1mo ago·source ↗

UPenn PRONTO Team Uses Meta's SAM 2 and DINO for Autonomous Military Medical Triage in DARPA Challenge

The University of Pennsylvania's PRONTO team is applying Meta's Segment Anything Model 2 (SAM 2) and DINO/Grounding DINO models to autonomous robotic triage in DARPA's three-year mass casualty incident challenge. The multi-robot system uses drones and ground robots to locate victims, then runs parallel injury classification pipelines combining SAM, DINO, and pose estimation to assess heart rate, respiration, wounds, and amputations without requiring labeled training data. Results are surfaced to first responders via a mobile interface for real-time prioritization. Phase 2 concluded in October 2025, with Phase 3 expected to push toward deployment-ready performance.

Enterprise Deployment Patterns Agent and Tool Ecosystem Segment Anything Model 2 Grounding DINO Meta AI +6 more

6arXiv · cs.CL·2d ago·source ↗

OmniAgent: POMDP-based active perception agent for long video understanding with test-time scaling

Researchers introduce OmniAgent, a multimodal agent that reformulates long video understanding as a POMDP-based iterative Observation-Thought-Action cycle, selectively distilling audio-visual cues into persistent textual memory rather than processing all frames uniformly. The system uses Agentic Supervised Fine-Tuning and a novel reinforcement learning method (TAURA) with turn-level entropy for credit assignment. OmniAgent demonstrates positive test-time scaling and achieves state-of-the-art open-source results across ten benchmarks, with its 7B model outperforming Qwen2.5-VL-72B on LVBench (50.5% vs. 47.3%).

Inference Economics Agent and Tool Ecosystem OmniAgent Qwen2.5-VL-72B LVBench +4 more

6Google Deepmind Blog·1mo ago·source ↗

D4RT: DeepMind's Unified 4D Reconstruction and Tracking System, Up to 300x Faster

DeepMind has announced D4RT, a system for unified four-dimensional (spatial + temporal) scene reconstruction and tracking. The method claims up to 300x speed improvements over prior approaches. The announcement positions D4RT as a significant efficiency advance in dynamic 3D scene understanding, with potential applications in robotics, video understanding, and embodied AI.

Agent and Tool Ecosystem Multimodal Progress DeepMind 4D reconstruction D4RT +1 more