USRA Applies SAM 2 Fine-Tuning for Real-Time Flood and River Monitoring
The Universities Space Research Association (USRA) and Meta are collaborating with the U.S. Geological Survey (USGS) to apply a fine-tuned version of SAM 2 for automated water segmentation in drone and satellite imagery, targeting real-time flood detection and river extent mapping. The fine-tuned model replaces a labor-intensive manual digitization workflow that was a key bottleneck in rapid-response image analysis. The system integrates with PlanetScope satellite imagery and USGS 3D Hydrography data, with case studies in the Chesapeake Bay area showing promise for nationwide deployment. The collaboration also anticipates leveraging the recently released SAM 3 for unified detection, segmentation, and tracking.
Related guides (3)
Related events (8)
SAM 3.1: Meta Releases Faster Real-Time Video Segmentation Model with Object Multiplexing
Meta has released SAM 3.1, an incremental update to Segment Anything Model 3, introducing object multiplexing that allows tracking up to 16 objects in a single forward pass. This doubles video processing throughput from 16 to 32 FPS on a single H100 GPU, reducing GPU resource requirements and enabling real-time tracking on smaller hardware. SAM 3.1 is a drop-in replacement for SAM 3 and is available via updated model checkpoints and codebase. The broader SAM 3 release also includes text and exemplar prompting, a new Segment Anything Playground, the SA-Co evaluation dataset, and SAM 3D for 3D reconstruction.
UPenn PRONTO Team Uses Meta's SAM 2 and DINO for Autonomous Military Medical Triage in DARPA Challenge
The University of Pennsylvania's PRONTO team is applying Meta's Segment Anything Model 2 (SAM 2) and DINO/Grounding DINO models to autonomous robotic triage in DARPA's three-year mass casualty incident challenge. The multi-robot system uses drones and ground robots to locate victims, then runs parallel injury classification pipelines combining SAM, DINO, and pose estimation to assess heart rate, respiration, wounds, and amputations without requiring labeled training data. Results are surfaced to first responders via a mobile interface for real-time prioritization. Phase 2 concluded in October 2025, with Phase 3 expected to push toward deployment-ready performance.
ActiveSAM: Training-free open-vocabulary segmentation via image-conditional class pruning on SAM 3
ActiveSAM is a training-free, zero-shot inference framework that wraps Segment Anything Model 3 (SAM 3) to perform open-vocabulary semantic segmentation more efficiently. It estimates an image-conditioned active class subset at low resolution before running full-resolution decoding only on retained classes, using bucketed prompt multiplexing and margin-aware background calibration. Across eight benchmarks, it outperforms the prior state-of-the-art SegEarth-OV3 by ~1.4 mIoU on average while running up to 5.5x faster on large-vocabulary datasets, with strong robustness to image corruption relevant to autonomous driving and embodied AI.
Forest Research Deploys DINOv2 for National-Scale Tree Canopy Monitoring in England
Forest Research, the UK Forestry Commission's research agency, is using Meta's DINOv2 computer vision model—trained on 18 million satellite images in collaboration with the World Resources Institute—to build enhanced tree canopy height maps at 1-meter resolution for England. The approach aims to replace expensive LiDAR and survey data with open-source AI-derived canopy height models applied to national aerial photography, enabling rolling three-year monitoring cycles. The deployment supports the UK government's Environmental Improvement Plan targets and the Natural Capital and Ecosystem Assessment program. Meta also announced DINOv3 as a successor to further improve visual intelligence for such applications.
Meta and World Resources Institute Release Canopy Height Maps v2 Using DINOv3 Self-Supervised Vision Model
Meta AI and the World Resources Institute have released Canopy Height Maps v2 (CHMv2), an open-source global forest mapping system powered by DINOv3, Meta's self-supervised vision model pre-trained on SAT-493M, a large satellite imagery dataset. The new model improves R² accuracy from 0.53 to 0.86 over the previous DINOv2-based version, with better performance on tall trees and greater geographic consistency. CHMv2 is already being adopted by the UK Forestry Commission, the European Commission's Joint Research Centre, and multiple US city planning initiatives. The model, maps, and dataset are publicly available.
Mistral AI Demonstrates Pixtral-12B Fine-Tuning on Satellite Imagery via LoRA
Mistral AI published a technical case study showing how fine-tuning Pixtral-12B using LoRA on the Aerial Image Dataset (AID) significantly improves satellite image classification over the base model. The post details the fine-tuning workflow via Mistral's API and LaPlateforme UI, covering hyperparameter selection and structured output enforcement. Key improvements include better handling of ambiguous scene categories (e.g., Playground vs. Stadium) and reduced hallucination of invalid class labels. The article positions domain-specific fine-tuning as a practical bridge between general-purpose vision-language models and specialized geospatial applications.
Meta SAM 3 (Segment Anything Model 3) Released on GitHub
Meta / Facebook Research has released SAM 3, the third generation of their Segment Anything Model, with code for inference and finetuning, pretrained model checkpoints, and example notebooks. The repository has accumulated over 10,000 stars with strong daily momentum (+93). SAM 3 continues Meta's open-weights tradition in computer vision foundation models. No accompanying paper or technical blog post is referenced in this item.
Meta Introduces SAM Audio: Unified Multimodal Model for Audio Separation with PE-AV, Benchmark, and Judge Model
Meta has released SAM Audio, a unified multimodal audio separation model that accepts text, visual, and temporal span prompts to isolate sounds from complex audio mixtures. The system is powered by Perception Encoder Audiovisual (PE-AV), an extension of Meta's open-source Perception Encoder released earlier in 2025, and uses a flow-matching diffusion transformer architecture. Alongside the model, Meta is releasing SAM Audio-Bench (the first in-the-wild audio separation benchmark) and SAM Audio Judge (an automatic evaluation model for audio separation). All components are available today via the Segment Anything Playground.


