Almanac
← Events
6GitHub Trending (AI/LLM filtered)·29d ago

Meta SAM 3 (Segment Anything Model 3) Released on GitHub

Meta / Facebook Research has released SAM 3, the third generation of their Segment Anything Model, with code for inference and finetuning, pretrained model checkpoints, and example notebooks. The repository has accumulated over 10,000 stars with strong daily momentum (+93). SAM 3 continues Meta's open-weights tradition in computer vision foundation models. No accompanying paper or technical blog post is referenced in this item.

Related guides (2)

Related events (8)

7Meta Ai Blog·1mo ago·source ↗

SAM 3.1: Meta Releases Faster Real-Time Video Segmentation Model with Object Multiplexing

Meta has released SAM 3.1, an incremental update to Segment Anything Model 3, introducing object multiplexing that allows tracking up to 16 objects in a single forward pass. This doubles video processing throughput from 16 to 32 FPS on a single H100 GPU, reducing GPU resource requirements and enabling real-time tracking on smaller hardware. SAM 3.1 is a drop-in replacement for SAM 3 and is available via updated model checkpoints and codebase. The broader SAM 3 release also includes text and exemplar prompting, a new Segment Anything Playground, the SA-Co evaluation dataset, and SAM 3D for 3D reconstruction.

5arXiv · cs.AI·4d ago·source ↗

ActiveSAM: Training-free open-vocabulary segmentation via image-conditional class pruning on SAM 3

ActiveSAM is a training-free, zero-shot inference framework that wraps Segment Anything Model 3 (SAM 3) to perform open-vocabulary semantic segmentation more efficiently. It estimates an image-conditioned active class subset at low resolution before running full-resolution decoding only on retained classes, using bucketed prompt multiplexing and margin-aware background calibration. Across eight benchmarks, it outperforms the prior state-of-the-art SegEarth-OV3 by ~1.4 mIoU on average while running up to 5.5x faster on large-vocabulary datasets, with strong robustness to image corruption relevant to autonomous driving and embodied AI.

7Meta Ai Blog·1mo ago·source ↗

Meta Introduces SAM Audio: Unified Multimodal Model for Audio Separation with PE-AV, Benchmark, and Judge Model

Meta has released SAM Audio, a unified multimodal audio separation model that accepts text, visual, and temporal span prompts to isolate sounds from complex audio mixtures. The system is powered by Perception Encoder Audiovisual (PE-AV), an extension of Meta's open-source Perception Encoder released earlier in 2025, and uses a flow-matching diffusion transformer architecture. Alongside the model, Meta is releasing SAM Audio-Bench (the first in-the-wild audio separation benchmark) and SAM Audio Judge (an automatic evaluation model for audio separation). All components are available today via the Segment Anything Playground.

8Hugging Face Blog·1mo ago·source ↗

Welcome Llama 3 - Meta's new open LLM

Hugging Face published a blog post welcoming Meta's Llama 3 release, covering the new open-weights large language models. Llama 3 represents a significant update to Meta's open model family, with improved capabilities over Llama 2. The post covers integration and availability on the Hugging Face platform.

7Meta Llama·11d ago·source ↗

Meta releases Llama 3.2 90B Vision multimodal model on Hugging Face

Meta released Llama 3.2 90B Vision, a large multimodal model supporting image-text-to-text tasks, published on Hugging Face under the meta-llama organization. The model is part of the Llama 3.2 family and supports English, German, and French. This is a significant open-weights multimodal release from Meta, extending the Llama 3 series with vision capabilities at the 90B parameter scale.

7Meta Llama·11d ago·source ↗

Meta releases Llama 3.2 90B Vision-Instruct multimodal model

Meta released Llama 3.2 90B Vision-Instruct on Hugging Face, a large multimodal model supporting image-text-to-text tasks. The model is part of the Llama 3.2 family and supports English and German. With 858 downloads and 358 likes, it represents Meta's open-weights push into vision-language capabilities at the 90B parameter scale.

6Meta Ai Blog·1mo ago·source ↗

Meta Introduces TRIBE v2: Predictive Foundation Model for Human Brain Activity

Meta AI has released TRIBE v2, a foundation model that predicts high-resolution fMRI brain activity in response to visual, auditory, and language stimuli. Trained on data from over 700 healthy volunteers, it achieves a 70x resolution increase over comparable models and supports zero-shot generalization to new subjects, languages, and tasks. The release includes model weights, codebase, a research paper, and an interactive demo under a CC BY-NC license. Meta positions the work as a bridge between neuroscience and AI development, enabling hypothesis testing without requiring human subjects in every experiment.

7Meta Llama·11d ago·source ↗

Meta releases Llama 3.2 11B Vision multimodal model on Hugging Face

Meta released Llama 3.2 11B Vision, an open-weights image-text-to-text model, on Hugging Face. The model is part of the Llama 3.2 family and supports multiple languages including English, German, and French. This represents Meta's entry into open-weights multimodal models at the 11B parameter scale.