mmBERT: ModernBERT goes Multilingual
Hugging Face introduces mmBERT, a multilingual extension of ModernBERT. The post describes adapting the ModernBERT architecture for multilingual text encoding tasks. This represents an incremental but meaningful expansion of the ModernBERT family to cover non-English languages.
Related guides (3)
Related events (8)
Finally, a Replacement for BERT: Introducing ModernBERT
Hugging Face introduces ModernBERT, a modernized encoder-only transformer model designed as a successor to BERT. The model incorporates architectural improvements developed since BERT's 2018 release, targeting better performance on downstream NLP tasks. ModernBERT aims to fill the gap for efficient encoder models in retrieval, classification, and other discriminative tasks where decoder-only LLMs are often overkill.
Pre-Train BERT with Hugging Face Transformers and Habana Gaudi
This Hugging Face blog post from August 2022 describes how to pre-train a BERT model from scratch using the Hugging Face Transformers library on Habana Gaudi hardware accelerators. It covers the full pipeline including data preparation, tokenizer training, and masked language modeling pretraining. The post serves as both a technical tutorial and a demonstration of Habana Gaudi's viability as an alternative AI training accelerator.
Introducing BERTopic Integration with the Hugging Face Hub
Hugging Face has announced an integration between BERTopic, a topic modeling library, and the Hugging Face Hub. This allows users to push, share, and load BERTopic models directly from the Hub, enabling easier collaboration and deployment of topic modeling workflows. The integration leverages the Hub's model card and versioning infrastructure for NLP tooling beyond generative models.
Visual Document Retrieval Goes Multilingual
Hugging Face introduces VDR-2B-Multilingual, a 2-billion parameter vision-language model designed for visual document retrieval across multiple languages. The model enables retrieval of document images without OCR by embedding visual page representations directly. This extends prior visual document retrieval work to multilingual settings, broadening applicability for enterprise document search use cases.
Meta releases Llama 3.2 11B Vision multimodal model on Hugging Face
Meta released Llama 3.2 11B Vision, an open-weights image-text-to-text model, on Hugging Face. The model is part of the Llama 3.2 family and supports multiple languages including English, German, and French. This represents Meta's entry into open-weights multimodal models at the 11B parameter scale.
Meta releases Llama 4 Maverick 17B-128E multimodal instruct model on Hugging Face
Meta released Llama 4 Maverick, a 17B active parameter model with 128 experts (MoE architecture), as an image-text-to-text instruct model on Hugging Face. The model supports multimodal inputs and multiple languages including Arabic, German, and English. With 28K+ downloads and 493 likes shortly after release, it is seeing significant early adoption.
Multimodal Embedding & Reranker Models with Sentence Transformers
Hugging Face's Sentence Transformers library has added support for multimodal embedding and reranking models, enabling joint text-image (and potentially other modality) representations within a unified framework. The update extends the library's existing text-focused embedding capabilities to handle cross-modal retrieval and reranking tasks. This lowers the barrier for practitioners building multimodal search and RAG pipelines using open-weights models.
SmolLM3: Hugging Face Releases Small Multilingual Long-Context Reasoning Model
Hugging Face has released SmolLM3, a compact language model designed for multilingual support, long-context processing, and reasoning capabilities. The model targets the small/efficient model segment while incorporating reasoning features typically associated with larger models. This release continues Hugging Face's SmolLM series aimed at capable but deployable open-weight models.


