Almanac
product

Hugging Face Transformers

productactivehugging-face-transformers-e962d1e2·34 events·first seen 29d ago

Aliases: Hugging Face Transformers, Transformers (Hugging Face library), Hugging Face Transformers Trainer

Co-occurring entities

More like this (12)

Recent events (34)

5Hugging Face Blog·28d ago·source ↗

Timm ❤️ Transformers: Use any timm model with transformers

Hugging Face has announced native integration between the timm library and the Transformers library, allowing any timm vision model to be used directly within the Transformers ecosystem. This integration simplifies workflows for computer vision practitioners by enabling unified model loading, pipelines, and tooling across both libraries. The move consolidates Hugging Face's position as the central hub for model interoperability in the ML ecosystem.

5Hugging Face Blog·28d ago·source ↗

Transformers Backend Integration in SGLang

Hugging Face has announced an integration that allows SGLang, a high-performance LLM serving framework, to use the Transformers library as a backend. This enables models supported by Transformers to be served through SGLang's inference engine, combining SGLang's optimized serving capabilities with the broad model coverage of the Transformers ecosystem. The integration lowers the barrier for deploying a wide range of models with production-grade inference infrastructure.

5Hugging Face Blog·28d ago·source ↗

Overview of Natively Supported Quantization Schemes in 🤗 Transformers

This Hugging Face blog post surveys the quantization methods natively integrated into the Transformers library as of September 2023, covering schemes such as GPTQ, bitsandbytes (LLM.int8, NF4), and related techniques. It explains how each method works, their trade-offs in terms of memory reduction and inference speed, and how practitioners can apply them via the Transformers API. The post serves as a practical reference for deploying large language models under memory constraints.

4Hugging Face Blog·28d ago·source ↗

Guiding Text Generation with Constrained Beam Search in 🤗 Transformers

This Hugging Face blog post introduces constrained beam search, a text generation technique that allows users to enforce hard constraints on model outputs, such as requiring specific tokens or phrases to appear in generated text. The method extends standard beam search by guiding the search process to satisfy user-defined constraints while still optimizing for fluency. The post covers the implementation available in the Hugging Face Transformers library, making the technique accessible to practitioners.

5Hugging Face Blog·28d ago·source ↗

Faster Assisted Generation with Dynamic Speculation

Hugging Face introduces dynamic speculation lookahead for assisted (speculative) decoding, a technique that adaptively adjusts the number of candidate tokens generated by a draft model before verification by the main model. This approach aims to improve throughput and reduce latency compared to fixed-lookahead speculative decoding by tuning the speculation depth at runtime. The blog post describes the method and its integration into the Hugging Face Transformers library.

5Hugging Face Blog·28d ago·source ↗

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

A Hugging Face blog post discusses inference optimization techniques derived from OpenAI's gpt-oss codebase that can be applied within the Hugging Face Transformers library. The post appears to cover practical tricks for improving transformer inference speed or efficiency. As a tier-2 source with commentary depth, this is a practitioner-oriented technical guide bridging OpenAI's internal methods and the open-source ecosystem.

5Hugging Face Blog·28d ago·source ↗

Assisted Generation: a new direction toward low-latency text generation

Hugging Face introduces assisted generation (speculative decoding) as a practical technique for reducing LLM inference latency. The approach uses a smaller draft model to propose token candidates that a larger model then verifies in parallel, enabling multiple tokens to be accepted per forward pass. The blog post explains the mechanism and demonstrates integration into the Hugging Face Transformers library.

3Hugging Face Blog·28d ago·source ↗

Training a Language Model with Hugging Face Transformers Using TensorFlow and TPUs

This Hugging Face blog post provides a technical walkthrough for training a language model using TensorFlow and Google TPUs via the Transformers library. It covers the practical setup, data pipeline, and training configuration required to leverage TPU hardware with the TF ecosystem. The post serves as a tutorial bridging Hugging Face tooling with TPU-based infrastructure.

4Hugging Face Blog·28d ago·source ↗

Probabilistic Time Series Forecasting with Transformers

This Hugging Face blog post introduces probabilistic time series forecasting using Transformer-based models available in the Hugging Face ecosystem. It covers the application of attention-based architectures to sequential prediction tasks with uncertainty quantification. The post serves as a tutorial and capability demonstration for time series modeling within the Transformers library.

3Hugging Face Blog·28d ago·source ↗

Optimizing Bark Text-to-Speech Using Hugging Face Transformers

This Hugging Face blog post details optimization techniques applied to Bark, a text-to-speech model, using the Transformers library. The post likely covers inference speed improvements, memory reduction strategies, and deployment considerations for the Bark model. As a tier-2 source focused on practical tooling, it provides implementation-level guidance for running Bark efficiently.

4Hugging Face Blog·28d ago·source ↗

Accelerating Hugging Face Transformers with AWS Inferentia2

Hugging Face published a blog post detailing how to accelerate Transformer model inference using AWS Inferentia2, Amazon's second-generation ML inference chip. The post covers integration patterns between the Hugging Face ecosystem and the Neuron SDK for deploying models on Inferentia2 hardware. This represents a practical guide for enterprise and cloud-based inference deployment using dedicated AI accelerators.

4Hugging Face Blog·28d ago·source ↗

Generating Human-level Text with Contrastive Search in Transformers

Hugging Face introduces contrastive search, a decoding strategy for autoregressive language models that aims to produce more coherent and human-like text compared to standard methods like beam search or nucleus sampling. The technique works by balancing a model's confidence in its next-token prediction against a contrastive penalty that discourages repetitive or degenerate outputs. The blog post describes integration of contrastive search into the Hugging Face Transformers library, making it accessible to practitioners.

4Hugging Face Blog·28d ago·source ↗

Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers

This Hugging Face blog post provides a practical guide for fine-tuning OpenAI's Whisper model for multilingual automatic speech recognition using the Transformers library. It covers dataset preparation, training configuration, and evaluation using the Word Error Rate metric. The post targets practitioners seeking to adapt Whisper to low-resource or domain-specific languages.

3Hugging Face Blog·28d ago·source ↗

From PyTorch DDP to Accelerate to Trainer: Mastery of Distributed Training with Ease

This Hugging Face blog post walks through the progression from raw PyTorch DistributedDataParallel (DDP) to the Accelerate library to the Transformers Trainer API for distributed training. It explains the abstractions each layer provides and how they reduce boilerplate while maintaining flexibility. The post serves as a practical guide for ML practitioners scaling training across multiple GPUs or nodes.

3Hugging Face Blog·28d ago·source ↗

Pre-Train BERT with Hugging Face Transformers and Habana Gaudi

This Hugging Face blog post from August 2022 describes how to pre-train a BERT model from scratch using the Hugging Face Transformers library on Habana Gaudi hardware accelerators. It covers the full pipeline including data preparation, tokenizer training, and masked language modeling pretraining. The post serves as both a technical tutorial and a demonstration of Habana Gaudi's viability as an alternative AI training accelerator.

3Hugging Face Blog·28d ago·source ↗

Accelerate BERT inference with Hugging Face Transformers and AWS Inferentia

This Hugging Face blog post describes how to deploy BERT models on AWS Inferentia chips using the Hugging Face Transformers library and Amazon SageMaker. It covers the workflow for compiling models with AWS Neuron SDK and running optimized inference on Inferentia hardware. The post targets practitioners looking to reduce inference costs and latency for transformer-based NLP workloads.

5Hugging Face Blog·28d ago·source ↗

The Partnership: Amazon SageMaker and Hugging Face

Hugging Face and Amazon announced a partnership integrating Hugging Face models and tools natively into Amazon SageMaker. This collaboration enables developers to train and deploy Hugging Face Transformers models directly within SageMaker's managed ML infrastructure. The partnership represents an early major cloud-provider integration for Hugging Face, expanding enterprise access to open-source NLP models.

4Hugging Face Blog·28d ago·source ↗

Hugging Face on PyTorch / XLA TPUs

This Hugging Face blog post covers the integration of Hugging Face Transformers with PyTorch/XLA for training on Google TPUs. It describes how users can leverage TPU hardware through the XLA compiler backend to accelerate transformer model training. The post serves as a technical guide for the ecosystem connecting Hugging Face's model library with Google's TPU infrastructure.

4Hugging Face Blog·29d ago·source ↗

PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend

PaddleOCR 3.5 introduces support for running OCR and document parsing pipelines using a Hugging Face Transformers backend, enabling integration with the broader Transformers ecosystem. The update allows users to leverage transformer-based models for optical character recognition and structured document understanding tasks. This represents a convergence between the PaddlePaddle framework and the Transformers library for document AI workloads.

6Hugging Face Blog·28d ago·source ↗

Introducing SynthID Text

Hugging Face published a blog post introducing SynthID Text, Google DeepMind's watermarking technique for AI-generated text. The method embeds imperceptible signals into LLM outputs by modifying token sampling distributions, enabling detection of AI-generated content without degrading text quality. The post likely covers integration with Hugging Face's transformers library, making the technique accessible to the broader ML community.

5Hugging Face Blog·28d ago·source ↗

Tool Use, Unified — Hugging Face Blog

Hugging Face published a blog post addressing the fragmented landscape of tool/function-calling interfaces across different LLMs and frameworks. The post likely introduces or advocates for a unified approach to tool use in the Hugging Face ecosystem, covering how different models expose tool-calling capabilities and how to standardize them. This is relevant to the agent and tooling ecosystem as interoperability between models and tool-calling conventions remains a key friction point.

4Hugging Face Blog·28d ago·source ↗

Patch Time Series Transformer in Hugging Face

Hugging Face has integrated PatchTST, a patch-based Transformer architecture for time series forecasting, into its ecosystem. PatchTST applies the patching concept from vision transformers to time series data, dividing sequences into subseries-level patches as input tokens. The blog post covers usage, fine-tuning, and zero-shot transfer capabilities of the model within the Hugging Face Transformers library.

4Hugging Face Blog·28d ago·source ↗

PatchTSMixer in HuggingFace

Hugging Face introduces PatchTSMixer, a lightweight MLP-Mixer-based model for multivariate time-series forecasting, now available in the Transformers library. The model is designed for efficient patch-based mixing of temporal and channel information. This integration expands Hugging Face's time-series modeling capabilities alongside the previously added PatchTST model.

4Hugging Face Blog·28d ago·source ↗

Federated Learning using Hugging Face and Flower

This Hugging Face blog post describes how to combine the Hugging Face ecosystem with the Flower federated learning framework to train models across distributed, privacy-preserving data silos. It provides a practical walkthrough of integrating Transformers and Datasets libraries with Flower's federated training loop. The post targets practitioners looking to apply federated learning to NLP and other ML tasks without centralizing sensitive data.

3Hugging Face Blog·28d ago·source ↗

Multivariate Probabilistic Time Series Forecasting with Informer

A Hugging Face blog post introduces the Informer model for multivariate probabilistic time series forecasting. The post covers the architecture and usage of Informer, which uses a sparse attention mechanism (ProbSparse) to handle long sequences more efficiently than standard Transformers. It demonstrates how to use the model via the Hugging Face Transformers library for forecasting tasks.

4Hugging Face Blog·28d ago·source ↗

Universal Image Segmentation with Mask2Former and OneFormer

Hugging Face published a blog post introducing Mask2Former and OneFormer, two universal image segmentation architectures now available in the Transformers library. These models unify panoptic, instance, and semantic segmentation tasks under a single framework. The post covers model capabilities, usage examples, and integration into the HuggingFace ecosystem.

4Hugging Face Blog·28d ago·source ↗

Habana Labs and Hugging Face Partner to Accelerate Transformer Model Training

Habana Labs and Hugging Face announced a partnership to accelerate transformer model training on Habana's Gaudi AI processors. The collaboration aims to integrate Hugging Face's Transformers library with Habana's hardware, offering an alternative to GPU-based training infrastructure. This represents an early effort to diversify the AI training hardware ecosystem beyond NVIDIA dominance.

3Hugging Face Blog·28d ago·source ↗

Deploy GPT-J 6B for Inference Using Hugging Face Transformers and Amazon SageMaker

This Hugging Face blog post provides a tutorial for deploying the GPT-J 6B open-weights language model on Amazon SageMaker using the Hugging Face Transformers library. It covers the infrastructure and tooling steps needed to serve a large language model in a managed cloud environment. The post reflects early 2022 patterns for productionizing open-weight models via cloud ML platforms.

4Hugging Face Blog·28d ago·source ↗

Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

This Hugging Face blog post from January 2021 covers integration of ZeRO (Zero Redundancy Optimizer) memory optimization techniques via DeepSpeed and FairScale into the Transformers training ecosystem. ZeRO partitions optimizer states, gradients, and model parameters across GPUs to enable training of much larger models on the same hardware. The post serves as a practical guide for practitioners looking to scale model training without additional infrastructure investment.

4Hugging Face Blog·28d ago·source ↗

Text-Generation Pipeline on Intel® Gaudi® 2 AI Accelerator

Hugging Face published a blog post detailing how to run text-generation pipelines on Intel's Gaudi 2 AI accelerator. The post covers integration between Hugging Face's text-generation tooling and Intel's Gaudi 2 hardware, positioning it as an alternative inference accelerator to NVIDIA GPUs. This is relevant to the growing ecosystem of non-NVIDIA AI inference hardware.

3Hugging Face Blog·28d ago·source ↗

Fine-Tune W2V2-Bert for Low-Resource ASR with Hugging Face Transformers

Hugging Face published a tutorial on fine-tuning the W2V2-Bert model for automatic speech recognition in low-resource language settings using the Transformers library. The post covers practical steps for adapting the wav2vec2-BERT architecture to languages with limited training data. This is a practitioner-oriented guide targeting the open-source ML community.

5Hugging Face Blog·28d ago·source ↗

Speculative Decoding for 2x Faster Whisper Inference

Hugging Face demonstrates applying speculative decoding to OpenAI's Whisper speech recognition model, achieving approximately 2x inference speedup. The technique uses a smaller draft model to propose token sequences that the larger target model then verifies, reducing the number of full forward passes required. This post covers implementation details using the Hugging Face Transformers library and benchmarks the approach across different hardware configurations.

4Hugging Face Blog·28d ago·source ↗

Speech Synthesis, Recognition, and More With SpeechT5

This Hugging Face blog post introduces SpeechT5, a unified pre-trained model for speech synthesis, recognition, and related tasks. The post covers the model's architecture and capabilities, and explains how to use it via the Hugging Face Transformers library. SpeechT5 is a Microsoft Research model that uses a shared encoder-decoder framework across multiple speech tasks.

5Hugging Face Blog·28d ago·source ↗

Zero-shot image-to-text generation with BLIP-2

Hugging Face published a blog post introducing BLIP-2, a multimodal model that enables zero-shot image-to-text generation by bridging frozen image encoders and large language models via a lightweight Querying Transformer (Q-Former). The post covers the model's architecture, capabilities, and how to use it via the Hugging Face Transformers library. BLIP-2 achieves strong performance on visual question answering and image captioning tasks without task-specific fine-tuning.