Almanac
← Events
5Hugging Face Blog·1mo ago

Transformers Backend Integration in SGLang

Hugging Face has announced an integration that allows SGLang, a high-performance LLM serving framework, to use the Transformers library as a backend. This enables models supported by Transformers to be served through SGLang's inference engine, combining SGLang's optimized serving capabilities with the broad model coverage of the Transformers ecosystem. The integration lowers the barrier for deploying a wide range of models with production-grade inference infrastructure.

Related guides (3)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Getting Started with Transformers on Habana Gaudi

This Hugging Face blog post introduces integration between the Transformers library and Habana Gaudi AI accelerators. It provides a practical guide for running transformer model training and inference on Gaudi hardware as an alternative to GPU-based infrastructure. The post signals growing ecosystem support for non-NVIDIA AI accelerator hardware.

7Hugging Face Blog·1mo ago·source ↗

Transformers v5: Simple model definitions powering the AI ecosystem

Hugging Face has announced Transformers v5, a major version update to its flagship open-source library. The release focuses on simplified model definitions and architectural improvements to the codebase. As one of the most widely used ML libraries in the ecosystem, this update has broad implications for researchers and practitioners building on top of the Transformers framework.

6Hugging Face Blog·1mo ago·source ↗

Making LLMs lighter with AutoGPTQ and transformers

Hugging Face announces native integration of AutoGPTQ into the transformers library, enabling 4-bit quantized inference for large language models. The integration allows users to load and run GPTQ-quantized models directly through the standard transformers API with minimal code changes. This lowers the hardware barrier for deploying LLMs by significantly reducing VRAM requirements while maintaining competitive performance.

6Hugging Face Blog·1mo ago·source ↗

Transformers.js v3: WebGPU Support, New Models & Tasks, and More

Hugging Face released Transformers.js v3, a major update to its JavaScript inference library enabling on-device ML in browsers and Node.js. The release adds WebGPU backend support for hardware-accelerated inference, expands the supported model and task catalog, and improves overall performance. This brings browser-side AI inference closer to parity with native runtimes for a wider range of use cases.

4Hugging Face Blog·1mo ago·source ↗

Making ML-powered web games with Transformers.js

This Hugging Face blog post demonstrates how to build machine learning-powered web games using Transformers.js, enabling in-browser inference without a server backend. The post covers practical implementation patterns for running transformer models directly in the browser via WebAssembly and WebGL. It serves as both a tutorial and a showcase of client-side ML deployment capabilities.

4Hugging Face Blog·1mo ago·source ↗

Habana Labs and Hugging Face Partner to Accelerate Transformer Model Training

Habana Labs and Hugging Face announced a partnership to accelerate transformer model training on Habana's Gaudi AI processors. The collaboration aims to integrate Hugging Face's Transformers library with Habana's hardware, offering an alternative to GPU-based training infrastructure. This represents an early effort to diversify the AI training hardware ecosystem beyond NVIDIA dominance.

4Hugging Face Blog·1mo ago·source ↗

Accelerating Hugging Face Transformers with AWS Inferentia2

Hugging Face published a blog post detailing how to accelerate Transformer model inference using AWS Inferentia2, Amazon's second-generation ML inference chip. The post covers integration patterns between the Hugging Face ecosystem and the Neuron SDK for deploying models on Inferentia2 hardware. This represents a practical guide for enterprise and cloud-based inference deployment using dedicated AI accelerators.

5Hugging Face Blog·1mo ago·source ↗

Releasing Swift Transformers: Run On-Device LLMs in Apple Devices

Hugging Face released Swift Transformers, a Swift library enabling on-device LLM inference on Apple hardware (iOS, macOS) via Core ML. The library provides a pipeline abstraction for text generation and supports models converted to Core ML format. This extends the Hugging Face ecosystem to Apple's native development environment, lowering the barrier for deploying LLMs on Apple Silicon devices.