6Hugging Face Blog·1mo ago

Transformers.js v3: WebGPU Support, New Models & Tasks, and More

Hugging Face released Transformers.js v3, a major update to its JavaScript inference library enabling on-device ML in browsers and Node.js. The release adds WebGPU backend support for hardware-accelerated inference, expands the supported model and task catalog, and improves overall performance. This brings browser-side AI inference closer to parity with native runtimes for a wider range of use cases.

Inference Economics Agent and Tool Ecosystem Transformers Hugging Face WebGPU

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How AI Is Learning to Act, Not Just Answer

Read asBeginner In-depth

Inference EconomicsTopic guide

Inference Economics: The Cost of Running AI in Production

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Transformers.js v4: Now Available on NPM

Hugging Face has released Transformers.js v4, a major version update to its JavaScript library for running transformer models in the browser and Node.js, now published on NPM. The release likely includes updated model support, performance improvements, and API changes. This continues the trend of bringing ML inference capabilities directly to JavaScript environments without requiring a Python backend.

Inference Economics Agent and Tool Ecosystem Transformers NPM Hugging Face

4Hugging Face Blog·1mo ago·source ↗

Making ML-powered web games with Transformers.js

This Hugging Face blog post demonstrates how to build machine learning-powered web games using Transformers.js, enabling in-browser inference without a server backend. The post covers practical implementation patterns for running transformer models directly in the browser via WebAssembly and WebGL. It serves as both a tutorial and a showcase of client-side ML deployment capabilities.

Inference Economics Agent and Tool Ecosystem Transformers WebGL WebAssembly +1 more

7Hugging Face Blog·1mo ago·source ↗

Transformers v5: Simple model definitions powering the AI ecosystem

Hugging Face has announced Transformers v5, a major version update to its flagship open-source library. The release focuses on simplified model definitions and architectural improvements to the codebase. As one of the most widely used ML libraries in the ecosystem, this update has broad implications for researchers and practitioners building on top of the Transformers framework.

Open Weights Progress Inference Economics Transformers Hugging Face +1 more

4Hugging Face Blog·1mo ago·source ↗

Accelerating Hugging Face Transformers with AWS Inferentia2

Hugging Face published a blog post detailing how to accelerate Transformer model inference using AWS Inferentia2, Amazon's second-generation ML inference chip. The post covers integration patterns between the Hugging Face ecosystem and the Neuron SDK for deploying models on Inferentia2 hardware. This represents a practical guide for enterprise and cloud-based inference deployment using dedicated AI accelerators.

Training Infrastructure Inference Economics AWS Inferentia2 Hugging Face Transformers Hugging Face +3 more

5Hugging Face Blog·1mo ago·source ↗

Transformers Backend Integration in SGLang

Hugging Face has announced an integration that allows SGLang, a high-performance LLM serving framework, to use the Transformers library as a backend. This enables models supported by Transformers to be served through SGLang's inference engine, combining SGLang's optimized serving capabilities with the broad model coverage of the Transformers ecosystem. The integration lowers the barrier for deploying a wide range of models with production-grade inference infrastructure.

Inference Economics Agent and Tool Ecosystem SGLang Hugging Face Transformers Hugging Face

4Hugging Face Blog·1mo ago·source ↗

Accelerated Inference with Optimum and Transformers Pipelines

Hugging Face announced integration between the Optimum library and the Transformers Pipelines API, enabling hardware-accelerated inference with minimal code changes. The integration targets deployment on specialized hardware backends such as ONNX Runtime, allowing users to swap in optimized inference engines transparently. This lowers the barrier to production-grade inference optimization for practitioners using the Hugging Face ecosystem.

Inference Economics Agent and Tool Ecosystem Optimum ONNX Transformers Pipelines +1 more

4Hugging Face Blog·1mo ago·source ↗

Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers

Graphcore and Hugging Face announced a collaboration to make transformer models compatible with Graphcore's Intelligence Processing Unit (IPU) hardware. The partnership expands the set of Hugging Face models that can run natively on IPU infrastructure. This represents an effort to broaden the hardware ecosystem available for transformer model inference and training beyond GPUs.

Training Infrastructure Inference Economics Transformers Graphcore Hugging Face +1 more

4Hugging Face Blog·1mo ago·source ↗

How Hugging Face Sped Up Transformer Inference 100x for API Customers

Hugging Face describes engineering optimizations that achieved up to 100x speedups in transformer inference for their hosted API customers. The post covers techniques applied to accelerate model serving at scale. This is a 2021 article documenting early inference optimization work at Hugging Face's inference API product.

Inference Economics Enterprise Deployment Patterns Transformers Hugging Face Inference API Hugging Face