Almanac
← Events
5Hugging Face Blog·1mo ago

Accelerating over 130,000 Hugging Face Models with ONNX Runtime

Hugging Face and Microsoft have integrated ONNX Runtime (ORT) to accelerate inference for over 130,000 models on the Hugging Face Hub. The integration enables optimized deployment across CPU and GPU hardware without requiring users to manually export or configure ONNX models. This represents a significant expansion of ORT's reach within the open-weights model ecosystem, lowering the barrier to production-grade inference optimization.

Related guides (4)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Optimum + ONNX Runtime: Faster Training for Hugging Face Models

Hugging Face's Optimum library integrates with Microsoft's ONNX Runtime Training to accelerate fine-tuning of transformer models. The integration aims to reduce training time and memory usage with minimal code changes for practitioners using the Hugging Face ecosystem. This tooling update targets enterprise and research users looking to optimize training efficiency on existing hardware.

5Hugging Face Blog·1mo ago·source ↗

Hugging Face and AMD Partner to Accelerate Models on CPU and GPU Platforms

Hugging Face and AMD announced a partnership aimed at optimizing and accelerating state-of-the-art AI models across AMD's CPU and GPU hardware platforms. The collaboration targets improved performance for models hosted and distributed through Hugging Face's ecosystem. This represents a strategic move to broaden hardware support beyond NVIDIA-dominated infrastructure in the AI/ML deployment landscape.

5Hugging Face Blog·1mo ago·source ↗

AMD + Hugging Face: Large Language Models Out-of-the-Box Acceleration with AMD GPU

Hugging Face and AMD announced integration work enabling out-of-the-box LLM acceleration on AMD GPUs via the Optimum library. The collaboration targets ROCm-based AMD hardware, aiming to reduce friction for users running inference on non-NVIDIA GPU stacks. This represents a continued push to broaden the hardware ecosystem available to open-weights model users.

5Hugging Face Blog·1mo ago·source ↗

Accelerate a World of LLMs on Hugging Face with NVIDIA NIM

NVIDIA NIM microservices are being integrated with Hugging Face to enable optimized inference deployment for a broad range of LLMs hosted on the Hub. The partnership allows developers to deploy Hugging Face models via NIM's containerized inference stack, leveraging NVIDIA's TensorRT-LLM and other optimizations. This expands the ecosystem of models accessible through NIM beyond NVIDIA's own catalog to the wider Hugging Face model repository.

4Hugging Face Blog·1mo ago·source ↗

Accelerated Inference with Optimum and Transformers Pipelines

Hugging Face announced integration between the Optimum library and the Transformers Pipelines API, enabling hardware-accelerated inference with minimal code changes. The integration targets deployment on specialized hardware backends such as ONNX Runtime, allowing users to swap in optimized inference engines transparently. This lowers the barrier to production-grade inference optimization for practitioners using the Hugging Face ecosystem.

4Hugging Face Blog·1mo ago·source ↗

Accelerate your models with Optimum Intel and OpenVINO

Hugging Face's Optimum Intel library integrates with Intel's OpenVINO toolkit to accelerate inference of transformer models on Intel hardware. The post covers how to export models to OpenVINO IR format and run optimized inference pipelines. This targets deployment efficiency for NLP and vision models on CPU and other Intel accelerators.

4Hugging Face Blog·1mo ago·source ↗

Accelerating Hugging Face Transformers with AWS Inferentia2

Hugging Face published a blog post detailing how to accelerate Transformer model inference using AWS Inferentia2, Amazon's second-generation ML inference chip. The post covers integration patterns between the Hugging Face ecosystem and the Neuron SDK for deploying models on Inferentia2 hardware. This represents a practical guide for enterprise and cloud-based inference deployment using dedicated AI accelerators.

4Hugging Face Blog·1mo ago·source ↗

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

This Hugging Face blog post details how to accelerate Stable Diffusion Turbo and SDXL Turbo inference using ONNX Runtime and Microsoft's Olive optimization toolkit. The post covers the workflow for converting and optimizing diffusion models for faster deployment. This is a practical inference optimization guide targeting practitioners deploying image generation models.