Almanac
← Events
4Hugging Face Blog·1mo ago

Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive

This Hugging Face blog post details how to accelerate Stable Diffusion Turbo and SDXL Turbo inference using ONNX Runtime and Microsoft's Olive optimization toolkit. The post covers the workflow for converting and optimizing diffusion models for faster deployment. This is a practical inference optimization guide targeting practitioners deploying image generation models.

Related guides (4)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Accelerating Stable Diffusion XL Inference with JAX on Cloud TPU v5e

Hugging Face published a technical blog post detailing how to accelerate Stable Diffusion XL inference using JAX on Google Cloud TPU v5e hardware. The post covers the integration of JAX-based diffusion pipelines with TPU v5e, demonstrating performance gains from hardware-software co-optimization. This represents a practical deployment pattern for large image generation models on non-GPU accelerators.

3Hugging Face Blog·1mo ago·source ↗

Exploring Simple Optimizations for SDXL

This Hugging Face blog post explores practical optimization techniques for Stable Diffusion XL (SDXL) inference. It covers methods to improve throughput and reduce memory usage when running SDXL, targeting practitioners deploying the model. The content is oriented toward applied inference efficiency rather than novel research.

4Hugging Face Blog·1mo ago·source ↗

Accelerate your models with Optimum Intel and OpenVINO

Hugging Face's Optimum Intel library integrates with Intel's OpenVINO toolkit to accelerate inference of transformer models on Intel hardware. The post covers how to export models to OpenVINO IR format and run optimized inference pipelines. This targets deployment efficiency for NLP and vision models on CPU and other Intel accelerators.

4Hugging Face Blog·1mo ago·source ↗

Optimizing Stable Diffusion for Intel CPUs with NNCF and Hugging Face Optimum

This Hugging Face blog post details techniques for optimizing Stable Diffusion inference on Intel CPUs using Neural Network Compression Framework (NNCF) and the Optimum library. The workflow covers quantization and other compression methods to reduce latency and memory footprint on CPU hardware. This is relevant to the inference-economics and enterprise-deployment threads as it addresses running diffusion models without dedicated GPU hardware.

5Hugging Face Blog·1mo ago·source ↗

Accelerating over 130,000 Hugging Face Models with ONNX Runtime

Hugging Face and Microsoft have integrated ONNX Runtime (ORT) to accelerate inference for over 130,000 models on the Hugging Face Hub. The integration enables optimized deployment across CPU and GPU hardware without requiring users to manually export or configure ONNX models. This represents a significant expansion of ORT's reach within the open-weights model ecosystem, lowering the barrier to production-grade inference optimization.

4Hugging Face Blog·1mo ago·source ↗

Accelerating Stable Diffusion Inference on Intel CPUs

This Hugging Face blog post details techniques for optimizing Stable Diffusion inference on Intel CPUs, likely covering quantization, operator fusion, and Intel-specific hardware acceleration libraries. The post addresses the practical challenge of running diffusion models on CPU hardware without dedicated GPUs. This is relevant to inference economics and enterprise deployment patterns where GPU availability is constrained.

4Hugging Face Blog·1mo ago·source ↗

Optimum + ONNX Runtime: Faster Training for Hugging Face Models

Hugging Face's Optimum library integrates with Microsoft's ONNX Runtime Training to accelerate fine-tuning of transformer models. The integration aims to reduce training time and memory usage with minimal code changes for practitioners using the Hugging Face ecosystem. This tooling update targets enterprise and research users looking to optimize training efficiency on existing hardware.

4Hugging Face Blog·1mo ago·source ↗

Accelerated Inference with Optimum and Transformers Pipelines

Hugging Face announced integration between the Optimum library and the Transformers Pipelines API, enabling hardware-accelerated inference with minimal code changes. The integration targets deployment on specialized hardware backends such as ONNX Runtime, allowing users to swap in optimized inference engines transparently. This lowers the barrier to production-grade inference optimization for practitioners using the Hugging Face ecosystem.