5Hugging Face Blog·1mo ago

AMD + Hugging Face: Large Language Models Out-of-the-Box Acceleration with AMD GPU

Hugging Face and AMD announced integration work enabling out-of-the-box LLM acceleration on AMD GPUs via the Optimum library. The collaboration targets ROCm-based AMD hardware, aiming to reduce friction for users running inference on non-NVIDIA GPU stacks. This represents a continued push to broaden the hardware ecosystem available to open-weights model users.

Training Infrastructure Open Weights Progress Inference Economics Optimum ROCm Hugging Face AMD

Related guides (3)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Training InfrastructureTopic guide

Training Infrastructure: The Compute Arms Race Powering Modern AI

Read asBeginner In-depth

Related events (8)

5Hugging Face Blog·1mo ago·source ↗

Hugging Face and AMD Partner to Accelerate Models on CPU and GPU Platforms

Hugging Face and AMD announced a partnership aimed at optimizing and accelerating state-of-the-art AI models across AMD's CPU and GPU hardware platforms. The collaboration targets improved performance for models hosted and distributed through Hugging Face's ecosystem. This represents a strategic move to broaden hardware support beyond NVIDIA-dominated infrastructure in the AI/ML deployment landscape.

Training Infrastructure Inference Economics Hugging Face AMD +1 more

5Hugging Face Blog·1mo ago·source ↗

Hugging Face on AMD Instinct MI300 GPU

Hugging Face announces support and optimization for AMD Instinct MI300 GPUs, expanding the ecosystem of hardware that can run Hugging Face models and tools. The post covers integration work enabling inference and training workloads on AMD's high-memory GPU accelerator. This represents a meaningful step in diversifying AI infrastructure beyond NVIDIA dominance.

Training Infrastructure Inference Economics AMD Instinct MI300 Hugging Face AMD

5Hugging Face Blog·1mo ago·source ↗

Accelerating over 130,000 Hugging Face Models with ONNX Runtime

Hugging Face and Microsoft have integrated ONNX Runtime (ORT) to accelerate inference for over 130,000 models on the Hugging Face Hub. The integration enables optimized deployment across CPU and GPU hardware without requiring users to manually export or configure ONNX models. This represents a significant expansion of ORT's reach within the open-weights model ecosystem, lowering the barrier to production-grade inference optimization.

Open Weights Progress Inference Economics Optimum Microsoft ONNX +2 more

4Hugging Face Blog·1mo ago·source ↗

Run a ChatGPT-like Chatbot on a Single GPU with ROCm

Hugging Face published a guide demonstrating how to run a large language model chatbot on a single AMD GPU using ROCm, AMD's open-source GPU compute stack. The post covers setup, model loading, and inference on AMD hardware as an alternative to NVIDIA CUDA-based workflows. This is relevant to the growing interest in democratizing LLM inference beyond NVIDIA's ecosystem.

Training Infrastructure Inference Economics ROCm Hugging Face CUDA +1 more

4Hugging Face Blog·1mo ago·source ↗

Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration

Intel and Hugging Face announced a partnership aimed at making hardware acceleration for machine learning more accessible. The collaboration focuses on optimizing Hugging Face models and tools to run efficiently on Intel hardware. This represents an early-stage industry alignment between a major chip manufacturer and the dominant open-source ML model hub.

Training Infrastructure Inference Economics Hugging Face Intel +1 more

4Hugging Face Blog·1mo ago·source ↗

Easily Build and Share ROCm Kernels with Hugging Face

Hugging Face has published a guide and tooling for building and sharing custom ROCm kernels on its platform, targeting AMD GPU users in the ML ecosystem. The post covers the workflow for packaging, uploading, and reusing ROCm-based GPGPU kernels via the Hub. This lowers the barrier for AMD GPU kernel development and sharing, complementing the existing CUDA-centric kernel ecosystem. The initiative is relevant to inference optimization and the broader push to diversify GPU hardware support in AI workloads.

Training Infrastructure Inference Economics ROCm Hugging Face AMD +1 more

5Hugging Face Blog·1mo ago·source ↗

Optimum-NVIDIA: One-Line LLM Inference Acceleration via TensorRT-LLM

Hugging Face's Optimum-NVIDIA integration wraps NVIDIA's TensorRT-LLM backend to enable high-performance LLM inference with minimal code changes. The library targets developers who want near-peak GPU throughput without manually configuring TensorRT-LLM pipelines. It positions as a bridge between the Hugging Face ecosystem and NVIDIA's optimized inference stack.

Inference Economics Enterprise Deployment Patterns NVIDIA TensorRT-LLM Optimum-NVIDIA +2 more

5Hugging Face Blog·1mo ago·source ↗

Accelerate a World of LLMs on Hugging Face with NVIDIA NIM

NVIDIA NIM microservices are being integrated with Hugging Face to enable optimized inference deployment for a broad range of LLMs hosted on the Hub. The partnership allows developers to deploy Hugging Face models via NIM's containerized inference stack, leveraging NVIDIA's TensorRT-LLM and other optimizations. This expands the ecosystem of models accessible through NIM beyond NVIDIA's own catalog to the wider Hugging Face model repository.

Inference Economics Enterprise Deployment Patterns NVIDIA NIM NVIDIA TensorRT-LLM +2 more