4Hugging Face Blog·1mo ago

Introduction to ggml

This Hugging Face blog post introduces ggml, a C-based tensor library that underpins popular inference runtimes like llama.cpp and whisper.cpp. It explains ggml's design philosophy, quantization support, and how it enables efficient on-device inference for large language models. The post serves as an educational overview for developers looking to understand or build on the ggml ecosystem.

Open Weights Progress Inference Economics Agent and Tool Ecosystem whisper.cpp llama.cpp Hugging Face GGML

Related guides (4)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Inference EconomicsTopic guide

Inference Economics: The Cost Structure of Running AI Models in Production

Read asIn-depth

Related events (8)

8Hugging Face Blog·1mo ago·source ↗

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

Open Weights Progress Inference Economics Georgi Gerganov llama.cpp Hugging Face +2 more

4Hugging Face Blog·1mo ago·source ↗

New in llama.cpp: Model Management

llama.cpp has introduced new model management capabilities, as described in a Hugging Face blog post from the ggml-org. The post covers updates to how models are handled within the llama.cpp inference framework. This is a tooling update relevant to the open-source local inference ecosystem.

Open Weights Progress Inference Economics ggml-org llama.cpp Hugging Face +1 more

4Hugging Face Blog·1mo ago·source ↗

Accelerating LLM Inference with TGI on Intel Gaudi

Hugging Face's Text Generation Inference (TGI) framework has added a backend for Intel Gaudi accelerators, enabling LLM inference on Intel's AI hardware. The integration allows users to deploy large language models on Gaudi hardware using TGI's serving infrastructure. This expands the hardware ecosystem for LLM inference beyond NVIDIA GPUs, offering an alternative accelerator option for enterprise deployments.

Training Infrastructure Inference Economics Text Generation Inference Intel Gaudi Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Hugging Face and Amazon Web Services have launched a dedicated LLM inference container for Amazon SageMaker, enabling optimized deployment of large language models on managed cloud infrastructure. The container is built on Hugging Face's Text Generation Inference (TGI) toolkit, which supports features like continuous batching, tensor parallelism, and quantization. This integration lowers the barrier for enterprise teams to deploy open-weight LLMs at scale on AWS without managing custom serving infrastructure.

Open Weights Progress Inference Economics Text Generation Inference Amazon SageMaker tensor parallelism +4 more

7Hugging Face Blog·1mo ago·source ↗

Welcome Gemma 2 - Google's new open LLM

Google released Gemma 2, a new open-weights large language model, announced via the Hugging Face blog. The post covers integration with the Hugging Face ecosystem and highlights the model's capabilities. Gemma 2 represents Google's continued investment in open-weight model releases to compete in the open-source LLM space.

Frontier Model Releases Open Weights Progress Google Gemma 2 Hugging Face

7Hugging Face Blog·1mo ago·source ↗

Welcome Gemma - Google's new open LLM

Google released Gemma, a family of open-weight large language models, announced via the Hugging Face blog. The models are positioned as Google's entry into the open-weights LLM space, following the success of models like Llama 2. This release marks a significant strategic move by Google to compete in the open-source AI ecosystem.

Frontier Model Releases Open Weights Progress Gemma Google Hugging Face +1 more

7Hugging Face Blog·1mo ago·source ↗

Welcome Gemma 3: Google's All-New Multimodal, Multilingual, Long-Context Open LLM

Google has released Gemma 3, a new family of open-weights large language models featuring multimodal capabilities, multilingual support, and extended context windows. The Hugging Face blog post introduces the model family and its key features. Gemma 3 represents a significant update to Google's open-weights model line, expanding beyond text-only capabilities to include vision and broader language coverage.

Long Context Evolution Frontier Model Releases Gemma 3 Google Hugging Face +2 more

5Hugging Face Blog·3d ago·source ↗

GLM-5.2 announced as model built for long-horizon tasks

ZAI.org published a blog post on Hugging Face announcing GLM-5.2, a model positioned for long-horizon tasks. The post appears to be a model release announcement from the GLM (General Language Model) lineage. Limited body content is available, but the framing suggests capabilities relevant to extended reasoning or agentic workflows.

Long Context Evolution Frontier Model Releases zai-org Hugging Face GLM-5.1