8Hugging Face Blog·1mo ago

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

Open Weights Progress Inference Economics Agent and Tool Ecosystem Georgi Gerganov llama.cpp Hugging Face GGML

Related guides (4)

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner In-depth

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Inference EconomicsTopic guide

Inference Economics: The Cost Structure of Running AI Models in Production

Read asIn-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Introduction to ggml

This Hugging Face blog post introduces ggml, a C-based tensor library that underpins popular inference runtimes like llama.cpp and whisper.cpp. It explains ggml's design philosophy, quantization support, and how it enables efficient on-device inference for large language models. The post serves as an educational overview for developers looking to understand or build on the ggml ecosystem.

Open Weights Progress Inference Economics whisper.cpp llama.cpp Hugging Face +2 more

4Hugging Face Blog·1mo ago·source ↗

New in llama.cpp: Model Management

llama.cpp has introduced new model management capabilities, as described in a Hugging Face blog post from the ggml-org. The post covers updates to how models are handled within the llama.cpp inference framework. This is a tooling update relevant to the open-source local inference ecosystem.

Open Weights Progress Inference Economics ggml-org llama.cpp Hugging Face +1 more

6Hugging Face Blog·1mo ago·source ↗

Falcon LLM Integrated into Hugging Face Ecosystem

Hugging Face announced the integration of the Falcon language models (Falcon-7B and Falcon-40B) into its ecosystem, including model hosting, inference APIs, and tooling support. Falcon, developed by the Technology Innovation Institute (TII), had recently topped the Open LLM Leaderboard at the time of release. The post covers usage patterns, fine-tuning guidance, and deployment options within the Hugging Face stack.

Open Weights Progress Inference Economics Falcon-7B Open LLM Leaderboard Falcon-40B +3 more

5Hugging Face Blog·1mo ago·source ↗

From OpenAI to Open LLMs with Messages API on Hugging Face

Hugging Face's Text Generation Inference (TGI) now supports an OpenAI-compatible Messages API, enabling developers to switch from OpenAI models to open-weight LLMs with minimal code changes. The integration allows existing OpenAI SDK users to point their client at Hugging Face endpoints by changing only the base URL and model name. This lowers the migration barrier for teams wanting to self-host or use open models while retaining familiar tooling.

Open Weights Progress Inference Economics Text Generation Inference OpenAI Messages API Hugging Face +2 more

6Hugging Face Blog·1mo ago·source ↗

Hugging Face and Google Partner for Open AI Collaboration

Hugging Face and Google have announced a partnership focused on open AI collaboration, expanding access to Hugging Face models and tools on Google Cloud Platform. The deal deepens integration between Hugging Face's model hub and Google's cloud infrastructure, enabling easier deployment of open-source models via GCP services. This follows a pattern of major cloud providers forming strategic alliances with leading open-source AI platforms.

Open Weights Progress Inference Economics Google Hugging Face Google Cloud Platform +1 more

6Hugging Face Blog·1mo ago·source ↗

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Hugging Face and Google Cloud announced an integration bringing thousands of open-source LLMs from the Hugging Face Hub into Vertex AI Model Garden. This partnership allows developers to deploy open-weight models directly through Google Cloud's managed infrastructure. The collaboration represents a significant expansion of enterprise-accessible open model deployment options on a major cloud platform.

Open Weights Progress Inference Economics Google Cloud Vertex AI Model Garden Hugging Face +1 more

8Hugging Face Blog·1mo ago·source ↗

Llama 2 is here - get it on Hugging Face

Meta released Llama 2, a new family of open-weights large language models, made available through Hugging Face. The release includes both base and fine-tuned chat variants across multiple parameter sizes. This represents a significant expansion of accessible open-weights frontier models, with Meta and Microsoft partnering on distribution.

Frontier Model Releases Open Weights Progress Microsoft Llama 2 Hugging Face +2 more

4Hugging Face Blog·1mo ago·source ↗

Deploy LLMs with Hugging Face Inference Endpoints

Hugging Face published a guide on deploying large language models using their Inference Endpoints service. The post covers how to set up scalable, production-ready LLM deployments with minimal infrastructure overhead. It targets developers looking to move from experimentation to hosted inference without managing raw compute.

Inference Economics Enterprise Deployment Patterns Hugging Face Inference Endpoints Hugging Face