5Hugging Face Blog·1mo ago

From OpenAI to Open LLMs with Messages API on Hugging Face

Hugging Face's Text Generation Inference (TGI) now supports an OpenAI-compatible Messages API, enabling developers to switch from OpenAI models to open-weight LLMs with minimal code changes. The integration allows existing OpenAI SDK users to point their client at Hugging Face endpoints by changing only the base URL and model name. This lowers the migration barrier for teams wanting to self-host or use open models while retaining familiar tooling.

Open Weights Progress Inference Economics Agent and Tool Ecosystem Text Generation Inference OpenAI Messages API Hugging Face OpenAI

Related guides (4)

OpenAI

OpenAI: The Lab That Made AI a Household Word

Read asBeginner In-depth

Hugging Face

Hugging Face: The Home of Open-Source AI

Read asBeginner In-depth

Open Weights ProgressTopic guide

Open Weights Progress: How Freely Available AI Models Caught Up to the Frontier

Read asBeginner

Agent and Tool EcosystemTopic guide

Agent and Tool Ecosystem: How the Infrastructure Layer Around LLMs Is Consolidating

Read asIn-depth

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Hugging Face published a blog post surveying the open-source LLM ecosystem as of mid-2023, covering text generation models, tooling, and deployment patterns available on the platform. The post highlights the breadth of open-weight models and associated infrastructure for inference and fine-tuning. It serves as a reference overview of the state of open-source LLMs at that point in time.

Open Weights Progress Inference Economics Hugging Face +1 more

4Hugging Face Blog·1mo ago·source ↗

Deploy LLMs with Hugging Face Inference Endpoints

Hugging Face published a guide on deploying large language models using their Inference Endpoints service. The post covers how to set up scalable, production-ready LLM deployments with minimal infrastructure overhead. It targets developers looking to move from experimentation to hosted inference without managing raw compute.

Inference Economics Enterprise Deployment Patterns Hugging Face Inference Endpoints Hugging Face

8Hugging Face Blog·1mo ago·source ↗

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

Open Weights Progress Inference Economics Georgi Gerganov llama.cpp Hugging Face +2 more

5Hugging Face Blog·1mo ago·source ↗

Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Hugging Face and Amazon Web Services have launched a dedicated LLM inference container for Amazon SageMaker, enabling optimized deployment of large language models on managed cloud infrastructure. The container is built on Hugging Face's Text Generation Inference (TGI) toolkit, which supports features like continuous batching, tensor parallelism, and quantization. This integration lowers the barrier for enterprise teams to deploy open-weight LLMs at scale on AWS without managing custom serving infrastructure.

Open Weights Progress Inference Economics Text Generation Inference Amazon SageMaker tensor parallelism +4 more

5Hugging Face Blog·1mo ago·source ↗

Hugging Face Text Generation Inference available for AWS Inferentia2

Hugging Face has announced that its Text Generation Inference (TGI) serving framework is now available for AWS Inferentia2 accelerators. This integration allows users to deploy large language models on AWS's custom AI chips using the TGI stack. The move extends TGI's hardware support beyond GPUs to specialized inference silicon, potentially offering cost and performance advantages for production LLM deployments.

Training Infrastructure Inference Economics Text Generation Inference AWS Inferentia2 Hugging Face +2 more

6Hugging Face Blog·1mo ago·source ↗

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Hugging Face and Google Cloud announced an integration bringing thousands of open-source LLMs from the Hugging Face Hub into Vertex AI Model Garden. This partnership allows developers to deploy open-weight models directly through Google Cloud's managed infrastructure. The collaboration represents a significant expansion of enterprise-accessible open model deployment options on a major cloud platform.

Open Weights Progress Inference Economics Google Cloud Vertex AI Model Garden Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Introducing HUGS - Scale your AI with Open Models

Hugging Face announced HUGS (Hugging Face Generative Services), a new product aimed at helping enterprises scale AI deployments using open models. The service appears to target production inference infrastructure for open-weight models, positioning Hugging Face as a managed deployment layer. This is a product launch in the enterprise AI infrastructure space, competing with managed inference offerings from other providers.

Open Weights Progress Inference Economics HUGS Hugging Face +1 more

5Hugging Face Blog·1mo ago·source ↗

Fine-tune Any LLM from the Hugging Face Hub with Together AI

Together AI has announced an integration with Hugging Face that enables fine-tuning of any model from the Hugging Face Hub directly through Together AI's platform. This partnership expands access to fine-tuning infrastructure for open-weight models without requiring users to manage their own compute. The integration targets developers and enterprises seeking managed fine-tuning workflows for a broad range of open-source LLMs.

Open Weights Progress Enterprise Deployment Patterns Together AI Hugging Face +1 more