Almanac
← Events
5Hugging Face Blog·1mo ago

From OpenAI to Open LLMs with Messages API on Hugging Face

Hugging Face's Text Generation Inference (TGI) now supports an OpenAI-compatible Messages API, enabling developers to switch from OpenAI models to open-weight LLMs with minimal code changes. The integration allows existing OpenAI SDK users to point their client at Hugging Face endpoints by changing only the base URL and model name. This lowers the migration barrier for teams wanting to self-host or use open models while retaining familiar tooling.

Related guides (4)

Related events (8)

4Hugging Face Blog·1mo ago·source ↗

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Hugging Face published a blog post surveying the open-source LLM ecosystem as of mid-2023, covering text generation models, tooling, and deployment patterns available on the platform. The post highlights the breadth of open-weight models and associated infrastructure for inference and fine-tuning. It serves as a reference overview of the state of open-source LLMs at that point in time.

4Hugging Face Blog·1mo ago·source ↗

Deploy LLMs with Hugging Face Inference Endpoints

Hugging Face published a guide on deploying large language models using their Inference Endpoints service. The post covers how to set up scalable, production-ready LLM deployments with minimal infrastructure overhead. It targets developers looking to move from experimentation to hosted inference without managing raw compute.

8Hugging Face Blog·1mo ago·source ↗

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

5Hugging Face Blog·1mo ago·source ↗

Introducing the Hugging Face LLM Inference Container for Amazon SageMaker

Hugging Face and Amazon Web Services have launched a dedicated LLM inference container for Amazon SageMaker, enabling optimized deployment of large language models on managed cloud infrastructure. The container is built on Hugging Face's Text Generation Inference (TGI) toolkit, which supports features like continuous batching, tensor parallelism, and quantization. This integration lowers the barrier for enterprise teams to deploy open-weight LLMs at scale on AWS without managing custom serving infrastructure.

5Hugging Face Blog·1mo ago·source ↗

Hugging Face Text Generation Inference available for AWS Inferentia2

Hugging Face has announced that its Text Generation Inference (TGI) serving framework is now available for AWS Inferentia2 accelerators. This integration allows users to deploy large language models on AWS's custom AI chips using the TGI stack. The move extends TGI's hardware support beyond GPUs to specialized inference silicon, potentially offering cost and performance advantages for production LLM deployments.

6Hugging Face Blog·1mo ago·source ↗

Making thousands of open LLMs bloom in the Vertex AI Model Garden

Hugging Face and Google Cloud announced an integration bringing thousands of open-source LLMs from the Hugging Face Hub into Vertex AI Model Garden. This partnership allows developers to deploy open-weight models directly through Google Cloud's managed infrastructure. The collaboration represents a significant expansion of enterprise-accessible open model deployment options on a major cloud platform.

5Hugging Face Blog·1mo ago·source ↗

Introducing HUGS - Scale your AI with Open Models

Hugging Face announced HUGS (Hugging Face Generative Services), a new product aimed at helping enterprises scale AI deployments using open models. The service appears to target production inference infrastructure for open-weight models, positioning Hugging Face as a managed deployment layer. This is a product launch in the enterprise AI infrastructure space, competing with managed inference offerings from other providers.

5Hugging Face Blog·1mo ago·source ↗

Fine-tune Any LLM from the Hugging Face Hub with Together AI

Together AI has announced an integration with Hugging Face that enables fine-tuning of any model from the Hugging Face Hub directly through Together AI's platform. This partnership expands access to fine-tuning infrastructure for open-weight models without requiring users to manage their own compute. The integration targets developers and enterprises seeking managed fine-tuning workflows for a broad range of open-source LLMs.