Almanac
product

TensorRT-LLM

productactivetensorrt-llm-555d11d0·6 events·first seen 1mo ago

Aliases: TensorRT-LLM

Co-occurring entities

More like this (12)

Recent events (6)

6Hugging Face Blog·28d ago·source ↗

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

Hugging Face's Text Generation Inference (TGI) now supports multiple inference backends, including NVIDIA TensorRT-LLM and vLLM, in addition to its native backend. This allows users to select the most appropriate backend for their hardware and workload without leaving the TGI ecosystem. The announcement positions TGI as a unified serving layer that abstracts over competing inference runtimes, potentially simplifying enterprise deployment workflows.

5Hugging Face Blog·28d ago·source ↗

Optimum-NVIDIA: One-Line LLM Inference Acceleration via TensorRT-LLM

Hugging Face's Optimum-NVIDIA integration wraps NVIDIA's TensorRT-LLM backend to enable high-performance LLM inference with minimal code changes. The library targets developers who want near-peak GPU throughput without manually configuring TensorRT-LLM pipelines. It positions as a bridge between the Hugging Face ecosystem and NVIDIA's optimized inference stack.

5Hugging Face Blog·28d ago·source ↗

Accelerate a World of LLMs on Hugging Face with NVIDIA NIM

NVIDIA NIM microservices are being integrated with Hugging Face to enable optimized inference deployment for a broad range of LLMs hosted on the Hub. The partnership allows developers to deploy Hugging Face models via NIM's containerized inference stack, leveraging NVIDIA's TensorRT-LLM and other optimizations. This expands the ecosystem of models accessible through NIM beyond NVIDIA's own catalog to the wider Hugging Face model repository.

7Mistral Ai News·1mo ago·source ↗

Mistral AI Launches La Plateforme: First API Endpoints in Early Access

Mistral AI opened beta access to its first developer platform, La Plateforme, offering three generative text endpoints (mistral-tiny, mistral-small, mistral-medium) and an embedding endpoint. Mistral-tiny serves Mistral 7B Instruct v0.2, mistral-small serves Mixtral 8x7B, and mistral-medium serves an unreleased prototype model scoring 8.6 on MT-Bench. The platform also introduces Mistral-embed with a 1024-dimension embedding model achieving 55.26 on MTEB. The API follows OpenAI-compatible chat interface specifications and is ramping toward general availability.

8Mistral Ai News·1mo ago·source ↗

Mistral Releases Mistral 3 Family: Mistral Large 3 (675B MoE) and Ministral 3 Series (3B–14B), All Apache 2.0

Mistral AI has announced Mistral 3, a family of open-weight models including Mistral Large 3 (41B active / 675B total sparse MoE) and three dense Ministral 3 edge models (3B, 8B, 14B), all released under Apache 2.0. Mistral Large 3 debuts at #2 on LMArena's OSS non-reasoning leaderboard, supports image understanding, and was trained on 3,000 NVIDIA H200 GPUs; a reasoning variant is forthcoming. The Ministral 3 series includes base, instruct, and reasoning variants with multimodal and multilingual capabilities, with the 14B reasoning model achieving 85% on AIME '25. The release involves deep co-optimization with NVIDIA (Blackwell/Hopper kernels, NVFP4 format), vLLM, and Red Hat, and is available across major cloud and inference platforms.

7Mistral Ai News·15d ago·source ↗

Codestral Mamba: Mistral AI Releases Apache 2.0 Mamba-Architecture Code Model

Mistral AI has released Codestral Mamba, a 7.3B-parameter code-focused language model built on the Mamba state-space architecture rather than the Transformer architecture. The model offers linear-time inference and theoretically infinite sequence length, tested up to 256k tokens in-context retrieval. Developed with Mamba co-creators Albert Gu and Tri Dao, it is released under Apache 2.0 and available via HuggingFace, mistral-inference SDK, TensorRT-LLM, and Mistral's la Plateforme API. Mistral positions it as a local code assistant that performs on par with state-of-the-art transformer-based code models.