Entity · product

llama.cpp

productactivellama-cpp-7e7dfac9·5 events·first seen May 18, 2026

Aliases: llama.cpp

Co-occurring entities

Hugging Face Georgi Gerganov GGML Simon Willison USF-MAE FetalCLIP Qwen3-4B FADA UltraFedFM UltraSAM whisper.cpp ggml-org

More like this (12)

Llama Llama 2 Llama 3 Code Llama Llama-3.1-8B Llama-3 Llama 3.2 Llama-3.2-1B-Instruct TinyLlama-1.1B TinyLlama Meta Llama 3.1 405B LlamaGuard

Recent events (5)

2Simon Willison'S Weblog·Jun 16, 2026·source ↗

Simon Willison quotes Georgi Gerganov

Simon Willison shares a quote from Georgi Gerganov, the creator of llama.cpp. The body of the item is empty, so the specific content of the quote is unavailable. Georgi Gerganov is a significant figure in the open-weights inference ecosystem, making any substantive statement from him potentially relevant to tracking open-source LLM tooling trends.

Open Weights Progress Georgi Gerganov llama.cpp Simon Willison

6arXiv · cs.AI·Jun 10, 2026·source ↗

FADA: Unified vision-language model for fetal ultrasound interpretation deployable on consumer smartphones

FADA is a unified vision-language model built on Qwen3.5-VL that performs clinical interpretation, classification, detection, and segmentation of fetal ultrasound images through a single pipeline without requiring external labels at inference. The system distills knowledge from four domain-specific foundation models using selective distillation, achieving 0.8820 mean Dice for segmentation and 0.7671 mAP@0.50 for detection, with expert validation confirming clinically acceptable outputs. Notably, the compressed 0.8B model runs entirely offline on a commodity smartphone (Qualcomm Snapdragon 7 Gen 1) in approximately 60 seconds, targeting diagnostic access gaps in low- and middle-income countries where trained sonographers are scarce. Code, models, and data are publicly released.

Inference Economics Multimodal Progress USF-MAE FetalCLIP Qwen3-4B +4 more

4Hugging Face Blog·May 19, 2026·source ↗

Introduction to ggml

This Hugging Face blog post introduces ggml, a C-based tensor library that underpins popular inference runtimes like llama.cpp and whisper.cpp. It explains ggml's design philosophy, quantization support, and how it enables efficient on-device inference for large language models. The post serves as an educational overview for developers looking to understand or build on the ggml ecosystem.

Open Weights Progress Inference Economics whisper.cpp llama.cpp Hugging Face +2 more

4Hugging Face Blog·May 19, 2026·source ↗

New in llama.cpp: Model Management

llama.cpp has introduced new model management capabilities, as described in a Hugging Face blog post from the ggml-org. The post covers updates to how models are handled within the llama.cpp inference framework. This is a tooling update relevant to the open-source local inference ecosystem.

Open Weights Progress Inference Economics ggml-org llama.cpp Hugging Face +1 more

8Hugging Face Blog·May 18, 2026·source ↗

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

Open Weights Progress Inference Economics Georgi Gerganov llama.cpp Hugging Face +2 more