Almanac
product

llama.cpp

productactivellama-cpp-7e7dfac9·5 events·first seen 1mo ago

Aliases: llama.cpp

Co-occurring entities

More like this (12)

Recent events (5)

8Hugging Face Blog·1mo ago·source ↗

GGML and llama.cpp Join Hugging Face to Ensure Long-Term Progress of Local AI

GGML and llama.cpp, the foundational open-source libraries enabling efficient local inference of large language models, are joining Hugging Face. This move is intended to secure long-term development and sustainability of the projects that underpin much of the local/on-device AI ecosystem. The acquisition or integration represents a significant consolidation of key open-weights inference infrastructure under the Hugging Face umbrella.

4Hugging Face Blog·28d ago·source ↗

New in llama.cpp: Model Management

llama.cpp has introduced new model management capabilities, as described in a Hugging Face blog post from the ggml-org. The post covers updates to how models are handled within the llama.cpp inference framework. This is a tooling update relevant to the open-source local inference ecosystem.

4Hugging Face Blog·28d ago·source ↗

Introduction to ggml

This Hugging Face blog post introduces ggml, a C-based tensor library that underpins popular inference runtimes like llama.cpp and whisper.cpp. It explains ggml's design philosophy, quantization support, and how it enables efficient on-device inference for large language models. The post serves as an educational overview for developers looking to understand or build on the ggml ecosystem.

2Simon Willison'S Weblog·13h ago·source ↗

Simon Willison quotes Georgi Gerganov

Simon Willison shares a quote from Georgi Gerganov, the creator of llama.cpp. The body of the item is empty, so the specific content of the quote is unavailable. Georgi Gerganov is a significant figure in the open-weights inference ecosystem, making any substantive statement from him potentially relevant to tracking open-source LLM tooling trends.

6arXiv · cs.AI·7d ago·source ↗

FADA: Unified vision-language model for fetal ultrasound interpretation deployable on consumer smartphones

FADA is a unified vision-language model built on Qwen3.5-VL that performs clinical interpretation, classification, detection, and segmentation of fetal ultrasound images through a single pipeline without requiring external labels at inference. The system distills knowledge from four domain-specific foundation models using selective distillation, achieving 0.8820 mean Dice for segmentation and 0.7671 mAP@0.50 for detection, with expert validation confirming clinically acceptable outputs. Notably, the compressed 0.8B model runs entirely offline on a commodity smartphone (Qualcomm Snapdragon 7 Gen 1) in approximately 60 seconds, targeting diagnostic access gaps in low- and middle-income countries where trained sonographers are scarce. Code, models, and data are publicly released.