Falcon-Edge: 1.58-bit Quantized Language Model Series from TII
Technology Innovation Institute (TII) has released Falcon-Edge, a series of language models operating at 1.58-bit precision, targeting edge deployment scenarios. The models are designed to be fine-tunable despite extreme quantization, positioning them as practical options for resource-constrained environments. This release extends the Falcon model family into the ultra-low-bit regime, following broader industry interest in BitNet-style ternary weight models.
Related guides (3)
Related events (8)
Fine-tuning LLMs to 1.58bit: extreme quantization made easy
Hugging Face published a blog post describing a method for fine-tuning large language models down to 1.58-bit precision, referencing the BitNet b1.58 quantization scheme. The post covers tooling and workflows that make extreme quantization more accessible via the Hugging Face ecosystem. This represents a practical guide to applying ternary-weight quantization ({-1, 0, 1}) to existing models through fine-tuning rather than training from scratch.
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance
TII UAE has released Falcon-H1, a new family of hybrid-head language models combining attention and state-space mechanisms to improve efficiency and performance. The models are published on Hugging Face and represent TII's latest iteration in the Falcon series. The hybrid architecture targets better inference economics and competitive benchmark results relative to model size.
Falcon 180B Released: New Open-Weights Frontier Model
Technology Innovation Institute (TII) has released Falcon 180B, a 180-billion parameter open-weights language model announced via Hugging Face. At the time of release, it was positioned as the largest publicly available open-weights model, trained on 3.5 trillion tokens. The model is available on Hugging Face Hub for research and commercial use under a custom license.
Falcon 2: 11B Parameter Pretrained LLM and VLM Trained on 5T+ Tokens Across 11 Languages
Technology Innovation Institute (TII) has released Falcon 2, an 11B parameter language model pretrained on over 5 trillion tokens spanning 11 languages. The release includes both a base language model and a vision-language model (VLM) variant. This represents a significant update to the Falcon model family, expanding multilingual and multimodal capabilities.
Falcon-Arabic: A Breakthrough in Arabic Language Models
TII UAE has released Falcon-Arabic, a language model specifically designed for Arabic. The announcement highlights it as a significant advancement in Arabic NLP capabilities. As a tier-2 source with minimal body content, specific technical details about model size, training data, or benchmark performance are not available from this item.
Mistral AI Releases Ministral 3B and 8B Edge Models
Mistral AI has introduced two new small language models, Ministral 3B and Ministral 8B, targeting on-device and edge computing use cases. Both models support up to 128k context length and claim state-of-the-art performance in the sub-10B parameter category, outperforming comparable models from Google and Meta on internal benchmarks. Ministral 8B features an interleaved sliding-window attention mechanism for memory-efficient inference and is priced at $0.1/M tokens via API, while Ministral 3B is priced at $0.04/M tokens. Weights for Ministral 8B Instruct are available for research use, with commercial licensing available on request.
Falcon LLM Integrated into Hugging Face Ecosystem
Hugging Face announced the integration of the Falcon language models (Falcon-7B and Falcon-40B) into its ecosystem, including model hosting, inference APIs, and tooling support. Falcon, developed by the Technology Innovation Institute (TII), had recently topped the Open LLM Leaderboard at the time of release. The post covers usage patterns, fine-tuning guidance, and deployment options within the Hugging Face stack.
Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
Hugging Face published a blog post detailing the integration of 4-bit quantization via bitsandbytes into the Transformers library, enabling large language models to run on consumer-grade hardware. The post covers NF4 (NormalFloat4) data type and double quantization techniques from the QLoRA paper, which together reduce memory footprint significantly while preserving model quality. It demonstrates how users can load models like LLaMA in 4-bit precision and fine-tune them using QLoRA with minimal code changes.


