Almanac
product

bitsandbytes

productactivebitsandbytes-e64ef372·4 events·first seen 28d ago

Aliases: bitsandbytes

Co-occurring entities

More like this (12)

Recent events (4)

6Hugging Face Blog·28d ago·source ↗

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Hugging Face published a blog post detailing the integration of 4-bit quantization via bitsandbytes into the Transformers library, enabling large language models to run on consumer-grade hardware. The post covers NF4 (NormalFloat4) data type and double quantization techniques from the QLoRA paper, which together reduce memory footprint significantly while preserving model quality. It demonstrates how users can load models like LLaMA in 4-bit precision and fine-tune them using QLoRA with minimal code changes.

6Hugging Face Blog·28d ago·source ↗

A Gentle Introduction to 8-bit Matrix Multiplication for Transformers at Scale using Hugging Face and bitsandbytes

This Hugging Face blog post introduces 8-bit quantization for large transformer models via integration of the bitsandbytes library with the transformers and accelerate libraries. It explains how LLM.int8() enables loading large models in 8-bit precision, significantly reducing GPU memory requirements without major accuracy degradation. The post covers the technical mechanics of mixed-precision decomposition and how practitioners can use the integration in practice.

5Hugging Face Blog·28d ago·source ↗

Overview of Natively Supported Quantization Schemes in 🤗 Transformers

This Hugging Face blog post surveys the quantization methods natively integrated into the Transformers library as of September 2023, covering schemes such as GPTQ, bitsandbytes (LLM.int8, NF4), and related techniques. It explains how each method works, their trade-offs in terms of memory reduction and inference speed, and how practitioners can apply them via the Transformers API. The post serves as a practical reference for deploying large language models under memory constraints.

5Hugging Face Blog·28d ago·source ↗

Exploring Quantization Backends in Diffusers

Hugging Face published a technical overview of quantization backends available in the Diffusers library for image and video generation models. The post covers integration with multiple quantization frameworks (likely bitsandbytes, GGUF, torchao, and similar) and their trade-offs for diffusion model inference. It targets practitioners seeking to reduce memory footprint and improve throughput when deploying diffusion models.