LLM.int8
llm-int8-5b13eb10·2 events·first seen 28d agoAliases: LLM.int8, LLM.int8()
Co-occurring entities
More like this (12)
Recent events (2)
A Gentle Introduction to 8-bit Matrix Multiplication for Transformers at Scale using Hugging Face and bitsandbytes
This Hugging Face blog post introduces 8-bit quantization for large transformer models via integration of the bitsandbytes library with the transformers and accelerate libraries. It explains how LLM.int8() enables loading large models in 8-bit precision, significantly reducing GPU memory requirements without major accuracy degradation. The post covers the technical mechanics of mixed-precision decomposition and how practitioners can use the integration in practice.
Overview of Natively Supported Quantization Schemes in 🤗 Transformers
This Hugging Face blog post surveys the quantization methods natively integrated into the Transformers library as of September 2023, covering schemes such as GPTQ, bitsandbytes (LLM.int8, NF4), and related techniques. It explains how each method works, their trade-offs in terms of memory reduction and inference speed, and how practitioners can apply them via the Transformers API. The post serves as a practical reference for deploying large language models under memory constraints.