Almanac
technique

GPTQ

techniqueactivegptq-bc8cdf24·2 events·first seen 28d ago

Aliases: GPTQ

Co-occurring entities

More like this (12)

Recent events (2)

6Hugging Face Blog·28d ago·source ↗

Making LLMs lighter with AutoGPTQ and transformers

Hugging Face announces native integration of AutoGPTQ into the transformers library, enabling 4-bit quantized inference for large language models. The integration allows users to load and run GPTQ-quantized models directly through the standard transformers API with minimal code changes. This lowers the hardware barrier for deploying LLMs by significantly reducing VRAM requirements while maintaining competitive performance.

5Hugging Face Blog·28d ago·source ↗

Overview of Natively Supported Quantization Schemes in 🤗 Transformers

This Hugging Face blog post surveys the quantization methods natively integrated into the Transformers library as of September 2023, covering schemes such as GPTQ, bitsandbytes (LLM.int8, NF4), and related techniques. It explains how each method works, their trade-offs in terms of memory reduction and inference speed, and how practitioners can apply them via the Transformers API. The post serves as a practical reference for deploying large language models under memory constraints.