LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

Abstract

In this presentation, I reviewed the paper LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. The discussion encompassed various quantization techniques, the emergence of outlier features in large language models, and the evaluation results reported in the study.

In addition to summarizing the original work, I corrected an erratum related to zero-point quantization and demonstrated how the LLM.int8 workflow is practically integrated with Hugging Face’s ecosystem.