2023

  1. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Ji Lin*, Jiaming Tang*, Haotian Tang, Shang Yang, Xingyu Dang, and Song Han arXiv / Abstract / Code
  2. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization Cong Guo*, Jiaming Tang*, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, and Yuhao Zhu ISCA2023 / Abstract / Code