2024

  1. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Ji Lin*, Jiaming Tang*, Haotian Tang, Shang Yang, Xingyu Dang, and Song Han MLSys 2024 / Abstract / Code

2023

  1. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization Cong Guo*, Jiaming Tang*, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, and Yuhao Zhu ISCA 2023 / Abstract / Code