2024

  1. Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference Jiaming Tang*, Yilong Zhao*, Kan Zhu, Guangxuan Xiao, Baris Kasikci, and Song Han ICML 2024 / Abstract / Code
  2. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Ji Lin*, Jiaming Tang*, Haotian Tang, Shang Yang, Wei-ming Chen, Wei-chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han MLSys 2024 / Best Paper Award / Abstract / Code

2023

  1. OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization Cong Guo*, Jiaming Tang*, Weiming Hu, Jingwen Leng, Chen Zhang, Fan Yang, Yunxin Liu, Minyi Guo, and Yuhao Zhu ISCA 2023 / Abstract / Code