Haoli Bai (柏昊立)

Researcher, Huawei Noah's Ark Lab

Hong Kong SAR, China

Email: haolibai [at] gmail.com

General

I am currently a researcher at Huawei Noah's Ark Lab. I obtained my Ph.D. degree from The Chinese University of Hong Kong in 2021, supervised by Prof. Michael R. Lyu and Prof. Irwin King. Prior to that, I received the B.Eng. Degree from Yingcai Honors College of University of Electronic Science and Technology of China in 2017.

[Hiring] I am looking for research/engineering interns with strong machine learning and natural language processing background. Please drop me an email if you are interested. Base: HK or Shenzhen.

Research Topics

Acceleration of Large Language Models, Multi-modal Pre-training

News

[2025-5] Congrats! One paper (FlatQuant) accepted to ICML 2025 and one paper (TreeKV) accpeted to IJCAI 2025.

[2025-4] I will serve as the Area Chair for NeurIPS 2025.

[2025-4]🔥Our preprint "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models" is now released and trending on alphaXiv. Code will be available here.

[2024-10]🔥Our preprint "FlatQuant: Flatness Matters for LLM Quantization" sets up new records for LLM quantization! Code is available here.

[2024-3] Our work "Visually Guided Generative Text-Layout Pre-training for Document Intelligence" is accepted by NACCL 2024.

Selected Research

(*: Equal contribution; #: Corresponding author; +: Project lead)

Xinrui Chen, Haoli Bai^#+, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan^#
A Simple Linear Patch Revives Layer-Pruned Large Language Models
Preprint arXiv:2505.24680 , 2025.

Ruikang Liu^*, Yuxuan Sun^*, Manyi Zhang^*, Haoli Bai^#+, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou^#
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Preprint arXiv:2504.04823 , 2025. [Code]

Yuxuan Sun^*, Ruikang Liu^*, Haoli Bai^#+, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao
FlatQuant: Flatness Matters for LLM Quantization
International Conference on Machine Learning (ICML), 2025. [Code]

Zhiming Mao, Haoli Bai^#+, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
The North American Chapter of the Association for Computational Linguistics (NACCL), 2024. [Code]

Ruikang Liu, Haoli Bai⁺, Haokun Lin, Yuening Li, Han Gao, Zhengzhuo Xu, Lu Hou, Jun Yao, Chun Yuan
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2024. [Code]

Haokun Lin, Haoli Bai⁺, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

Yingtao Zhang, Haoli Bai⁺, Haokun Lin, Jialin Zhao, Lu Hou, Carlo Vittorio Cannistraci
Plug-and-Play: An Efficient Post-training Pruning Method for Large Language Models
The Twelfth International Conference on Learning Representations (ICLR), 2024. [Code]

Haoli Bai^*, Zhiguang Liu^*, Xiaojun Meng^*, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
The 61th Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

Chaofan Tao, Lu Hou⁺, Haoli Bai⁺, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
Structured Pruning for Efficient Generative Pre-trained Language Models
Findings of The 61th Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael Lyu
Towards Efficient Post-training Quantization of Pre-trained Language Models
Proceedings of the 36th conference on Neural Information Processing Systems (NeurIPS), 2022.

Haoli Bai, Hongda Mao, Dinesh Nair
Dynamically pruning segformer for efficient semantic segmentation
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King
BinaryBERT: Pushing the Limit of BERT Quantization
The 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021. Accepted with scores 5, 5, 4. [Code]

Haoli Bai^*, Jiaxing Wang^*, Jiaxiang Wu, Xupeng Shi, Junzhou Huang, Irwin King, Michael Lyu, Jian Cheng
Revisiting Parameter Sharing for Automatic Neural Channel Number Search
Proceedings of the 34th conference on Neural Information Processing Systems (NeurIPS), 2020. [Code]

Haoli Bai, Jiaxiang Wu, Irwin King, Michael Lyu
Few Shot Network Compression via Cross Distillation
Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2020. [Code] [Poster]

Haoli Bai, Zhuangbin Chen, Michael Lyu, Irwin King, Zenglin Xu
Neural Relational Topic Models for Scientific Article Analysis
Proceedings of The 27th International Conference on Information and Knowledge Management (CIKM), 2018. [Code]

Haoli Bai, Zenglin Xu, Bin Liu, Yingming Li
Hierarchical Probabilistic Matrix Factorization with Network Topology for Multi-relational Social Network
Proceedings of The 8th Asian Conference on Machine Learning (ACML), 2016, Best Student Paper Runner-up.

Projects

PocketFlow: An Automated Framework for Compressing and Accelerating DNNs
[Code] [Doc]

PocketFlow automatically searches for optimal model compression strategies such as network pruning, quantization, knowledge distillation with little human efforts, and also supports TFLite deployment on Andriod devices. It has collected 2600+ stars and 480+ folks.

Services

Area Chair: NeurIPS 2025

Senior PC Member: IJCAI 2021

PC Member: ICLR 22-25, ICML 21-25, NeurIPS 20-24, ACL ARR 25, COLM 25, ICCV 25, AAAI 19-21, IJCAI 20

Journal Reviewer: Cognitive Computation, Neural Networks, Neurocomputing

Selected Awards

Excellent Intern

Huawei Noah's Ark Lab, 2021

AAAI Student Travel Grant

AAAI 2020

ACM Student Travel Grant

CIKM 2018

CUHK Postgraduate Student Scholarship

2017-2021

Best Student Paper Runner-up

ACML 2016

National Scholarship

2015

Tang Lixin Scholarship

2015

Working Experiences

Applied Scientist Intern at Amazon Devices

2021 Summer

Research Intern at Huawei Noah's Ark Lab

2020 Summer

Research Intern at Tencent AI Lab

2018 Summer

Teaching Assistant

CSCI3100: Software Engineering

2020 Spring

CSCI3100: Software Engineering

2019 Spring

CSCI1540: Introduction to C++

2018 Fall

CSCI3100: Software Engineering

2018 Spring