About
I am currently a researcher at the Language Model Lab, Huawei Hong Kong Research Center. I obtained my Ph.D. degree from The Chinese University of Hong Kong supervised by Prof. Michael R. Lyu and Prof. Irwin King, and the B.Eng. Degree from Yingcai Honors College of University of Electronic Science and Technology.
Our team's effort is on large language models with topics spanning from pre-training, post-training, to agentic AI (e.g., deep research and coding agent). I am also an experienced researcher in LLM efficiency, e.g., compression and acceleration of LLMs.
News
- 2026-1 🔥 We present SWE-Lego , the state-of-the-art supervised fine-tuning method for software issue resolving. All code, data, models are now opensourced. Project website.
- 2025-11 We will present the tutorial "Efficient Inference for Large Language Models – Algorithm, Model, and System" at EMNLP 2025. Tutorial website.
- 2025-11 I will give a talk on "Quantization and Pruning of Large Language Models: Challenges, Techniques and Opportunities" at LMG 2025.
- 2025-9 Our paper "A Simple Linear Patch Revives Layer-Pruned Large Language Models" is accepted by NeurIPS 2025.
Selected Publications
*: Equal contribution; #: Corresponding author; +: Project lead
Invited Talks
- "Quantization and Pruning of Large Language Models: Challenges, Techniques and Opportunities" at SLAI, 2025. [Slide]
- "Efficient Inference for Large Language Models – Algorithm, Model, and System" at EMNLP Tutorial, 2025. [Tutorial website]
- "Quantization and Pruning of Large Language Models: Challenges, Techniques and Opportunities" at LMG, 2025.
Projects
PocketFlow automatically searches for optimal model compression strategies such as network pruning, quantization, knowledge distillation with little human efforts, and also supports TFLite deployment on Android devices. It has collected 2600+ stars and 480+ forks.