About Me
I am a PhD candidate in Computer Science at Rice University, advised by Prof. Anshumali Shrivastava. I work on making Large Language Models (LLMs) and foundation models more efficient, accurate, and accessible. My research has been published at top-tier conferences such as NeurIPS, ICML, ICLR, and EMNLP. My open-source contributions have been adopted by a growing community of users.
I pioneered a lossless LLM compression technique that reduces model size by 30% while preserving bit-for-bit identical outputs and enabling efficient GPU inference. This work reached #1 on Hacker News and my models on Hugging Face receive thousands of monthly downloads.
Before Rice, I earned my undergraduate degree with distinction in Computer Science from the University of Waterloo.
Research Interests
- Lossless and Lossy Model Compression
- Inference Optimizations
- Accurate and Efficient Fine-tuning
- GPU Kernel Design and Optimization
- Quantization
Selected Publications
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
Preprint
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
ICLR 2025
KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization
NeurIPS 2024
NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention
NeurIPS 2024
Education
Ph.D. in Computer Science (2021 - Expected 2025)
Rice University
Advisor: Prof. Anshumali Shrivastava
B.S. in Computer Science (2016 - 2021)
University of Waterloo