Tianyi "Tony" Zhang's Website
Bridging Algorithms and Hardware for Efficient AI
I am an AI researcher specializing in the design of efficient training and inference methods for foundational models. My research focuses on bridging the gap between algorithmic innovation and hardware acceleration, with emphasis on model compression, quantization, quantization-aware training, and custom CUDA kernels.
Education
I received my Ph.D. in Computer Science from Rice University in August 2025, advised by Prof. Anshumali Shrivastava. Prior to that, I completed my undergraduate studies in Computer Science at the University of Waterloo in Canada, graduating with distinction in 2021.
Research Highlights
My recent work, DFloat11 [NeurIPS ‘25], is a pioneering approach for losslessly compressing foundation models, including Large Language Models (LLMs) and Diffusion Transformers, to enable efficient GPU inference. Unlike quantization or pruning methods that degrade model quality, DFloat11 produces outputs that are bit-for-bit identical to the original BFloat16 models while reducing model size by 32%, enabling efficient deployment on lower-tier GPUs. The project has gained significant community attention and adoption:
- Trended at #2 on Hacker News (Discussion)
- 30K+ downloads of the open-source Python package on PyPI
- 200K+ downloads of the open-source models on Hugging Face
- Media coverage by MIT Technology Review (China), AI Era (新智元), and Synced (机器之心)
Entrepreneurship
I co-founded the AI startup xMAD.ai in 2024, where I led research and product development. Our mission of building customized, cost-effective AI agents was supported by funding from Non-Sibi Ventures, AISprouts, and the Hopper-Dean Foundation.
In August 2025, xMAD.ai was acquired by Workato to launch their first AI research lab. Read the official press release here.
Industry Experience
I have held research internships at Visa Research, Amazon, and Intel. At Visa Research, I was a key contributor to TransactionGPT, developing a novel compression algorithm for transaction foundation models that reduced communication bottlenecks and significantly accelerated training on H100 clusters (paper).
Academic Service
- Top Reviewer Award, NeurIPS 2025
- Reviewer @ NeurIPS, ICML, ICLR, ACL, EMNLP, KDD, WWW