Yuming Lou

PhD Student @ Princeton CS

Efficient AI | Systems for ML | Algorithm-System Co-design

About Me

I am a passionate PhD student at Princeton CS, advised by Professor Kai Li and Professor Ravi Netravali. Before that, I obtained my B.E. degree from EE, Tsinghua University. My research focuses on efficient AI, spanning from algorithms to systems.

Previously, I had the privilege of working under the supervision of Professor Yu Wang and Xuefei Ning at Tsinghua on quantization algorithms.

I also interned at MIT HAN Lab, supervised by Professor Song Han, Shang Yang, and Haotian Tang, exploring efficient system design for quantized LLMs. I have also collaborated closely with Jason Lu from Nvidia Research.

Blog & Updates

Dec 12, 2024

TinyChat 2.0: Accelerating Edge AI with Efficient LLM and VLM Deployment

Explore the latest advancement in TinyChat – the 2.0 version with significant advancements in prefilling speed of Edge LLMs and VLMs.

Jan 24, 2026

Test-Time Training for Long Context

Here I share some results and insights on managing long context in large language models through test-time training.

Selected Projects

TinyChat 2.0
Co-led the development. Accelerating Edge AI with Efficient LLM and VLM Deployment.

Publications

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
H. Ye, C. Yang, A. Goel, W. Huang, L. Zhu, Y. Su, S. Lin, A. Cheng, Z. Wan, J. Tian, Yuming Lou, et al.
ArXiv 2025
NVILA: Efficient Frontier Visual Language Models
Z. Liu, L. Zhu, B. Shi, Z. Zhang, Yuming Lou, et al.
CVPR 2025
A Survey on Efficient Inference for Large Language Models
Z. Zhou, X. Ning, K. Hong, T. Fu, J. Xu, S. Li, Yuming Lou, et al.
ArXiv 2024

Interests

Outside of research, I enjoy a variety of sports (badminton, basketball) and am passionate about movies and TV series.

Contact

Feel free to reach out via email.