I’m a Ph.D. candidate at Princeton University, advised by Prof. Prateek Mittal. I am also very fortunate to closely work with Prof. Ruoxi Jia at Virginia Tech. Before moving to Princeton, I received my master’s degree from Harvard in 2021, where I worked with Prof. Salil Vadhan. Before that, I received my Bachelor’s Degree in Computer Science and Statistics from the University of Waterloo, where I closely worked with Prof. Florian Kerschbaum.
I am interested in developing theoretical foundations and practical tools for trustworthy machine learning from a data-centric perspective. My current work centers on three main areas: data attribution, data curation, and data privacy. Most recently, I have been developing scalable, theoretically grounded data attribution and curation techniques for foundation models. I use tools from statistics and game theory to analyze the intricate connections between training data and model behaviors.
I am supported by Princeton’s Yan Huo *94 Graduate Fellowship and Gordon Y. S. Wu Fellowship. I was selected as a Rising Star in Data Science in 2024.
I gave a tutorial at NeurIPS 2024 with Ruoxi Jia and Ludwig Schmidt on Advancing Data Selection for Foundation Models: From Heuristics to Principled Methods. The slides are available here. I co-lead the organization of ICLR 2025 workshop on Data Problems for Foundation Models (DATA-FM).
[02/2025] Two papers on data attribution for foundation models, In-Run Data Shapley and Data Value Embedding, are both being selected for oral presentation (top ~1.5% among submissions) at ICLR 2025. See you in Singapore!
tianhaowang[at]princeton.edu
Engineering Quad B307, Princeton, NJ
Princeton University, Sept 2021 - Present
Harvard University, Aug 2019 - May 2021
MEng in Computational Science and Engineering
University of Waterloo, Sept 2016 - May 2019
B.S. in Computer Science and Statistics