Jiachen (Tianhao) Wang

Jiachen (Tianhao) Wang

Ph.D. Student

Princeton University

About Me

I’m a 4th year Ph.D. student at Princeton University, advised by Prof. Prateek Mittal. I am also very fortunate to closely work with Prof. Ruoxi Jia at Virginia Tech. Before moving to Princeton, I received my master’s degree from Harvard University, where I worked with Prof. Salil Vadhan. Before that, I received my Bachelor’s Degree in Computer Science and Statistics from the University of Waterloo, where I closely worked with Prof. Florian Kerschbaum.

I am interested in developing theoretical foundations and practical tools for trustworthy machine learning from a data-centric perspective. My current work centers on three main areas: data attribution, data curation, and data privacy. Most recently, I have been developing scalable, theoretically grounded data attribution and curation techniques for foundation models. I use tools from statistics and game theory to analyze the intricate connections between training data and model behaviors.

I am supported by Princeton’s Yan Huo *94 Graduate Fellowship and Gordon Y. S. Wu Fellowship. I was selected as a Rising Star in Data Science in 2024.

News

[12/2024] I gave a tutorial at NeurIPS 2024 with Ruoxi and Ludwig on Advancing Data Selection for Foundation Models: From Heuristics to Principled Methods. The slides are available here. It was a super rewarding process to prepare for this!
[12/2024] So happy that our workshop Data Problems for Foundation Models (DPFM) has been accepted by ICLR 2025. Stay tuned!

  tianhaowang[at]princeton.edu
   Engineering Quad B307, Princeton, NJ

Interests

  • Data Attribution
  • Data Curation
  • Data Privacy

Education

  • Princeton University, Sept 2021 - Present

  • Harvard University, Aug 2019 - May 2021

    MEng in Computational Science and Engineering

  • University of Waterloo, Sept 2016 - May 2019

    B.S. in Computer Science and Statistics

Selected Publications

GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration

Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Efficient Data Valuation for Weighted Nearest Neighbor Algorithms

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Privacy-Preserving In-Context Learning for Large Language Models

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

A Randomized Approach for Tight Privacy Accounting

Data Banzhaf: A Robust Data Valuation Framework for Machine Learning

LAVA: Data Valuation without Pre-Specified Learning Algorithms

Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning

Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning

Concurrent Composition of Differential Privacy

DPlis: Boosting Utility of Differentially Private Deep Learning via Randomized Smoothing

RIGA: Covert and Robust White-Box Watermarking of Deep Neural Networks

Improving Robustness to Model Inversion Attacks via Mutual Information Regularization

A Principled Approach to Data Valuation for Federated Learning