Jiachen (Tianhao) Wang

Jiachen (Tianhao) Wang

Ph.D. Student

Princeton University

About Me

I’m a Ph.D. candidate at Princeton University, advised by Prof. Prateek Mittal. I am also very fortunate to closely work with Prof. Ruoxi Jia at Virginia Tech. Before moving to Princeton, I received my master’s degree from Harvard in 2021, where I worked with Prof. Salil Vadhan. Before that, I received my Bachelor’s Degree in Computer Science and Statistics from the University of Waterloo, where I closely worked with Prof. Florian Kerschbaum.

I am interested in developing theoretical foundations and practical tools for trustworthy machine learning from a data-centric perspective. My current work centers on three main areas: data attribution, data curation, and data privacy. Most recently, I have been developing scalable, theoretically grounded data attribution and curation techniques for foundation models. I use tools from statistics and game theory to analyze the intricate connections between training data and model behaviors.

I am supported by Princeton’s Yan Huo *94 Graduate Fellowship and Gordon Y. S. Wu Fellowship. I was selected as a Rising Star in Data Science in 2024.

I gave a tutorial at NeurIPS 2024 with Ruoxi Jia and Ludwig Schmidt on Advancing Data Selection for Foundation Models: From Heuristics to Principled Methods. The slides are available here. I co-lead the organization of ICLR 2025 workshop on Data Problems for Foundation Models (DATA-FM).

News

[02/2025] Two papers on data attribution for foundation models, In-Run Data Shapley and Data Value Embedding, are both being selected for oral presentation (top ~1.5% among submissions) at ICLR 2025. See you in Singapore!

  tianhaowang[at]princeton.edu
   Engineering Quad B307, Princeton, NJ

Interests

  • Data Attribution
  • Data Curation
  • Data Privacy

Education

  • Princeton University, Sept 2021 - Present

  • Harvard University, Aug 2019 - May 2021

    MEng in Computational Science and Engineering

  • University of Waterloo, Sept 2016 - May 2019

    B.S. in Computer Science and Statistics

Selected Publications

Capturing the Temporal Dependence of Training Data Influence

Data Shapley in One Training Run

GREATS: Online Selection of High-Quality Data for LLM Training in Every Iteration

Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

Efficient Data Valuation for Weighted Nearest Neighbor Algorithms

DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer

Privacy-Preserving In-Context Learning for Large Language Models

Threshold KNN-Shapley: A Linear-Time and Privacy-Friendly Approach to Data Valuation

A Randomized Approach for Tight Privacy Accounting

Data Banzhaf: A Robust Data Valuation Framework for Machine Learning

LAVA: Data Valuation without Pre-Specified Learning Algorithms

Renyi Differential Privacy of Propose-Test-Release and Applications to Private and Robust Machine Learning

Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning

Concurrent Composition of Differential Privacy

DPlis: Boosting Utility of Differentially Private Deep Learning via Randomized Smoothing

RIGA: Covert and Robust White-Box Watermarking of Deep Neural Networks

Improving Robustness to Model Inversion Attacks via Mutual Information Regularization

A Principled Approach to Data Valuation for Federated Learning