Jaeseung Heo
jsheo12304@postech.ac.kr
Hi, I’m Jaeseung Heo, a Ph.D. student at POSTECH ML Lab under the supervision of Prof. Dongwoo Kim. My research aims to identify which training data causes specific model behaviors, with a longer-term goal of extending this to problematic behaviors relevant to safety and alignment. I use training data attribution (TDA), particularly influence functions, to characterize how training examples drive model behavior, and translate these signals into data-centric interventions such as augmentation, label smoothing, and selection. Methodologically, I develop influence functions that capture dependence between training examples, both explicit (as in graph neural networks) and implicit (arising from joint loss minimization). Going forward, I aim to connect TDA with mechanistic interpretability, and ultimately to trace behaviors such as subliminal learning back to their origins in training data.
News
| May, 2026 | |
|---|---|
| Nov, 2025 | |
| Sep, 2025 | |
| Jun, 2025 | |
| May, 2024 | |