Beyond reverse KL: Generalizing direct preference optimization with diverse divergence constraints C Wang, Y Jiang, C Yang, H Liu, Y Chen International conference on learning representations (ICLR), Spotlight, 2023 | 59 | 2023 |
Invariant and transportable representations for anti-causal domain shifts Y Jiang, V Veitch Advances in Neural Information Processing Systems (NeurIPS) 35, 20782-20794, 2022 | 34 | 2022 |
Learning nonparametric latent causal graphs with unknown interventions Y Jiang, B Aragam Advances in Neural Information Processing Systems (NeurIPS) 36, 2023 | 28 | 2023 |
The Geometry of Categorical and Hierarchical Concepts in Large Language Models K Park, YJ Choe, Y Jiang, V Veitch International conference on learning representations (ICLR), 2024 | 16 | 2024 |
On the origins of linear representations in large language models Y Jiang, G Rajendran, P Ravikumar, B Aragam, V Veitch International conference on machine learning (ICML), 2024 | 15 | 2024 |
Associative memory in iterated overparameterized sigmoid autoencoders Y Jiang, C Pehlevan International conference on machine learning (ICML), 4828-4838, 2020 | 15 | 2020 |
Meta-learning to cluster Y Jiang, N Verma arXiv preprint arXiv:1910.14134, 2019 | 13 | 2019 |
Uncovering meanings of embeddings via partial orthogonality Y Jiang, B Aragam, V Veitch Advances in Neural Information Processing Systems (NeurIPS) 36, 2023 | 9 | 2023 |
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers Y Jiang, G Rajendran, P Ravikumar, B Aragam Advances in Neural Information Processing Systems (NeurIPS) 37, 2024 | 5 | 2024 |
Direct Acquisition Optimization for Low-Budget Active Learning Z Zhao, Y Jiang, Y Chen arXiv preprint arXiv:2402.06045, 2024 | 4 | 2024 |
Model-agnostic meta-learning using runge-kutta methods DJ Im, Y Jiang, N Verma arXiv preprint arXiv:1910.07368, 2019 | 4 | 2019 |
Quantifying generalization complexity for large language models Z Qi, H Luo, X Huang, Z Zhao, Y Jiang, X Fan, H Lakkaraju, J Glass International conference on learning representations (ICLR), 2024 | 3 | 2024 |
Beyond Reward Hacking: Causal Rewards for Large Language Model Alignment C Wang, Z Zhao, Y Jiang, Z Chen, C Zhu, Y Chen, J Liu, L Zhang, X Fan, ... arXiv preprint arXiv:2501.09620, 2025 | | 2025 |