What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization Y Zhang, F Zhang, Z Yang, Z Wang arXiv preprint arXiv:2305.19420, 2023 | 66 | 2023 |
Generative adversarial imitation learning with neural network parameterization: Global optimality and convergence rate Y Zhang, Q Cai, Z Yang, Z Wang International conference on machine learning, 11044-11054, 2020 | 35* | 2020 |
Learning from demonstration: Provably efficient adversarial policy imitation with linear function approximation Z Liu, Y Zhang, Z Fu, Z Yang, Z Wang International conference on machine learning, 14094-14138, 2022 | 26* | 2022 |
Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case Y Zhang, Z Yang, Z Wang International Conference on Artificial Intelligence and Statistics, 2764-2772, 2021 | 20 | 2021 |
Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration R Wu, Y Zhang, Z Yang, Z Wang Advances in Neural Information Processing Systems 34, 2021 | 19 | 2021 |
Provably efficient offline reinforcement learning for partially observable Markov decision processes H Guo, Q Cai, Y Zhang, Z Yang, Z Wang International Conference on Machine Learning, 8016-8038, 2022 | 18 | 2022 |
Federated offline reinforcement learning D Zhou, Y Zhang, A Sonabend-W, Z Wang, J Lu, T Cai Journal of the American Statistical Association 119 (548), 3152-3163, 2024 | 14 | 2024 |
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory Y Zhang, Q Cai, Z Yang, Y Chen, Z Wang Advances in Neural Information Processing Systems 33, 19680-19692, 2020 | 13 | 2020 |
An analysis of attention via the lens of exchangeability and latent variable models Y Zhang, B Liu, Q Cai, L Wang, Z Wang arXiv preprint arXiv:2212.14852, 2022 | 10 | 2022 |
Infinite-dimensional optimization for zero-sum games via variational transport L Liu, Y Zhang, Z Yang, R Babanezhad, Z Wang International conference on machine learning, 7033-7044, 2021 | 10* | 2021 |
Can large language models play games? a case study of a self-play approach H Guo, Z Liu, Y Zhang, Z Wang arXiv preprint arXiv:2403.05632, 2024 | 8 | 2024 |
Variational transport: A convergent particle-basedalgorithm for distributional optimization Z Yang, Y Zhang, Y Chen, Z Wang arXiv preprint arXiv:2012.11554, 2020 | 7 | 2020 |
Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic Y Zhang, S Chen, Z Yang, M Jordan, Z Wang Advances in Neural Information Processing Systems 34, 2021 | 6 | 2021 |
Lobass: Gauging learnability in supervised fine-tuning data H Zhou, T Liu, Q Ma, J Yuan, P Liu, Y You, H Yang arXiv preprint arXiv:2310.13008, 2023 | 5 | 2023 |
Fullstack bench: Evaluating llms as full stack coder S Liu, H Zhu, J Liu, S Xin, A Li, R Long, L Chen, J Yang, J Xia, ZY Peng, ... arXiv preprint arXiv:2412.00535, 2024 | 4 | 2024 |
Seed-cts: Unleashing the power of tree search for superior performance in competitive coding tasks H Wang, B Liu, Y Zhang, J Chen arXiv preprint arXiv:2412.12544, 2024 | 1 | 2024 |
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs S Zhang, Z Liu, B Liu, Y Zhang, Y Yang, Y Liu, L Chen, T Sun, Z Wang arXiv preprint arXiv:2410.08067, 2024 | 1 | 2024 |
BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data X Wang, Q Cui, Y Tao, Y Wang, Z Chai, X Han, B Liu, J Yuan, J Su, ... arXiv preprint arXiv:2410.00773, 2024 | | 2024 |
A Mean-Field Analysis of Neural Gradient Descent-Ascent: Applications to Functional Conditional Moment Equations Y Zhu, Y Zhang, Z Wang, Z Yang, X Chen arXiv e-prints, arXiv: 2404.12312, 2024 | | 2024 |