A near-optimal algorithm for stochastic bilevel optimization via double-momentum P Khanduri, S Zeng, M Hong, HT Wai, Z Wang, Z Yang Advances in neural information processing systems 34, 30271-30283, 2021 | 144 | 2021 |
Maximum-likelihood inverse reinforcement learning with finite-time guarantees S Zeng, C Li, A Garcia, M Hong Advances in Neural Information Processing Systems 35, 10122-10135, 2022 | 38 | 2022 |
A stochastic linearized augmented lagrangian method for decentralized bilevel optimization S Lu, S Zeng, X Cui, M Squillante, L Horesh, B Kingsbury, J Liu, M Hong Advances in Neural Information Processing Systems 35, 30638-30650, 2022 | 17 | 2022 |
Learning to Coordinate in Multi-Agent Systems: A Coordinated Actor-Critic Algorithm and Finite-Time Guarantees S Zeng, T Chen, A Garcia, M Hong 4th Annual Learning for Dynamics & Control Conference (L4DC 2022), 2021 | 15 | 2021 |
When demonstrations meet generative world models: A maximum likelihood framework for offline inverse reinforcement learning S Zeng, C Li, A Garcia, M Hong Advances in Neural Information Processing Systems 36, 65531-65565, 2023 | 14 | 2023 |
On the divergence of decentralized nonconvex optimization M Hong, S Zeng, J Zhang, H Sun SIAM Journal on Optimization 32 (4), 2879-2908, 2022 | 13 | 2022 |
A momentum-assisted single-timescale stochastic approximation algorithm for bilevel optimization P Khanduri, S Zeng, M Hong, HT Wai, Z Wang, Z Yang arXiv preprint arXiv:2102.07367, 2021 | 13 | 2021 |
Multi-Agent Reinforcement Learning for Adaptive Routing: A Hybrid Method using Eligibility Traces S Zeng, X Xu, Y Chen 2020 IEEE 16th International Conference on Control & Automation (ICCA), 1332 …, 2020 | 13 | 2020 |
Getting more juice out of the sft data: Reward learning from human demonstration improves sft for llm alignment J Li, S Zeng, HT Wai, C Li, A Garcia, M Hong Advances in Neural Information Processing Systems 37, 124292-124318, 2025 | 12 | 2025 |
Structural estimation of markov decision processes in high-dimensional state space with finite-time guarantees S Zeng, M Hong, A Garcia Operations Research, 2024 | 10 | 2024 |
A bayesian approach to robust inverse reinforcement learning R Wei, S Zeng, C Li, A Garcia, AD McDonald, M Hong Conference on Robot Learning, 2304-2322, 2023 | 8* | 2023 |
Joint Reward and Policy Learning with Demonstrations and Human Feedback Improves Alignment C Li, S Zeng, Z Liao, J Li, D Kang, A Garcia, M Hong The Thirteenth International Conference on Learning Representations, 2025 | 3* | 2025 |
Network-Level System Performance Prediction Using Deep Neural Networks with Cross-Layer Information Q Cao, S Zeng, MO Pun, Y Chen ICC 2020-2020 IEEE International Conference on Communications (ICC), 1-6, 2020 | 2 | 2020 |
Bilevel decentralized multi-agent learning S Zeng, S Lu, X Cui, MS Squillante, L Horesh, BED Kingsbury, M Hong US Patent App. 18/217,081, 2025 | | 2025 |
From Demonstrations to Rewards: Alignment Without Explicit Human Preferences S Zeng, Y Liu, H Rangwala, G Karypis, M Hong, R Fakoor | | 2025 |
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality R Zhang, S Zeng, C Li, A Garcia, M Hong The 28th International Conference on Artificial Intelligence and Statistics, 2025 | | 2025 |
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens Z Cen, Y Liu, S Zeng, P Chaudhari, H Rangwala, G Karypis, R Fakoor arXiv preprint arXiv:2410.14655, 2024 | | 2024 |
LLM Alignment Through Successive Policy Re-weighting (SPR) X Zhang, S Zeng, J Li, K Lin, M Hong NeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles …, 2024 | | 2024 |