Score-based Diffusion Models via Stochastic Differential Equations--a Technical Tutorial W Tang, H Zhao arXiv preprint arXiv:2402.07487, 2024 | 28 | 2024 |
Policy optimization for continuous reinforcement learning H Zhao, W Tang, D Yao Advances in Neural Information Processing Systems 36, 2023 | 21 | 2023 |
Contractive diffusion probabilistic models W Tang, H Zhao arXiv preprint arXiv:2401.13115, 2024 | 17 | 2024 |
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey GI Winata, H Zhao, A Das, W Tang, D Yao, SX Zhang, S Sahu Journal of Artificial Intelligence Research, 2024 | 7 | 2024 |
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning H Zhao, H Chen, J Zhang, DD Yao, W Tang arXiv preprint arXiv:2502.01819, 2025 | 4* | 2025 |
MallowsPO: Fine-Tune Your LLM with Preference Dispersions H Chen, H Zhao, H Lam, D Yao, W Tang ICLR 2025, 2024 | 4 | 2024 |
Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisines GI Winata, F Hudi, PA Irawan, D Anugraha, RA Putri, Y Wang, A Nohejl, ... NAACL 2025, 2024 | 3 | 2024 |
RainbowPO: A unified framework for combining improvements in preference optimization H Zhao, GI Winata, A Das, SX Zhang, DD Yao, W Tang, S Sahu ICLR 2025, 2024 | 2 | 2024 |