Stebėti
Hanyang Zhao
Pavadinimas
Cituota
Cituota
Metai
Score-based Diffusion Models via Stochastic Differential Equations--a Technical Tutorial
W Tang, H Zhao
arXiv preprint arXiv:2402.07487, 2024
282024
Policy optimization for continuous reinforcement learning
H Zhao, W Tang, D Yao
Advances in Neural Information Processing Systems 36, 2023
212023
Contractive diffusion probabilistic models
W Tang, H Zhao
arXiv preprint arXiv:2401.13115, 2024
172024
Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey
GI Winata, H Zhao, A Das, W Tang, D Yao, SX Zhang, S Sahu
Journal of Artificial Intelligence Research, 2024
72024
Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
H Zhao, H Chen, J Zhang, DD Yao, W Tang
arXiv preprint arXiv:2502.01819, 2025
4*2025
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
H Chen, H Zhao, H Lam, D Yao, W Tang
ICLR 2025, 2024
42024
Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisines
GI Winata, F Hudi, PA Irawan, D Anugraha, RA Putri, Y Wang, A Nohejl, ...
NAACL 2025, 2024
32024
RainbowPO: A unified framework for combining improvements in preference optimization
H Zhao, GI Winata, A Das, SX Zhang, DD Yao, W Tang, S Sahu
ICLR 2025, 2024
22024
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–8