Folgen
Zhihui Xie
Zhihui Xie
University of Hong Kong, Shanghai Jiao Tong University
Bestätigte E-Mail-Adresse bei connect.hku.hk - Startseite
Titel
Zitiert von
Zitiert von
Jahr
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
L Li*, Z Xie*, M Li, S Chen, P Wang, L Chen, Y Yang, B Wang, L Kong, ...
EMNLP 2024, 2024
68*2024
Comparison-based Conversational Recommender System with Relative Bandit Feedback
Z Xie, T Yu, C Zhao, S Li
SIGIR 2021, 1400-1409, 2021
462021
Pretraining in Deep Reinforcement Learning: A Survey
Z Xie, Z Lin, J Li, S Li, D Ye
arXiv preprint arXiv:2211.03959, 2022
282022
Knowledge-aware Conversational Preference Elicitation with Bandit Feedback
C Zhao, T Yu, Z Xie, S Li
WWW 2022, 483-492, 2022
272022
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models
A Ormazabal, C Zheng, CM d'Autume, D Yogatama, D Fu, D Ong, E Chen, ...
arXiv preprint arXiv:2404.12387, 2024
22*2024
Future-conditioned unsupervised pretraining for decision transformer
Z Xie, Z Lin, D Ye, Q Fu, Y Wei, S Li
ICML 2023, 38187-38203, 2023
222023
Dynamics-Aware Adaptation for Reinforcement Learning Based Cross-Domain Interactive Recommendation
J Wu*, Z Xie*, T Yu, H Zhao, R Zhang, S Li
SIGIR 2022, 290-300, 2022
182022
Discovering Low-rank Subspaces for Language-agnostic Multilingual Representations
Z Xie, H Zhao, T Yu, S Li
EMNLP 2022, 5617–5633, 2022
92022
Sim-to-Real Interactive Recommendation via Off-Dynamics Reinforcement Learning
J Wu, Z Xie, T Yu, Q Li, S Li
2rd Offline Reinforcement Learning Workshop Advances at NeurIPS, 2021
52021
Layered Neighborhood Expansion for Incremental Multiple Graph Matching
Z Chen, Z Xie, J Yan, Y Zheng, X Yang
ECCV 2020, 251-267, 2020
52020
Jailbreaking as a Reward Misspecification Problem
Z Xie, J Gao, L Li, Z Li, Q Liu, L Kong
ICLR 2025, 2024
32024
Calibrating Reasoning in Language Models with Internal Consistency
Z Xie, J Guo, T Yu, S Li
NeurIPS 2024, 2024
32024
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
L Li*, Y Wei*, Z Xie*, X Yang*, Y Song, P Wang, C An, T Liu, S Li, BY Lin, ...
arXiv preprint arXiv:2411.17451, 2024
12024
Toward joint utilization of absolute and relative bandit feedback for conversational recommendation
Y Xia, Z Xie, T Yu, C Zhao, S Li
User Modeling and User-Adapted Interaction, 1-38, 2024
12024
Learning Versatile Skills with Curriculum Masking
Y Tang*, Z Xie*, Z Lin, D Ye, S Li
NeurIPS 2024, 2024
2024
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–15