Tianhao Wu

Viittaukset

	Kaikki	2020 lähtien
Sitaatit	509	508
h-indeksi	9	9
i10-indeksi	9	9

300

150

225

2021202220232024202516 32 54 300 103

Yleisessä käytössä

Näytä kaikki

3 artikkelia

0 artikkelia

käytettävissä

ei käytettävissä

Perustuu rahoitusehtoihin

Muut kirjoittajat

Jiantao JiaoAssistant Professor of EECS and Statistics, University of California, BerkeleyVahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu
Banghua ZhuUniversity of California, BerkeleyVahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu
Evan FrickUC BerkeleyVahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu
Liwei WangProfessor, Peking UniversityVahvistettu sähköpostiosoite verkkotunnuksessa cis.pku.edu.cn
Hanlin ZhuPh.D. student, University of California, BerkeleyVahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu
Simon Shaolei DuAssistant Professor, School of Computer Science and Engineering, University of WashingtonVahvistettu sähköpostiosoite verkkotunnuksessa cs.washington.edu
Sainbayar SukhbaatarFAIR team, Meta AIVahvistettu sähköpostiosoite verkkotunnuksessa fb.com
Jason WestonMetaVahvistettu sähköpostiosoite verkkotunnuksessa fb.com
Ruoyu ZhangPeking UniversityVahvistettu sähköpostiosoite verkkotunnuksessa pku.edu.cn
Kannan RamchandranProfessor of Electrical Engineering and Computer Science, UC BerkeleyVahvistettu sähköpostiosoite verkkotunnuksessa eecs.berkeley.edu
Han ZhongPeking UniversityVahvistettu sähköpostiosoite verkkotunnuksessa stu.pku.edu.cn
Zhaojin WenUniversity of California, BerkeleyVahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu

Seuraa

Tianhao Wu

University of California, Berkeley

Vahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu - Kotisivu

reinforcement learning alignment foundation models


Nimike Lajittele sitaattien mukaan Lajittele vuoden mukaan Lajittele otsikon mukaan	Viittaukset Viittaukset	Vuosi
Starling-7b: Improving helpfulness and harmlessness with rlaif B Zhu, E Frick, T Wu, H Zhu, K Ganesan, WL Chiang, J Zhang, J Jiao First Conference on Language Modeling, 2024	113*	2024
Sanity-checking pruning methods: Random tickets can win the jackpot J Su, Y Chen, T Cai, T Wu, R Gao, L Wang, JD Lee Advances in neural information processing systems 33, 20390-20401, 2020	96	2020
From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu, JE Gonzalez, I Stoica arXiv preprint arXiv:2406.11939, 2024	90	2024
Pairwise proximal policy optimization: Language model alignment with comparative RL T Wu, B Zhu, R Zhang, Z Wen, K Ramchandran, J Jiao First Conference on Language Modeling, 2024	43*	2024
Meta-rewarding language models: Self-improving alignment with llm-as-a-meta-judge T Wu, W Yuan, O Golovneva, J Xu, Y Tian, J Jiao, J Weston, S Sukhbaatar arXiv preprint arXiv:2407.19594, 2024	42	2024
RouteLLM: Learning to Route LLMs from Preference Data I Ong, A Almahairi, V Wu, WL Chiang, T Wu, JE Gonzalez, MW Kadous, ... The Thirteenth International Conference on Learning Representations, 2024	37	2024
From generation to judgment: Opportunities and challenges of llm-as-a-judge D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan, A Bhattacharjee, Y Jiang, ... arXiv preprint arXiv:2411.16594, 2024	25	2024
On reinforcement learning with adversarial corruption and its application to block mdp T Wu, Y Yang, S Du, L Wang International Conference on Machine Learning, 11296-11306, 2021	19	2021
Nearly optimal policy optimization with stable at any time guarantee T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao International Conference on Machine Learning, 24243-24265, 2022	15	2022
A reduction-based framework for conservative bandits and reinforcement learning Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du arXiv preprint arXiv:2106.11692, 2021	9*	2021
Thinking llms: General instruction following with thought generation T Wu, J Lan, W Yuan, J Jiao, J Weston, S Sukhbaatar arXiv preprint arXiv:2410.10630, 2024	8	2024
A reduction-based framework for sequential decision making with delayed feedback Y Yang, H Zhong, T Wu, B Liu, L Wang, SS Du Advances in Neural Information Processing Systems 36, 46362-46389, 2023	6	2023
Statistical inference on multi-armed bandits with delayed feedback L Shi, J Wang, T Wu International Conference on Machine Learning, 31328-31352, 2023	5	2023
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure Y Xiong, R Zhang, Y Li, T Wu, L Zou arXiv preprint arXiv:2410.11744, 2024	1	2024
RIP: Better Models by Survival of the Fittest Prompts P Yu, W Yuan, O Golovneva, T Wu, S Sukhbaatar, J Weston, J Xu arXiv preprint arXiv:2501.18578, 2025		2025
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback YT Lin, D Jin, T Xu, T Wu, S Sukhbaatar, C Zhu, Y He, YN Chen, J Weston, ... arXiv preprint arXiv:2501.10799, 2025		2025
EmbedLLM: Learning Compact Representations of Large Language Models R Zhuang, T Wu, Z Wen, A Li, J Jiao, K Ramchandran arXiv preprint arXiv:2410.02223, 2024		2024
Bench-O-Matic: Automating Benchmark Curation from Crowdsourced Data T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu, JE Gonzalez, I Stoica

Järjestelmä ei voi suorittaa toimenpidettä nyt. Yritä myöhemmin uudelleen.

Artikkelit 1–18

Sitaatteja vuodessa

Päällekkäiset lähteet

Yhdistetyt sitaatit

Lisää muut kirjoittajatMuut kirjoittajat

Seuraa

Viittaukset

Muut kirjoittajat