Seuraa
Tianhao Wu
Tianhao Wu
Vahvistettu sähköpostiosoite verkkotunnuksessa berkeley.edu - Kotisivu
Nimike
Viittaukset
Viittaukset
Vuosi
Starling-7b: Improving helpfulness and harmlessness with rlaif
B Zhu, E Frick, T Wu, H Zhu, K Ganesan, WL Chiang, J Zhang, J Jiao
First Conference on Language Modeling, 2024
113*2024
Sanity-checking pruning methods: Random tickets can win the jackpot
J Su, Y Chen, T Cai, T Wu, R Gao, L Wang, JD Lee
Advances in neural information processing systems 33, 20390-20401, 2020
962020
From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline
T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu, JE Gonzalez, I Stoica
arXiv preprint arXiv:2406.11939, 2024
902024
Pairwise proximal policy optimization: Language model alignment with comparative RL
T Wu, B Zhu, R Zhang, Z Wen, K Ramchandran, J Jiao
First Conference on Language Modeling, 2024
43*2024
Meta-rewarding language models: Self-improving alignment with llm-as-a-meta-judge
T Wu, W Yuan, O Golovneva, J Xu, Y Tian, J Jiao, J Weston, S Sukhbaatar
arXiv preprint arXiv:2407.19594, 2024
422024
RouteLLM: Learning to Route LLMs from Preference Data
I Ong, A Almahairi, V Wu, WL Chiang, T Wu, JE Gonzalez, MW Kadous, ...
The Thirteenth International Conference on Learning Representations, 2024
372024
From generation to judgment: Opportunities and challenges of llm-as-a-judge
D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan, A Bhattacharjee, Y Jiang, ...
arXiv preprint arXiv:2411.16594, 2024
252024
On reinforcement learning with adversarial corruption and its application to block mdp
T Wu, Y Yang, S Du, L Wang
International Conference on Machine Learning, 11296-11306, 2021
192021
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
152022
A reduction-based framework for conservative bandits and reinforcement learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
arXiv preprint arXiv:2106.11692, 2021
9*2021
Thinking llms: General instruction following with thought generation
T Wu, J Lan, W Yuan, J Jiao, J Weston, S Sukhbaatar
arXiv preprint arXiv:2410.10630, 2024
82024
A reduction-based framework for sequential decision making with delayed feedback
Y Yang, H Zhong, T Wu, B Liu, L Wang, SS Du
Advances in Neural Information Processing Systems 36, 46362-46389, 2023
62023
Statistical inference on multi-armed bandits with delayed feedback
L Shi, J Wang, T Wu
International Conference on Machine Learning, 31328-31352, 2023
52023
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
Y Xiong, R Zhang, Y Li, T Wu, L Zou
arXiv preprint arXiv:2410.11744, 2024
12024
RIP: Better Models by Survival of the Fittest Prompts
P Yu, W Yuan, O Golovneva, T Wu, S Sukhbaatar, J Weston, J Xu
arXiv preprint arXiv:2501.18578, 2025
2025
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
YT Lin, D Jin, T Xu, T Wu, S Sukhbaatar, C Zhu, Y He, YN Chen, J Weston, ...
arXiv preprint arXiv:2501.10799, 2025
2025
EmbedLLM: Learning Compact Representations of Large Language Models
R Zhuang, T Wu, Z Wen, A Li, J Jiao, K Ramchandran
arXiv preprint arXiv:2410.02223, 2024
2024
Bench-O-Matic: Automating Benchmark Curation from Crowdsourced Data
T Li, WL Chiang, E Frick, L Dunlap, T Wu, B Zhu, JE Gonzalez, I Stoica
Järjestelmä ei voi suorittaa toimenpidettä nyt. Yritä myöhemmin uudelleen.
Artikkelit 1–18