Shengyi Huang

Citeret af

	Alle	Siden 2020
Henvisninger	1894	1894
h-index	14	14
i10-indeks	15	15

1200

600

300

900

2021202220232024202522 98 394 1185 190

Medforfattere

Santiago OntañónResearch Scientist, Google DeepMindVerificeret mail på google.com
Nathan LambertResearch Scientist, Allen AIVerificeret mail på allenai.org
Edward BeechingResearch Scientist, Hugging FaceVerificeret mail på insa-lyon.fr
Lewis TunstallHugging FaceVerificeret mail på itp.unibe.ch
Rousslan Fernand Julien DossaKobe UniversityVerificeret mail på ai.cs.kobe-u.ac.jp
Thomas WolfCo-founder at HuggingFaceVerificeret mail på polytechnique.edu
Chang YeGoogleVerificeret mail på google.com
Anitha KannanCuraiVerificeret mail på curai.com
Xavier(Xavi) AmatriainVP of Product, ACE (AI and Compute Enablement). GoogleVerificeret mail på amatriain.net
Ilya ValmianskiResearch scientist at CuraiVerificeret mail på curai.com
David GrethleinComputer Science PhD Candidate, Drexel UniversityVerificeret mail på drexel.edu
Chris BamfordMistral AI
Namit KatariyaTech Lead Manager, ML Platform at Faire

Følg

Shengyi Huang

Allen Institute for Artificial Intelligence

Verificeret mail på allenai.org - Startside

Artificial Intelligence Reinforcement Learning


Titel Sortér efter henvisninger Sortér efter årstal Sortér efter titel	Citeret af Citeret af	År
Zephyr: Direct distillation of lm alignment L Tunstall, E Beeching, N Lambert, N Rajani, K Rasul, Y Belkada, ... arXiv preprint arXiv:2310.16944, 2023	505	2023
A closer look at invalid action masking in policy gradient algorithms S Huang, S Ontañón The International FLAIRS Conference 2022 35, 2022	420	2022
Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms S Huang, RFJ Dossa, C Ye, J Braga, D Chakraborty, K Mehta, ... Journal of Machine Learning Research 23 (274), 1-18, 2022	323	2022
Trl: Transformer reinforcement learning L von Werra, Y Belkada, L Tunstall, E Beeching, T Thrush, N Lambert, ...	200	2020
The 37 Implementation Details of Proximal Policy Optimization S Huang, RFJ Dossa, A Raffin, A Kanervisto, W Wang International Conference on Learning Representations Blog Track, 2022	130	2022
Envpool: A highly parallel reinforcement learning environment execution engine J Weng, M Lin, S Huang, B Liu, D Makoviichuk, V Makoviychuk, Z Liu, ... Advances in Neural Information Processing Systems 35, 22409-22421, 2022	55	2022
The alignment handbook L Tunstall, E Beeching, N Lambert, N Rajani, S Huang, K Rasul, AM Rush, ... URL https://github. com/huggingface/alignment-handbook 6, 2023	52	2023
Gym-RTS: Toward Affordable Full Game Real-time Strategy Games Research with Deep Reinforcement Learning S Huang, S Ontañón, C Bamford, L Grela Proceedings of the 3rd IEEE Conference on Games, 2021	50	2021
A2C is a special case of PPO S Huang, A Kanervisto, A Raffin, W Wang, S Ontañón, RFJ Dossa arXiv preprint arXiv:2205.09123, 2022	28	2022
Zephyr: Direct distillation of lm alignment, 2023 L Tunstall, E Beeching, N Lambert, N Rajani, K Rasul, Y Belkada, ... URL https://arxiv. org/abs/2310.16944 6, 2023	19	2023
The n+ implementation details of rlhf with ppo: A case study on tl; dr summarization S Huang, M Noukhovitch, A Hosseini, K Rasul, W Wang, L Tunstall arXiv preprint arXiv:2403.17031, 2024	18	2024
Tülu 3: Pushing Frontiers in Open Language Model Post-Training N Lambert, J Morrison, V Pyatkin, S Huang, H Ivison, F Brahman, ... arXiv preprint arXiv:2411.15124, 2024	17	2024
Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions J Li, E Beeching, L Tunstall, B Lipkin, R Soletskyi, S Huang, K Rasul, L Yu, ... Hugging Face repository 13, 9, 2024	17	2024
An empirical investigation of early stopping optimizations in proximal policy optimization RFJ Dossa, S Huang, S Ontañón, T Matsubara IEEE access 9, 117981-117992, 2021	16	2021
Action guidance: Getting the best of sparse rewards and shaped rewards for real-time strategy games S Huang, S Ontañón AIIDE-20 Workshop on Artificial Intelligence for Strategy Games, 2020	13	2020
Open rl benchmark: Comprehensive tracked experiments for reinforcement learning S Huang, Q Gallouédec, F Felten, A Raffin, RFJ Dossa, Y Zhao, ... arXiv preprint arXiv:2402.03046, 2024	9	2024
MEDCOD: A medically-accurate, emotive, diverse, and controllable dialog system R Compton, I Valmianski, L Deng, C Huang, N Katariya, X Amatriain, ... Machine Learning for Health, 110-129, 2021	7	2021
Comparing Observation and Action Representations for Deep Reinforcement Learning in RTS S Huang, S Ontañón AIIDE-19 Workshop on Artificial Intelligence for Strategy Games, 2019	7*	2019
2 OLMo 2 Furious T OLMo, P Walsh, L Soldaini, D Groeneveld, K Lo, S Arora, A Bhagia, ... arXiv preprint arXiv:2501.00656, 2024	3	2024
Reward scale robustness for proximal policy optimization via dreamerv3 tricks R Sullivan, A Kumar, S Huang, J Dickerson, J Suarez Advances in Neural Information Processing Systems 36, 1352-1362, 2023	3	2023

Systemet kan ikke foretage handlingen nu. Prøv igen senere.

Artikler 1–20

Henvisninger pr. år

Dublerede henvisninger

Flettede henvisninger

Tilføj medforfattereMedforfattere

Følg

Citeret af

Medforfattere