The perfect blend: Redefining RLHF with mixture of judges T Xu, E Helenowski, KA Sankararaman, D Jin, K Peng, E Han, S Nie, ... arXiv preprint arXiv:2409.20370, 2024 | 7 | 2024 |
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following Y He, D Jin, C Wang, C Bi, K Mandyam, H Zhang, C Zhu, N Li, T Xu, H Lv, ... arXiv preprint arXiv:2410.15553, 2024 | 1 | 2024 |
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization Z Yu, T Xu, D Jin, KA Sankararaman, Y He, W Zhou, Z Zeng, ... arXiv preprint arXiv:2501.17974, 2025 | | 2025 |