P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks X Liu, K Ji, Y Fu, WL Tam, Z Du, Z Yang, J Tang arXiv preprint arXiv:2110.07602, 2021 | 1514 | 2021 |
Self-play fine-tuning converts weak language models to strong language models Z Chen, Y Deng, H Yuan, K Ji, Q Gu arXiv preprint arXiv:2401.01335, 2024 | 455 | 2024 |
Self-play preference optimization for language model alignment Y Wu, Z Sun, H Yuan, K Ji, Y Yang, Q Gu arXiv preprint arXiv:2405.00675, 2024 | 76 | 2024 |
Parameter-efficient prompt tuning makes generalized and calibrated neural text retrievers WL Tam, X Liu, K Ji, L Xue, X Zhang, Y Dong, J Liu, M Hu, J Tang arXiv preprint arXiv:2207.07087, 2022 | 35 | 2022 |
Reinforcement learning from human feedback with active queries K Ji, J He, Q Gu arXiv preprint arXiv:2402.09401, 2024 | 19 | 2024 |
Self-play fine-tuning of diffusion models for text-to-image generation H Yuan, Z Chen, K Ji, Q Gu Advances in Neural Information Processing Systems 37, 73366-73398, 2025 | 10 | 2025 |
Enhancing multi-step reasoning abilities of language models through direct q-function optimization G Liu, K Ji, R Zheng, Z Wu, C Dun, Q Gu, L Yan arXiv preprint arXiv:2410.09302, 2024 | 5 | 2024 |
Mastering the task of open information extraction with large language models and consistent reasoning environment J Qi, K Ji, X Wang, J Yu, K Zeng, L Hou, J Li, B Xu arXiv preprint arXiv:2310.10590, 2023 | 5 | 2023 |
Horizon-free reinforcement learning in adversarial linear mixture MDPs K Ji, Q Zhao, J He, W Zhang, Q Gu arXiv preprint arXiv:2305.08359, 2023 | 4 | 2023 |
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability Q Zhao, K Ji, H Zhao, T Zhang, Q Gu arXiv preprint arXiv:2502.06051, 2025 | | 2025 |
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools J Qi, K Ji, J Yu, D Wang, B Xu, L Hou, J Li arXiv preprint arXiv:2310.10586, 2023 | | 2023 |