Ai-assisted generation of difficult math questions V Shah, D Yu, K Lyu, S Park, J Yu, Y He, NR Ke, M Mozer, Y Bengio, ... arXiv preprint arXiv:2407.21009, 2024 | 5 | 2024 |
LongProc: Benchmarking Long-Context Language Models on Long Procedural Generation X Ye, F Yin, Y He, J Zhang, H Yen, T Gao, G Durrett, D Chen arXiv preprint arXiv:2501.05414, 2025 | | 2025 |