Gpt-4 technical report J Achiam, S Adler, S Agarwal, L Ahmad, I Akkaya, FL Aleman, D Almeida, ... arXiv preprint arXiv:2303.08774, 2023 | 8638 | 2023 |
Truthfulqa: Measuring how models mimic human falsehoods S Lin, J Hilton, O Evans arXiv preprint arXiv:2109.07958, 2021 | 1549 | 2021 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 1359 | 2022 |
Teaching models to express their uncertainty in words S Lin, J Hilton, O Evans arXiv preprint arXiv:2205.14334, 2022 | 289 | 2022 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models SU Toshniwal, S Debnath, S Shakeri, S Thormeyer, S Melzi, S Reddy, ... ArXiv, abs/2206.04615, 2022 | 12 | 2022 |
Trading inference-time compute for adversarial robustness W Zaremba, E Nitishinskaya, B Barak, S Lin, S Toyer, Y Yu, R Dias, ... arXiv preprint arXiv:2501.18841, 2025 | 5 | 2025 |