Evaluating Language-Model Agents on Realistic Autonomous Tasks M Kinniment, LJ Koba Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ... | 84* | 2023 |
Localizing model behavior with path patching N Goldowsky-Dill, C MacLeod, LJ Koba Sato, A Arora arXiv preprint arXiv:2304.05969, 2023 | 56 | 2023 |
A few-shot semantic parser for wizard-of-oz dialogues with the precise thingtalk representation G Campagna, SJ Semnani, R Kearns, LJ Koba Sato, S Xu, MS Lam Findings of ACL 2022, 2020 | 9 | 2020 |
Re-bench: Evaluating frontier ai r&d capabilities of language model agents against human experts H Wijk, T Lin, J Becker, S Jawhar, N Parikh, T Broadley, L Chan, M Chen, ... arXiv preprint arXiv:2411.15114, 2024 | 1 | 2024 |