A strongreject for empty jailbreaks A Souly, Q Lu, D Bowen, T Trinh, E Hsieh, S Pandey, P Abbeel, ... arXiv preprint arXiv:2402.10260, 2024 | 53* | 2024 |
Efficient game-theoretic planning with prediction heuristic for socially-compliant autonomous driving C Li, T Trinh, L Wang, C Liu, M Tomizuka, W Zhan IEEE Robotics and Automation Letters 7 (4), 10248-10255, 2022 | 27 | 2022 |
Softmax probabilities (mostly) predict large language model correctness on multiple-choice q&a B Plaut, K Nguyen, T Trinh arXiv preprint arXiv:2402.13213, 2024 | 9 | 2024 |
Autonomous assessment of demonstration sufficiency via bayesian inverse reinforcement learning T Trinh, H Chen, DS Brown Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot …, 2024 | 6 | 2024 |
Refusal-trained llms are easily jailbroken as browser agents P Kumar, E Lau, S Vijayakumar, T Trinh, SR Team, E Chang, V Robinson, ... arXiv preprint arXiv:2410.13886, 2024 | 1 | 2024 |
Getting By Goal Misgeneralization With a Little Help From a Mentor T Trinh, MH Danesh, NX Khanh, B Plaut arXiv preprint arXiv:2410.21052, 2024 | | 2024 |
Practical alignment requires more than learning from human feedback T Trinh, KX Nguyen | | |