Seguir
Tu Trinh
Título
Citado por
Citado por
Año
A strongreject for empty jailbreaks
A Souly, Q Lu, D Bowen, T Trinh, E Hsieh, S Pandey, P Abbeel, ...
arXiv preprint arXiv:2402.10260, 2024
53*2024
Efficient game-theoretic planning with prediction heuristic for socially-compliant autonomous driving
C Li, T Trinh, L Wang, C Liu, M Tomizuka, W Zhan
IEEE Robotics and Automation Letters 7 (4), 10248-10255, 2022
272022
Softmax probabilities (mostly) predict large language model correctness on multiple-choice q&a
B Plaut, K Nguyen, T Trinh
arXiv preprint arXiv:2402.13213, 2024
92024
Autonomous assessment of demonstration sufficiency via bayesian inverse reinforcement learning
T Trinh, H Chen, DS Brown
Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot …, 2024
62024
Refusal-trained llms are easily jailbroken as browser agents
P Kumar, E Lau, S Vijayakumar, T Trinh, SR Team, E Chang, V Robinson, ...
arXiv preprint arXiv:2410.13886, 2024
12024
Getting By Goal Misgeneralization With a Little Help From a Mentor
T Trinh, MH Danesh, NX Khanh, B Plaut
arXiv preprint arXiv:2410.21052, 2024
2024
Practical alignment requires more than learning from human feedback
T Trinh, KX Nguyen
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–7