Spremljaj
Jacob Steinhardt
Jacob Steinhardt
Preverjeni e-poštni naslov na cs.stanford.edu - Domača stran
Naslov
Navedeno
Navedeno
Leto
Measuring massive multitask language understanding
D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt
arXiv preprint arXiv:2009.03300, 2020
34032020
Concrete problems in AI safety
D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané
arXiv preprint arXiv:1606.06565, 2016
31682016
The many faces of robustness: A critical analysis of out-of-distribution generalization
D Hendrycks, S Basart, N Mu, S Kadavath, F Wang, E Dorundo, R Desai, ...
Proceedings of the IEEE/CVF international conference on computer vision …, 2021
18592021
Natural adversarial examples
D Hendrycks, K Zhao, S Basart, J Steinhardt, D Song
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
16712021
Measuring mathematical problem solving with the math dataset
D Hendrycks, C Burns, S Kadavath, A Arora, S Basart, E Tang, D Song, ...
arXiv preprint arXiv:2103.03874, 2021
14252021
The malicious use of artificial intelligence: Forecasting, prevention, and mitigation
M Brundage, S Avin, J Clark, H Toner, P Eckersley, B Garfinkel, A Dafoe, ...
arXiv preprint arXiv:1802.07228, 2018
12432018
Certified defenses against adversarial examples
A Raghunathan, J Steinhardt, P Liang
arXiv preprint arXiv:1801.09344, 2018
11472018
Certified defenses for data poisoning attacks
J Steinhardt, PWW Koh, PS Liang
Advances in neural information processing systems 30, 2017
9572017
Jailbroken: How does llm safety training fail?
A Wei, N Haghtalab, J Steinhardt
Advances in Neural Information Processing Systems 36, 80079-80110, 2023
8272023
Measuring coding challenge competence with apps
D Hendrycks, S Basart, S Kadavath, M Mazeika, A Arora, E Guo, C Burns, ...
arXiv preprint arXiv:2105.09938, 2021
5742021
Semidefinite relaxations for certifying robustness to adversarial examples
A Raghunathan, J Steinhardt, PS Liang
Advances in neural information processing systems 31, 2018
5142018
Scaling out-of-distribution detection for real-world settings
D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ...
arXiv preprint arXiv:1911.11132, 2019
5092019
Aligning ai with shared human values
D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt
arXiv preprint arXiv:2008.02275, 2020
5052020
Interpretability in the wild: a circuit for indirect object identification in gpt-2 small
K Wang, A Variengien, A Conmy, B Shlegeris, J Steinhardt
arXiv preprint arXiv:2211.00593, 2022
4292022
Troubling Trends in Machine Learning Scholarship: Some ML papers suffer from flaws that could mislead the public and stymie future research.
ZC Lipton, J Steinhardt
Queue 17 (1), 45-77, 2019
3892019
Progress measures for grokking via mechanistic interpretability
N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt
arXiv preprint arXiv:2301.05217, 2023
3602023
Sever: A robust meta-algorithm for stochastic optimization
I Diakonikolas, G Kamath, D Kane, J Li, J Steinhardt, A Stewart
International Conference on Machine Learning, 1596-1606, 2019
3502019
Unsolved problems in ml safety
D Hendrycks, N Carlini, J Schulman, J Steinhardt
arXiv preprint arXiv:2109.13916, 2021
3482021
Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution
JP Bello, C Silva, O Nov, RL Dubois, A Arora, J Salamon, C Mydlarz, ...
Communications of the ACM 62 (2), 68-77, 2019
3482019
Learning from untrusted data
M Charikar, J Steinhardt, G Valiant
Proceedings of the 49th annual ACM SIGACT symposium on theory of computing …, 2017
3472017
Sistem trenutno ne more izvesti postopka. Poskusite znova pozneje.
Članki 1–20