Spremljaj
Jaehoon Lee
Jaehoon Lee
Anthropic
Preverjeni e-poštni naslov na anthropic.com - Domača stran
Naslov
Navedeno
Navedeno
Leto
Deep Neural Networks as Gaussian Processes
J Lee*, Y Bahri*, R Novak, SS Schoenholz, J Pennington, ...
International Conference on Learning Representations (ICLR), 2018
13862018
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ...
TMLR 2023, 2022
13722022
Wide neural networks of any depth evolve as linear models under gradient descent
J Lee*, L Xiao*, SS Schoenholz, Y Bahri, J Sohl-Dickstein, J Pennington
Neural Information Processing Systems (NeurIPS), 2019
12322019
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ...
arXiv preprint arXiv:2403.05530, 2024
12112024
Measuring the effects of data parallelism on neural network training
CJ Shallue*, J Lee*, J Antognini, J Sohl-Dickstein, R Frostig, GE Dahl
Journal of Machine Learning Research (2019) 20, 1-49, 2019
4702019
On empirical comparisons of optimizers for deep learning
C Dami, CJ Shallue, N Zachary, L Jaehoon, CJ Maddison, GE Dahl
arXiv preprint.—2019.—arXiv, 1910
434*1910
Bayesian Deep Convolutional Neural Networks with Many Channels are Gaussian Processes
R Novak*, L Xiao*, J Lee, Y Bahri, G Yang, D Abolafia, J Pennington, ...
International Conference on Learning Representations (ICLR), 2019
399*2019
Dataset Distillation with Infinitely Wide Convolutional Networks
T Nguyen, R Novak, L Xiao, J Lee
Neural Information Processing Systems (NeurIPS), 2021
2812021
Explaining neural scaling laws
Y Bahri*, E Dyer*, J Kaplan*, J Lee*, U Sharma*
arXiv preprint arXiv:2102.06701, 2021
2782021
Neural tangents: Fast and easy infinite neural networks in python
R Novak*, L Xiao*, J Hron, J Lee, AA Alemi, J Sohl-Dickstein, ...
International Conference on Learning Representations (ICLR), Spotlight, 2020
2782020
Dataset Meta-Learning from Kernel Ridge-Regression
T Nguyen, Z Chen, J Lee
International Conference on Learning Representations (ICLR), 2021
2752021
Scaling llm test-time compute optimally can be more effective than scaling model parameters
C Snell, J Lee, K Xu, A Kumar
arXiv preprint arXiv:2408.03314, 2024
248*2024
Finite versus infinite neural networks: an empirical study
J Lee, SS Schoenholz, J Pennington, B Adlam, L Xiao, R Novak, ...
Neural Information Processing Systems (NeurIPS), Spotlight, 2020
2402020
The superconformal bootstrap in three dimensions
SM Chester, J Lee, SS Pufu, R Yacoby
Journal of High Energy Physics 2014 (9), 1-59, 2014
1712014
Exact correlators of BPS operators from the 3d superconformal bootstrap
SM Chester, J Lee, SS Pufu, R Yacoby
Journal of High Energy Physics 2015 (3), 1-55, 2015
1532015
Beyond human data: Scaling self-training for problem-solving with language models
A Singh, JD Co-Reyes, R Agarwal, A Anand, P Patil, X Garcia, PJ Liu, ...
arXiv preprint arXiv:2312.06585, 2023
982023
Small-scale proxies for large-scale transformer training instabilities
M Wortsman, PJ Liu, L Xiao, K Everett, A Alemi, B Adlam, JD Co-Reyes, ...
International Conference on Learning Representations (ICLR), Oral, 2024
732024
On the infinite width limit of neural networks with a standard parameterization
J Sohl-Dickstein, R Novak, SS Schoenholz, J Lee
arXiv preprint arXiv:2001.07301, 2020
622020
Algebra of Majorana doubling
J Lee, F Wilczek
Physical Review Letters 111 (22), 226402, 2013
382013
Replacing softmax with relu in vision transformers
M Wortsman, J Lee, J Gilmer, S Kornblith
arXiv preprint arXiv:2309.08586, 2023
352023
Sistem trenutno ne more izvesti postopka. Poskusite znova pozneje.
Članki 1–20