Følg
Etai Littwin
Etai Littwin
Research Scientist at Apple
Verifisert e-postadresse på apple.com
Tittel
Sitert av
Sitert av
År
What algorithms can transformers learn? a study in length generalization
H Zhou, A Bradley, E Littwin, N Razin, O Saremi, J Susskind, S Bengio, ...
arXiv preprint arXiv:2310.16028, 2023
1132023
Tensor programs iib: Architectural universality of neural tangent kernel training dynamics
G Yang, E Littwin
International conference on machine learning, 11762-11772, 2021
732021
Stabilizing transformer training by preventing attention entropy collapse
S Zhai, T Likhomanenko, E Littwin, D Busbridge, J Ramapuram, Y Zhang, ...
International Conference on Machine Learning, 40770-40803, 2023
622023
The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon
V Thilak, E Littwin, S Zhai, O Saremi, R Paiss, J Susskind
arXiv preprint arXiv:2206.04817, 2022
502022
Biometric authentication techniques
DS Prakash, LE Ballard, JV Hauck, F Tang, E Littwin, PKA Vasu, G Littwin, ...
US Patent 10,929,515, 2021
412021
Transformers learn through gradual rank increase
E Boix-Adsera, E Littwin, E Abbe, S Bengio, J Susskind
Advances in Neural Information Processing Systems 36, 24519-24551, 2023
38*2023
The multiverse loss for robust transfer learning
E Littwin, L Wolf
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2016
342016
On infinite-width hypernetworks
E Littwin, T Galanti, L Wolf, G Yang
Advances in neural information processing systems 33, 13226-13237, 2020
30*2020
Tensor programs ivb: Adaptive optimization in the infinite-width limit
G Yang, E Littwin
arXiv preprint arXiv:2308.01814, 2023
222023
The loss surface of residual networks: Ensembles and the role of batch normalization
E Littwin, L Wolf
arXiv preprint arXiv:1611.02525, 2016
142016
Regularizing by the variance of the activations' sample-variances
E Littwin, L Wolf
Advances in Neural Information Processing Systems 31, 2018
122018
When can transformers reason with abstract symbols?
E Boix-Adsera, O Saremi, E Abbe, S Bengio, E Littwin, J Susskind
arXiv preprint arXiv:2310.09753, 2023
102023
Collegial ensembles
E Littwin, B Myara, S Sabah, J Susskind, S Zhai, O Golan
Advances in Neural Information Processing Systems 33, 18738-18748, 2020
102020
Adaptive Optimization in the -Width Limit
E Littwin, G Yang
The Eleventh International Conference on Learning Representations, 2023
82023
Lidar: Sensing linear probing performance in joint embedding ssl architectures
V Thilak, C Huang, O Saremi, L Dinh, H Goh, P Nakkiran, JM Susskind, ...
arXiv preprint arXiv:2312.04000, 2023
72023
On random kernels of residual architectures
E Littwin, T Galanti, L Wolf
Uncertainty in Artificial Intelligence, 897-907, 2021
72021
Biometric authentication techniques
DS Prakash, LE Ballard, JV Hauck, F Tang, E Littwin, PKA Vasu, G Littwin, ...
US Patent 11,151,235, 2021
72021
Spherical embedding of inlier silhouette dissimilarities
E Littwin, H Averbuch-Elor, D Cohen-Or
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2015
72015
Vanishing gradients in reinforcement finetuning of language models
N Razin, H Zhou, O Saremi, V Thilak, A Bradley, P Nakkiran, J Susskind, ...
arXiv preprint arXiv:2310.20703, 2023
62023
How jepa avoids noisy features: The implicit bias of deep linear self distillation networks
E Littwin, O Saremi, M Advani, V Thilak, P Nakkiran, C Huang, J Susskind
Advances in Neural Information Processing Systems 37, 91300-91336, 2025
52025
Systemet kan ikke utføre handlingen. Prøv på nytt senere.
Artikler 1–20