Embedding principle of loss landscape of deep neural networks Y Zhang, Z Zhang, T Luo, ZJ Xu Advances in Neural Information Processing Systems 34, 14848-14859, 2021 | 39 | 2021 |
Embedding principle: a hierarchical structure of loss landscape of deep neural networks Y Zhang, Y Li, Z Zhang, T Luo, ZQJ Xu Journal of Machine Learning, 2021 | 31 | 2021 |
Implicit regularization of dropout Z Zhang, ZQJ Xu IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 28* | 2024 |
Loss Spike in Training Neural Networks Z Zhang, ZQJ Xu arXiv preprint arXiv:2305.12133, 2023 | 10* | 2023 |
Stochastic modified equations and dynamics of dropout algorithm Z Zhang, Y Li, T Luo, ZQJ Xu ICLR 2024, 2023 | 8 | 2023 |
Linear stability hypothesis and rank stratification for nonlinear models Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo, ZQJ Xu arXiv preprint arXiv:2211.11623, 2022 | 8 | 2022 |
Towards understanding how transformer perform multi-step reasoning with matching operation Z Wang, Y Wang, Z Zhang, Z Zhou, H Jin, T Hu, J Sun, Z Li, Y Zhang, ... arXiv preprint arXiv:2405.15302, 2024 | 5 | 2024 |
Anchor function: a type of benchmark functions for studying language models Z Zhang, Z Wang, J Yao, Z Zhou, X Li, ZQJ Xu arXiv preprint arXiv:2401.08309, 2024 | 4 | 2024 |
Initialization is critical to whether transformers fit composite functions by reasoning or memorizing Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 4* | 2024 |
Optimistic estimate uncovers the potential of nonlinear models Y Zhang, Z Zhang, L Zhang, Z Bai, T Luo, ZQJ Xu arXiv preprint arXiv:2307.08921, 2023 | 4 | 2023 |
Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers Z Zhang, P Lin, Z Wang, Y Zhang, ZQJ Xu arXiv preprint arXiv:2501.08537, 2025 | 1 | 2025 |
Loss Jump During Loss Switch in Solving PDEs with Neural Networks Z Wang, L Zhang, Z Zhang, ZQJ Xu arXiv preprint arXiv:2405.03095, 2024 | | 2024 |