Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks?---A Neural Tangent Kernel Perspective K Huang, Y Wang, M Tao, T Zhao Advances in neural information processing systems 33, 2698-2709, 2020 | 113 | 2020 |
Large learning rate tames homogeneity: Convergence and balancing effect Y Wang, M Chen, T Zhao, M Tao arXiv preprint arXiv:2110.03677, 2021 | 51 | 2021 |
Momentum stiefel optimizer, with applications to suitably-orthogonal attention, and optimal transport L Kong, Y Wang, M Tao arXiv preprint arXiv:2205.14173, 2022 | 12 | 2022 |
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult Y Wang, Z Xu, T Zhao, M Tao arXiv preprint arXiv:2310.17087, 2023 | 8 | 2023 |
Evaluating the design space of diffusion-based generative models Y Wang, Y He, M Tao Advances in Neural Information Processing Systems 37, 19307-19352, 2025 | 5 | 2025 |
Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks Z Xu, Y Wang, T Zhao, R Ward, M Tao arXiv preprint arXiv:2410.09640, 2024 | 2 | 2024 |
Markov chain Monte Carlo for Gaussian: A linear control perspective B Yuan, J Fan, Y Wang, M Tao, Y Chen IEEE Control Systems Letters 7, 2173-2178, 2023 | 1 | 2023 |