Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks?---A Neural Tangent Kernel Perspective K Huang, Y Wang, M Tao, T Zhao Advances in neural information processing systems 33, 2698-2709, 2020 | 112 | 2020 |
Large learning rate tames homogeneity: Convergence and balancing effect Y Wang, M Chen, T Zhao, M Tao arXiv preprint arXiv:2110.03677, 2021 | 47 | 2021 |
Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport L Kong, Y Wang, M Tao arXiv preprint arXiv:2205.14173, 2022 | 11 | 2022 |
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult Y Wang, Z Xu, T Zhao, M Tao arXiv preprint arXiv:2310.17087, 2023 | 7 | 2023 |
Evaluating the design space of diffusion-based generative models Y Wang, Y He, M Tao arXiv preprint arXiv:2406.12839, 2024 | 5 | 2024 |
Markov Chain Monte Carlo for Gaussian: A Linear Control Perspective B Yuan, J Fan, Y Wang, M Tao, Y Chen IEEE Control Systems Letters 7, 2173-2178, 2023 | 1 | 2023 |
Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks Z Xu, Y Wang, T Zhao, R Ward, M Tao arXiv preprint arXiv:2410.09640, 2024 | | 2024 |