The Alignment Property of SGD Noise and How it Helps Select Flat Minima: A Stability Analysis L Wu, M Wang, WJ Su Advances in Neural Information Processing Systems (NeurIPS 2022), 1-25, 2022 | 43* | 2022 |
Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks M Wang, C Ma Advances in Neural Information Processing Systems (NeurIPS 2023, Spotlight …, 2023 | 14 | 2023 |
Generalization Error Bounds for Deep Neural Networks Trained by SGD M Wang, C Ma arXiv: 2206.03299, 1-32, 2022 | 14 | 2022 |
Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks M Wang, C Ma Advances in Neural Information Processing Systems (NeurIPS 2022), 1-73, 2022 | 7 | 2022 |
Are AI-Generated Text Detectors Robust to Adversarial Perturbations? G Huang, Y Zhang, Z Li, Y You, M Wang, Z Yang Annual Meeting of the Association for Computational Linguistics (ACL 2024), 1-20, 2024 | 5 | 2024 |
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent M Wang, L Wu NeurIPS 2023 Workshop on M3L, 1-30, 2023 | 5* | 2023 |
Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling M Wang, W E Advances in Neural Information Processing Systems (NeurIPS 2024), 1-76, 2024 | 4 | 2024 |
Improving Generalization and Convergence by Enhancing Implicit Regularization M Wang, J Wang, H He, Z Wang, G Huang, F Xiong, Z Li, W E, L Wu Advances in Neural Information Processing Systems (NeurIPS 2024), 1-44, 2024 | 3 | 2024 |
Loss Symmetry and Noise Equilibrium of Stochastic Gradient Descent L Ziyin, M Wang, H Li, L Wu Advances in Neural Information Processing Systems (NeurIPS 2024), 1-26, 2024 | 3* | 2024 |
How Transformers Get Rich: Approximation and Dynamics Analysis M Wang, R Yu, W E, L Wu arXiv preprint arXiv:2410.11474, 1-46, 2024 | 2* | 2024 |
Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling M Wang, Z Min, L Wu International Conference on Machine Learning (ICML 2024), 1-38, 2023 | 2 | 2023 |
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Z Zhou*, M Wang*, Y Mao, B Li, J Yan International Conference on Learning Representations (ICLR 2025, Spotlight …, 2024 | | 2024 |