Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X **e, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

Piecewise linear neural networks and deep learning

Q Tao, L Li, X Huang, X **, S Wang… - Nature Reviews Methods …, 2022 - nature.com
As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …

Simple and deep graph convolutional networks

M Chen, Z Wei, Z Huang, B Ding… - … conference on machine …, 2020 - proceedings.mlr.press
Graph convolutional networks (GCNs) are a powerful deep learning approach for graph-
structured data. Recently, GCNs and subsequent variants have shown superior performance …

A geometric analysis of neural collapse with unconstrained features

Z Zhu, T Ding, J Zhou, X Li, C You… - Advances in Neural …, 2021 - proceedings.neurips.cc
We provide the first global optimization landscape analysis of Neural Collapse--an intriguing
empirical phenomenon that arises in the last-layer classifiers and features of neural …

A convergence theory for deep learning via over-parameterization

Z Allen-Zhu, Y Li, Z Song - International conference on …, 2019 - proceedings.mlr.press
Deep neural networks (DNNs) have demonstrated dominating performance in many fields;
since AlexNet, networks used in practice are going wider and deeper. On the theoretical …

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … conference on machine …, 2019 - proceedings.mlr.press
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Blind super-resolution kernel estimation using an internal-gan

S Bell-Kligler, A Shocher… - Advances in neural …, 2019 - proceedings.neurips.cc
Super resolution (SR) methods typically assume that the low-resolution (LR) image was
downscaled from the unknown high-resolution (HR) image by a fixedideal'downscaling …

Cache me if you can: Accelerating diffusion models through block caching

F Wimbauer, B Wu, E Schoenfeld… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models have recently revolutionized the field of image synthesis due to their ability
to generate photorealistic images. However one of the major drawbacks of diffusion models …

Rmp-snn: Residual membrane potential neuron for enabling deeper high-accuracy and low-latency spiking neural network

B Han, G Srinivasan, K Roy - Proceedings of the IEEE/CVF …, 2020 - openaccess.thecvf.com
Abstract Spiking Neural Networks (SNNs) have recently attracted significant research
interest as the third generation of artificial neural networks that can enable low-power event …