[HTML][HTML] On the convergence and calibration of deep learning with differential privacy

Z Bu, H Wang, Z Dai, Q Long - Transactions on machine learning …, 2023 - ncbi.nlm.nih.gov
Differentially private (DP) training preserves the data privacy usually at the cost of slower
convergence (and thus lower accuracy), as well as more severe mis-calibration than its non …

MetaNODE: Prototype optimization as a neural ODE for few-shot learning

B Zhang, X Li, S Feng, Y Ye, R Ye - … of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org
Abstract Few-Shot Learning (FSL) is a challenging task, ie, how to recognize novel classes
with few examples? Pre-training based methods effectively tackle the problem by pre …

Sparse neural additive model: Interpretable deep learning with feature selection via group sparsity

S Xu, Z Bu, P Chaudhari, IJ Barnett - Joint European Conference on …, 2023 - Springer
Interpretable machine learning has demonstrated impressive performance while preserving
explainability. In particular, neural additive models (NAM) offer the interpretability to the …

Node-adapter: Neural ordinary differential equations for better vision-language reasoning

Y Zhang, CW Cheng, K Yu, Z He, CB Schönlieb… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we consider the problem of prototype-based vision-language reasoning
problem. We observe that existing methods encounter three major challenges: 1) escalating …

Provable convergence of nesterov's accelerated gradient method for over-parameterized neural networks

X Liu, Z Pan, W Tao - Knowledge-Based Systems, 2022 - Elsevier
Momentum methods, such as heavy ball method (HB) and Nesterov's accelerated gradient
method (NAG), have been widely used in training neural networks by incorporating the …

A convergence analysis of Nesterov's accelerated gradient method in training deep linear neural networks

X Liu, W Tao, Z Pan - Information Sciences, 2022 - Elsevier
As the training process of deep neural networks involves expensive computational cost,
speeding up the convergence is of great importance. Nesterov's accelerated gradient (NAG) …

Covariate-balancing-aware interpretable deep learning models for treatment effect estimation

K Chen, Q Yin, Q Long - Statistics in Biosciences, 2023 - Springer
Estimating treatment effects is of great importance for many biomedical applications with
observational data. Particularly, interpretability of the treatment effects is preferable for many …

A high-resolution dynamical view on momentum methods for over-parameterized neural networks

X Liu, W Tao, J Wang, Z Pan - arxiv preprint arxiv:2208.03941, 2022 - arxiv.org
Due to the simplicity and efficiency of the first-order gradient method, it has been widely
used in training neural networks. Although the optimization problem of the neural network is …

[PDF][PDF] Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks

X Liu, W Tao, W Li, D Zhan, J Wang, Z Pan - Proceedings of the Thirty-Third …, 2024 - ijcai.org
Due to its simplicity and efficiency, the first-order gradient method has been extensively
employed in training neural networks. Although the optimization problem of the neural …

[HTML][HTML] Accelerated analysis on the triple momentum method for a two-layer ReLU neural network

X Li, X Liu - Journal of King Saud University-Computer and …, 2024 - Elsevier
The momentum method has become the workhorse in the deep learning community. To
theoretical understand its success, researchers put efforts in demystifying its convergence …