An overview on restricted Boltzmann machines

N Zhang, S Ding, J Zhang, Y Xue - Neurocomputing, 2018 - Elsevier
Abstract The Restricted Boltzmann Machine (RBM) has aroused wide interest in machine
learning fields during the past decade. This review aims to report the recent developments in …

Preconditioned stochastic gradient Langevin dynamics for deep neural networks

C Li, C Chen, D Carlson, L Carin - … of the AAAI conference on artificial …, 2016 - ojs.aaai.org
Effective training of deep neural networks suffers from two main issues. The first is that the
parameter space of these models exhibit pathological curvature. Recent methods address …

CNN and RNN based payload classification methods for attack detection

H Liu, B Lang, M Liu, H Yan - Knowledge-Based Systems, 2019 - Elsevier
In recent years, machine learning has been widely applied to problems in detecting network
attacks, particularly novel attacks. However, traditional machine learning methods depend …

Preconditioned stochastic gradient descent

XL Li - IEEE transactions on neural networks and learning …, 2017 - ieeexplore.ieee.org
Stochastic gradient descent (SGD) still is the workhorse for many practical problems.
However, it converges slow, and can be difficult to tune. It is possible to precondition SGD to …

Bridging the gap between stochastic gradient MCMC and stochastic optimization

C Chen, D Carlson, Z Gan, C Li… - Artificial Intelligence …, 2016 - proceedings.mlr.press
Abstract Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian
analogs to popular stochastic optimization methods; however, this connection is not well …

Tool wear state recognition based on gradient boosting decision tree and hybrid classification RBM

G Li, Y Wang, J He, Q Hao, H Yang, J Wei - The International Journal of …, 2020 - Springer
Machined surface quality and dimensional accuracy are significantly affected by tool wear in
machining process. Tool wear state (TWS) recognition is highly desirable to realize …

Learning weight uncertainty with stochastic gradient mcmc for shape classification

C Li, A Stevens, C Chen, Y Pu… - Proceedings of the …, 2016 - openaccess.thecvf.com
Learning the representation of shape cues in 2D & 3D objects for recognition is a
fundamental task in computer vision. Deep neural networks (DNNs) have shown promising …

Old optimizer, new norm: An anthology

J Bernstein, L Newhouse - arxiv preprint arxiv:2409.20325, 2024 - arxiv.org
Deep learning optimizers are often motivated through a mix of convex and approximate
second-order theory. We select three such methods--Adam, Shampoo and Prodigy--and …

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel's Spectrum

A Geifman, D Barzilai, R Basri, M Galun - arxiv preprint arxiv:2307.14531, 2023 - arxiv.org
Wide neural networks are biased towards learning certain functions, influencing both the
rate of convergence of gradient descent (GD) and the functions that are reachable with GD …

Modular Duality in Deep Learning

J Bernstein, L Newhouse - arxiv preprint arxiv:2410.21265, 2024 - arxiv.org
An old idea in optimization theory says that since the gradient is a dual vector it may not be
subtracted from the weights without first being mapped to the primal space where the …