On the information bottleneck theory of deep learning

AM Saxe, Y Bansal, J Dapello, M Advani… - Journal of Statistical …, 2019 - iopscience.iop.org
The practical successes of deep neural networks have not been matched by theoretical
progress that satisfyingly explains their behavior. In this work, we study the information …

Optimization methods for large-scale machine learning

L Bottou, FE Curtis, J Nocedal - SIAM review, 2018 - SIAM
This paper provides a review and commentary on the past, present, and future of numerical
optimization algorithms in the context of machine learning applications. Through case …

[หนังสือ][B] Targeted learning in data science

MJ Van der Laan, S Rose - 2018 - Springer
This book builds on and is a sequel to our book Targeted Learning: Causal Inference for
Observational and Experimental Studies (2011). Since the publication of this first book on …

New insights and perspectives on the natural gradient method

J Martens - Journal of Machine Learning Research, 2020 - jmlr.org
Natural gradient descent is an optimization method traditionally motivated from the
perspective of information geometry, and works well for many applications as an alternative …

Stochastic gradient descent tricks

L Bottou - Neural networks: tricks of the trade: second edition, 2012 - Springer
Chapter 1 strongly advocates the stochastic back-propagation method to train neural
networks. This is in fact an instance of a more general technique called stochastic gradient …

Label consistent K-SVD: Learning a discriminative dictionary for recognition

Z Jiang, Z Lin, LS Davis - IEEE transactions on pattern analysis …, 2013 - ieeexplore.ieee.org
A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse
coding is presented. In addition to using class labels of training data, we also associate label …

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

D Needell, R Ward, N Srebro - Advances in neural …, 2014 - proceedings.neurips.cc
We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for
smooth and strongly convex objectives, reducing a quadratic dependence on the strong …

Stochastic dual coordinate ascent methods for regularized loss

S Shalev-Shwartz, T Zhang - The Journal of Machine Learning …, 2013 - dl.acm.org
Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised
machine learning optimization problems such as SVM, due to their strong theoretical …

Large-scale machine learning with stochastic gradient descent

L Bottou - Proceedings of COMPSTAT'2010: 19th International …, 2010 - Springer
During the last decade, the data sizes have grown faster than the speed of processors. In
this context, the capabilities of statistical machine learning methods is limited by the …

A stochastic quasi-Newton method for large-scale optimization

RH Byrd, SL Hansen, J Nocedal, Y Singer - SIAM Journal on Optimization, 2016 - SIAM
The question of how to incorporate curvature information into stochastic approximation
methods is challenging. The direct application of classical quasi-Newton updating …