The heavy-tail phenomenon in SGD

M Gurbuzbalaban, U Simsekli… - … Conference on Machine …, 2021 - proceedings.mlr.press
In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …

Stochastic models with power-law tails

D Buraczewski, E Damek, T Mikosch - The equation X= AX+ B. Cham …, 2016 - Springer
Dariusz Buraczewski Ewa Damek Thomas Mikosch The Equation X = AX + B Page 1
Springer Series in Operations Research and Financial Engineering Dariusz Buraczewski …

On multidimensional Mandelbrot cascades

D Buraczewski, E Damek, Y Guivarc'h… - Journal of Difference …, 2014 - Taylor & Francis
Let Z be a random variable with values in a proper closed convex cone, A a random
endomorphism of C and N a random integer. We assume that Z, A, N are independent …

The cluster index of regularly varying sequences with applications to limit theory for functions of multivariate Markov chains

T Mikosch, O Wintenberger - Probability Theory and Related Fields, 2014 - Springer
We introduce the cluster index of a multivariate stationary sequence and characterize the
index in terms of the spectral tail process. This index plays a major role in limit theory for …

Precise large deviation results for products of random matrices

D Buraczewski, S Mentemeier - 2016 - projecteuclid.org
The theorem of Furstenberg and Kesten provides a strong law of large numbers for the norm
of a product of random matrices. This can be extended under various assumptions, covering …

Large deviation estimates for exceedance times of perpetuity sequences and their dual processes

D Buraczewski, JF Collamore, E Damek… - The Annals of …, 2016 - JSTOR
In a variety of problems in pure and applied probability, it is relevant to study the large
exceedance probabilities of the perpetuity sequence Yn≔ B₁+ A₁ B₂+⋯+(A₁⋯ An-1) Bn …

Heavy-tail phenomenon in decentralized sgd

M Gürbüzbalaban, Y Hu, U Şimşekli, K Yuan… - IISE …, 2024 - Taylor & Francis
Recent theoretical studies have shown that heavy-tails can emerge in stochastic
optimization due to 'multiplicative noise', even under surprisingly simple settings, such as …

Cyclic and randomized stepsizes invoke heavier tails in SGD than constant stepsize

M Gürbüzbalaban, Y Hu, U Şimşekli, L Zhu - arxiv preprint arxiv …, 2023 - arxiv.org
Cyclic and randomized stepsizes are widely used in the deep learning practice and can
often outperform standard stepsize choices such as constant stepsize in SGD. Despite their …

Random difference equations with subexponential innovations

QH Tang, ZY Yuan - Science China Mathematics, 2016 - Springer
We consider the random difference equations S= d (X+ S) Y and T= d X+ TY, where= d
denotes equality in distribution, X and Y are two nonnegative random variables, and S and T …

Stochastic difference equation with diagonal matrices

E Damek - Annales de l'Institut Henri Poincare (B) Probabilites et …, 2025 - projecteuclid.org
We consider the stochastic equation X= d AX+ B where A is a random diagonal matrix and
X, B are random vectors, X, A are independent and the equation is meant in law. We prove …