The heavy-tail phenomenon in SGD
In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …
Stochastic models with power-law tails
Dariusz Buraczewski Ewa Damek Thomas Mikosch The Equation X = AX + B Page 1
Springer Series in Operations Research and Financial Engineering Dariusz Buraczewski …
Springer Series in Operations Research and Financial Engineering Dariusz Buraczewski …
On multidimensional Mandelbrot cascades
D Buraczewski, E Damek, Y Guivarc'h… - Journal of Difference …, 2014 - Taylor & Francis
Let Z be a random variable with values in a proper closed convex cone, A a random
endomorphism of C and N a random integer. We assume that Z, A, N are independent …
endomorphism of C and N a random integer. We assume that Z, A, N are independent …
The cluster index of regularly varying sequences with applications to limit theory for functions of multivariate Markov chains
We introduce the cluster index of a multivariate stationary sequence and characterize the
index in terms of the spectral tail process. This index plays a major role in limit theory for …
index in terms of the spectral tail process. This index plays a major role in limit theory for …
Precise large deviation results for products of random matrices
D Buraczewski, S Mentemeier - 2016 - projecteuclid.org
The theorem of Furstenberg and Kesten provides a strong law of large numbers for the norm
of a product of random matrices. This can be extended under various assumptions, covering …
of a product of random matrices. This can be extended under various assumptions, covering …
Large deviation estimates for exceedance times of perpetuity sequences and their dual processes
In a variety of problems in pure and applied probability, it is relevant to study the large
exceedance probabilities of the perpetuity sequence Yn≔ B₁+ A₁ B₂+⋯+(A₁⋯ An-1) Bn …
exceedance probabilities of the perpetuity sequence Yn≔ B₁+ A₁ B₂+⋯+(A₁⋯ An-1) Bn …
Heavy-tail phenomenon in decentralized sgd
Recent theoretical studies have shown that heavy-tails can emerge in stochastic
optimization due to 'multiplicative noise', even under surprisingly simple settings, such as …
optimization due to 'multiplicative noise', even under surprisingly simple settings, such as …
Cyclic and randomized stepsizes invoke heavier tails in SGD than constant stepsize
Cyclic and randomized stepsizes are widely used in the deep learning practice and can
often outperform standard stepsize choices such as constant stepsize in SGD. Despite their …
often outperform standard stepsize choices such as constant stepsize in SGD. Despite their …
Random difference equations with subexponential innovations
We consider the random difference equations S= d (X+ S) Y and T= d X+ TY, where= d
denotes equality in distribution, X and Y are two nonnegative random variables, and S and T …
denotes equality in distribution, X and Y are two nonnegative random variables, and S and T …
Stochastic difference equation with diagonal matrices
E Damek - Annales de l'Institut Henri Poincare (B) Probabilites et …, 2025 - projecteuclid.org
We consider the stochastic equation X= d AX+ B where A is a random diagonal matrix and
X, B are random vectors, X, A are independent and the equation is meant in law. We prove …
X, B are random vectors, X, A are independent and the equation is meant in law. We prove …