Google 학술 검색

P Mehta, M Bukov, CH Wang, AGR Day, C Richardson… - Physics reports, 2019 - Elsevier

Abstract Machine Learning (ML) is one of the most exciting and dynamic areas of modern
research and application. The purpose of this review is to provide an introduction to the core …

저장 인용 1256회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org

Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

저장 인용 524회 인용 관련 학술자료 전체 13개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press

Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arxiv preprint arxiv:1810.02054, 2018 - arxiv.org

One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

저장 인용 855회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dying relu and initialization: Theory and numerical examples

L Lu, Y Shin, Y Su, GE Karniadakis - arxiv preprint arxiv:1903.06733, 2019 - arxiv.org

The dying ReLU refers to the problem when ReLU neurons become inactive and only output
0 for any input. There are many empirical and heuristic explanations of why ReLU neurons …

저장 인용 684회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

First-order methods almost always avoid strict saddle points

JD Lee, I Panageas, G Piliouras, M Simchowitz… - Mathematical …, 2019 - Springer

We establish that first-order methods avoid strict saddle points for almost all initializations.
Our results apply to a wide variety of first-order methods, including (manifold) gradient …

저장 인용 384회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Understanding the acceleration phenomenon via high-resolution differential equations

B Shi, SS Du, MI Jordan, WJ Su - Mathematical Programming, 2022 - Springer

Gradient-based optimization algorithms can be studied from the perspective of limiting
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …

저장 인용 295회 인용 관련 학술자료 전체 14개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Fixing by mixing: A recipe for optimal byzantine ml under heterogeneity

Y Allouah, S Farhadkhani… - International …, 2023 - proceedings.mlr.press

Byzantine machine learning (ML) aims to ensure the resilience of distributed learning
algorithms to misbehaving (or Byzantine) machines. Although this problem received …

저장 인용 57회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Neural collapse with normalized features: A geometric analysis over the riemannian manifold

C Yaras, P Wang, Z Zhu… - Advances in neural …, 2022 - proceedings.neurips.cc

When training overparameterized deep networks for classification tasks, it has been widely
observed that the learned features exhibit a so-called" neural collapse'" phenomenon. More …

저장 인용 47회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

On the power of over-parametrization in neural networks with quadratic activation

S Du, J Lee - International conference on machine learning, 2018 - proceedings.mlr.press

We provide new theoretical insights on why over-parametrization is effective in learning
neural networks. For a $ k $ hidden node shallow network with quadratic activation and $ n …

저장 인용 292회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Gradient descent can take exponential time to escape saddle points

A high-bias, low-variance introduction to machine learning for physicists

Nonconvex optimization meets low-rank matrix factorization: An overview

Gradient descent finds global minima of deep neural networks

Gradient descent provably optimizes over-parameterized neural networks

Dying relu and initialization: Theory and numerical examples

First-order methods almost always avoid strict saddle points

Understanding the acceleration phenomenon via high-resolution differential equations

Fixing by mixing: A recipe for optimal byzantine ml under heterogeneity

Neural collapse with normalized features: A geometric analysis over the riemannian manifold

On the power of over-parametrization in neural networks with quadratic activation