On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arxiv preprint arxiv …, 2023 - arxiv.org
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Large-scale methods for distributionally robust optimization

D Levy, Y Carmon, JC Duchi… - Advances in Neural …, 2020 - proceedings.neurips.cc
We propose and analyze algorithms for distributionally robust optimization of convex losses
with conditional value at risk (CVaR) and $\chi^ 2$ divergence uncertainty sets. We prove …

Prioritized training on points that are learnable, worth learning, and not yet learnt

S Mindermann, JM Brauner… - International …, 2022 - proceedings.mlr.press
Training on web-scale data can take months. But much computation and time is wasted on
redundant and noisy points that are already learnt or not learnable. To accelerate training …

No train no gain: Revisiting efficient training algorithms for transformer-based language models

J Kaddour, O Key, P Nawrot… - Advances in Neural …, 2024 - proceedings.neurips.cc
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …

Acpl: Anti-curriculum pseudo-labelling for semi-supervised medical image classification

F Liu, Y Tian, Y Chen, Y Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Effective semi-supervised learning (SSL) in medical image analysis (MIA) must address two
challenges: 1) work effectively on both multi-class (eg, lesion classification) and multi-label …

When do curricula work?

X Wu, E Dyer, B Neyshabur - arxiv preprint arxiv:2012.03107, 2020 - arxiv.org
Inspired by human learning, researchers have proposed ordering examples during training
based on their difficulty. Both curriculum learning, exposing a network to easier examples …

Chaos as an interpretable benchmark for forecasting and data-driven modelling

W Gilpin - arxiv preprint arxiv:2110.05266, 2021 - arxiv.org
The striking fractal geometry of strange attractors underscores the generative nature of
chaos: like probability distributions, chaotic systems can be repeatedly measured to produce …

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

Rank-based decomposable losses in machine learning: A survey

S Hu, X Wang, S Lyu - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Recent works have revealed an essential paradigm in designing loss functions that
differentiate individual losses versus aggregate losses. The individual loss measures the …

Calibrated selective classification

A Fisch, T Jaakkola, R Barzilay - arxiv preprint arxiv:2208.12084, 2022 - arxiv.org
Selective classification allows models to abstain from making predictions (eg, say" I don't
know") when in doubt in order to obtain better effective accuracy. While typical selective …