On efficient training of large-scale deep learning models: A literature review
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …
(CV), natural language processing (NLP), and speech. The use of large-scale models …
Large-scale methods for distributionally robust optimization
We propose and analyze algorithms for distributionally robust optimization of convex losses
with conditional value at risk (CVaR) and $\chi^ 2$ divergence uncertainty sets. We prove …
with conditional value at risk (CVaR) and $\chi^ 2$ divergence uncertainty sets. We prove …
Prioritized training on points that are learnable, worth learning, and not yet learnt
Training on web-scale data can take months. But much computation and time is wasted on
redundant and noisy points that are already learnt or not learnable. To accelerate training …
redundant and noisy points that are already learnt or not learnable. To accelerate training …
No train no gain: Revisiting efficient training algorithms for transformer-based language models
The computation necessary for training Transformer-based language models has
skyrocketed in recent years. This trend has motivated research on efficient training …
skyrocketed in recent years. This trend has motivated research on efficient training …
Acpl: Anti-curriculum pseudo-labelling for semi-supervised medical image classification
Effective semi-supervised learning (SSL) in medical image analysis (MIA) must address two
challenges: 1) work effectively on both multi-class (eg, lesion classification) and multi-label …
challenges: 1) work effectively on both multi-class (eg, lesion classification) and multi-label …
When do curricula work?
Inspired by human learning, researchers have proposed ordering examples during training
based on their difficulty. Both curriculum learning, exposing a network to easier examples …
based on their difficulty. Both curriculum learning, exposing a network to easier examples …
Chaos as an interpretable benchmark for forecasting and data-driven modelling
W Gilpin - arxiv preprint arxiv:2110.05266, 2021 - arxiv.org
The striking fractal geometry of strange attractors underscores the generative nature of
chaos: like probability distributions, chaotic systems can be repeatedly measured to produce …
chaos: like probability distributions, chaotic systems can be repeatedly measured to produce …
On Efficient Training of Large-Scale Deep Learning Models
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
Rank-based decomposable losses in machine learning: A survey
Recent works have revealed an essential paradigm in designing loss functions that
differentiate individual losses versus aggregate losses. The individual loss measures the …
differentiate individual losses versus aggregate losses. The individual loss measures the …
Calibrated selective classification
Selective classification allows models to abstain from making predictions (eg, say" I don't
know") when in doubt in order to obtain better effective accuracy. While typical selective …
know") when in doubt in order to obtain better effective accuracy. While typical selective …