On hyperparameter optimization of machine learning algorithms: Theory and practice
Abstract Machine learning algorithms have been used widely in various applications and
areas. To fit a machine learning model into different problems, its hyper-parameters must be …
areas. To fit a machine learning model into different problems, its hyper-parameters must be …
Artificial neural networks-based machine learning for wireless networks: A tutorial
In order to effectively provide ultra reliable low latency communications and pervasive
connectivity for Internet of Things (IoT) devices, next-generation wireless networks can …
connectivity for Internet of Things (IoT) devices, next-generation wireless networks can …
Training compute-optimal large language models
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …
language model under a given compute budget. We find that current large language models …
Transformers as statisticians: Provable in-context learning with in-context algorithm selection
Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …
Adaptive federated learning in resource constrained edge computing systems
Emerging technologies and applications including Internet of Things, social networking, and
crowd-sourcing generate large amounts of data at the network edge. Machine learning …
crowd-sourcing generate large amounts of data at the network edge. Machine learning …
[BOOK][B] High-dimensional probability: An introduction with applications in data science
R Vershynin - 2018 - books.google.com
High-dimensional probability offers insight into the behavior of random vectors, random
matrices, random subspaces, and objects used to quantify uncertainty in high dimensions …
matrices, random subspaces, and objects used to quantify uncertainty in high dimensions …
Personalized federated learning with moreau envelopes
Federated learning (FL) is a decentralized and privacy-preserving machine learning
technique in which a group of clients collaborate with a server to learn a global model …
technique in which a group of clients collaborate with a server to learn a global model …
An empirical analysis of compute-optimal large language model training
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …
language model under a given compute budget. We find that current large language models …
Byzantine-robust distributed learning: Towards optimal statistical rates
In this paper, we develop distributed optimization algorithms that are provably robust against
Byzantine failures—arbitrary and potentially adversarial behavior, in distributed computing …
Byzantine failures—arbitrary and potentially adversarial behavior, in distributed computing …
QSGD: Communication-efficient SGD via gradient quantization and encoding
Parallel implementations of stochastic gradient descent (SGD) have received significant
research attention, thanks to its excellent scalability properties. A fundamental barrier when …
research attention, thanks to its excellent scalability properties. A fundamental barrier when …