Split computing and early exiting for deep learning applications: Survey and research challenges

Y Matsubara, M Levorato, F Restuccia - ACM Computing Surveys, 2022‏ - dl.acm.org
Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep
neural networks (DNNs) to execute complex inference tasks such as image classification …

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023‏ - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

Language models are super mario: Absorbing abilities from homologous models as a free lunch

L Yu, B Yu, H Yu, F Huang, Y Li - Forty-first International Conference …, 2024‏ - openreview.net
In this paper, we unveil that Language Models (LMs) can acquire new capabilities by
assimilating parameters from homologous models without retraining or GPUs. We first …

Finetuned language models are zero-shot learners

J Wei, M Bosma, VY Zhao, K Guu, AW Yu… - arxiv preprint arxiv …, 2021‏ - arxiv.org
This paper explores a simple method for improving the zero-shot learning abilities of
language models. We show that instruction tuning--finetuning language models on a …

True few-shot learning with language models

E Perez, D Kiela, K Cho - Advances in neural information …, 2021‏ - proceedings.neurips.cc
Pretrained language models (LMs) perform well on many tasks even when learning from a
few examples, but prior work uses many held-out examples to tune various aspects of …

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

J Dodge, M Sap, A Marasović, W Agnew… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Large language models have led to remarkable progress on many NLP tasks, and
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …

Time travel in llms: Tracing data contamination in large language models

S Golchin, M Surdeanu - arxiv preprint arxiv:2308.08493, 2023‏ - arxiv.org
Data contamination, ie, the presence of test data from downstream tasks in the training data
of large language models (LLMs), is a potential major issue in measuring LLMs' real …