Split computing and early exiting for deep learning applications: Survey and research challenges

Y Matsubara, M Levorato, F Restuccia - ACM Computing Surveys, 2022 - dl.acm.org
Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep
neural networks (DNNs) to execute complex inference tasks such as image classification …

The eras and trends of automatic short answer grading

S Burrows, I Gurevych, B Stein - International journal of artificial …, 2015 - Springer
Automatic short answer grading (ASAG) is the task of assessing short natural language
responses to objective questions using computational methods. The active research in this …

Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Pretraining language models with human preferences

T Korbak, K Shi, A Chen, RV Bhalerao… - International …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …

Fine-tuning language models with just forward passes

S Malladi, T Gao, E Nichani… - Advances in …, 2023 - proceedings.neurips.cc
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but
as LMs grow in size, backpropagation requires a prohibitively large amount of memory …

Unified-io: A unified model for vision, language, and multi-modal tasks

J Lu, C Clark, R Zellers, R Mottaghi… - The Eleventh …, 2022 - openreview.net
We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical
computer vision tasks, including pose estimation, object detection, depth estimation and …

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

M Wortsman, G Ilharco, SY Gadre… - International …, 2022 - proceedings.mlr.press
The conventional recipe for maximizing model accuracy is to (1) train multiple models with
various hyperparameters and (2) pick the individual model which performs best on a held …

Rethinking the role of demonstrations: What makes in-context learning work?

S Min, X Lyu, A Holtzman, M Artetxe, M Lewis… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LMs) are able to in-context learn--perform a new task via inference
alone by conditioning on a few input-label pairs (demonstrations) and making predictions for …

Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert

Q Zhong, L Ding, J Liu, B Du, D Tao - arxiv preprint arxiv:2302.10198, 2023 - arxiv.org
Recently, ChatGPT has attracted great attention, as it can generate fluent and high-quality
responses to human inquiries. Several prior studies have shown that ChatGPT attains …

Powerinfer: Fast large language model serving with a consumer-grade gpu

Y Song, Z Mi, H **e, H Chen - Proceedings of the ACM SIGOPS 30th …, 2024 - dl.acm.org
This paper introduces PowerInfer, a high-speed Large Language Model (LLM) inference
engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key …