Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

A survey on model compression and acceleration for pretrained language models

C Xu, J McAuley - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and
long inference delay prevent Transformer-based pretrained language models (PLMs) from …

Extracting decision trees from medical texts: an overview of the Text2DT track in CHIP2022

W Zhu, W Li, X Wang, W Ji, Y Wu, J Chen… - China Health …, 2022 - Springer
This paper presents an overview of the Text2DT shared task 1 held in the CHIP-2022 shared
tasks. The shared task addresses the challenging topic of automatically extracting the …

A survey on dynamic neural networks for natural language processing

C Xu, J McAuley - arxiv preprint arxiv:2202.07101, 2022 - arxiv.org
Effectively scaling large Transformer models is a main driver of recent advances in natural
language processing. Dynamic neural networks, as an emerging research direction, are …

Learned adapters are better than manually designed adapters

Y Zhang, P Wang, M Tan, W Zhu - Findings of the Association for …, 2023 - aclanthology.org
Recently, a series of works have looked into further improving the adapter-based tuning by
manually designing better adapter architectures. Understandably, these manually designed …

BERT lost patience won't be robust to adversarial slowdown

Z Coalson, G Ritter, R Bobba… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we systematically evaluate the robustness of multi-exit language models
against adversarial slowdown. To audit their robustness, we design a slowdown attack that …

Alora: Allocating low-rank adaptation for fine-tuning large language models

Z Liu, J Lyn, W Zhu, X Tian, Y Graham - arxiv preprint arxiv:2403.16187, 2024 - arxiv.org
Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in
the era of large language models. Low-rank adaptation (LoRA) has demonstrated …

Milora: Efficient mixture of low-rank adaptation for large language models fine-tuning

J Zhang, Y Zhao, D Chen, X Tian, H Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective
parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency …

Deed: Dynamic early exit on decoder for accelerating encoder-decoder transformer models

P Tang, P Zhu, T Li, S Appalaraju… - arxiv preprint arxiv …, 2023 - arxiv.org
Encoder-decoder transformer models have achieved great success on various vision-
language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes …

F-PABEE: flexible-patience-based early exiting for single-label and multi-label text classification tasks

X Gao, W Zhu, J Gao, C Yin - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Computational complexity and overthinking problems have become the bottlenecks for pre-
training language models (PLMs) with millions or even trillions of parameters. A Flexible …