Beyond efficiency: A systematic survey of resource-efficient large language models
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
A survey on model compression and acceleration for pretrained language models
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and
long inference delay prevent Transformer-based pretrained language models (PLMs) from …
long inference delay prevent Transformer-based pretrained language models (PLMs) from …
Extracting decision trees from medical texts: an overview of the Text2DT track in CHIP2022
This paper presents an overview of the Text2DT shared task 1 held in the CHIP-2022 shared
tasks. The shared task addresses the challenging topic of automatically extracting the …
tasks. The shared task addresses the challenging topic of automatically extracting the …
A survey on dynamic neural networks for natural language processing
Effectively scaling large Transformer models is a main driver of recent advances in natural
language processing. Dynamic neural networks, as an emerging research direction, are …
language processing. Dynamic neural networks, as an emerging research direction, are …
Learned adapters are better than manually designed adapters
Recently, a series of works have looked into further improving the adapter-based tuning by
manually designing better adapter architectures. Understandably, these manually designed …
manually designing better adapter architectures. Understandably, these manually designed …
BERT lost patience won't be robust to adversarial slowdown
In this paper, we systematically evaluate the robustness of multi-exit language models
against adversarial slowdown. To audit their robustness, we design a slowdown attack that …
against adversarial slowdown. To audit their robustness, we design a slowdown attack that …
Alora: Allocating low-rank adaptation for fine-tuning large language models
Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in
the era of large language models. Low-rank adaptation (LoRA) has demonstrated …
the era of large language models. Low-rank adaptation (LoRA) has demonstrated …
Milora: Efficient mixture of low-rank adaptation for large language models fine-tuning
Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective
parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency …
parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency …
Deed: Dynamic early exit on decoder for accelerating encoder-decoder transformer models
Encoder-decoder transformer models have achieved great success on various vision-
language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes …
language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes …
F-PABEE: flexible-patience-based early exiting for single-label and multi-label text classification tasks
X Gao, W Zhu, J Gao, C Yin - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Computational complexity and overthinking problems have become the bottlenecks for pre-
training language models (PLMs) with millions or even trillions of parameters. A Flexible …
training language models (PLMs) with millions or even trillions of parameters. A Flexible …