- Academic Search

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

Save Cite Cited by 71 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

A survey on model compression and acceleration for pretrained language models

C Xu, J McAuley - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and
long inference delay prevent Transformer-based pretrained language models (PLMs) from …

Save Cite Cited by 73 Related articles All 4 versions Free GPT-4 View as HTML

Extracting decision trees from medical texts: an overview of the Text2DT track in CHIP2022

W Zhu, W Li, X Wang, W Ji, Y Wu, J Chen… - China Health …, 2022 - Springer

This paper presents an overview of the Text2DT shared task 1 held in the CHIP-2022 shared
tasks. The shared task addresses the challenging topic of automatically extracting the …

Save Cite Cited by 18 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A survey on dynamic neural networks for natural language processing

C Xu, J McAuley - arxiv preprint arxiv:2202.07101, 2022 - arxiv.org

Effectively scaling large Transformer models is a main driver of recent advances in natural
language processing. Dynamic neural networks, as an emerging research direction, are …

Save Cite Cited by 31 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Learned adapters are better than manually designed adapters

Y Zhang, P Wang, M Tan, W Zhu - Findings of the Association for …, 2023 - aclanthology.org

Recently, a series of works have looked into further improving the adapter-based tuning by
manually designing better adapter architectures. Understandably, these manually designed …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

BERT lost patience won't be robust to adversarial slowdown

Z Coalson, G Ritter, R Bobba… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we systematically evaluate the robustness of multi-exit language models
against adversarial slowdown. To audit their robustness, we design a slowdown attack that …

Save Cite Cited by 2 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Alora: Allocating low-rank adaptation for fine-tuning large language models

Z Liu, J Lyn, W Zhu, X Tian, Y Graham - arxiv preprint arxiv:2403.16187, 2024 - arxiv.org

Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in
the era of large language models. Low-rank adaptation (LoRA) has demonstrated …

Save Cite Cited by 17 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Milora: Efficient mixture of low-rank adaptation for large language models fine-tuning

J Zhang, Y Zhao, D Chen, X Tian, H Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Low-rank adaptation (LoRA) and its mixture-of-experts (MOE) variants are highly effective
parameter-efficient fine-tuning (PEFT) methods. However, they introduce significant latency …

Save Cite Cited by 3 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Deed: Dynamic early exit on decoder for accelerating encoder-decoder transformer models

P Tang, P Zhu, T Li, S Appalaraju… - arxiv preprint arxiv …, 2023 - arxiv.org

Encoder-decoder transformer models have achieved great success on various vision-
language (VL) tasks, but they suffer from high inference latency. Typically, the decoder takes …

Save Cite Cited by 7 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

F-PABEE: flexible-patience-based early exiting for single-label and multi-label text classification tasks

X Gao, W Zhu, J Gao, C Yin - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Computational complexity and overthinking problems have become the bottlenecks for pre-
training language models (PLMs) with millions or even trillions of parameters. A Flexible …

Save Cite Cited by 13 Related articles All 3 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

PCEE-BERT: accelerating BERT inference via patient and confident early exiting

Beyond efficiency: A systematic survey of resource-efficient large language models

A survey on model compression and acceleration for pretrained language models

Extracting decision trees from medical texts: an overview of the Text2DT track in CHIP2022

A survey on dynamic neural networks for natural language processing

Learned adapters are better than manually designed adapters

BERT lost patience won't be robust to adversarial slowdown

Alora: Allocating low-rank adaptation for fine-tuning large language models

Milora: Efficient mixture of low-rank adaptation for large language models fine-tuning

Deed: Dynamic early exit on decoder for accelerating encoder-decoder transformer models

F-PABEE: flexible-patience-based early exiting for single-label and multi-label text classification tasks