Ammus: A survey of transformer-based pretrained models in natural language processing

KS Kalyan, A Rajasekharan, S Sangeetha - arxiv preprint arxiv …, 2021 - arxiv.org
Transformer-based pretrained language models (T-PTLMs) have achieved great success in
almost every NLP task. The evolution of these models started with GPT and BERT. These …

A systematic review of transformer-based pre-trained language models through self-supervised learning

E Kotei, R Thirunavukarasu - Information, 2023 - mdpi.com
Transfer learning is a technique utilized in deep learning applications to transmit learned
inference to a different target domain. The approach is mainly to solve the problem of a few …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

On transferability of prompt tuning for natural language processing

Y Su, X Wang, Y Qin, CM Chan, Y Lin, H Wang… - arxiv preprint arxiv …, 2021 - arxiv.org
Prompt tuning (PT) is a promising parameter-efficient method to utilize extremely large pre-
trained language models (PLMs), which can achieve comparable performance to full …

[HTML][HTML] Cpm-2: Large-scale cost-effective pre-trained language models

Z Zhang, Y Gu, X Han, S Chen, C **ao, Z Sun, Y Yao… - AI Open, 2021 - Elsevier
In recent years, the size of pre-trained language models (PLMs) has grown by leaps and
bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

bert2bert: Towards reusable pretrained language models

C Chen, Y Yin, L Shang, X Jiang, Y Qin, F Wang… - arxiv preprint arxiv …, 2021 - arxiv.org
In recent years, researchers tend to pre-train ever-larger language models to explore the
upper limit of deep models. However, large language model pre-training costs intensive …

ELLE: Efficient lifelong pre-training for emerging data

Y Qin, J Zhang, Y Lin, Z Liu, P Li, M Sun… - arxiv preprint arxiv …, 2022 - arxiv.org
Current pre-trained language models (PLM) are typically trained with static data, ignoring
that in real-world scenarios, streaming data of various sources may continuously grow. This …

Learning to grow pretrained models for efficient transformer training

P Wang, R Panda, LT Hennigen, P Greengard… - arxiv preprint arxiv …, 2023 - arxiv.org
Scaling transformers has led to significant breakthroughs in many domains, leading to a
paradigm in which larger versions of existing models are trained and released on a periodic …

Cross-lingual consistency of factual knowledge in multilingual language models

J Qi, R Fernández, A Bisazza - arxiv preprint arxiv:2310.10378, 2023 - arxiv.org
Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store
considerable amounts of factual knowledge, but large variations are observed across …