Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Llama pro: Progressive llama with block expansion

C Wu, Y Gan, Y Ge, Z Lu, J Wang, Y Feng… - arxiv preprint arxiv …, 2024 - arxiv.org
Humans generally acquire new skills without compromising the old; however, the opposite
holds for Large Language Models (LLMs), eg, from LLaMA to CodeLLaMA. To this end, we …

ELLE: Efficient lifelong pre-training for emerging data

Y Qin, J Zhang, Y Lin, Z Liu, P Li, M Sun… - arxiv preprint arxiv …, 2022 - arxiv.org
Current pre-trained language models (PLM) are typically trained with static data, ignoring
that in real-world scenarios, streaming data of various sources may continuously grow. This …

Learning to grow pretrained models for efficient transformer training

P Wang, R Panda, LT Hennigen, P Greengard… - arxiv preprint arxiv …, 2023 - arxiv.org
Scaling transformers has led to significant breakthroughs in many domains, leading to a
paradigm in which larger versions of existing models are trained and released on a periodic …

Reusing pretrained models by multi-linear operators for efficient training

Y Pan, Y Yuan, Y Yin, Z Xu, L Shang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Training large models from scratch usually costs a substantial amount of resources. Towards
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …

Knowledge inheritance for pre-trained language models

Y Qin, Y Lin, J Yi, J Zhang, X Han, Z Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Recent explorations of large-scale pre-trained language models (PLMs) have revealed the
power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger …

Initializing models with larger ones

Z Xu, Y Chen, K Vishniakov, Y Yin, Z Shen… - The Twelfth …, 2023 - openreview.net
Weight initialization plays an important role in neural network training. Widely used
initialization methods are proposed and evaluated for networks that are trained from scratch …

Flm-101b: An open llm and how to train it with $100 k budget

X Li, Y Yao, X Jiang, X Fang, X Meng, S Fan… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have achieved remarkable success in NLP and multimodal
tasks. Despite these successes, their development faces two main challenges:(i) high …

Retraining-free model quantization via one-shot weight-coupling learning

C Tang, Y Meng, J Jiang, S **e, R Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Quantization is of significance for compressing the over-parameterized deep neural models
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …

Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models

J Chen, J He, F Chen, Z Lv, J Tang, W Li, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based
neural networks. Although Transformer-based large models (LMs), including language …