Efficient large language models: A survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …
tasks such as natural language understanding and language generation, and thus have the …
Llama pro: Progressive llama with block expansion
Humans generally acquire new skills without compromising the old; however, the opposite
holds for Large Language Models (LLMs), eg, from LLaMA to CodeLLaMA. To this end, we …
holds for Large Language Models (LLMs), eg, from LLaMA to CodeLLaMA. To this end, we …
ELLE: Efficient lifelong pre-training for emerging data
Current pre-trained language models (PLM) are typically trained with static data, ignoring
that in real-world scenarios, streaming data of various sources may continuously grow. This …
that in real-world scenarios, streaming data of various sources may continuously grow. This …
Learning to grow pretrained models for efficient transformer training
Scaling transformers has led to significant breakthroughs in many domains, leading to a
paradigm in which larger versions of existing models are trained and released on a periodic …
paradigm in which larger versions of existing models are trained and released on a periodic …
Reusing pretrained models by multi-linear operators for efficient training
Training large models from scratch usually costs a substantial amount of resources. Towards
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …
Knowledge inheritance for pre-trained language models
Recent explorations of large-scale pre-trained language models (PLMs) have revealed the
power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger …
power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger …
Initializing models with larger ones
Weight initialization plays an important role in neural network training. Widely used
initialization methods are proposed and evaluated for networks that are trained from scratch …
initialization methods are proposed and evaluated for networks that are trained from scratch …
Flm-101b: An open llm and how to train it with $100 k budget
Large language models (LLMs) have achieved remarkable success in NLP and multimodal
tasks. Despite these successes, their development faces two main challenges:(i) high …
tasks. Despite these successes, their development faces two main challenges:(i) high …
Retraining-free model quantization via one-shot weight-coupling learning
Quantization is of significance for compressing the over-parameterized deep neural models
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …
Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models
Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based
neural networks. Although Transformer-based large models (LMs), including language …
neural networks. Although Transformer-based large models (LMs), including language …