- Academic Search

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Speichern Zitieren Zitiert von: 125 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Llama pro: Progressive llama with block expansion

C Wu, Y Gan, Y Ge, Z Lu, J Wang, Y Feng… - arxiv preprint arxiv …, 2024 - arxiv.org

Humans generally acquire new skills without compromising the old; however, the opposite
holds for Large Language Models (LLMs), eg, from LLaMA to CodeLLaMA. To this end, we …

Speichern Zitieren Zitiert von: 51 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

ELLE: Efficient lifelong pre-training for emerging data

Y Qin, J Zhang, Y Lin, Z Liu, P Li, M Sun… - arxiv preprint arxiv …, 2022 - arxiv.org

Current pre-trained language models (PLM) are typically trained with static data, ignoring
that in real-world scenarios, streaming data of various sources may continuously grow. This …

Speichern Zitieren Zitiert von: 62 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Learning to grow pretrained models for efficient transformer training

P Wang, R Panda, LT Hennigen, P Greengard… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling transformers has led to significant breakthroughs in many domains, leading to a
paradigm in which larger versions of existing models are trained and released on a periodic …

Speichern Zitieren Zitiert von: 54 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Reusing pretrained models by multi-linear operators for efficient training

Y Pan, Y Yuan, Y Yin, Z Xu, L Shang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Training large models from scratch usually costs a substantial amount of resources. Towards
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …

Speichern Zitieren Zitiert von: 11 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Knowledge inheritance for pre-trained language models

Y Qin, Y Lin, J Yi, J Zhang, X Han, Z Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org

Recent explorations of large-scale pre-trained language models (PLMs) have revealed the
power of PLMs with huge amounts of parameters, setting off a wave of training ever-larger …

Speichern Zitieren Zitiert von: 55 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] openreview.net

Initializing models with larger ones

Z Xu, Y Chen, K Vishniakov, Y Yin, Z Shen… - The Twelfth …, 2023 - openreview.net

Weight initialization plays an important role in neural network training. Widely used
initialization methods are proposed and evaluated for networks that are trained from scratch …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Flm-101b: An open llm and how to train it with $100 k budget

X Li, Y Yao, X Jiang, X Fang, X Meng, S Fan… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have achieved remarkable success in NLP and multimodal
tasks. Despite these successes, their development faces two main challenges:(i) high …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Retraining-free model quantization via one-shot weight-coupling learning

C Tang, Y Meng, J Jiang, S **e, R Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Quantization is of significance for compressing the over-parameterized deep neural models
and deploying them on resource-limited devices. Fixed-precision quantization suffers from …

Speichern Zitieren Zitiert von: 7 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models

J Chen, J He, F Chen, Z Lv, J Tang, W Li, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Currently, most applications in the Industrial Internet of Things (IIoT) still rely on CNN-based
neural networks. Although Transformer-based large models (LMs), including language …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

bert2bert: Towards reusable pretrained language models

Efficient large language models: A survey

Llama pro: Progressive llama with block expansion

ELLE: Efficient lifelong pre-training for emerging data

Learning to grow pretrained models for efficient transformer training

Reusing pretrained models by multi-linear operators for efficient training

Knowledge inheritance for pre-trained language models

Initializing models with larger ones

Flm-101b: An open llm and how to train it with $100 k budget

Retraining-free model quantization via one-shot weight-coupling learning

Towards General Industrial Intelligence: A Survey on IIoT-Enhanced Continual Large Models