A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

Llamafactory: Unified efficient fine-tuning of 100+ language models

Y Zheng, R Zhang, J Zhang, Y Ye, Z Luo… - arxiv preprint arxiv …, 2024 - arxiv.org
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation

Y Sun, S Wang, S Feng, S Ding, C Pang… - arxiv preprint arxiv …, 2021 - arxiv.org
Pre-trained models have achieved state-of-the-art results in various Natural Language
Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up …

Factuality enhanced language models for open-ended text generation

N Lee, W **, P Xu, M Patwary… - Advances in …, 2022 - proceedings.neurips.cc
Pretrained language models (LMs) are susceptible to generate text with nonfactual
information. In this work, we measure and improve the factual accuracy of large-scale LMs …

Cpt: A pre-trained unbalanced transformer for both chinese language understanding and generation

Y Shao, Z Geng, Y Liu, J Dai, H Yan, F Yang… - Science China …, 2024 - Springer
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a
novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese …

[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding

X Ning, Z Lin, Z Zhou, Z Wang, H Yang… - Proceedings ENLSP …, 2023 - lirias.kuleuven.be
This work aims at decreasing the end-to-end generation latency of large language models
(LLMs). One of the major causes of the high generation latency is the sequential decoding …

A knowledge-enhanced pretraining model for commonsense story generation

J Guan, F Huang, Z Zhao, X Zhu… - Transactions of the …, 2020 - direct.mit.edu
Story generation, namely, generating a reasonable story from a leading context, is an
important but challenging task. In spite of the success in modeling fluency and local …

Fate-llm: A industrial grade federated learning framework for large language models

T Fan, Y Kang, G Ma, W Chen, W Wei, L Fan… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have
exhibited remarkable performances across various tasks in recent years. However, LLMs …

MEGATRON-CNTRL: Controllable story generation with external knowledge using large-scale language models

P Xu, M Patwary, M Shoeybi, R Puri, P Fung… - arxiv preprint arxiv …, 2020 - arxiv.org
Existing pre-trained large language models have shown unparalleled generative
capabilities. However, they are not controllable. In this paper, we propose MEGATRON …