A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
Llamafactory: Unified efficient fine-tuning of 100+ language models
Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks.
However, it requires non-trivial efforts to implement these methods on different models. We …
However, it requires non-trivial efforts to implement these methods on different models. We …
A review of deep learning for video captioning
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …
contributions from domains such as computer vision, natural language processing …
Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation
Pre-trained models have achieved state-of-the-art results in various Natural Language
Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up …
Processing (NLP) tasks. Recent works such as T5 and GPT-3 have shown that scaling up …
Factuality enhanced language models for open-ended text generation
Pretrained language models (LMs) are susceptible to generate text with nonfactual
information. In this work, we measure and improve the factual accuracy of large-scale LMs …
information. In this work, we measure and improve the factual accuracy of large-scale LMs …
Cpt: A pre-trained unbalanced transformer for both chinese language understanding and generation
In this paper, we take the advantage of previous pre-trained models (PTMs) and propose a
novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese …
novel Chinese pre-trained unbalanced transformer (CPT). Different from previous Chinese …
[PDF][PDF] Skeleton-of-thought: Large language models can do parallel decoding
This work aims at decreasing the end-to-end generation latency of large language models
(LLMs). One of the major causes of the high generation latency is the sequential decoding …
(LLMs). One of the major causes of the high generation latency is the sequential decoding …
A knowledge-enhanced pretraining model for commonsense story generation
Story generation, namely, generating a reasonable story from a leading context, is an
important but challenging task. In spite of the success in modeling fluency and local …
important but challenging task. In spite of the success in modeling fluency and local …
Fate-llm: A industrial grade federated learning framework for large language models
Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have
exhibited remarkable performances across various tasks in recent years. However, LLMs …
exhibited remarkable performances across various tasks in recent years. However, LLMs …
MEGATRON-CNTRL: Controllable story generation with external knowledge using large-scale language models
Existing pre-trained large language models have shown unparalleled generative
capabilities. However, they are not controllable. In this paper, we propose MEGATRON …
capabilities. However, they are not controllable. In this paper, we propose MEGATRON …