Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024‏ - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Using natural language processing to support peer‐feedback in the age of artificial intelligence: A cross‐disciplinary framework and a research agenda

E Bauer, M Greisel, I Kuznetsov… - British Journal of …, 2023‏ - Wiley Online Library
Advancements in artificial intelligence are rapidly increasing. The new‐generation large
language models, such as ChatGPT and GPT‐4, bear the potential to transform educational …

Efficient streaming language models with attention sinks

G **ao, Y Tian, B Chen, S Han, M Lewis - arxiv preprint arxiv:2309.17453, 2023‏ - arxiv.org
Deploying Large Language Models (LLMs) in streaming applications such as multi-round
dialogue, where long interactions are expected, is urgently needed but poses two major …

Gqa: Training generalized multi-query transformer models from multi-head checkpoints

J Ainslie, J Lee-Thorp, M De Jong… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up
decoder inference. However, MQA can lead to quality degradation, and moreover it may not …

Ul2: Unifying language learning paradigms

Y Tay, M Dehghani, VQ Tran, X Garcia, J Wei… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Existing pre-trained models are generally geared towards a particular class of problems. To
date, there seems to be still no consensus on what the right architecture and pre-training …

Longbench: A bilingual, multitask benchmark for long context understanding

Y Bai, X Lv, J Zhang, H Lyu, J Tang, Z Huang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Although large language models (LLMs) demonstrate impressive performance for many
language tasks, most of them can only handle texts a few thousand tokens long, limiting their …

Finetuned language models are zero-shot learners

J Wei, M Bosma, VY Zhao, K Guu, AW Yu… - arxiv preprint arxiv …, 2021‏ - arxiv.org
This paper explores a simple method for improving the zero-shot learning abilities of
language models. We show that instruction tuning--finetuning language models on a …

LongT5: Efficient text-to-text transformer for long sequences

M Guo, J Ainslie, D Uthus, S Ontanon, J Ni… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Recent work has shown that either (1) increasing the input length or (2) increasing model
size can improve the performance of Transformer-based neural models. In this paper, we …

Graph neural networks for natural language processing: A survey

L Wu, Y Chen, K Shen, X Guo, H Gao… - … and Trends® in …, 2023‏ - nowpublishers.com
Deep learning has become the dominant approach in addressing various tasks in Natural
Language Processing (NLP). Although text inputs are typically represented as a sequence …

Delta tuning: A comprehensive study of parameter efficient methods for pre-trained language models

N Ding, Y Qin, G Yang, F Wei, Z Yang, Y Su… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Despite the success, the process of fine-tuning large-scale PLMs brings prohibitive
adaptation costs. In fact, fine-tuning all the parameters of a colossal model and retaining …