Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing

P Liu, W Yuan, J Fu, Z Jiang, H Hayashi… - ACM Computing …, 2023 - dl.acm.org
This article surveys and organizes research works in a new paradigm in natural language
processing, which we dub “prompt-based learning.” Unlike traditional supervised learning …

Recent advances in natural language processing via large pre-trained language models: A survey

B Min, H Ross, E Sulem, APB Veyseh… - ACM Computing …, 2023 - dl.acm.org
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …

Qlora: Efficient finetuning of quantized llms

T Dettmers, A Pagnoni, A Holtzman… - Advances in Neural …, 2024 - proceedings.neurips.cc
We present QLoRA, an efficient finetuning approach that reduces memory usage enough to
finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit …

Scaling instruction-finetuned language models

HW Chung, L Hou, S Longpre, B Zoph, Y Tay… - Journal of Machine …, 2024 - jmlr.org
Finetuning language models on a collection of datasets phrased as instructions has been
shown to improve model performance and generalization to unseen tasks. In this paper we …

Instruction tuning with gpt-4

B Peng, C Li, P He, M Galley, J Gao - arxiv preprint arxiv:2304.03277, 2023 - arxiv.org
Prior work has shown that finetuning large language models (LLMs) using machine-
generated instruction-following data enables such models to achieve remarkable zero-shot …

Finetuned language models are zero-shot learners

J Wei, M Bosma, VY Zhao, K Guu, AW Yu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper explores a simple method for improving the zero-shot learning abilities of
language models. We show that instruction tuning--finetuning language models on a …

Gpt-neox-20b: An open-source autoregressive language model

S Black, S Biderman, E Hallahan, Q Anthony… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce GPT-NeoX-20B, a 20 billion parameter autoregressive language model
trained on the Pile, whose weights will be made freely and openly available to the public …

Crosslingual generalization through multitask finetuning

N Muennighoff, T Wang, L Sutawika, A Roberts… - arxiv preprint arxiv …, 2022 - arxiv.org
Multitask prompted finetuning (MTF) has been shown to help large language models
generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused …

Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks

Y Wang, S Mishra, P Alipoormolabashi, Y Kordi… - arxiv preprint arxiv …, 2022 - arxiv.org
How well can NLP models generalize to a variety of unseen tasks when provided with task
instructions? To address this question, we first introduce Super-NaturalInstructions, a …

Multitask prompted training enables zero-shot task generalization

V Sanh, A Webson, C Raffel, SH Bach… - arxiv preprint arxiv …, 2021 - arxiv.org
Large language models have recently been shown to attain reasonable zero-shot
generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that …