A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

GPTEval: A survey on assessments of ChatGPT and GPT-4

R Mao, G Chen, X Zhang, F Guerin… - arxiv preprint arxiv …, 2023 - arxiv.org
The emergence of ChatGPT has generated much speculation in the press about its potential
to disrupt social and economic systems. Its astonishing language ability has aroused strong …

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

Is chatgpt a good nlg evaluator? a preliminary study

J Wang, Y Liang, F Meng, Z Sun, H Shi, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, the emergence of ChatGPT has attracted wide attention from the computational
linguistics community. Many prior studies have shown that ChatGPT achieves remarkable …

Bartscore: Evaluating generated text as text generation

W Yuan, G Neubig, P Liu - Advances in neural information …, 2021 - proceedings.neurips.cc
A wide variety of NLP applications, such as machine translation, summarization, and dialog,
involve text generation. One major challenge for these applications is how to evaluate …

Generative judge for evaluating alignment

J Li, S Sun, W Yuan, RZ Fan, H Zhao, P Liu - arxiv preprint arxiv …, 2023 - arxiv.org
The rapid development of Large Language Models (LLMs) has substantially expanded the
range of tasks they can address. In the field of Natural Language Processing (NLP) …

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Human-like summarization evaluation with chatgpt

M Gao, J Ruan, R Sun, X Yin, S Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Evaluating text summarization is a challenging problem, and existing evaluation metrics are
far from satisfactory. In this study, we explored ChatGPT's ability to perform human-like …

QAFactEval: Improved QA-based factual consistency evaluation for summarization

AR Fabbri, CS Wu, W Liu, C **ong - arxiv preprint arxiv:2112.08542, 2021 - arxiv.org
Factual consistency is an essential quality of text summarization models in practical settings.
Existing work in evaluating this dimension can be broadly categorized into two lines of …