Pre-trained language models for text generation: A survey
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
Bridging the gap: A survey on integrating (human) feedback for natural language generation
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …
large language models on vast internet-scale datasets. Despite these advancements, there …
Holistic evaluation of language models
Abstract Language models (LMs) like GPT‐3, PaLM, and ChatGPT are the foundation for
almost all major language technologies, but their capabilities, limitations, and risks are not …
almost all major language technologies, but their capabilities, limitations, and risks are not …
NusaCrowd: Open source initiative for Indonesian NLP resources
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …
Indonesian languages, including opening access to previously non-public resources …
Prometheus 2: An open source language model specialized in evaluating other language models
Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from
various LMs. However, concerns including transparency, controllability, and affordability …
various LMs. However, concerns including transparency, controllability, and affordability …
Evaluating general-purpose ai with psychometrics
X Wang, L Jiang, J Hernandez-Orallo… - arxiv preprint arxiv …, 2023 - arxiv.org
Comprehensive and accurate evaluation of general-purpose AI systems such as large
language models allows for effective mitigation of their risks and deepened understanding of …
language models allows for effective mitigation of their risks and deepened understanding of …
Chef: A comprehensive evaluation framework for standardized assessment of multimodal large language models
Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting
with visual content with myriad potential downstream tasks. However, even though a list of …
with visual content with myriad potential downstream tasks. However, even though a list of …
Dolphin: A Challenging and Diverse Benchmark for Arabic NLG
We present Dolphin, a novel benchmark that addresses the need for a natural language
generation (NLG) evaluation framework dedicated to the wide collection of Arabic …
generation (NLG) evaluation framework dedicated to the wide collection of Arabic …
Measuring the measuring tools: An automatic evaluation of semantic metrics for text corpora
The ability to compare the semantic similarity between text corpora is important in a variety
of natural language processing applications. However, standard methods for evaluating …
of natural language processing applications. However, standard methods for evaluating …
LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English
In the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress.
However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking …
However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking …