- Academic Search

J Li, T Tang, WX Zhao, JY Nie, JR Wen - ACM Computing Surveys, 2024 - dl.acm.org

Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …

Save Cite Cited by 417 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] mit.edu

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu

Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

Save Cite Cited by 75 Related articles All 9 versions Free GPT-4

Holistic evaluation of language models

R Bommasani, P Liang, T Lee - … of the New York Academy of …, 2023 - Wiley Online Library

Abstract Language models (LMs) like GPT‐3, PaLM, and ChatGPT are the foundation for
almost all major language technologies, but their capabilities, limitations, and risks are not …

Save Cite Cited by 110 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] aclanthology.org

NusaCrowd: Open source initiative for Indonesian NLP resources

S Cahyawijaya, H Lovenia, AF Aji… - Findings of the …, 2023 - aclanthology.org

We present NusaCrowd, a collaborative initiative to collect and unify existing resources for
Indonesian languages, including opening access to previously non-public resources …

Save Cite Cited by 970 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Prometheus 2: An open source language model specialized in evaluating other language models

S Kim, J Suk, S Longpre, BY Lin, J Shin… - arxiv preprint arxiv …, 2024 - arxiv.org

Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from
various LMs. However, concerns including transparency, controllability, and affordability …

Save Cite Cited by 95 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Evaluating general-purpose ai with psychometrics

X Wang, L Jiang, J Hernandez-Orallo… - arxiv preprint arxiv …, 2023 - arxiv.org

Comprehensive and accurate evaluation of general-purpose AI systems such as large
language models allows for effective mitigation of their risks and deepened understanding of …

Save Cite Cited by 12 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Chef: A comprehensive evaluation framework for standardized assessment of multimodal large language models

Z Shi, Z Wang, H Fan, Z Yin, L Sheng, Y Qiao… - arxiv preprint arxiv …, 2023 - arxiv.org

Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting
with visual content with myriad potential downstream tasks. However, even though a list of …

Save Cite Cited by 9 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

A Elmadany, A El-Shangiti… - Findings of the …, 2023 - aclanthology.org

We present Dolphin, a novel benchmark that addresses the need for a natural language
generation (NLG) evaluation framework dedicated to the wide collection of Arabic …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Measuring the measuring tools: An automatic evaluation of semantic metrics for text corpora

G Kour, S Ackerman, O Raz, E Farchi, B Carmeli… - arxiv preprint arxiv …, 2022 - arxiv.org

The ability to compare the semantic similarity between text corpora is important in a variety
of natural language processing applications. However, standard methods for evaluating …

Save Cite Cited by 11 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English

T Santosh, C Weiss, M Grabmair - arxiv preprint arxiv:2410.09527, 2024 - arxiv.org

In the evolving NLP landscape, benchmarks serve as yardsticks for gauging progress.
However, existing Legal NLP benchmarks only focus on predictive tasks, overlooking …

Save Cite Cited by 1 Related articles All 5 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Gemv2: Multilingual nlg benchmarking in a single line of code

Pre-trained language models for text generation: A survey

Bridging the gap: A survey on integrating (human) feedback for natural language generation

Holistic evaluation of language models

NusaCrowd: Open source initiative for Indonesian NLP resources

Prometheus 2: An open source language model specialized in evaluating other language models

Evaluating general-purpose ai with psychometrics

Chef: A comprehensive evaluation framework for standardized assessment of multimodal large language models

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

Measuring the measuring tools: An automatic evaluation of semantic metrics for text corpora

LexSumm and LexT5: Benchmarking and Modeling Legal Summarization Tasks in English