- Academic Search

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Save Cite Cited by 158 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A survey of active learning for natural language processing

Z Zhang, E Strubell, E Hovy - arxiv preprint arxiv:2210.10109, 2022 - arxiv.org

In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …

Save Cite Cited by 108 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

J Ruan, X Pu, M Gao, X Wan, Y Zhu - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Human evaluation is viewed as a reliable evaluation method for NLG which is expensive
and time-consuming. In order to save labor and costs, researchers usually perform human …

Save Cite Cited by 4 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Label-efficient model selection for text generation

SA Tahan, A Gera, B Sznajder, L Choshen… - Proceedings of the …, 2024 - aclanthology.org

Abstract Model selection for a given target task can be costly, as it may entail extensive
annotation of the quality of outputs of different models. We introduce DiffUse, an efficient …

Save Cite Cited by 1 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Label-efficient model selection for text generation

S Ashury-Tahan, A Gera, B Sznajder… - arxiv preprint arxiv …, 2024 - arxiv.org

Model selection for a given target task can be costly, as it may entail extensive annotation of
the quality of outputs of different models. We introduce DiffUse, an efficient method to make …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

On the effectiveness of automated metrics for text generation systems

P Von Däniken, J Deriu, D Tuggener… - arxiv preprint arxiv …, 2022 - arxiv.org

A major challenge in the field of Text Generation is evaluation because we lack a sound
theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we …

Save Cite Cited by 4 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations

X Han, F Yu, J Sedoc, B Van Durme - arxiv preprint arxiv:2408.09765, 2024 - arxiv.org

Our goal is a mechanism for efficiently assigning scalar ratings to each of a large set of
elements. For example," what percent positive or negative is this product review?" When …

[Free GPT-4]

[PDF] cmu.edu

[PDF][PDF] Exploring Language Structured Prediction in Resource-limited Scenarios

Z Zhang - 2023 - lti.cmu.edu

In natural language processing (NLP), many tasks involve structured prediction: predicting
structured outputs consisting of a group of interdependent variables. This allows extracting …

Save Cite Cited by 1 Related articles All 4 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Active evaluation: Efficient NLG evaluation with few pairwise comparisons

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

A survey of active learning for natural language processing

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

Label-efficient model selection for text generation

Label-efficient model selection for text generation

On the effectiveness of automated metrics for text generation systems

Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations

[PDF][PDF] Exploring Language Structured Prediction in Resource-limited Scenarios