Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
A survey of active learning for natural language processing
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …
language processing (NLP). In addition to a fine-grained categorization of query strategies …
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
Human evaluation is viewed as a reliable evaluation method for NLG which is expensive
and time-consuming. In order to save labor and costs, researchers usually perform human …
and time-consuming. In order to save labor and costs, researchers usually perform human …
Label-efficient model selection for text generation
Abstract Model selection for a given target task can be costly, as it may entail extensive
annotation of the quality of outputs of different models. We introduce DiffUse, an efficient …
annotation of the quality of outputs of different models. We introduce DiffUse, an efficient …
Label-efficient model selection for text generation
Model selection for a given target task can be costly, as it may entail extensive annotation of
the quality of outputs of different models. We introduce DiffUse, an efficient method to make …
the quality of outputs of different models. We introduce DiffUse, an efficient method to make …
On the effectiveness of automated metrics for text generation systems
A major challenge in the field of Text Generation is evaluation because we lack a sound
theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we …
theory that can be leveraged to extract guidelines for evaluation campaigns. In this work, we …
Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations
Our goal is a mechanism for efficiently assigning scalar ratings to each of a large set of
elements. For example," what percent positive or negative is this product review?" When …
elements. For example," what percent positive or negative is this product review?" When …
[PDF][PDF] Exploring Language Structured Prediction in Resource-limited Scenarios
Z Zhang - 2023 - lti.cmu.edu
In natural language processing (NLP), many tasks involve structured prediction: predicting
structured outputs consisting of a group of interdependent variables. This allows extracting …
structured outputs consisting of a group of interdependent variables. This allows extracting …