Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
An empirical survey on long document summarization: Datasets, models, and metrics
Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …
format to detail out important issues and complicated subjects that require extra attention. An …
Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
Language model tokenizers introduce unfairness between languages
Recent language models have shown impressive multilingual performance, even when not
explicitly trained for it. Despite this, there are concerns about the quality of their outputs …
explicitly trained for it. Despite this, there are concerns about the quality of their outputs …
AlignScore: Evaluating factual consistency with a unified alignment function
Many text generation applications require the generated text to be factually consistent with
input information. Automatic evaluation of factual consistency is challenging. Previous work …
input information. Automatic evaluation of factual consistency is challenging. Previous work …
Learning to summarize with human feedback
As language models become more powerful, training and evaluation are increasingly
bottlenecked by the data and metrics used for a particular task. For example, summarization …
bottlenecked by the data and metrics used for a particular task. For example, summarization …
On faithfulness and factuality in abstractive summarization
It is well known that the standard likelihood training and approximate decoding objectives in
neural text generation models lead to less human-like responses for open-ended tasks such …
neural text generation models lead to less human-like responses for open-ended tasks such …
Summeval: Re-evaluating summarization evaluation
The scarcity of comprehensive up-to-date studies on evaluation metrics for text
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …
Beyond goldfish memory: Long-term open-domain conversation
Despite recent improvements in open-domain dialogue models, state of the art models are
trained and evaluated on short conversations with little context. In contrast, the long-term …
trained and evaluated on short conversations with little context. In contrast, the long-term …
Understanding factuality in abstractive summarization with FRANK: A benchmark for factuality metrics
Modern summarization models generate highly fluent but often factually unreliable outputs.
This motivated a surge of metrics attempting to measure the factuality of automatically …
This motivated a surge of metrics attempting to measure the factuality of automatically …
Pegasus: Pre-training with extracted gap-sentences for abstractive summarization
Recent work pre-training Transformers with self-supervised objectives on large text corpora
has shown great success when fine-tuned on downstream NLP tasks including text …
has shown great success when fine-tuned on downstream NLP tasks including text …