Sentence-level Aggregation of Lexical Metrics Correlate Stronger with Human Judgements than Corpus-level Aggregation
In this paper we show that corpus-level aggregation hinders considerably the capability of
lexical metrics to accurately evaluate machine translation (MT) systems. With empirical …
lexical metrics to accurately evaluate machine translation (MT) systems. With empirical …