A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation

C Goutte, E Gaussier - European conference on information retrieval, 2005 - Springer
We address the problems of 1/assessing the confidence of the standard point estimates,
precision, recall and F-score, and 2/comparing the results, in terms of precision, recall and F …

[KÖNYV][B] Handbook of natural language processing

N Indurkhya, FJ Damerau - 2010 - taylorfrancis.com
The Handbook of Natural Language Processing, Second Edition presents practical tools
and techniques for implementing natural language processing in computer systems. Along …

A comparison of statistical significance tests for information retrieval evaluation

MD Smucker, J Allan, B Carterette - … of the sixteenth ACM conference on …, 2007 - dl.acm.org
Information retrieval (IR) researchers commonly use three tests of statistical significance: the
Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers …

Test collection based evaluation of information retrieval systems

M Sanderson - Foundations and Trends® in Information …, 2010 - nowpublishers.com
Use of test collections and evaluation measures to assess the effectiveness of information
retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 …

Statistical significance, power, and sample sizes: A systematic review of SIGIR and TOIS, 2006-2015

T Sakai - Proceedings of the 39th International ACM SIGIR …, 2016 - dl.acm.org
We conducted a systematic review of 840 SIGIR full papers and 215 TOIS papers published
between 2006 and 2015. The original objective of the study was to identify IR effectiveness …

How reliable are the results of large-scale information retrieval experiments?

J Zobel - Proceedings of the 21st annual international ACM …, 1998 - dl.acm.org
Two stages in measurement of techniques for information retrieval are gathering of
documents for relevance assessment and use of the assessments to numerically evaluate …

Information retrieval system evaluation: effort, sensitivity, and reliability

M Sanderson, J Zobel - Proceedings of the 28th annual international …, 2005 - dl.acm.org
The effectiveness of information retrieval systems is measured by comparing performance
on a common set of queries and documents. Significance tests are often used to evaluate …

Deep learning for identification of water deficits in sugarcane based on thermal images

LL de Melo, VGML de Melo, PAA Marques… - Agricultural Water …, 2022 - Elsevier
Thermal images of plants have been used as a way for monitoring water status since it does
provide a non-destructive method that allows its remote evaluation. Even with the facilities …

Evaluating evaluation metrics based on the bootstrap

T Sakai - Proceedings of the 29th annual international ACM …, 2006 - dl.acm.org
This paper describes how the Bootstrap approach to statistics can be applied to the
evaluation of IR effectiveness metrics. First, we argue that Bootstrap Hypothesis Tests …

Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors

J Urbano, H Lima, A Hanjalic - … of the 42nd International ACM SIGIR …, 2019 - dl.acm.org
Statistical significance testing is widely accepted as a means to assess how well a difference
in effectiveness reflects an actual difference between systems, as opposed to random noise …