A theoretical analysis of NDCG type ranking measures

Y Wang, L Wang, Y Li, D He… - Conference on learning …, 2013 - proceedings.mlr.press
Ranking has been extensively studied in information retrieval, machine learning and
statistics. A central problem in ranking is to design a ranking measure for evaluation of …

A comparison of statistical significance tests for information retrieval evaluation

MD Smucker, J Allan, B Carterette - … of the sixteenth ACM conference on …, 2007 - dl.acm.org
Information retrieval (IR) researchers commonly use three tests of statistical significance: the
Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers …

Test collection based evaluation of information retrieval systems

M Sanderson - Foundations and Trends® in Information …, 2010 - nowpublishers.com
Use of test collections and evaluation measures to assess the effectiveness of information
retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 …

Statistical significance, power, and sample sizes: A systematic review of SIGIR and TOIS, 2006-2015

T Sakai - Proceedings of the 39th International ACM SIGIR …, 2016 - dl.acm.org
We conducted a systematic review of 840 SIGIR full papers and 215 TOIS papers published
between 2006 and 2015. The original objective of the study was to identify IR effectiveness …

Search result diversification

RLT Santos, C Macdonald, I Ounis - Foundations and Trends® …, 2015 - nowpublishers.com
Ranking in information retrieval has been traditionally approached as a pursuit of relevant
information, under the assumption that the users' information needs are unambiguously …

Assessing ranking metrics in top-N recommendation

D Valcarce, A Bellogín, J Parapar, P Castells - Information Retrieval …, 2020 - Springer
The evaluation of recommender systems is an area with unsolved questions at several
levels. Choosing the appropriate evaluation metric is one of such important issues. Ranking …

Time-based calibration of effectiveness measures

MD Smucker, CLA Clarke - Proceedings of the 35th international ACM …, 2012 - dl.acm.org
Many current effectiveness measures incorporate simplifying assumptions about user
behavior. These assumptions prevent the measures from reflecting aspects of the search …

[PDF][PDF] University of Wolverhampton at the TREC 2011 Microblog Track.

G Paltoglou, M Thelwall - TREC, 2011 - trec.nist.gov
In this report we discuss the experiments we conducted at the University of Wolverhampton
for the Microblog Track at TREC-2011. As this was the first time we participated in TREC and …

Estimating the uncertainty of average F1 scores

D Zhang, J Wang, X Zhao - … of the 2015 International conference on the …, 2015 - dl.acm.org
In multi-class text classification, the performance (effectiveness) of a classifier is usually
measured by micro-averaged and macro-averaged F 1 scores. However, the scores …

Statistical significance testing in information retrieval: an empirical analysis of type I, type II and type III errors

J Urbano, H Lima, A Hanjalic - … of the 42nd International ACM SIGIR …, 2019 - dl.acm.org
Statistical significance testing is widely accepted as a means to assess how well a difference
in effectiveness reflects an actual difference between systems, as opposed to random noise …