Some common mistakes in IR evaluation, and how they can be avoided
N Fuhr - Acm sigir forum, 2018 - dl.acm.org
This paper points out some mistakes that can be frequently found in IR publications: MRR
and ERR violate basic requirements for a metric, MAP is based on unrealistic assumptions …
and ERR violate basic requirements for a metric, MAP is based on unrealistic assumptions …
On Fuhr's guideline for IR evaluation
T Sakai - ACM SIGIR Forum, 2021 - dl.acm.org
In the December 2017 issue of SIGIR Forum, Fuhr presented ten" Thou Shalt Not" s (ie,
warnings against bad practices) for IR experimenters. While his article provides a lot of good …
warnings against bad practices) for IR experimenters. While his article provides a lot of good …
Topic difficulty: Collection and query formulation effects
Several recent studies have explored the interaction effects between topics, systems,
corpora, and components when measuring retrieval effectiveness. However, all of these …
corpora, and components when measuring retrieval effectiveness. However, all of these …
[PDF][PDF] Overview of the NTCIR-12 Short Text Conversation Task.
We give an overview of the NII Testbeds and Community for Information access Research
(NTCIR)-13 Short Text Conversation (STC) task, which was a core task of NTCIR-13. At …
(NTCIR)-13 Short Text Conversation (STC) task, which was a core task of NTCIR-13. At …
Which Diversity Evaluation Measures Are" Good"?
This study evaluates 30 IR evaluation measures or their instances, of which nine are for
adhoc IR and 21 are for diversified IR, primarily from the viewpoint of whether their …
adhoc IR and 21 are for diversified IR, primarily from the viewpoint of whether their …
[PDF][PDF] Overview of the NTCIR-15 we want web with CENTRE (WWW-3) task
This is an overview of the NTCIR-15 We Want Web with CENTRE (WWW-3) task. The task
features the Chinese subtask (adhoc web search) and the English subtask (adhoc web …
features the Chinese subtask (adhoc web search) and the English subtask (adhoc web …
Retrieval evaluation measures that agree with users' SERP preferences: Traditional, preference-based, and diversity measures
We examine the “goodness” of ranked retrieval evaluation measures in terms of how well
they align with users' Search Engine Result Page (SERP) preferences for web search. The …
they align with users' Search Engine Result Page (SERP) preferences for web search. The …
Statistical reform in information retrieval?
T Sakai - ACM SIGIR Forum, 2014 - dl.acm.org
IR revolves around evaluation. Therefore, IR researchers should employ sound evaluation
practices. Nowadays many of us know that statistical significance testing is not enough, but …
practices. Nowadays many of us know that statistical significance testing is not enough, but …
A reference-dependent model for web search evaluation: Understanding and measuring the experience of boundedly rational users
Previous researches demonstrate that users' actions in search interaction are associated
with relative gains and losses to reference points, known as the reference dependence …
with relative gains and losses to reference points, known as the reference dependence …
[HTML][HTML] User behavior modeling for web search evaluation
Search engines are widely used in our daily life. Batch evaluation of the performance of
search systems to their users has always been an essential issue in the field of information …
search systems to their users has always been an essential issue in the field of information …