Fairness in information access systems

MD Ekstrand, A Das, R Burke… - Foundations and Trends …, 2022 - nowpublishers.com
Recommendation, information retrieval, and other information access systems pose unique
challenges for investigating and applying the fairness and non-discrimination concepts that …

Scaling laws do not scale

F Diaz, M Madaio - Proceedings of the AAAI/ACM Conference on AI …, 2024 - ojs.aaai.org
Recent work has advocated for training AI models on ever-larger datasets, arguing that as
the size of a dataset increases, the performance of a model trained on that dataset will …

Aligning offline metrics and human judgments of value for code generation models

V Dibia, A Fourney, G Bansal… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models have demonstrated great potential to assist programmers in
generating code. For such human-AI pair programming scenarios, we empirically …

Preference-based offline evaluation

CLA Clarke, F Diaz, N Arabzadeh - … on Web Search and Data Mining, 2023 - dl.acm.org
A core step in production model research and development involves the offline evaluation of
a system before production deployment. Traditional offline evaluation of search …

Measuring commonality in recommendation of cultural content to strengthen cultural citizenship

A Ferraro, G Ferreira, F Diaz, G Born - ACM Transactions on …, 2024 - dl.acm.org
Recommender systems have become the dominant means of curating cultural content,
significantly influencing the nature of individual cultural experience. While the majority of …

Recall, robustness, and lexicographic evaluation

F Diaz, B Mitra - arxiv preprint arxiv:2302.11370, 2023 - arxiv.org
Although originally developed to evaluate sets of items, recall is often used to evaluate
rankings of items, including those produced by recommender, retrieval, and other machine …

Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision

F Diaz - arxiv preprint arxiv:2306.07908, 2023 - arxiv.org
Across a variety of ranking tasks, researchers use reciprocal rank to measure the
effectiveness for users interested in exactly one relevant item. Despite its widespread use …

Offline Evaluation of Set-Based Text-to-Image Generation

N Arabzadeh, F Diaz, J He - … of the 2024 Annual International ACM …, 2024 - dl.acm.org
Text-to-Image (TTI) systems often support people during ideation, the early stages of a
creative process when exposure to a broad set of relevant or partially relevant images can …

Unified browsing models for linear and grid layouts

A Raj, M Ekstrand - arxiv preprint arxiv:2310.12524, 2023 - arxiv.org
Many information access systems operationalize their results in terms of rankings, which are
then displayed to users in various ranking layouts such as linear lists or grids. User …

Mixed method development of evaluation metrics

B St. Thomas, P Chandar, C Hosey, F Diaz - Proceedings of the 27th …, 2021 - dl.acm.org
Designers of online search and recommendation services often need to develop metrics to
assess system performance. This tutorial focuses on mixed methods approaches to …