Google 학술 검색

MD Ekstrand, A Das, R Burke… - Foundations and Trends …, 2022 - nowpublishers.com

Recommendation, information retrieval, and other information access systems pose unique
challenges for investigating and applying the fairness and non-discrimination concepts that …

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Scaling laws do not scale

F Diaz, M Madaio - Proceedings of the AAAI/ACM Conference on AI …, 2024 - ojs.aaai.org

Recent work has advocated for training AI models on ever-larger datasets, arguing that as
the size of a dataset increases, the performance of a model trained on that dataset will …

저장 인용 9회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Aligning offline metrics and human judgments of value for code generation models

V Dibia, A Fourney, G Bansal… - arxiv preprint arxiv …, 2022 - arxiv.org

Large language models have demonstrated great potential to assist programmers in
generating code. For such human-AI pair programming scenarios, we empirically …

저장 인용 11회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

Preference-based offline evaluation

CLA Clarke, F Diaz, N Arabzadeh - … on Web Search and Data Mining, 2023 - dl.acm.org

A core step in production model research and development involves the offline evaluation of
a system before production deployment. Traditional offline evaluation of search …

저장 인용 7회 인용 관련 학술자료

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Measuring commonality in recommendation of cultural content to strengthen cultural citizenship

A Ferraro, G Ferreira, F Diaz, G Born - ACM Transactions on …, 2024 - dl.acm.org

Recommender systems have become the dominant means of curating cultural content,
significantly influencing the nature of individual cultural experience. While the majority of …

저장 인용 3회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recall, robustness, and lexicographic evaluation

F Diaz, B Mitra - arxiv preprint arxiv:2302.11370, 2023 - arxiv.org

Although originally developed to evaluate sets of items, recall is often used to evaluate
rankings of items, including those produced by recommender, retrieval, and other machine …

저장 인용 3회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision

F Diaz - arxiv preprint arxiv:2306.07908, 2023 - arxiv.org

Across a variety of ranking tasks, researchers use reciprocal rank to measure the
effectiveness for users interested in exactly one relevant item. Despite its widespread use …

저장 인용 3회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Offline Evaluation of Set-Based Text-to-Image Generation

N Arabzadeh, F Diaz, J He - … of the 2024 Annual International ACM …, 2024 - dl.acm.org

Text-to-Image (TTI) systems often support people during ideation, the early stages of a
creative process when exposure to a broad set of relevant or partially relevant images can …

저장 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unified browsing models for linear and grid layouts

A Raj, M Ekstrand - arxiv preprint arxiv:2310.12524, 2023 - arxiv.org

Many information access systems operationalize their results in terms of rankings, which are
then displayed to users in various ranking layouts such as linear lists or grids. User …

저장 인용 1회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

Mixed method development of evaluation metrics

B St. Thomas, P Chandar, C Hosey, F Diaz - Proceedings of the 27th …, 2021 - dl.acm.org

Designers of online search and recommendation services often need to develop metrics to
assess system performance. This tutorial focuses on mixed methods approaches to …

저장 인용 4회 인용 관련 학술자료 전체 2개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Beyond accuracy: Grounding evaluation metrics for human-machine learning systems

Fairness in information access systems

Scaling laws do not scale

Aligning offline metrics and human judgments of value for code generation models

Preference-based offline evaluation

Measuring commonality in recommendation of cultural content to strengthen cultural citizenship

Recall, robustness, and lexicographic evaluation

Best-Case Retrieval Evaluation: Improving the Sensitivity of Reciprocal Rank with Lexicographic Precision

Offline Evaluation of Set-Based Text-to-Image Generation

Unified browsing models for linear and grid layouts

Mixed method development of evaluation metrics