Google Acadêmico

P Flach - Proceedings of the AAAI conference on artificial …, 2019 - aaai.org

This paper gives an overview of some ways in which our understanding of performance
evaluation measures for machine-learned classifiers has improved over the last twenty …

Salvar Citar Citado por 198 Artigos relacionados Todas as 8 versões Ver em HTML

[Free GPT-4]

[PDF] aclanthology.org

Evaluation examples are not equally informative: How should that change NLP leaderboards?

P Rodriguez, J Barrow, AM Hoyle… - Proceedings of the …, 2021 - aclanthology.org

Leaderboards are widely used in NLP and push the field forward. While leaderboards are a
straightforward ranking of NLP models, this simplicity can mask nuances in evaluation items …

Salvar Citar Citado por 81 Artigos relacionados Todas as 6 versões Ver em HTML

[Free GPT-4]

[PDF] mit.edu

Comparing Bayesian models of annotation

S Paun, B Carpenter, J Chamberlain, D Hovy… - Transactions of the …, 2018 - direct.mit.edu

The analysis of crowdsourced annotations in natural language processing is concerned with
identifying (1) gold standard labels,(2) annotator accuracies and biases, and (3) item …

Salvar Citar Citado por 123 Artigos relacionados Todas as 17 versões

[Free GPT-4]

[PDF] sciencedirect.com

Item response theory in AI: Analysing machine learning classifiers at the instance level

F Martínez-Plumed, RBC Prudêncio, A Martínez-Usó… - Artificial intelligence, 2019 - Elsevier

AI systems are usually evaluated on a range of problem instances and compared to other AI
systems that use different strategies. These instances are rarely independent. Machine …

Salvar Citar Citado por 129 Artigos relacionados Todas as 5 versões

[Free GPT-4]

[PDF] nature.com

The quest for the reliability of machine learning models in binary classification on tabular data

VC Araujo Santos, L Cardoso, R Alves - Scientific Reports, 2023 - nature.com

In this paper we explore the reliability of contexts of machine learning (ML) models. There
are several evaluation procedures commonly used to validate a model (precision, F1 Score …

Salvar Citar Citado por 9 Artigos relacionados Todas as 7 versões

Content Modeling in Smart Learning Environments: A systematic literature review

A Jiménez-Macías, PJ Muñoz-Merino… - Journal of Universal …, 2024 - search.proquest.com

Educational content has become a key element for improving the quality and effectiveness
of teaching. Many studies have been conducted on user and knowledge modeling using …

Salvar Citar Citado por 1 Artigos relacionados

[Free GPT-4]

[PDF] arxiv.org

Item response theory based ensemble in machine learning

Z Chen, H Ahn - International Journal of Automation and Computing, 2020 - Springer

In this article, we propose a novel probabilistic framework to improve the accuracy of a
weighted majority voting algorithm. In order to assign higher weights to the classifiers which …

Salvar Citar Citado por 49 Artigos relacionados Todas as 9 versões

[Free GPT-4]

[HTML] nih.gov

[HTML][HTML] Learning latent parameters without human response patterns: Item response theory with artificial crowds

JP Lalor, H Wu, H Yu - Proceedings of the Conference on Empirical …, 2019 - ncbi.nlm.nih.gov

Abstract Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable
information about model performance and behavior. Traditionally, IRT models are learned …

Salvar Citar Citado por 53 Artigos relacionados Todas as 7 versões

[Free GPT-4]

[PDF] iop.org Full View

Unveiling the robustness of machine learning families

R Fabra-Boluda, C Ferri… - Machine Learning …, 2024 - iopscience.iop.org

The evaluation of machine learning systems has typically been limited to performance
measures on clean and curated datasets, which may not accurately reflect their robustness …

Salvar Citar Citado por 3 Artigos relacionados Todas as 4 versões

[Free GPT-4]

[PDF] upv.es

Dual indicators to analyze ai benchmarks: Difficulty, discrimination, ability, and generality

F Martinez-Plumed… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

With the purpose of better analyzing the result of artificial intelligence (AI) benchmarks, we
present two indicators on the side of the AI problems, difficulty and discrimination, and two …

Salvar Citar Citado por 42 Artigos relacionados Todas as 2 versões

Criar alerta

Citar

Pesquisa avançada

Salvo em "Minha biblioteca"

Making sense of item response theory in machine learning

Performance evaluation in machine learning: the good, the bad, the ugly, and the way forward

Evaluation examples are not equally informative: How should that change NLP leaderboards?

Comparing Bayesian models of annotation

Item response theory in AI: Analysing machine learning classifiers at the instance level

The quest for the reliability of machine learning models in binary classification on tabular data

Content Modeling in Smart Learning Environments: A systematic literature review

Item response theory based ensemble in machine learning

[HTML][HTML] Learning latent parameters without human response patterns: Item response theory with artificial crowds

Unveiling the robustness of machine learning families

Dual indicators to analyze ai benchmarks: Difficulty, discrimination, ability, and generality