Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

I Triguero, S García, F Herrera - Knowledge and Information systems, 2015 - Springer
Semi-supervised classification methods are suitable tools to tackle training sets with large
amounts of unlabeled data and a small quantity of labeled data. This problem has been …

[PDF][PDF] AdaBoost 算法研究进展与展望

曹莹, 苗启广, 刘家辰, 高琳 - 自动化学报, 2013 - aas.net.cn
摘要AdaBoost 是最优秀的Boosting 算法之一, 有着坚实的理论基础, 在实践中得到了很好的
推广和应用. 算法能够将比随机猜测略好的弱分类器提升为分类精度高的**分类器 …

Big data: New tricks for econometrics

HR Varian - Journal of economic perspectives, 2014 - aeaweb.org
Computers are now involved in many economic transactions and can capture data
associated with these transactions, which can then be manipulated and analyzed …

kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

J Maillo, S Ramírez, I Triguero, F Herrera - Knowledge-Based Systems, 2017 - Elsevier
Abstract The k-Nearest Neighbors classifier is a simple yet effective widely renowned
method in data mining. The actual application of this model in the big data domain is not …

Web-scale k-means clustering

D Sculley - Proceedings of the 19th international conference on …, 2010 - dl.acm.org
We present two modifications to the popular k-means clustering algorithm to address the
extreme requirements for latency, scalability, and sparsity encountered in user-facing web …

On tackling explanation redundancy in decision trees

Y Izza, A Ignatiev, J Marques-Silva - Journal of Artificial Intelligence …, 2022 - jair.org
Decision trees (DTs) epitomize the ideal of interpretability of machine learning (ML) models.
The interpretability of decision trees motivates explainability approaches by so-called …

Advance and prospects of AdaBoost algorithm

C Ying, M Qi-Guang, L Jia-Chen, G Lin - Acta Automatica Sinica, 2013 - Elsevier
AdaBoost is one of the most excellent Boosting algorithms. It has a solid theoretical basis
and has made great success in practical applications. AdaBoost can boost a weak learning …

Data mining: practical machine learning tools and techniques with Java implementations

IH Witten, E Frank - Acm Sigmod Record, 2002 - dl.acm.org
Witten and Frank's textbook was one of two books that 1 used for a data mining class in the
Fall of 2001. The book covers all major methods of data mining that produce a knowledge …

A benchmark study on time series clustering

A Javed, BS Lee, DM Rizzo - Machine Learning with Applications, 2020 - Elsevier
This paper presents the first time series clustering benchmark utilizing all time series
datasets currently available in the University of California Riverside (UCR) archive—the …