Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

I Triguero, D García‐Gil, J Maillo… - … : Data Mining and …, 2019 - Wiley Online Library
The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining
technique. The main drawback of this technique appears when massive amounts of data …

A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges

JL Suárez, S García, F Herrera - Neurocomputing, 2021 - Elsevier
Distance metric learning is a branch of machine learning that aims to learn distances from
the data, which enhances the performance of similarity-based algorithms. This tutorial …

A survey on semi-supervised learning

JE Van Engelen, HH Hoos - Machine learning, 2020 - Springer
Semi-supervised learning is the branch of machine learning concerned with using labelled
as well as unlabelled data to perform certain learning tasks. Conceptually situated between …

A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and …

S González, S García, J Del Ser, L Rokach, F Herrera - Information Fusion, 2020 - Elsevier
Ensembles, especially ensembles of decision trees, are one of the most popular and
successful techniques in machine learning. Recently, the number of ensemble-based …

Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey

G Nguyen, S Dlugolinsky, M Bobák, V Tran… - Artificial Intelligence …, 2019 - Springer
The combined impact of new computing resources and techniques with an increasing
avalanche of large datasets, is transforming many research areas and may lead to …

SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors

A Zhang, H Yu, Z Huan, X Yang, S Zheng, S Gao - Information Sciences, 2022 - Elsevier
In recent years, class imbalance learning (CIL) has become an important branch of machine
learning. The Synthetic Minority Oversampling TEchnique (SMOTE) is considered to be a …

Handling data irregularities in classification: Foundations, trends, and future challenges

S Das, S Datta, BB Chaudhuri - Pattern Recognition, 2018 - Elsevier
Most of the traditional pattern classifiers assume their input data to be well-behaved in terms
of similar underlying class distributions, balanced size of classes, the presence of a full set of …

A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research

MS Santos, PH Abreu, N Japkowicz, A Fernández… - Information …, 2023 - Elsevier
The combination of class imbalance and overlap is currently one of the most challenging
issues in machine learning. While seminal work focused on establishing class overlap as a …

Data sampling methods to deal with the big data multi-class imbalance problem

E Rendon, R Alejo, C Castorena, FJ Isidro-Ortega… - Applied Sciences, 2020 - mdpi.com
The class imbalance problem has been a hot topic in the machine learning community in
recent years. Nowadays, in the time of big data and deep learning, this problem remains in …

Dynamic ensemble selection for multi-class imbalanced datasets

S García, ZL Zhang, A Altalhi, S Alshomrani… - Information Sciences, 2018 - Elsevier
Many real-world classification tasks suffer from the class imbalanced problem, in which
some classes are highly underrepresented as compared to other classes. In this paper, we …