Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data
The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining
technique. The main drawback of this technique appears when massive amounts of data …
technique. The main drawback of this technique appears when massive amounts of data …
A tutorial on distance metric learning: Mathematical foundations, algorithms, experimental analysis, prospects and challenges
Distance metric learning is a branch of machine learning that aims to learn distances from
the data, which enhances the performance of similarity-based algorithms. This tutorial …
the data, which enhances the performance of similarity-based algorithms. This tutorial …
A survey on semi-supervised learning
Semi-supervised learning is the branch of machine learning concerned with using labelled
as well as unlabelled data to perform certain learning tasks. Conceptually situated between …
as well as unlabelled data to perform certain learning tasks. Conceptually situated between …
A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and …
Ensembles, especially ensembles of decision trees, are one of the most popular and
successful techniques in machine learning. Recently, the number of ensemble-based …
successful techniques in machine learning. Recently, the number of ensemble-based …
Machine learning and deep learning frameworks and libraries for large-scale data mining: a survey
The combined impact of new computing resources and techniques with an increasing
avalanche of large datasets, is transforming many research areas and may lead to …
avalanche of large datasets, is transforming many research areas and may lead to …
SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors
A Zhang, H Yu, Z Huan, X Yang, S Zheng, S Gao - Information Sciences, 2022 - Elsevier
In recent years, class imbalance learning (CIL) has become an important branch of machine
learning. The Synthetic Minority Oversampling TEchnique (SMOTE) is considered to be a …
learning. The Synthetic Minority Oversampling TEchnique (SMOTE) is considered to be a …
Handling data irregularities in classification: Foundations, trends, and future challenges
Most of the traditional pattern classifiers assume their input data to be well-behaved in terms
of similar underlying class distributions, balanced size of classes, the presence of a full set of …
of similar underlying class distributions, balanced size of classes, the presence of a full set of …
A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research
The combination of class imbalance and overlap is currently one of the most challenging
issues in machine learning. While seminal work focused on establishing class overlap as a …
issues in machine learning. While seminal work focused on establishing class overlap as a …
Data sampling methods to deal with the big data multi-class imbalance problem
The class imbalance problem has been a hot topic in the machine learning community in
recent years. Nowadays, in the time of big data and deep learning, this problem remains in …
recent years. Nowadays, in the time of big data and deep learning, this problem remains in …
Dynamic ensemble selection for multi-class imbalanced datasets
Many real-world classification tasks suffer from the class imbalanced problem, in which
some classes are highly underrepresented as compared to other classes. In this paper, we …
some classes are highly underrepresented as compared to other classes. In this paper, we …