On the joint-effect of class imbalance and overlap: a critical review

MS Santos, PH Abreu, N Japkowicz… - Artificial Intelligence …, 2022 - Springer
Current research on imbalanced data recognises that class imbalance is aggravated by
other data intrinsic characteristics, among which class overlap stands out as one of the most …

How complex is your classification problem? a survey on measuring classification complexity

AC Lorena, LPF Garcia, J Lehmann… - ACM Computing …, 2019 - dl.acm.org
Characteristics extracted from the training datasets of classification problems have proven to
be effective predictors in a number of meta-analyses. Among them, measures of …

A review of microarray datasets and applied feature selection methods

V Bolón-Canedo, N Sánchez-Marono… - Information …, 2014 - Elsevier
Microarray data classification is a difficult challenge for machine learning researchers due to
its high number of features and the small sample sizes. Feature selection has been soon …

[HTML][HTML] Microarray cancer feature selection: Review, challenges and research directions

MA Hambali, TO Oladele, KS Adewole - International Journal of Cognitive …, 2020 - Elsevier
Microarray technology has become an emerging trend in the domain of genetic research in
which many researchers employ to study and investigate the levels of genes' expression in a …

Impact of missing data imputation methods on gene expression clustering and classification

MCP De Souto, PA Jaskowiak, IG Costa - BMC bioinformatics, 2015 - Springer
Background Several missing value imputation methods for gene expression data have been
proposed in the literature. In the past few years, researchers have been putting a great deal …

Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification

JA Sáez, J Luengo, F Herrera - Pattern Recognition, 2013 - Elsevier
Classifier performance, particularly of instance-based learners such as k-nearest neighbors,
is affected by the presence of noisy data. Noise filters are traditionally employed to remove …

Centralized vs. distributed feature selection methods based on data complexity measures

L Morán-Fernández, V Bolón-Canedo… - Knowledge-Based …, 2017 - Elsevier
In the era of Big Data, many datasets have a common characteristic, the large number of
features. As a result, selecting the relevant features and ignoring the irrelevant and …

A framework model using multifilter feature selection to enhance colon cancer classification

M Al-Rajab, J Lu, Q Xu - Plos one, 2021 - journals.plos.org
Gene expression profiles can be utilized in the diagnosis of critical diseases such as cancer.
The selection of biomarker genes from these profiles is significant and crucial for cancer …

Redundancy and complexity metrics for big data classification: towards smart data

J Maillo, I Triguero, F Herrera - IEEE Access, 2020 - ieeexplore.ieee.org
It is recognized the importance of knowing the descriptive properties of a dataset when
tackling a data science problem. Having information about the redundancy, complexity and …

A novel ECOC algorithm for multiclass microarray data classification based on data complexity analysis

MX Sun, KH Liu, QQ Wu, QQ Hong, BZ Wang… - Pattern Recognition, 2019 - Elsevier
Nowadays, a lot of new classification and clustering techniques have been proposed for
microarray data analysis. However, the multiclass microarray data classification is still …