Dealing with noise problem in machine learning data-sets: A systematic review

S Gupta, A Gupta - Procedia Computer Science, 2019 - Elsevier
The occurrences of noisy data in data set can significantly impact prediction of any
meaningful information. Many empirical studies have shown that noise in data set …

How complex is your classification problem? a survey on measuring classification complexity

AC Lorena, LPF Garcia, J Lehmann… - ACM Computing …, 2019 - dl.acm.org
Characteristics extracted from the training datasets of classification problems have proven to
be effective predictors in a number of meta-analyses. Among them, measures of …

Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

I Triguero, D García‐Gil, J Maillo… - … : Data Mining and …, 2019 - Wiley Online Library
The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining
technique. The main drawback of this technique appears when massive amounts of data …

Dynamic selection of normalization techniques using data complexity measures

S Jain, S Shukla, R Wadhvani - Expert Systems with Applications, 2018 - Elsevier
Data preprocessing is an important step for designing classification model. Normalization is
one of the preprocessing techniques used to handle the out-of-bounds attributes. This work …

Effect of training class label noise on classification performances for land cover map** with satellite image time series

C Pelletier, S Valero, J Inglada, N Champion… - Remote Sensing, 2017 - mdpi.com
Supervised classification systems used for land cover map** require accurate reference
databases. These reference data come generally from different sources such as field …

Machine learning algorithms for smart data analysis in internet of things environment: taxonomies and research trends

MH Alsharif, AH Kelechi, K Yahya, SA Chaudhry - Symmetry, 2020 - mdpi.com
Machine learning techniques will contribution towards making Internet of Things (IoT)
symmetric applications among the most significant sources of new data in the future. In this …

Combined cleaning and resampling algorithm for multi-class imbalanced data with label noise

M Koziarski, M Woźniak, B Krawczyk - Knowledge-Based Systems, 2020 - Elsevier
The imbalanced data classification is one of the most crucial tasks facing modern data
analysis. Especially when combined with other difficulty factors, such as the presence of …

Application of hybrid artificial neural networks for predicting rate of penetration (ROP): A case study from Marun oil field

SB Ashrafi, M Anemangely, M Sabah… - Journal of petroleum …, 2019 - Elsevier
Rate of Penetration (ROP) can be considered as a crucial factor in optimization and cost
minimization of drilling operations. In order to predict ROP with satisfactory precision, some …

The impact of inconsistent human annotations on AI driven clinical decision making

A Sylolypavan, D Sleeman, H Wu, M Sim - NPJ Digital Medicine, 2023 - nature.com
In supervised learning model development, domain experts are often used to provide the
class labels (annotations). Annotation inconsistencies commonly occur when even highly …

Active learning for network traffic classification: a technical study

A Shahraki, M Abbasi, A Taherkordi… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Network Traffic Classification (NTC) has become an important feature in various network
management operations, eg, Quality of Service (QoS) provisioning and security services …