Big data preprocessing: methods and prospects

S García, S Ramírez-Gallego, J Luengo, JM Benítez… - Big data analytics, 2016 - Springer
The massive growth in the scale of data has been observed in recent years being a key
factor of the Big Data scenario. Big Data can be defined as high volume, velocity and variety …

Pattern classification with missing data: a review

PJ García-Laencina, JL Sancho-Gómez… - Neural Computing and …, 2010 - Springer
Pattern classification has been successfully applied in many problem domains, such as
biometric recognition, document classification or medical diagnosis. Missing or unknown …

PAPILA: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment

O Kovalyk, J Morales-Sánchez, R Verdú-Monedero… - Scientific Data, 2022 - nature.com
Glaucoma is one of the ophthalmological diseases that frequently causes loss of vision in
today's society. Previous studies assess which anatomical parameters of the optic nerve can …

Handling data irregularities in classification: Foundations, trends, and future challenges

S Das, S Datta, BB Chaudhuri - Pattern Recognition, 2018 - Elsevier
Most of the traditional pattern classifiers assume their input data to be well-behaved in terms
of similar underlying class distributions, balanced size of classes, the presence of a full set of …

[KNIHA][B] Dimensionality reduction with unsupervised nearest neighbors

O Kramer - 2013 - Springer
The growing information infrastructure in a variety of disciplines involves an increasing
requirement for efficient data mining techniques. Fast dimensionality reduction methods are …

Estimating conversion rate in display advertising from past erformance data

K Lee, B Orten, A Dasdan, W Li - Proceedings of the 18th ACM SIGKDD …, 2012 - dl.acm.org
In targeted display advertising, the goal is to identify the best opportunities to display a
banner ad to an online user who is most likely to take a desired action such as purchasing a …

Hybrid prediction model with missing value imputation for medical data

A Purwar, SK Singh - Expert Systems with Applications, 2015 - Elsevier
Accurate prediction in the presence of large number of missing values in the data set has
always been a challenging problem. Most of hybrid models to address this challenge have …

Anchors bring ease: An embarrassingly simple approach to partial multi-view clustering

J Guo, J Ye - Proceedings of the AAAI conference on artificial …, 2019 - aaai.org
Clustering on multi-view data has attracted much more attention in the past decades. Most
previous studies assume that each instance appears in all views, or there is at least one …

Identifying depression in the National Health and Nutrition Examination Survey data using a deep learning algorithm

J Oh, K Yun, U Maoz, TS Kim, JH Chae - Journal of affective disorders, 2019 - Elsevier
Background As depression is the leading cause of disability worldwide, large-scale surveys
have been conducted to establish the occurrence and risk factors of depression. However …

Network-based high level data classification

TC Silva, L Zhao - IEEE Transactions on Neural Networks and …, 2012 - ieeexplore.ieee.org
Traditional supervised data classification considers only physical features (eg, distance or
similarity) of the input data. Here, this type of learning is called low level classification. On …