Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications

RK Halder, MN Uddin, MA Uddin, S Aryal, A Khraisat - Journal of Big Data, 2024 - Springer
Abstract The k-Nearest Neighbors (kNN) method, established in 1951, has since evolved
into a pivotal tool in data mining, recommendation systems, and Internet of Things (IoT) …

Outlier detection: Methods, models, and classification

A Boukerche, L Zheng, O Alfandi - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Over the past decade, we have witnessed an enormous amount of research effort dedicated
to the design of efficient outlier detection techniques while taking into consideration …

Survey on exact knn queries over high-dimensional data space

N Ukey, Z Yang, B Li, G Zhang, Y Hu, W Zhang - Sensors, 2023 - mdpi.com
k nearest neighbours (kNN) queries are fundamental in many applications, ranging from
data mining, recommendation system and Internet of Things, to Industry 4.0 framework …

kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data

J Maillo, S Ramírez, I Triguero, F Herrera - Knowledge-Based Systems, 2017 - Elsevier
Abstract The k-Nearest Neighbors classifier is a simple yet effective widely renowned
method in data mining. The actual application of this model in the big data domain is not …

Spatialhadoop: A mapreduce framework for spatial data

A Eldawy, MF Mokbel - 2015 IEEE 31st international …, 2015 - ieeexplore.ieee.org
This paper describes SpatialHadoop; a full-fledged MapReduce framework with native
support for spatial data. SpatialHadoop is a comprehensive extension to Hadoop that injects …

[PDF][PDF] 大数据管理: 概念, 技术与挑战

孟小峰, 慈祥 - 2013 - idke.ruc.edu.cn
大数据管理:概念,技术与挑战 Page 1 大数据管理:概念,技术与挑战 孟小峰慈祥 (**人民大学信息
学院北京100872) Big Data Management: Concepts, Techniques and Challenges Meng …

Simba: Efficient in-memory spatial analytics

D **e, F Li, B Yao, G Li, L Zhou, M Guo - Proceedings of the 2016 …, 2016 - dl.acm.org
Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and
high-throughput spatial queries and analytics for numerous applications in location-based …

Locationspark: A distributed in-memory data management system for big spatial data

M Tang, Y Yu, QM Malluhi, M Ouzzani… - Proceedings of the VLDB …, 2016 - dl.acm.org
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a
widely used distributed data processing system. LocationSpark offers a rich set of spatial …

A new K-nearest neighbors classifier for big data based on efficient data pruning

H Saadatfar, S Khosravi, JH Joloudari, A Mosavi… - Mathematics, 2020 - mdpi.com
The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric
classification method. However, like other traditional data mining methods, applying it on big …

[HTML][HTML] Fast semistochastic heat-bath configuration interaction

J Li, M Otten, AA Holmes, S Sharma… - The Journal of chemical …, 2018 - pubs.aip.org
This paper presents in detail our fast semistochastic heat-bath configuration interaction
(SHCI) method for solving the many-body Schrödinger equation. We identify and eliminate …