A comprehensive performance analysis of Apache Hadoop and Apache Spark for large scale data sets using HiBench

N Ahmed, ALC Barczak, T Susnjak, MA Rashid - Journal of Big Data, 2020 - Springer
Big Data analytics for storing, processing, and analyzing large-scale datasets has become
an essential tool for the industry. The advent of distributed computing frameworks such as …

[PDF][PDF] A parallel grid optimization of SVM hyperparameter for big data classification using spark Radoop

AH Ali, MZ Abdullah - Karbala International Journal of Modern Science, 2020 - iasj.net
The big data phenomenon is currently a challenge to the process of relevant knowledge
extraction using classical machine learning technique. This is due to the need for efficient …

Compiling data-parallel datalog

T Gilray, S Kumar, K Micinski - Proceedings of the 30th ACM SIGPLAN …, 2021 - dl.acm.org
Datalog allows intuitive declarative specification of logical inference tasks while enjoying
efficient implementation via state-of-the-art engines such as LogicBlox and Soufflé. These …

The Parallel Fuzzy C-Median Clustering Algorithm Using Spark for the Big Data

MA Mallik, NF Zulkurnain, S Siddiqui, R Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org
Big data for sustainable development is a global issue due to the explosive growth of data
and according to the forecasting of International Data Corporation (IDC), the amount of data …

An Earlier Experiences Towards Optimizing Apache Spark Over Frontera Supercomputer

S Bernardo, A Ruhela, J Cazes, SL Harrell… - … Conference on High …, 2023 - Springer
Apache Spark has become a very popular computing engine that allows distributing
computing tasks on a compute cluster. However, the current approaches lack necessary …

[PDF][PDF] Containerization vs Bare Metal: distributed computing performance using Apache Spark

ΜΕ Τσαρμποπούλου - 2024 - dspace.lib.ntua.gr
This research explores the performance trade-offs between containerized and bare metal
environments for running Apache Spark applications, specifically focusing on incident …

Big Data and machine learning to improve medical monitoring and remote monitoring

AKG Escamilla - 2020 - theses.hal.science
In order to improve heart diseases care, and heart failure disease more specifically,
particularly for patients with NYHA (New-York Heart Association) stage III/IV, the most …