High performance design for HDFS with byte-addressability of NVM and RDMA

NS Islam, M Wasi-ur-Rahman, X Lu… - Proceedings of the 2016 …, 2016‏ - dl.acm.org
Non-Volatile Memory (NVM) offers byte-addressability with DRAM like performance along
with persistence. Thus, NVMs provide the opportunity to build high-throughput storage …

A survey on data-driven performance tuning for big data analytics platforms

RLC Costa, J Moreira, P Pintor, V dos Santos… - Big Data Research, 2021‏ - Elsevier
Many research works deal with big data platforms looking forward to data science and
analytics. These are complex and usually distributed environments, composed of several …

On efficient hierarchical storage for big data processing

KR Krish, B Wadhwa, MS Iqbal… - 2016 16th IEEE/ACM …, 2016‏ - ieeexplore.ieee.org
A promising trend in storage management for big data frameworks, such as Hadoop and
Spark, is the emergence of heterogeneous and hybrid storage systems that employ different …

Storage-tag-aware scheduler for hadoop cluster

NMF Qureshi, DR Shin, IF Siddiqui… - IEEE Access, 2017‏ - ieeexplore.ieee.org
Big data analytics has simplified the processing complexity of extremely large data sets
through ecosystems, such as Hadoop, MapR, and Cloudera. Apache Hadoop is an open …

The research and analysis of efficiency of hardware usage base on HDFS

Y Liu, X Zhang, B Liu, X Zhao - Cluster Computing, 2022‏ - Springer
Abstract HDFS (Hadoop Distributed File System), as a part of data stored in the Hadoop
ecosystem, provides read and write interfaces for many upper-level applications. The …

Hadoop MapReduce performance on SSDs for analyzing social networks

M Bakratsas, P Basaras, D Katsaros, L Tassiulas - Big data research, 2018‏ - Elsevier
Abstract The advent of Solid State Drives (SSDs) stimulated a lot of research to investigate
and exploit to the extent possible the potentials of the new drive. The focus of this work is on …

Early experience with optimizing I/O performance using high-performance SSDs for in-memory cluster computing

IS Choi, W Yang, YS Kee - … Conference on Big Data (Big Data), 2015‏ - ieeexplore.ieee.org
This paper describes our experience with storage optimization that utilizes cost-effective
PCIe solid-state drives (SSDs) to improve the overall performance of a Spark framework. A …

Selective I/O bypass and load balancing method for write-through SSD caching in big data analytics

J Kim, H Roh, S Park - IEEE Transactions on Computers, 2017‏ - ieeexplore.ieee.org
Fast network quality analysis in the telecom industry is an important method used to provide
quality service. SK Telecom, based in South Korea, built a Hadoop-based analytical system …

Performance tuning analysis of spatial operations on Spatial Hadoop cluster with SSD

P Auradkar, T Prashanth, S Aralihalli, SP Kumar… - Procedia Computer …, 2020‏ - Elsevier
Abstract Solid State Drives have shown promising results in improving the performance of
MapReduce jobs. Previous work in the usage of SSD consisted of evaluation of its …

ConeSSD: a novel policy to optimize the performance of HDFS heterogeneous storage

X Zhang, L Wang, Z Huang, H **e… - 2022 IEEE 24th Int …, 2022‏ - ieeexplore.ieee.org
HDFS (Hadoop distributed file system) is the core storage service of Hadoop, which stores
and processes large datasets efficiently. Therefore, the performance of storage services …