Big data analytics: a literature review

D Chong, H Shi - Journal of Management Analytics, 2015 - Taylor & Francis
With more and more data generated, it has become a big challenge for traditional
architectures and infrastructures to process large amounts of data within an acceptable time …

Bluedbm: An appliance for big data analytics

SW Jun, M Liu, S Lee, J Hicks, J Ankcorn… - ACM SIGARCH …, 2015 - dl.acm.org
Complex data queries, because of their need for random accesses, have proven to be slow
unless all the data can be accommodated in DRAM. There are many domains, such as …

Accelerating spark with RDMA for big data processing: Early experiences

X Lu, MWU Rahman, N Islam… - 2014 IEEE 22nd …, 2014 - ieeexplore.ieee.org
Apache Hadoop Map Reduce has been highly successful in processing large-scale, data-
intensive batch applications on commodity clusters. However, for low-latency interactive …

Toward modeling and optimization of features selection in Big Data based social Internet of Things

A Ahmad, M Khan, A Paul, S Din, MM Rathore… - Future Generation …, 2018 - Elsevier
The growing gap between users and the Big Data analytics requires innovative tools that
address the challenges faced by big data volume, variety, and velocity. Therefore, it …

High-performance design of apache spark with RDMA and its benefits on various workloads

X Lu, D Shankar, S Gugnani… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
The in-memory data processing framework, Apache Spark, has been stealing the limelight
for low-latency interactive applications, iterative and batch computations. Our early …

xCCL: A survey of industry-led collective communication libraries for deep learning

A Weingram, Y Li, H Qi, D Ng, L Dai, X Lu - Journal of Computer Science …, 2023 - Springer
Abstract Machine learning techniques have become ubiquitous both in industry and
academic applications. Increasing model sizes and training data volumes necessitate fast …

High-performance design of YARN MapReduce on modern HPC clusters with Lustre and RDMA

M Wasi-ur-Rahman, X Lu, NS Islam… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
The viability and benefits of running MapReduce over modern High Performance Computing
(HPC) clusters, with high performance interconnects and parallel file systems, have attracted …

Bluedbm: Distributed flash storage for big data analytics

SW Jun, M Liu, S Lee, J Hicks, J Ankcorn… - ACM Transactions on …, 2016 - dl.acm.org
Complex data queries, because of their need for random accesses, have proven to be slow
unless all the data can be accommodated in DRAM. There are many domains, such as …

Topic modeling and visualization for big data in social sciences

N Sukhija, M Tatineni, N Brown… - 2016 Intl IEEE …, 2016 - ieeexplore.ieee.org
Topic modeling is a widely used approach for analyzing large text collections. In particular,
Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling approaches to …

Leveraging adaptive I/O to optimize collective data shuffling patterns for big data analytics

B Nicolae, CHA Costa, C Misale… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Big data analytics is an indispensable tool in transforming science, engineering, medicine,
health-care, finance and ultimately business itself. With the explosion of data sizes and need …