Distributed data management using MapReduce

F Li, BC Ooi, MT Özsu, S Wu - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …

Query optimization for massively parallel data processing

S Wu, F Li, S Mehrotra, BC Ooi - … of the 2nd ACM Symposium on Cloud …, 2011 - dl.acm.org
MapReduce has been widely recognized as an efficient tool for large-scale data analysis. It
achieves high performance by exploiting parallelism among processing nodes while …

Only aggressive elephants are fast elephants

J Dittrich, JA Quiané-Ruiz, S Richter, S Schuh… - arxiv preprint arxiv …, 2012 - arxiv.org
Yellow elephants are slow. A major reason is that they consume their inputs entirely before
responding to an elephant rider's orders. Some clever riders have trained their yellow …

Differentiated secondary index maintenance in log structured NoSQL data stores

W Tan, S Tata - US Patent 9,218,383, 2015 - Google Patents
US9218383B2 - Differentiated secondary index maintenance in log structured NoSQL data
stores - Google Patents US9218383B2 - Differentiated secondary index maintenance in log …

TraceTracker: Hardware/software co-evaluation for large-scale I/O workload reconstruction

M Kwon, J Zhang, G Park, W Choi… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Block traces are widely used for system studies, model verifications, and design analyses in
both industry and academia. While such traces include detailed block access patterns …

A MapReduce-based scalable discovery and indexing of structured big data

H Singh, S Bawa - Future generation computer systems, 2017 - Elsevier
Various methods and techniques have been proposed in past for improving performance of
queries on structured and unstructured data. The paper proposes a parallel B-Tree index in …

Scalagist: Scalable generalized search trees for mapreduce systems [innovative systems paper]

P Lu, G Chen, BC Ooi, HT Vo, S Wu - Proceedings of the VLDB …, 2014 - dl.acm.org
MapReduce has become the state-of-the-art for data parallel processing. Nevertheless,
Hadoop, an open-source equivalent of MapReduce, has been noted to have sub-optimal …

Differentiated secondary index maintenance in log structured NoSQL data stores

W Tan, S Tata - US Patent 9,218,385, 2015 - Google Patents
There are provided a system and a computer program product for operating multi-node data
stores. The system stores a data table in a first computing node and stores an index table in …

[PDF][PDF] Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores.

W Tan, S Tata, YR Tang, LL Fong - EDBT, 2014 - tristartom.github.io
ABSTRACT Log-Structured-Merge (LSM) Tree gains much attention recently because of its
superior performance in write-intensive workloads. LSM Tree uses an append-only structure …

Differentiated secondary index maintenance in log structured NoSQL data stores

W Tan, S Tata - US Patent 10,078,682, 2018 - Google Patents
There are provided a system and a computer program product for operating multi-node data
stores. The system stores a data table in a first computing node and stores an index table in …