The stratosphere platform for big data analytics

A Alexandrov, R Bergmann, S Ewen, JC Freytag… - The VLDB Journal, 2014 - Springer
We present Stratosphere, an open-source software stack for parallel data analysis.
Stratosphere brings together a unique set of features that allow the expressive, easy, and …

Spark versus flink: Understanding performance in big data analytics frameworks

OC Marcu, A Costan, G Antoniu… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
Big Data analytics has recently gained increasing popularity as a tool to process large
amounts of data on-demand. Spark and Flink are two Apache-hosted data analytics …

[HTML][HTML] Partition based clustering of large datasets using MapReduce framework: An analysis of recent themes and directions

TH Sardar, Z Ansari - Future Computing and Informatics Journal, 2018 - Elsevier
Data clustering is one of the fundamental techniques in scientific analysis and data mining,
which describes a dataset according to similarities among its objects. Partition based …

Performance comparison of OpenMP, MPI, and MapReduce in practical problems

SJ Kang, SY Lee, KM Lee - Advances in Multimedia, 2015 - Wiley Online Library
With problem size and complexity increasing, several parallel and distributed programming
models and frameworks have been developed to efficiently handle such problems. This …

Big data analytics on modern hardware architectures: A technology survey

M Saecker, V Markl - … : Second European Summer School, eBISS 2012 …, 2013 - Springer
Abstract Big Data Analytics has the goal to analyze massive datasets, which increasingly
occur in web-scale business intelligence problems. The common strategy to handle these …

XML2HBase: Storing and querying large collections of XML documents using a NoSQL database system

L Bao, J Yang, CQ Wu, H Qi, X Zhang, S Cai - Journal of Parallel and …, 2022 - Elsevier
Many big data applications such as smart transportation, healthcare, and e-commerce need
to store and query large collections of small XML documents, which has become a …

Nephele streaming: stream processing under QoS constraints at scale

B Lohrmann, D Warneke, O Kao - Cluster computing, 2014 - Springer
The ability to process large numbers of continuous data streams in a near-real-time fashion
has become a crucial prerequisite for many scientific and industrial use cases in recent …

Integrating open government data with stratosphere for more transparency

A Heise, F Naumann - Journal of Web Semantics, 2012 - Elsevier
Governments are increasingly publishing their data to enable organizations and citizens to
browse and analyze the data. However, the heterogeneity of this Open Government Data …

AMADA: web data repositories in the amazon cloud

A Aranda-Andújar, F Bugiotti… - Proceedings of the 21st …, 2012 - dl.acm.org
We present AMADA, a platform for storing Web data (in particular, XML documents and RDF
graphs) based on the Amazon Web Services (AWS) cloud infrastructure. AMADA operates in …

Towards an integrated platform for big data analysis

M Bohlouli, F Schulz, L Angelis, D Pahor… - Integration of practice …, 2013 - Springer
The amount of data in the world is expanding rapidly. Every day, huge amounts of data are
created by scientific experiments, companies, and end users' activities. These large data …