Parallelizing user-defined aggregations using symbolic execution

V Raychev, M Musuvathi, T Mytkowicz - Proceedings of the 25th …, 2015 - dl.acm.org
User-defined aggregations (UDAs) are integral to large-scale data-processing systems,
such as MapReduce and Hadoop, because they let programmers express application …

DiffStream: differential output testing for stream processing programs

K Kallas, F Niksic, C Stanford, R Alur - Proceedings of the ACM on …, 2020 - dl.acm.org
High performance architectures for processing distributed data streams, such as Flink, Spark
Streaming, and Storm, are increasingly deployed in emerging data-driven computing …

An executable sequential specification for Spark aggregation

YF Chen, CD Hong, O Lengál, SC Mu, N Sinha… - Networked Systems: 5th …, 2017 - Springer
Spark is a new promising platform for scalable data-parallel computation. It provides several
high-level application programming interfaces (APIs) to perform parallel data aggregation …

Safe Programming over Distributed Streams

C Stanford - 2022 - search.proquest.com
The sheer scale of today's data processing needs has led to a new paradigm of software
systems centered around requirements for high-throughput, distributed, low-latency …

A method of test case set generation in the commutativity test of reduce functions

X Mu, L Liu, P Zhang, J Li, H Li - Science of Computer Programming, 2024 - Elsevier
MapReduce framework has become one of the more popular big data processing
frameworks. In the MapReduce framework, the test of the commutativity problem of the …

Parallel Execution of Order Dependent Grou** Functions

M Peters - 2024 - edoc.hu-berlin.de
Der exponentielle Anstieg elektronisch gespeicherter Daten erfordert leistungsfähige
Systeme zur Verarbeitung und Analyse großer Datenmengen. Parallel relationale …

Symmetric and Asymmetric Aggregate Function in Massively Parallel Computing

C Zhang, F Toumani, E Gangler - 2017 - uca.hal.science
Applications of aggregation for information summary have great meanings in various fields.
In big data era, processing aggregate function in parallel is drawing researchers' attention …

Testing Non-Commutativity of Reduce Functions with Multi-Column Inputs

X Mu, X Zhang, C Zhu, N Li, P Zhang, L Liu - Available at SSRN 5046879 - papers.ssrn.com
With the continuous development of the MapReduce programming model, it is necessary to
ensure the reliability of MapReduce programs. In practice, the non-commutativity of Reduce …

Testing Non-Commutativity of Reduce Functions with Multi-Column Inputs

X Zhang, C Zhu, N Li, P Zhang, L Liu - Available at SSRN 4904755 - papers.ssrn.com
With the continuous development of the MapReduce programming model, it is necessary to
ensure the reliability of MapReduce programs. In practice, the non-commutativity of Reduce …

SDPA: An Optimizer for Program Analysis of Data-Parallel Applications

F Wang, X Shi, D Yu, Z Ke, H **… - 2018 IEEE 20th …, 2018 - ieeexplore.ieee.org
Data-parallel applications have become prevalent due to the fast development of big data
technologies. The performances of these applications are obviously one of the most crucial …