Distributed data management using MapReduce
MapReduce is a framework for processing and managing large-scale datasets in a
distributed cluster, which has been used for applications such as generating search indexes …
distributed cluster, which has been used for applications such as generating search indexes …
A comprehensive view of Hadoop research—A systematic literature review
Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale
datasets–known as Big Data–led to the development of solutions to process information …
datasets–known as Big Data–led to the development of solutions to process information …
Making sense of performance in data analytics frameworks
There has been much research devoted to improving the performance of data analytics
frameworks, but comparatively little effort has been spent systematically identifying the …
frameworks, but comparatively little effort has been spent systematically identifying the …
Sprocket: A serverless video processing framework
Sprocket is a highly configurable, stage-based, scalable, serverless video processing
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …
[책][B] Magellan: Toward building entity matching management systems
PV Konda - 2018 - search.proquest.com
Entity matching (EM) identifies data instances that refer to the same real-world entity, such
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
Neural acceleration for general-purpose approximate programs
This paper describes a learning-based approach to the acceleration of approximate
programs. We describe the Parrot transformation, a program transformation that selects and …
programs. We describe the Parrot transformation, a program transformation that selects and …
Shark: SQL and rich analytics at scale
Shark is a new data analysis system that marries query processing with complex analytics
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …
Communication steps for parallel query processing
We study the problem of computing conjunctive queries over large databases on parallel
architectures without shared storage. Using the structure of such a query q and the skew in …
architectures without shared storage. Using the structure of such a query q and the skew in …
Locationspark: A distributed in-memory data management system for big spatial data
We present LocationSpark, a spatial data processing system built on top of Apache Spark, a
widely used distributed data processing system. LocationSpark offers a rich set of spatial …
widely used distributed data processing system. LocationSpark offers a rich set of spatial …
Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows
Streaming computations are by nature long-running, and their workloads can change in
unpredictable ways. This in turn means that maintaining performance may require …
unpredictable ways. This in turn means that maintaining performance may require …