Spark sql: Relational data processing in spark
Spark SQL is a new module in Apache Spark that integrates relational processing with
Spark's functional programming API. Built on our experience with Shark, Spark SQL lets …
Spark's functional programming API. Built on our experience with Shark, Spark SQL lets …
Approximate query processing: No silver bullet
In this paper, we reflect on the state of the art of Approximate Query Processing. Although
much technical progress has been made in this area of research, we are yet to see its impact …
much technical progress has been made in this area of research, we are yet to see its impact …
Wander join: Online aggregation via random walks
Joins are expensive, and online aggregation over joins was proposed to mitigate the cost,
which offers users a nice and flexible tradeoff between query efficiency and accuracy in a …
which offers users a nice and flexible tradeoff between query efficiency and accuracy in a …
Approximate query processing: What is new and where to go? a survey on approximate query processing
Online analytical processing (OLAP) is a core functionality in database systems. The
performance of OLAP is crucial to make online decisions in many applications. However, it is …
performance of OLAP is crucial to make online decisions in many applications. However, it is …
Verdictdb: Universalizing approximate query processing
Despite 25 years of research in academia, approximate query processing (AQP) has had
little industrial adoption. One of the major causes of this slow adoption is the reluctance of …
little industrial adoption. One of the major causes of this slow adoption is the reluctance of …
Quickr: Lazily approximating complex adhoc queries in bigdata clusters
We present a system that approximates the answer to complex ad-hoc queries in big-data
clusters by injecting samplers on-the-fly and without requiring pre-existing samples …
clusters by injecting samplers on-the-fly and without requiring pre-existing samples …
Random sampling over joins revisited
Joins are expensive, especially on large data and/or multiple relations. One promising
approach in mitigating their high costs is to just return a simple random sample of the full join …
approach in mitigating their high costs is to just return a simple random sample of the full join …
Northstar: An interactive data science system
T Kraska - 2021 - dspace.mit.edu
© 2018 VLDB Endowment. In order to democratize data science, we need to fundamentally
rethink the current analytics stack, from the user interface to the “guts.“Most importantly …
rethink the current analytics stack, from the user interface to the “guts.“Most importantly …
Scaling spark in the real world: performance and usability
Apache Spark is one of the most widely used open source processing engines for big data,
with rich language-integrated APIs and a wide range of libraries. Over the past two years …
with rich language-integrated APIs and a wide range of libraries. Over the past two years …
A structured and scalable mechanism for test access to embedded reusable cores
EJ Marinissen, R Arendsen, G Bos… - … 1998 (IEEE Cat. No …, 1998 - ieeexplore.ieee.org
The main objective of core-based IC design is improvement of design efficiency and time-to-
market. In order to prevent test development from becoming the bottleneck in the entire …
market. In order to prevent test development from becoming the bottleneck in the entire …