Parametric schema inference for massive JSON datasets
In recent years, JSON established itself as a very popular data format for representing
massive data collections. JSON data collections are usually schemaless. While this ensures …
massive data collections. JSON data collections are usually schemaless. While this ensures …
Parsing gigabytes of JSON per second
Abstract JavaScript Object Notation or JSON is a ubiquitous data exchange format on the
web. Ingesting JSON documents can become a performance bottleneck due to the sheer …
web. Ingesting JSON documents can become a performance bottleneck due to the sheer …
Filter before you parse: Faster analytics on raw data with sparser
Exploratory big data applications often run on raw unstructured or semi-structured data
formats, such as JSON files or text logs. These applications can spend 80--90% of their …
formats, such as JSON files or text logs. These applications can spend 80--90% of their …
A case study of {Processing-in-Memory} in {off-the-Shelf} systems
We evaluate a new processing-in-memory (PIM) architecture from UPMEM that was built
and deployed in an off-the-shelf server. Systems designed to perform computing in or near …
and deployed in an off-the-shelf server. Systems designed to perform computing in or near …
JSON tiles: Fast analytics on semi-structured data
Developers often prefer flexibility over upfront schema design, making semi-structured data
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …
Jumpgate:{In-Network} Processing as a Service for Data Analytics
In-network processing, where data is processed by special-purpose devices as it passes
over the network, is showing great promise at improving application performance, in …
over the network, is showing great promise at improving application performance, in …
Using selective memoization to defeat regular expression denial of service (ReDoS)
Regular expressions (regexes) are a denial of service vector in most mainstream
programming languages. Recent empirical work has demonstrated that up to 10% of …
programming languages. Recent empirical work has demonstrated that up to 10% of …
Speculative distributed CSV data parsing for big data analytics
There has been a recent flurry of interest in providing query capability on raw data in today's
big data systems. These raw data must be parsed before processing or use in analytics …
big data systems. These raw data must be parsed before processing or use in analytics …
Predicate pushdown for data science pipelines
Predicate pushdown is a widely adopted query optimization. Existing systems and prior work
mostly use pattern-matching rules to decide when a predicate can be pushed through …
mostly use pattern-matching rules to decide when a predicate can be pushed through …
A survey of JSON-compatible binary serialization specifications
In this paper, we present the recent advances that highlight the characteristics of JSON-
compatible binary serialization specifications. We motivate the discussion by covering the …
compatible binary serialization specifications. We motivate the discussion by covering the …