Parametric schema inference for massive JSON datasets
In recent years, JSON established itself as a very popular data format for representing
massive data collections. JSON data collections are usually schemaless. While this ensures …
massive data collections. JSON data collections are usually schemaless. While this ensures …
Parsing gigabytes of JSON per second
G Langdale, D Lemire - The VLDB Journal, 2019 - Springer
Abstract JavaScript Object Notation or JSON is a ubiquitous data exchange format on the
web. Ingesting JSON documents can become a performance bottleneck due to the sheer …
web. Ingesting JSON documents can become a performance bottleneck due to the sheer …
JSON tiles: Fast analytics on semi-structured data
Developers often prefer flexibility over upfront schema design, making semi-structured data
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …
A survey of JSON-compatible binary serialization specifications
JC Viotti, M Kinderkhedia - arxiv preprint arxiv:2201.02089, 2022 - arxiv.org
In this paper, we present the recent advances that highlight the characteristics of JSON-
compatible binary serialization specifications. We motivate the discussion by covering the …
compatible binary serialization specifications. We motivate the discussion by covering the …
ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize Framework
The schemalessness, one of the major advantages of JSON representation format, comes
with high penalties in querying and operations by denying various critical functions such as …
with high penalties in querying and operations by denying various critical functions such as …
Adaptive code generation for data-intensive analytics
Modern database management systems employ sophisticated query optimization
techniques that enable the generation of efficient plans for queries over very large data sets …
techniques that enable the generation of efficient plans for queries over very large data sets …
Dynamic speculative optimizations for SQL compilation in Apache Spark
Big-data systems have gained significant momentum, and Apache Spark is becoming a de-
facto standard for modern data analytics. Spark relies on SQL query compilation to optimize …
facto standard for modern data analytics. Spark relies on SQL query compilation to optimize …
Fishstore: Faster ingestion with subset hashing
The last decade has witnessed a huge increase in data being ingested into the cloud, in
forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into …
forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into …
Language-agnostic integrated queries in a managed polyglot runtime
Language-integrated query (LINQ) frameworks offer a convenient programming abstraction
for processing in-memory collections of data, allowing developers to concisely express …
for processing in-memory collections of data, allowing developers to concisely express …
ParPaRaw: Massively parallel parsing of delimiter-separated raw data
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading,
and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major …
and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major …