Parametric schema inference for massive JSON datasets

MA Baazizi, D Colazzo, G Ghelli, C Sartiani - The VLDB Journal, 2019 - Springer
In recent years, JSON established itself as a very popular data format for representing
massive data collections. JSON data collections are usually schemaless. While this ensures …

Parsing gigabytes of JSON per second

G Langdale, D Lemire - The VLDB Journal, 2019 - Springer
Abstract JavaScript Object Notation or JSON is a ubiquitous data exchange format on the
web. Ingesting JSON documents can become a performance bottleneck due to the sheer …

JSON tiles: Fast analytics on semi-structured data

D Durner, V Leis, T Neumann - … of the 2021 International Conference on …, 2021 - dl.acm.org
Developers often prefer flexibility over upfront schema design, making semi-structured data
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …

A survey of JSON-compatible binary serialization specifications

JC Viotti, M Kinderkhedia - arxiv preprint arxiv:2201.02089, 2022 - arxiv.org
In this paper, we present the recent advances that highlight the characteristics of JSON-
compatible binary serialization specifications. We motivate the discussion by covering the …

ReCG: Bottom-up JSON Schema Discovery Using a Repetitive Cluster-and-Generalize Framework

J Yun, B Tak, WS Han - Proceedings of the VLDB Endowment, 2024 - dl.acm.org
The schemalessness, one of the major advantages of JSON representation format, comes
with high penalties in querying and operations by denying various critical functions such as …

Adaptive code generation for data-intensive analytics

W Zhang, J Kim, KA Ross, E Sedlar… - Proceedings of the VLDB …, 2021 - dl.acm.org
Modern database management systems employ sophisticated query optimization
techniques that enable the generation of efficient plans for queries over very large data sets …

Dynamic speculative optimizations for SQL compilation in Apache Spark

F Schiavio, D Bonetta, W Binder - Proceedings of the VLDB …, 2020 - folia.unifr.ch
Big-data systems have gained significant momentum, and Apache Spark is becoming a de-
facto standard for modern data analytics. Spark relies on SQL query compilation to optimize …

Fishstore: Faster ingestion with subset hashing

D **e, B Chandramouli, Y Li, D Kossmann - Proceedings of the 2019 …, 2019 - dl.acm.org
The last decade has witnessed a huge increase in data being ingested into the cloud, in
forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into …

Language-agnostic integrated queries in a managed polyglot runtime

F Schiavio, D Bonetta, W Binder - Proceedings of the VLDB …, 2021 - folia.unifr.ch
Language-integrated query (LINQ) frameworks offer a convenient programming abstraction
for processing in-memory collections of data, allowing developers to concisely express …

ParPaRaw: Massively parallel parsing of delimiter-separated raw data

E Stehle, HA Jacobsen - arxiv preprint arxiv:1905.13415, 2019 - arxiv.org
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading,
and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major …