Document spanners-a brief overview of concepts, results, and recent developments

ML Schmid, N Schweikardt - Proceedings of the 41st ACM SIGMOD …, 2022 - dl.acm.org
The information extraction framework of document spanners was introduced by Fagin,
Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …

Spanner evaluation over SLP-compressed documents

ML Schmid, N Schweikardt - Proceedings of the 40th ACM SIGMOD …, 2021 - dl.acm.org
We consider the problem of evaluating regular spanners over compressed documents, ie,
we wish to solve evaluation tasks directly on the compressed data, without decompression …

Query evaluation over slp-represented document databases with complex document editing

ML Schmid, N Schweikardt - Proceedings of the 41st ACM SIGMOD …, 2022 - dl.acm.org
It is known that the query result of a regular spanner over a single document D can be
enumerated after O (| D|) preprocessing and with constant delay in data complexity …

The smallest extraction problem

V Cetorelli, P Atzeni, V Crescenzi… - Proceedings of the VLDB …, 2021 - dl.acm.org
We introduce landmark grammars, a new family of context-free grammars aimed at
describing the HTML source code of pages published by large and templated websites and …

On the hardness of smallest RLSLPs and collage systems

A Kawamoto, I Tomohiro, D Köppl… - 2024 Data …, 2024 - ieeexplore.ieee.org
On the Hardness of Smallest RLSLPs and Collage Systems Page 1 On the Hardness of
Smallest RLSLPs and Collage Systems Akiyoshi Kawamoto†, Tomohiro I†, Dominik Köppl∗,+ …

Computing np-hard repetitiveness measures via MAX-SAT

H Bannai, K Goto, M Ishihata, S Kanda, D Köppl… - arxiv preprint arxiv …, 2022 - arxiv.org
Repetitiveness measures reveal profound characteristics of datasets, and give rise to
compressed data structures and algorithms working in compressed space. Alas, the …

Enumeration for MSO-Queries on Compressed Trees

M Lohrey, ML Schmid - Proceedings of the ACM on Management of …, 2024 - dl.acm.org
We present a linear preprocessing and output-linear delay enumeration algorithm for MSO-
queries over trees that are compressed in the well-established grammar-based framework …

A Lower Bound on Unambiguous Context Free Grammars via Communication Complexity

S Mengel, H Vinall-Smeeth - arxiv preprint arxiv:2412.03199, 2024 - arxiv.org
Motivated by recent connections to factorised databases, we analyse the efficiency of
representations by context free grammars (CFGs). Concretely, we prove a recent conjecture …

Repair grammars are the smallest grammars for Fibonacci words

T Mieno, S Inenaga, T Horiyama - arxiv preprint arxiv:2202.08447, 2022 - arxiv.org
Grammar-based compression is a loss-less data compression scheme that represents a
given string $ w $ by a context-free grammar that generates only $ w $. While computing the …

The Information Extraction Framework of Document Spanners-A Very Informal Survey

ML Schmid - International Conference on Current Trends in Theory …, 2024 - Springer
This document provides an intuitive and high-level survey of the information extraction
framework of document spanners (Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013 …