Document spanners-a brief overview of concepts, results, and recent developments
The information extraction framework of document spanners was introduced by Fagin,
Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …
Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …
Spanner evaluation over SLP-compressed documents
We consider the problem of evaluating regular spanners over compressed documents, ie,
we wish to solve evaluation tasks directly on the compressed data, without decompression …
we wish to solve evaluation tasks directly on the compressed data, without decompression …
Query evaluation over slp-represented document databases with complex document editing
It is known that the query result of a regular spanner over a single document D can be
enumerated after O (| D|) preprocessing and with constant delay in data complexity …
enumerated after O (| D|) preprocessing and with constant delay in data complexity …
The smallest extraction problem
We introduce landmark grammars, a new family of context-free grammars aimed at
describing the HTML source code of pages published by large and templated websites and …
describing the HTML source code of pages published by large and templated websites and …
On the hardness of smallest RLSLPs and collage systems
A Kawamoto, I Tomohiro, D Köppl… - 2024 Data …, 2024 - ieeexplore.ieee.org
On the Hardness of Smallest RLSLPs and Collage Systems Page 1 On the Hardness of
Smallest RLSLPs and Collage Systems Akiyoshi Kawamoto†, Tomohiro I†, Dominik Köppl∗,+ …
Smallest RLSLPs and Collage Systems Akiyoshi Kawamoto†, Tomohiro I†, Dominik Köppl∗,+ …
Computing np-hard repetitiveness measures via MAX-SAT
Repetitiveness measures reveal profound characteristics of datasets, and give rise to
compressed data structures and algorithms working in compressed space. Alas, the …
compressed data structures and algorithms working in compressed space. Alas, the …
Enumeration for MSO-Queries on Compressed Trees
We present a linear preprocessing and output-linear delay enumeration algorithm for MSO-
queries over trees that are compressed in the well-established grammar-based framework …
queries over trees that are compressed in the well-established grammar-based framework …
A Lower Bound on Unambiguous Context Free Grammars via Communication Complexity
S Mengel, H Vinall-Smeeth - arxiv preprint arxiv:2412.03199, 2024 - arxiv.org
Motivated by recent connections to factorised databases, we analyse the efficiency of
representations by context free grammars (CFGs). Concretely, we prove a recent conjecture …
representations by context free grammars (CFGs). Concretely, we prove a recent conjecture …
Repair grammars are the smallest grammars for Fibonacci words
Grammar-based compression is a loss-less data compression scheme that represents a
given string $ w $ by a context-free grammar that generates only $ w $. While computing the …
given string $ w $ by a context-free grammar that generates only $ w $. While computing the …
The Information Extraction Framework of Document Spanners-A Very Informal Survey
ML Schmid - International Conference on Current Trends in Theory …, 2024 - Springer
This document provides an intuitive and high-level survey of the information extraction
framework of document spanners (Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013 …
framework of document spanners (Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013 …