Data lake management: challenges and opportunities
The ubiquity of data lakes has created fascinating new challenges for data management
research. In this tutorial, we review the state-of-the-art in data management for data lakes …
research. In this tutorial, we review the state-of-the-art in data management for data lakes …
Table understanding: Problem overview
A Shigarov - Wiley Interdisciplinary Reviews: Data Mining and …, 2023 - Wiley Online Library
Tables are probably the most natural way to represent relational data in various media and
formats. They store a large number of valuable facts that could be utilized for question …
formats. They store a large number of valuable facts that could be utilized for question …
Webformer: The web-page transformer for structure information extraction
Structure information extraction refers to the task of extracting structured text fields from web
pages, such as extracting a product offer from a shop** page including product title …
pages, such as extracting a product offer from a shop** page including product title …
Santos: Relationship-based semantic table union search
Existing techniques for unionable table search define unionability using metadata (tables
must have the same or similar schemas) or column-based metrics (for example, the values …
must have the same or similar schemas) or column-based metrics (for example, the values …
Semantics-aware dataset discovery from data lakes with contextualized column-based representation learning
Dataset discovery from data lakes is essential in many real application scenarios. In this
paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes …
paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes …
Table cell search for question answering
Tables are pervasive on the Web. Informative web tables range across a large variety of
topics, which can naturally serve as a significant resource to satisfy user information needs …
topics, which can naturally serve as a significant resource to satisfy user information needs …
MUSTIE: Multimodal structural transformer for web information extraction
The task of web information extraction is to extract target fields of an object from web pages,
such as extracting the name, genre and actor from a movie page. Recent sequential …
such as extracting the name, genre and actor from a movie page. Recent sequential …
Olio: A Semantic Search Interface for Data Repositories
Search and information retrieval systems are becoming more expressive in interpreting user
queries beyond the traditional weighted bag-of-words model of document retrieval. For …
queries beyond the traditional weighted bag-of-words model of document retrieval. For …
[PDF][PDF] Ground: A Data Context Service.
Ground is an open-source data context service, a system to manage all the information that
informs the use of data. Data usage has changed both philosophically and practically in the …
informs the use of data. Data usage has changed both philosophically and practically in the …
Automated extraction of unstructured tables and semantic information from arbitrary documents
N Duta - US Patent 10,878,195, 2020 - Google Patents
A “Table Extractor” provides various techniques for auto matically delimiting and extracting
tables from arbitrary documents. In various implementations, the Table extractor also …
tables from arbitrary documents. In various implementations, the Table extractor also …