Data lake management: challenges and opportunities

F Nargesian, E Zhu, RJ Miller, KQ Pu… - Proceedings of the VLDB …, 2019 - dl.acm.org
The ubiquity of data lakes has created fascinating new challenges for data management
research. In this tutorial, we review the state-of-the-art in data management for data lakes …

Table understanding: Problem overview

A Shigarov - Wiley Interdisciplinary Reviews: Data Mining and …, 2023 - Wiley Online Library
Tables are probably the most natural way to represent relational data in various media and
formats. They store a large number of valuable facts that could be utilized for question …

Webformer: The web-page transformer for structure information extraction

Q Wang, Y Fang, A Ravula, F Feng, X Quan… - Proceedings of the ACM …, 2022 - dl.acm.org
Structure information extraction refers to the task of extracting structured text fields from web
pages, such as extracting a product offer from a shop** page including product title …

Santos: Relationship-based semantic table union search

A Khatiwada, G Fan, R Shraga, Z Chen… - Proceedings of the …, 2023 - dl.acm.org
Existing techniques for unionable table search define unionability using metadata (tables
must have the same or similar schemas) or column-based metrics (for example, the values …

Semantics-aware dataset discovery from data lakes with contextualized column-based representation learning

G Fan, J Wang, Y Li, D Zhang, R Miller - arxiv preprint arxiv:2210.01922, 2022 - arxiv.org
Dataset discovery from data lakes is essential in many real application scenarios. In this
paper, we propose Starmie, an end-to-end framework for dataset discovery from data lakes …

Table cell search for question answering

H Sun, H Ma, X He, W Yih, Y Su, X Yan - Proceedings of the 25th …, 2016 - dl.acm.org
Tables are pervasive on the Web. Informative web tables range across a large variety of
topics, which can naturally serve as a significant resource to satisfy user information needs …

MUSTIE: Multimodal structural transformer for web information extraction

Q Wang, J Wang, X Quan, F Feng, Z Xu… - Proceedings of the …, 2023 - aclanthology.org
The task of web information extraction is to extract target fields of an object from web pages,
such as extracting the name, genre and actor from a movie page. Recent sequential …

Olio: A Semantic Search Interface for Data Repositories

V Setlur, A Kanyuka, A Srinivasan - … of the 36th Annual ACM Symposium …, 2023 - dl.acm.org
Search and information retrieval systems are becoming more expressive in interpreting user
queries beyond the traditional weighted bag-of-words model of document retrieval. For …

[PDF][PDF] Ground: A Data Context Service.

JM Hellerstein, V Sreekanti, JE Gonzalez, J Dalton… - CIDR, 2017 - Citeseer
Ground is an open-source data context service, a system to manage all the information that
informs the use of data. Data usage has changed both philosophically and practically in the …

Automated extraction of unstructured tables and semantic information from arbitrary documents

N Duta - US Patent 10,878,195, 2020 - Google Patents
A “Table Extractor” provides various techniques for auto matically delimiting and extracting
tables from arbitrary documents. In various implementations, the Table extractor also …