Dataset discovery and exploration: A survey

NW Paton, J Chen, Z Wu - ACM Computing Surveys, 2023 - dl.acm.org
Data scientists are tasked with obtaining insights from data. However, suitable data is often
not immediately at hand, and there may be many potentially relevant datasets in a data lake …

[PDF][PDF] Data wrangling for big data: Challenges and opportunities

T Furche, G Gottlob, L Libkin… - … : Proceedings of the …, 2016 - research.manchester.ac.uk
Data wrangling is the process by which the data required by an application is identified,
extracted, cleaned and integrated, to yield a data set that is suitable for exploration and …

Zeroshotceres: Zero-shot relation extraction from semi-structured webpages

C Lockard, P Shiralkar, XL Dong… - arxiv preprint arxiv …, 2020 - arxiv.org
In many documents, such as semi-structured webpages, textual semantics are augmented
with additional information conveyed using visual elements including layout, font size, and …

Ceres: Distantly supervised relation extraction from the semi-structured web

C Lockard, XL Dong, A Einolghozati… - arxiv preprint arxiv …, 2018 - arxiv.org
The web contains countless semi-structured websites, which can be a rich source of
information for populating knowledge bases. Existing methods for extracting relations from …

Swift Logic for Big Data and Knowledge Graphs: Overview of Requirements, Language, and System

L Bellomarini, G Gottlob, A Pieris… - SOFSEM 2018: Theory and …, 2018 - Springer
Many modern companies wish to maintain knowledge in the form of a corporate knowledge
graph and to use and manage this knowledge via a knowledge graph management system …

Openceres: When open information extraction meets the semi-structured web

C Lockard, P Shiralkar, XL Dong - … of the 2019 Conference of the …, 2019 - aclanthology.org
Abstract Open Information Extraction (OpenIE), the problem of harvesting triples from natural
language text whose predicate relations are not aligned to any pre-defined ontology, has …

Dexter: large-scale discovery and extraction of product specifications on the web

D Qiu, L Barbosa, XL Dong, Y Shen… - Proceedings of the VLDB …, 2015 - dl.acm.org
The web is a rich resource of structured data. There has been an increasing interest in using
web structured data for many applications such as data integration, web search and …

Deep Web crawling: a survey

I Hernández, CR Rivero, D Ruiz - World Wide Web, 2019 - Springer
Deep Web crawling refers to the problem of traversing the collection of pages in a deep Web
site, which are dynamically generated in response to a particular query that is submitted …

Datalog: concepts, history, and outlook

D Maier, KT Tekle, M Kifer, DS Warren - Declarative Logic Programming …, 2018 - dl.acm.org
This chapter is a survey of the history and the main concepts of Datalog. We begin with an
introduction to the language and its use for database definition and querying. We then look …

Data context informed data wrangling

M Koehler, A Bogatu, C Civili… - … Conference on Big …, 2017 - ieeexplore.ieee.org
The process of preparing potentially large and complex data sets for further analysis or
manual examination is often called data wrangling. In classical warehousing environments …