Dataset discovery and exploration: A survey
Data scientists are tasked with obtaining insights from data. However, suitable data is often
not immediately at hand, and there may be many potentially relevant datasets in a data lake …
not immediately at hand, and there may be many potentially relevant datasets in a data lake …
[PDF][PDF] Data wrangling for big data: Challenges and opportunities
Data wrangling is the process by which the data required by an application is identified,
extracted, cleaned and integrated, to yield a data set that is suitable for exploration and …
extracted, cleaned and integrated, to yield a data set that is suitable for exploration and …
Zeroshotceres: Zero-shot relation extraction from semi-structured webpages
In many documents, such as semi-structured webpages, textual semantics are augmented
with additional information conveyed using visual elements including layout, font size, and …
with additional information conveyed using visual elements including layout, font size, and …
Ceres: Distantly supervised relation extraction from the semi-structured web
The web contains countless semi-structured websites, which can be a rich source of
information for populating knowledge bases. Existing methods for extracting relations from …
information for populating knowledge bases. Existing methods for extracting relations from …
Swift Logic for Big Data and Knowledge Graphs: Overview of Requirements, Language, and System
Many modern companies wish to maintain knowledge in the form of a corporate knowledge
graph and to use and manage this knowledge via a knowledge graph management system …
graph and to use and manage this knowledge via a knowledge graph management system …
Openceres: When open information extraction meets the semi-structured web
Abstract Open Information Extraction (OpenIE), the problem of harvesting triples from natural
language text whose predicate relations are not aligned to any pre-defined ontology, has …
language text whose predicate relations are not aligned to any pre-defined ontology, has …
Dexter: large-scale discovery and extraction of product specifications on the web
The web is a rich resource of structured data. There has been an increasing interest in using
web structured data for many applications such as data integration, web search and …
web structured data for many applications such as data integration, web search and …
Deep Web crawling: a survey
Deep Web crawling refers to the problem of traversing the collection of pages in a deep Web
site, which are dynamically generated in response to a particular query that is submitted …
site, which are dynamically generated in response to a particular query that is submitted …
Datalog: concepts, history, and outlook
This chapter is a survey of the history and the main concepts of Datalog. We begin with an
introduction to the language and its use for database definition and querying. We then look …
introduction to the language and its use for database definition and querying. We then look …
Data context informed data wrangling
The process of preparing potentially large and complex data sets for further analysis or
manual examination is often called data wrangling. In classical warehousing environments …
manual examination is often called data wrangling. In classical warehousing environments …