Data lake management: challenges and opportunities
The ubiquity of data lakes has created fascinating new challenges for data management
research. In this tutorial, we review the state-of-the-art in data management for data lakes …
research. In this tutorial, we review the state-of-the-art in data management for data lakes …
[PDF][PDF] Data Integration: The Current Status and the Way Forward.
M Stonebraker, IF Ilyas - IEEE Data Eng. Bull., 2018 - cs.uwaterloo.ca
We discuss scalable data integration challenges in the enterprise inspired by our
experience at Tamr1. We use multiple real customer examples to highlight the technical …
experience at Tamr1. We use multiple real customer examples to highlight the technical …
A survey on data collection for machine learning: a big data-ai integration perspective
Data collection is a major bottleneck in machine learning and an active research topic in
multiple communities. There are largely two reasons data collection has recently become a …
multiple communities. There are largely two reasons data collection has recently become a …
[LIVRE][B] Magellan: Toward building entity matching management systems
PV Konda - 2018 - search.proquest.com
Entity matching (EM) identifies data instances that refer to the same real-world entity, such
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
as (David Smith, UWMadison) and (DM Smith, UWM). This problem has been a long …
Deepeye: Towards automatic data visualization
Data visualization is invaluable for explaining the significance of data to people who are
visually oriented. The central task of automatic data visualization is, given a dataset, to …
visually oriented. The central task of automatic data visualization is, given a dataset, to …
Annotating columns with pre-trained language models
Inferring meta information about tables, such as column headers or relationships between
columns, is an active research topic in data management as we find many tables are …
columns, is an active research topic in data management as we find many tables are …
Josie: Overlap set similarity search for finding joinable tables in data lakes
We present a new solution for finding joinable tables in massive data lakes: given a table
and one join column, find tables that can be joined with the given table on the largest …
and one join column, find tables that can be joined with the given table on the largest …
Data market platforms: Trading data assets to solve data problems
Data only generates value for a few organizations with expertise and resources to make
data shareable, discoverable, and easy to integrate. Sharing data that is easy to discover …
data shareable, discoverable, and easy to integrate. Sharing data that is easy to discover …
Finding related tables in data lakes for interactive data science
Many modern data science applications build on data lakes, schema-agnostic repositories
of data files and data products that offer limited organization and management capabilities …
of data files and data products that offer limited organization and management capabilities …
Raha: A configuration-free error detection system
Detecting erroneous values is a key step in data cleaning. Error detection algorithms usually
require a user to provide input configurations in the form of rules or statistical parameters …
require a user to provide input configurations in the form of rules or statistical parameters …