Data management in machine learning: Challenges, techniques, and systems

A Kumar, M Boehm, J Yang - Proceedings of the 2017 ACM International …, 2017 - dl.acm.org
Large-scale data analytics using statistical machine learning (ML), popularly called
advanced analytics, underpins many modern data-driven applications. The data …

Table understanding: Problem overview

A Shigarov - Wiley Interdisciplinary Reviews: Data Mining and …, 2023 - Wiley Online Library
Tables are probably the most natural way to represent relational data in various media and
formats. They store a large number of valuable facts that could be utilized for question …

Data lifecycle challenges in production machine learning: a survey

N Polyzotis, S Roy, SE Whang, M Zinkevich - ACM SIGMOD Record, 2018 - dl.acm.org
Machine learning has become an essential tool for gleaning knowledge from data and
tackling a diverse set of computationally hard tasks. However, the accuracy of a machine …

Network lasso: Clustering and optimization in large graphs

D Hallac, J Leskovec, S Boyd - Proceedings of the 21th ACM SIGKDD …, 2015 - dl.acm.org
Convex optimization is an essential tool for modern data analysis, as it provides a framework
to formulate and solve many problems in machine learning and data mining. However …

[HTML][HTML] Incremental knowledge base construction using deepdive

J Shin, S Wu, F Wang, C De Sa… - Proceedings of the …, 2015 - ncbi.nlm.nih.gov
Populating a database with unstructured information is a long-standing problem in industry
and research that encompasses problems of extraction, cleaning, and integration. Recent …

Model selection management systems: The next frontier of advanced analytics

A Kumar, R McCann, J Naughton, JM Patel - ACM SIGMOD Record, 2016 - dl.acm.org
John Boyd recognized in the 1960's the importance of situation awareness for military
operations and introduced the notion of the OODA loop (Observe, Orient, Decide, and Act) …

Fonduer: Knowledge base construction from richly formatted data

S Wu, L Hsiao, X Cheng, B Hancock… - Proceedings of the …, 2018 - dl.acm.org
We focus on knowledge base construction (KBC) from richly formatted data. In contrast to
KBC from text or tabular data, KBC from richly formatted data aims to extract relations …

To join or not to join? thinking twice about joins before feature selection

A Kumar, J Naughton, JM Patel, X Zhu - Proceedings of the 2016 …, 2016 - dl.acm.org
Closer integration of machine learning (ML) with data processing is a booming area in both
the data management industry and academia. Almost all ML toolkits assume that the input is …

Deepdive: Declarative knowledge base construction

C De Sa, A Ratner, C Ré, J Shin, F Wang, S Wu… - ACM SIGMOD …, 2016 - dl.acm.org
The dark data extraction or knowledge base construction (KBC) problem is to populate a
SQL database with information from unstructured data sources including emails, webpages …

DeepDive: a data management system for automatic knowledge base construction

C Zhang - 2015 - search.proquest.com
Many pressing questions in science are macroscopic: they require scientists to consult
information expressed in a wide range of resources, many of which are not organized in a …