[PDF][PDF] Concave penalized estimation of sparse Gaussian Bayesian networks
We develop a penalized likelihood estimation framework to learn the structure of Gaussian
Bayesian networks from observational data. In contrast to recent methods which accelerate …
Bayesian networks from observational data. In contrast to recent methods which accelerate …
Schema profiling of document-oriented databases
In document-oriented databases, schema is a soft concept and the documents in a collection
can be stored using different local schemata. This gives designers and implementers …
can be stored using different local schemata. This gives designers and implementers …
Curated databases
Curated databases are databases that are populated and updated with a great deal of
human effort. Most reference works that one traditionally found on the reference shelves of …
human effort. Most reference works that one traditionally found on the reference shelves of …
Inference of concise regular expressions and DTDs
We consider the problem of inferring a concise Document Type Definition (DTD) for a given
set of XML-documents, a problem that basically reduces to learning concise regular …
set of XML-documents, a problem that basically reduces to learning concise regular …
A universal approach for multi-model schema inference
The variety feature of Big Data, represented by multi-model data, has brought a new
dimension of complexity to all aspects of data management. The need to process a set of …
dimension of complexity to all aspects of data management. The need to process a set of …
Learning join queries from user examples
We investigate the problem of learning join queries from user examples. The user is
presented with a set of candidate tuples and is asked to label them as positive or negative …
presented with a set of candidate tuples and is asked to label them as positive or negative …
SemMT: a semantic-based testing approach for machine translation systems
Machine translation has wide applications in daily life. In mission-critical applications such
as translating official documents, incorrect translation can have unpleasant or sometimes …
as translating official documents, incorrect translation can have unpleasant or sometimes …
Extracting structured information from Wikipedia articles to populate infoboxes
Roughly every third Wikipedia article contains an infobox-a table that displays important
facts about the subject in attribute-value form. The schema of an infobox, ie, the attributes …
facts about the subject in attribute-value form. The schema of an infobox, ie, the attributes …
InfeRE: Step-by-Step Regex Generation via Chain of Inference
Automatically generating regular expressions (abbrev. regexes) from natural language
description (NL2RE) has been an emerging research area. Prior studies treat regex as a …
description (NL2RE) has been an emerging research area. Prior studies treat regex as a …
Making data platforms smarter with MOSES
The rise of data platforms has enabled the collection and processing of huge volumes of
data, but has opened to the risk of losing their control. Collecting proper metadata about raw …
data, but has opened to the risk of losing their control. Collecting proper metadata about raw …