XTRACT: A system for extracting document type descriptors from XML documents

M Garofalakis, A Gionis, R Rastogi, S Seshadri… - Proceedings of the …, 2000 - dl.acm.org
XML is rapidly emerging as the new standard for data representation and exchange on the
Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which …

Grass: Graph structure summarization

K LeFevre, E Terzi - Proceedings of the 2010 SIAM International …, 2010 - SIAM
Large graph databases are commonly collected and analyzed in numerous domains. For
reasons related to either space efficiency or for privacy protection (eg, in the case of social …

The minimum description length principle for pattern mining: a survey

E Galbrun - Data mining and knowledge discovery, 2022 - Springer
Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration,
the selection of patterns constitutes a major challenge. The Minimum Description Length …

Discovering patterns and subfamilies in biosequences.

A Brazma, I Jonassen, E Ukkonen… - … Conference on Intelligent …, 1996 - europepmc.org
We consider the problem of automatic discovery of patterns and the corresponding
subfamilies in a set of biosequences. The sequences are unaligned and may contain noise …

XTRACT: learning document type descriptors from XML document collections

M Garofalakis, A Gionis, R Rastogi, S Seshadri… - Data mining and …, 2003 - Springer
XML is rapidly emerging as the new standard for data representation and exchange on the
Web. Unlike HTML, tags in XML documents describe the semantics of the data and not how …

Constructing comprehensive summaries of large event sequences

J Kiernan, E Terzi - ACM Transactions on Knowledge Discovery from …, 2009 - dl.acm.org
Event sequences capture system and user activity over time. Prior research on sequence
mining has mostly focused on discovering local patterns appearing in a sequence. While …

The generalized MDL approach for summarization

LVS Lakshmanan, RT Ng, CX Wang, X Zhou… - VLDB'02: Proceedings …, 2002 - Elsevier
Publisher Summary This chapter presents a generalization of the Minimum Description
Length (MDL) principle, called GMDL, and shows that GMDL leads to fewer regions than …

Re-tree: an efficient index structure for regular expressions

CY Chan, M Garofalakis, R Rastogi - The VLDB Journal, 2003 - Springer
Due to their expressive power, regular expressions (REs) are quickly becoming an integral
part of language specifications for several important application scenarios. Many of these …

An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries

M Koivisto, M Perola, T Varilo, W Hennah… - Biocomputing …, 2002 - World Scientific
We describe a new method for finding haplotype blocks based on the use of the minimum
description length principle. We give a rigorous definition of the quality of a segmentation of …

Lange and Wiehagen's pattern language learning algorithm: An average-case analysis with respect to its total learning time

T Zeugmann - Annals of Mathematics and Artificial Intelligence, 1998 - Springer
The present paper deals with the best-case, worst-case and average-case behavior of
Lange and Wiehagen's (1991) pattern language learning algorithm with respect to its total …