Spaces, trees, and colors: The algorithmic landscape of document retrieval on sequences

G Navarro - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Document retrieval is one of the best-established information retrieval activities since
the'60s, pervading all search engines. Its aim is to obtain, from a collection of text …

[KIRJA][B] Algorithms and theory of computation handbook, volume 2: special topics and techniques

MJ Atallah, M Blanton - 2009 - books.google.com
This handbook provides an up-to-date compendium of fundamental computer science
topics, techniques, and applications. Along with updating and revising many of the existing …

[KIRJA][B] Average case analysis of algorithms on sequences

W Szpankowski - 2011 - books.google.com
A timely book on a topic that has witnessed a surge of interest over the last decade, owing in
part to several novel applications, most notably in data compression and computational …

[KIRJA][B] Handbook of computational molecular biology

S Aluru - 2005 - taylorfrancis.com
The enormous complexity of biological systems at the molecular level must be answered
with powerful computational methods. Computational biology is a young field, but has seen …

Fast text searching for regular expressions or automaton searching on tries

RA Baeza-Yates, GH Gonnet - Journal of the ACM (JACM), 1996 - dl.acm.org
We present algorithms for efficient searching of regular expressions on preprocessed text,
using a Patricia tree as a logical model for the index. We obtain searching algorithms that …

[HTML][HTML] Estimating the entropy of binary time series: Methodology, some theory and a simulation study

Y Gao, I Kontoyiannis, E Bienenstock - Entropy, 2008 - mdpi.com
Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and
extensive comparison between some of the most popular and effective entropy estimation …

Faster entropy-bounded compressed suffix trees

J Fischer, V Mäkinen, G Navarro - Theoretical Computer Science, 2009 - Elsevier
Suffix trees are among the most important data structures in stringology, with a number of
applications in flourishing areas like bioinformatics. Their main problem is space usage …

Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees

P Jacquet, W Szpankowski - Theoretical Computer Science, 1995 - Elsevier
The Lempel-Ziv parsing scheme finds a wide range of applications, most notably in data
compression and algorithms on words. It partitions a sequence of length n into variable …

On the entropy of a hidden Markov process

P Jacquet, G Seroussi, W Szpankowski - Theoretical computer science, 2008 - Elsevier
We study the entropy rate of a hidden Markov process (HMP) defined by observing the
output of a binary symmetric channel whose input is a first-order binary Markov process …

A suboptimal lossy data compression based on approximate pattern matching

T Luczak, W Szpankowski - IEEE transactions on Information …, 1997 - ieeexplore.ieee.org
A practical suboptimal (variable source coding) algorithm for lossy data compression is
presented. This scheme is based on approximate string matching, and it naturally extends …