- Academic Search

GE Pibiri, R Venturini - ACM Transactions on Information Systems (TOIS), 2019 - dl.acm.org

Two fundamental problems concern the handling of large n-gram language models:
indexing, that is, compressing the n-grams and associated satellite values without …

Save Cite Cited by 28 Related articles All 15 versions Free GPT-4

[Free GPT-4]

[PDF] cam.ac.uk

Show some love to your n-grams: A bit of progress and stronger n-gram language modeling baselines

E Shareghi, D Gerz, I Vulic - 2019 - repository.cam.ac.uk

In recent years neural language models (LMs) have set state-of-the-art performance for
several benchmarking datasets. While the reasons for their success and their computational …

Save Cite Cited by 16 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] unimelb.edu.au

[PDF][PDF] Automatic understanding of unwritten languages

O Adams - 2017 - minerva-access.unimelb.edu.au

Many of the world's languages are falling out of use without a written record and minimal
linguistic documentation. Language documentation is a slow process and there are an …

Save Cite Cited by 17 Related articles

[Free GPT-4]

[PDF] arxiv.org

Koala: An index for quantifying overlaps with pre-training corpora

TT Vu, X He, G Haffari, E Shareghi - arxiv preprint arxiv:2303.14770, 2023 - arxiv.org

In very recent years more attention has been placed on probing the role of pre-training data
in Large Language Models (LLMs) downstream behaviour. Despite the importance, there is …

Save Cite Cited by 6 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] oup.com

A framework for space-efficient variable-order Markov models

F Cunial, J Alanko, D Belazzougui - Bioinformatics, 2019 - academic.oup.com

Motivation Markov models with contexts of variable length are widely used in bioinformatics
for representing sets of sequences with similar biological properties. When models contain …

Save Cite Cited by 11 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] monash.edu

Compressed nonparametric language modelling

E Shareghi, G Haffari, T Cohn - International Joint Conference …, 2017 - research.monash.edu

Abstract Hierarchical Pitman-Yor Process priors are compelling for learning language
models, outperforming point-estimate based methods. However, these models remain …

Save Cite Cited by 5 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aclanthology.org

Succinct data structures for NLP-at-scale

M Petri, T Cohn - Proceedings of COLING 2016, the 26th …, 2016 - aclanthology.org

Succinct data structures involve the use of novel data structures, compression technologies,
and other mechanisms to allow data to be stored in extremely small memory or disk …

Save Cite Cited by 2 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] helsinki.fi

[PDF][PDF] Space-Efficient Algorithms for Strings and Prefix-Sortable Graphs.

J Alanko - 2020 - helda.helsinki.fi

Abstract Space-efficient data structures are an active field of research that has found many
applications in combinatorial pattern matching and bioinformatics. The idea is to build data …

Save Cite Cited by 1 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] depositolegale.it

Space and Time-Efficient Data Structures for Massive Datasets

GE Pibiri - 2019 - tesidottorato.depositolegale.it

This thesis concerns the design of compressed data structures for the efficient storage of
massive datasets of integer sequences and short strings. The studied problems arise in …

Save Cite Cited by 3 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] core.ac.uk

[PDF][PDF] GPU-accelerated k-mer counting

P Jylhä-Ollila - 2020 - core.ac.uk

A common task in bioinformatics algorithms is k-mer counting [MK11, KDD17]. Given a string
S, the problem is to count the frequency of each unique substring of length k in S. K-mer …

Create alert

Cite

Advanced search

Saved to My library

Fast, small and exact: Infinite-order language modelling with compressed suffix trees

Handling Massive N-Gram Datasets Efficiently

Show some love to your n-grams: A bit of progress and stronger n-gram language modeling baselines

[PDF][PDF] Automatic understanding of unwritten languages

Koala: An index for quantifying overlaps with pre-training corpora

A framework for space-efficient variable-order Markov models

Compressed nonparametric language modelling

Succinct data structures for NLP-at-scale

[PDF][PDF] Space-Efficient Algorithms for Strings and Prefix-Sortable Graphs.

Space and Time-Efficient Data Structures for Massive Datasets

[PDF][PDF] GPU-accelerated k-mer counting