Charagram: Embedding words and sentences via character n-grams

J Wieting, M Bansal, K Gimpel, K Livescu - arxiv preprint arxiv …, 2016 - arxiv.org
We present Charagram embeddings, a simple approach for learning character-based
compositional models to embed textual sequences. A word or sentence is represented using …

The parallel meaning bank: Towards a multilingual corpus of translations annotated with compositional meaning representations

L Abzianidze, J Bjerva, K Evang, H Haagsma… - arxiv preprint arxiv …, 2017 - arxiv.org
The Parallel Meaning Bank is a corpus of translations annotated with shared, formal
meaning representations comprising over 11 million words divided over four languages …

The groningen meaning bank

J Bos, V Basile, K Evang, NJ Venhuizen… - Handbook of linguistic …, 2017 - Springer
The goal of the Groningen Meaning Bank (GMB) is to obtain a large corpus of English texts
annotated with formal meaning representations. Since manually annotating a …

PySBD: Pragmatic sentence boundary disambiguation

N Sadvilkar, M Neumann - arxiv preprint arxiv:2010.09657, 2020 - arxiv.org
In this paper, we present a rule-based sentence boundary disambiguation Python package
that works out-of-the-box for 22 languages. We aim to provide a realistic segmenter which …

Exploring neural methods for parsing discourse representation structures

R Van Noord, L Abzianidze, A Toral… - Transactions of the …, 2018 - direct.mit.edu
Neural methods have had several recent successes in semantic parsing, though they have
yet to face the challenge of producing meaning representations based on formal semantics …

Semantic tagging with deep residual networks

J Bjerva, B Plank, J Bos - arxiv preprint arxiv:1609.07053, 2016 - arxiv.org
We propose a novel semantic tagging task, sem-tagging, tailored for the purpose of
multilingual semantic parsing, and present the first tagger using deep residual networks …

[PDF][PDF] Normalizing tweets with edit scripts and recurrent neural embeddings

G Chrupała - Proceedings of the 52nd Annual Meeting of the …, 2014 - aclanthology.org
Tweets often contain a large proportion of abbreviations, alternative spellings, novel words
and other non-canonical language. These features are problematic for standard language …

Character-level representations improve DRS-based semantic parsing Even in the age of BERT

R van Noord, A Toral, J Bos - arxiv preprint arxiv:2011.04308, 2020 - arxiv.org
We combine character-level and contextual language model representations to improve
performance on Discourse Representation Structure parsing. Character representations can …

Evaluating scoped meaning representations

R Van Noord, L Abzianidze, H Haagsma… - arxiv preprint arxiv …, 2018 - arxiv.org
Semantic parsing offers many opportunities to improve natural language understanding. We
present a semantically annotated parallel corpus for English, German, Italian, and Dutch …

Statistical learning for OCR error correction

J Mei, A Islam, A Moh'd, Y Wu, E Milios - Information Processing & …, 2018 - Elsevier
Modern OCR engines incorporate some form of error correction, typically based on
dictionaries. However, there are still residual errors that decrease performance of natural …