Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023‏ - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Studying the effect and treatment of misspelled queries in cross-language information retrieval

J Vilares, MA Alonso, Y Doval, M Vilares - Information Processing & …, 2016‏ - Elsevier
In contrast with their monolingual counterparts, little attention has been paid to the effects
that misspelled queries have on the performance of Cross-Language Information Retrieval …

Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm

N Barrett, J Weber-Jahnke - BMC bioinformatics, 2011‏ - Springer
Background Tokenization is an important component of language processing yet there is no
widely accepted tokenization method for English texts, including biomedical texts. Other than …

Managing misspelled queries in IR applications

J Vilares, M Vilares, J Otero - Information Processing & Management, 2011‏ - Elsevier
Our work concerns the design of robust information retrieval environments that can
successfully handle queries containing misspelled words. Our aim is to perform a …

[PDF][PDF] A reconfigurable stochastic tagger for languages with complex tag structure

Ł Dębowski - Proceedings of the 2003 EACL Workshop on …, 2003‏ - aclanthology.org
We present a case study of a complex stochastic disambiguator of alternatives of
morphosyntactic tags which allows for using incomplete disambiguation, shorthand tag …

Extraction of complex index terms in non-English IR: A shallow parsing based approach

J Vilares, MA Alonso, M Vilares - Information processing & management, 2008‏ - Elsevier
The performance of information retrieval systems is limited by the linguistic variation present
in natural language texts. Word-level natural language processing techniques have been …

Contextual spelling correction

J Otero, J Graña, M Vilares - International Conference on Computer Aided …, 2007‏ - Springer
Spelling correction is commonly a critical task for a variety of NLP tools. Some systems assist
users by offering a set of possible corrections for a given misspelt word. An automatic …

Morphological and syntactic processing for text retrieval

J Vilares, MA Alonso, M Vilares - International Conference on Database …, 2004‏ - Springer
This article describes the application of lemmatization and shallow parsing as a linguistically-
based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation …

[PDF][PDF] Natural language processing techniques for the purpose of sentinel event information extraction

N Barrett - 2012‏ - dspace.library.uvic.ca
An approach to biomedical language processing is to apply existing natural language
processing (NLP) solutions to biomedical texts. Often, existing NLP solutions are less …

[PDF][PDF] A Trainable Tokenizer, solution for multilingual texts and compound expression tokenization.

O Frunza - LREC, 2008‏ - pages.cs.brandeis.edu
Tokenization is one of the initial steps done for almost any text processing task. It is not
particularly recognized as a challenging task for English monolingual systems but it rapidly …