Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Automated phrase mining from massive text corpora

J Shang, J Liu, M Jiang, X Ren… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
As one of the fundamental tasks in text analysis, phrase mining aims at extracting quality
phrases from a text corpus and has various downstream applications including information …

Multiword expression processing: A survey

M Constant, G Eryiğit, J Monti, L Van Der Plas… - Computational …, 2017 - direct.mit.edu
Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word
boundaries that are both idiosyncratic and pervasive across different languages. The …

[KIRJA][B] Natural language processing with Python: analyzing text with the natural language toolkit

S Bird, E Klein, E Loper - 2009 - books.google.com
This book offers a highly accessible introduction to natural language processing, the field
that supports a variety of language technologies, from predictive text and email filtering to …

Mining quality phrases from massive text corpora

J Liu, J Shang, C Wang, X Ren, J Han - Proceedings of the 2015 ACM …, 2015 - dl.acm.org
Text data are ubiquitous and play an essential role in big data applications. However, text
data are mostly unstructured. Transforming unstructured text into structured units (eg …

A dependency parser for tweets

L Kong, N Schneider, S Swayamdipta… - Proceedings of the …, 2014 - research.ed.ac.uk
We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser
builds on several contributions: new syntactic annotations for a corpus of tweets …

Recognizing complex entity mentions: A review and future directions

X Dai - Proceedings of ACL 2018, Student Research …, 2018 - aclanthology.org
Standard named entity recognizers can effectively recognize entity mentions that consist of
contiguous tokens and do not overlap with each other. However, in practice, there are many …

SemEval-2022 task 2: Multilingual idiomaticity detection and sentence embedding

HT Madabushi, E Gow-Smith, M Garcia… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence
Embedding, which consists of two subtasks:(a) a binary classification task aimed at …

[KIRJA][B] Syntax-based collocation extraction

V Seretan - 2011 - direct.mit.edu
Collocation is a common language phenomenon which has attracted the interest of
researchers in many subfields of both theoretical and computational linguistics. Although …

Statistical metaphor processing

E Shutova, S Teufel, A Korhonen - Computational Linguistics, 2013 - direct.mit.edu
Metaphor is highly frequent in language, which makes its computational processing
indispensable for real-world NLP applications addressing semantic tasks. Previous …