Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023‏ - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

Multiword expression processing: A survey

M Constant, G Eryiğit, J Monti, L Van Der Plas… - Computational …, 2017‏ - direct.mit.edu
Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word
boundaries that are both idiosyncratic and pervasive across different languages. The …

Multiword expression identification with tree substitution grammars: A parsing tour de force with french

S Green, MC De Marneffe, J Bauer… - Conference on Empirical …, 2011‏ - hal.science
Multiword expressions (MWE), a known nui-sance for both linguistics and NLP, blur the lines
between syntax and semantics. Previous work on MWE identification has relied primar-ily on …

[PDF][PDF] A transition-based system for joint lexical and syntactic analysis

M Constant, J Nivre - Proceedings of the 54th Annual Meeting of …, 2016‏ - aclanthology.org
We present a transition-based system that jointly predicts the syntactic structure and lexical
units of a sentence by building two structures over the input words: a syntactic dependency …

Parsing models for identifying multiword expressions

S Green, MC de Marneffe, CD Manning - Computational Linguistics, 2013‏ - direct.mit.edu
Multiword expressions lie at the syntax/semantics interface and have motivated alternative
theories of syntax like Construction Grammar. Until now, however, syntactic analysis and …

Without lexicons, multiword expression identification will never fly: A position statement

A Savary, SR Cordeiro, C Ramisch - Joint Workshop on Multiword …, 2019‏ - hal.science
Because most multiword expressions (MWEs), especially verbal ones, are semantically non-
compositional, their automatic identification in running text is a prerequisite for semantically …

PARSEME–PARSing and Multiword Expressions within a European multilingual network

A Savary, M Sailer, Y Parmentier, M Rosner… - 7th Language & …, 2015‏ - hal.science
The aim of this paper is to present PARSEME, a COST Action devoted to the issue of
Multiword Expressions in parsing and in linguistic resources (corpora, lexicons). This is a …

Efficient continue training of temporal language model with structural information

Z Su, J Li, Z Zhang, Z Zhou… - Findings of the Association …, 2023‏ - aclanthology.org
Current language models are mainly trained on snap-shots of data gathered at a particular
time, which decreases their capability to generalize over time and model language change …

Collocations of fictive motion verbs in adventure tourism: A corpus-based study of the English language

EL Jiménez-Navarro… - Revista Española de …, 2024‏ - jbe-platform.com
This paper investigates the collocations produced by a set of fictive motion verbs found in a
specialized corpus representing the language of adventure tourism. Since our ultimate aim …

[ספר][B] Facets of prefabrication. Perspectives on modelling and detecting phraseological units

P Pęzik - 2018‏ - ceeol.com
Corpus-based studies have brought fresh insights into the role of collocability and lexico-
grammatical patterning as core aspects of language permeating its structure and use. Facets …