Comparison of text preprocessing methods
CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …
a key area that directly affects the natural language processing (NLP) application results. For …
State-of-the-art generalisation research in NLP: a taxonomy and review
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …
Discrete prompt compression with reinforcement learning
Compressed prompts aid instruction-tuned language models (LMs) in overcoming context
window limitations and reducing computational costs. Existing methods, which are primarily …
window limitations and reducing computational costs. Existing methods, which are primarily …
Using punctuation as an adversarial attack on deep learning-based NLP systems: An empirical study
This work empirically investigates punctuation insertions as adversarial attacks on NLP
systems. Data from experiments on three tasks, five datasets, and six models with four …
systems. Data from experiments on three tasks, five datasets, and six models with four …
[HTML][HTML] Training a deep contextualized language model for international classification of diseases, 10th revision classification via federated learning: model …
Background: The automatic coding of clinical text documents by using the International
Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses …
Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses …
Does Sentence Segmentation Matter for Machine Translation?
For the most part, NLP applications operate at the sentence level. Since sentences occur
most naturally in documents, they must be extracted and segmented via the use of a …
most naturally in documents, they must be extracted and segmented via the use of a …
On the robustness of intent classification and slot labeling in goal-oriented dialog systems to real-world noise
Intent Classification (IC) and Slot Labeling (SL) models, which form the basis of dialogue
systems, often encounter noisy data in real-word environments. In this work, we investigate …
systems, often encounter noisy data in real-word environments. In this work, we investigate …
Linguistic-based data augmentation approach for offensive language detection
The massive amount of data generated by social media possess a great deal of toxic content
that lead to serious content filtering problems including hate speech, cyberbullying and …
that lead to serious content filtering problems including hate speech, cyberbullying and …
Adversarial examples against a bert absa model–fooling bert with l33t, misspellign, and punctuation
The BERT model is de facto state-of-the-art for aspect-based sentiment analysis (ABSA), an
important task in natural language processing. Similar to every other model based on deep …
important task in natural language processing. Similar to every other model based on deep …
Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks Approach
With the vast amount of social media posts available online, topic modeling and sentiment
analysis have become central methods to better understand and analyze online behavior …
analysis have become central methods to better understand and analyze online behavior …