Comparison of text preprocessing methods

CP Chai - Natural Language Engineering, 2023 - cambridge.org
Text preprocessing is not only an essential step to prepare the corpus for modeling but also
a key area that directly affects the natural language processing (NLP) application results. For …

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arxiv preprint arxiv …, 2022 - arxiv.org
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

Discrete prompt compression with reinforcement learning

H Jung, KJ Kim - IEEE Access, 2024 - ieeexplore.ieee.org
Compressed prompts aid instruction-tuned language models (LMs) in overcoming context
window limitations and reducing computational costs. Existing methods, which are primarily …

Using punctuation as an adversarial attack on deep learning-based NLP systems: An empirical study

B Formento, CS Foo, LA Tuan… - Findings of the Association …, 2023 - aclanthology.org
This work empirically investigates punctuation insertions as adversarial attacks on NLP
systems. Data from experiments on three tasks, five datasets, and six models with four …

[HTML][HTML] Training a deep contextualized language model for international classification of diseases, 10th revision classification via federated learning: model …

PF Chen, TL He, SC Lin, YC Chu, CT Kuo… - JMIR Medical …, 2022 - medinform.jmir.org
Background: The automatic coding of clinical text documents by using the International
Classification of Diseases, 10th Revision (ICD-10) can be performed for statistical analyses …

Does Sentence Segmentation Matter for Machine Translation?

R Wicks, M Post - Proceedings of the Seventh Conference on …, 2022 - aclanthology.org
For the most part, NLP applications operate at the sentence level. Since sentences occur
most naturally in documents, they must be extracted and segmented via the use of a …

On the robustness of intent classification and slot labeling in goal-oriented dialog systems to real-world noise

S Sengupta, J Krone, S Mansour - arxiv preprint arxiv:2104.07149, 2021 - arxiv.org
Intent Classification (IC) and Slot Labeling (SL) models, which form the basis of dialogue
systems, often encounter noisy data in real-word environments. In this work, we investigate …

Linguistic-based data augmentation approach for offensive language detection

T Tanyel, B Alkurdi, S Ayvaz - 2022 7th International …, 2022 - ieeexplore.ieee.org
The massive amount of data generated by social media possess a great deal of toxic content
that lead to serious content filtering problems including hate speech, cyberbullying and …

Adversarial examples against a bert absa model–fooling bert with l33t, misspellign, and punctuation

N Hofer, P Schöttle, A Rietzler, S Stabinger - Proceedings of the 16th …, 2021 - dl.acm.org
The BERT model is de facto state-of-the-art for aspect-based sentiment analysis (ABSA), an
important task in natural language processing. Similar to every other model based on deep …

Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks Approach

D Hanny, B Resch - Information, 2024 - mdpi.com
With the vast amount of social media posts available online, topic modeling and sentiment
analysis have become central methods to better understand and analyze online behavior …