A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arxiv preprint arxiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

Universal language model fine-tuning for text classification

J Howard, S Ruder - arxiv preprint arxiv:1801.06146, 2018 - arxiv.org
Inductive transfer learning has greatly impacted computer vision, but existing approaches in
NLP still require task-specific modifications and training from scratch. We propose Universal …

An efficient framework for learning sentence representations

L Logeswaran, H Lee - arxiv preprint arxiv:1803.02893, 2018 - arxiv.org
In this work we propose a simple and efficient framework for learning sentence
representations from unlabelled data. Drawing inspiration from the distributional hypothesis …

A brief overview of universal sentence representation methods: A linguistic view

R Li, X Zhao, MF Moens - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
How to transfer the semantic information in a sentence to a computable numerical
embedding form is a fundamental problem in natural language processing. An informative …

Semantically equivalent adversarial rules for debugging NLP models

MT Ribeiro, S Singh, C Guestrin - … of the 56th Annual Meeting of …, 2018 - aclanthology.org
Complex machine learning models for NLP are often brittle, making different predictions for
input instances that are extremely similar semantically. To automatically detect this behavior …

ParaNMT-50M: Pushing the limits of paraphrastic sentence embeddings with millions of machine translations

J Wieting, K Gimpel - arxiv preprint arxiv:1711.05732, 2017 - arxiv.org
We describe PARANMT-50M, a dataset of more than 50 million English-English sentential
paraphrase pairs. We generated the pairs automatically by using neural machine translation …

Sbert-wk: A sentence embedding method by dissecting bert-based word models

B Wang, CCJ Kuo - IEEE/ACM Transactions on Audio, Speech …, 2020 - ieeexplore.ieee.org
Sentence embedding is an important research topic in natural language processing (NLP)
since it can transfer knowledge to downstream tasks. Meanwhile, a contextualized word …

[BOOK][B] Text data mining

C Zong, R **a, J Zhang - 2021 - Springer
With the rapid development and popularization of Internet and mobile communication
technologies, text data mining has attracted much attention. In particular, with the wide use …

Beyond BLEU: training neural machine translation with semantic similarity

J Wieting, T Berg-Kirkpatrick, K Gimpel… - arxiv preprint arxiv …, 2019 - arxiv.org
While most neural machine translation (NMT) systems are still trained using maximum
likelihood estimation, recent work has demonstrated that optimizing systems to directly …

Content selection in deep learning models of summarization

C Kedzie, K McKeown, H Daume III - arxiv preprint arxiv:1810.12343, 2018 - arxiv.org
We carry out experiments with deep learning models of summarization across the domains
of news, personal stories, meetings, and medical articles in order to understand how content …