Text data augmentation for deep learning

C Shorten, TM Khoshgoftaar, B Furht - Journal of big Data, 2021 - Springer
Abstract Natural Language Processing (NLP) is one of the most captivating applications of
Deep Learning. In this survey, we consider how the Data Augmentation training strategy can …

A survey of text watermarking in the era of large language models

A Liu, L Pan, Y Lu, J Li, X Hu, X Zhang, L Wen… - ACM Computing …, 2024 - dl.acm.org
Text watermarking algorithms are crucial for protecting the copyright of textual content.
Historically, their capabilities and application scenarios were limited. However, recent …

Bridgedata v2: A dataset for robot learning at scale

HR Walke, K Black, TZ Zhao, Q Vuong… - … on Robot Learning, 2023 - proceedings.mlr.press
We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors
designed to facilitate research in scalable robot learning. BridgeData V2 contains 53,896 …

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

M Grootendorst - ar** opinions, hence raising the
question of what drives online news consumption. Here we analyse the causal effect of …

DiffCSE: Difference-based contrastive learning for sentence embeddings

YS Chuang, R Dangovski, H Luo, Y Zhang… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence
embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference …

On the sentence embeddings from pre-trained language models

B Li, H Zhou, J He, M Wang, Y Yang, L Li - arxiv preprint arxiv:2011.05864, 2020 - arxiv.org
Pre-trained contextual representations like BERT have achieved great success in natural
language processing. However, the sentence embeddings from the pre-trained language …

Large pre-trained language models contain human-like biases of what is right and wrong to do

P Schramowski, C Turan, N Andersen… - Nature Machine …, 2022 - nature.com
Artificial writing is permeating our lives due to recent advances in large-scale, transformer-
based language models (LMs) such as BERT, GPT-2 and GPT-3. Using them as pre-trained …

Whitening sentence representations for better semantics and faster retrieval

J Su, J Cao, W Liu, Y Ou - arxiv preprint arxiv:2103.15316, 2021 - arxiv.org
Pre-training models such as BERT have achieved great success in many natural language
processing tasks. However, how to obtain better sentence representation through these pre …