Attribution and obfuscation of neural text authorship: A data mining perspective

A Uchendu, T Le, D Lee - ACM SIGKDD Explorations Newsletter, 2023 - dl.acm.org
Two interlocking research questions of growing interest and importance in privacy research
are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact …

Auggpt: Leveraging chatgpt for text data augmentation

H Dai, Z Liu, W Liao, X Huang, Y Cao… - … Transactions on Big …, 2025 - ieeexplore.ieee.org
Text data augmentation is an effective strategy for overcoming the challenge of limited
sample sizes in many natural language processing (NLP) tasks. This challenge is especially …

Beyond english-centric multilingual machine translation

A Fan, S Bhosale, H Schwenk, Z Ma, A El-Kishky… - Journal of Machine …, 2021 - jmlr.org
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages …

FUDGE: Controlled text generation with future discriminators

K Yang, D Klein - arxiv preprint arxiv:2104.05218, 2021 - arxiv.org
We propose Future Discriminators for Generation (FUDGE), a flexible and modular method
for controlled text generation. Given a pre-existing model G for generating text from a …

How can we know what language models know?

Z Jiang, FF Xu, J Araki, G Neubig - Transactions of the Association for …, 2020 - direct.mit.edu
Recent work has presented intriguing results examining the knowledge contained in
language models (LMs) by having the LM fill in the blanks of prompts such as “Obama is a …

How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering

Z Jiang, J Araki, H Ding, G Neubig - Transactions of the Association …, 2021 - direct.mit.edu
Recent works have shown that language models (LM) capture different types of knowledge
regarding facts or common sense. However, because no model is perfect, they still fail to …

Plug and play language models: A simple approach to controlled text generation

S Dathathri, A Madotto, J Lan, J Hung, E Frank… - arxiv preprint arxiv …, 2019 - arxiv.org
Large transformer-based language models (LMs) trained on huge text corpora have shown
unparalleled generation capabilities. However, controlling attributes of the generated …

Data augmentation using pre-trained transformer models

V Kumar, A Choudhary, E Cho - arxiv preprint arxiv:2003.02245, 2020 - arxiv.org
Language model based pre-trained models such as BERT have provided significant gains
across different NLP tasks. In this paper, we study different types of transformer based pre …

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

Self-guided contrastive learning for BERT sentence representations

T Kim, KM Yoo, S Lee - arxiv preprint arxiv:2106.07345, 2021 - arxiv.org
Although BERT and its variants have reshaped the NLP landscape, it still remains unclear
how best to derive sentence embeddings from such pre-trained Transformers. In this work …