A simple but tough-to-beat data augmentation approach for natural language understanding and generation

D Shen, M Zheng, Y Shen, Y Qu, W Chen - arxiv preprint arxiv …, 2020 - arxiv.org
Adversarial training has been shown effective at endowing the learned representations with
stronger generalization ability. However, it typically requires expensive computation to …

Bert, mbert, or bibert? a study on contextualized embeddings for neural machine translation

H Xu, B Van Durme, K Murray - arxiv preprint arxiv:2109.04588, 2021 - arxiv.org
The success of bidirectional encoders using masked language models, such as BERT, on
numerous natural language processing tasks has prompted researchers to attempt to …

BERTTune: Fine-tuning neural machine translation with BERTScore

IJ Unanue, J Parnell, M Piccardi - arxiv preprint arxiv:2106.02208, 2021 - arxiv.org
Neural machine translation models are often biased toward the limited translation
references seen during training. To amend this form of overfitting, in this paper we propose …

CipherDAug: Ciphertext based data augmentation for neural machine translation

N Kambhatla, L Born, A Sarkar - arxiv preprint arxiv:2204.00665, 2022 - arxiv.org
We propose a novel data-augmentation technique for neural machine translation based on
ROT-$ k $ ciphertexts. ROT-$ k $ is a simple letter substitution cipher that replaces a letter in …

Learning multiscale transformer models for sequence generation

B Li, T Zheng, Y **g, C Jiao… - … on Machine Learning, 2022 - proceedings.mlr.press
Multiscale feature hierarchies have been witnessed the success in the computer vision area.
This further motivates researchers to design multiscale Transformer for natural language …

Bi-simcut: A simple strategy for boosting neural machine translation

P Gao, Z He, H Wu, H Wang - arxiv preprint arxiv:2206.02368, 2022 - arxiv.org
We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine
translation (NMT) performance. It consists of two procedures: bidirectional pretraining and …

Neural hidden markov model for machine translation

W Wang, D Zhu, T Alkhouli, Z Gan… - Proceedings of the 56th …, 2018 - aclanthology.org
Attention-based neural machine translation (NMT) models selectively focus on specific
source positions to produce a translation, which brings significant improvements over pure …

TranSFormer: Slow-fast transformer for machine translation

B Li, Y **g, X Tan, Z **ng, T **ao, J Zhu - arxiv preprint arxiv:2305.16982, 2023 - arxiv.org
Learning multiscale Transformer models has been evidenced as a viable approach to
augmenting machine translation systems. Prior research has primarily focused on treating …

Em-network: Oracle guided self-distillation for sequence learning

JW Yoon, S Ahn, H Lee, M Kim… - … on Machine Learning, 2023 - proceedings.mlr.press
We introduce EM-Network, a novel self-distillation approach that effectively leverages target
information for supervised sequence-to-sequence (seq2seq) learning. In contrast to …

I2R: Intra and inter-modal representation learning for code search

X Zhang, Y **ang, Z Liu, X Hu… - Intelligent Data …, 2024 - journals.sagepub.com
Code search, which locates code snippets in large code repositories based on natural
language queries entered by developers, has become increasingly popular in the software …