Self-training with noisy student improves imagenet classification

Q **e, MT Luong, E Hovy… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …

Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation

L Ding, L Wang, S Shi, D Tao, Z Tu - … of the 60th Annual Meeting of …, 2022 - aclanthology.org
Abstract Knowledge distillation (KD) is the preliminary step for training non-autoregressive
translation (NAT) models, which eases the training of NAT models at the cost of losing …

Improving neural machine translation by bidirectional training

L Ding, D Wu, D Tao - arxiv preprint arxiv:2109.07780, 2021 - arxiv.org
We present a simple and effective pretraining strategy--bidirectional training (BiT) for neural
machine translation. Specifically, we bidirectionally update the model parameters at the …

Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translation

L Ding, L Wang, X Liu, DF Wong, D Tao… - arxiv preprint arxiv …, 2021 - arxiv.org
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-
autoregressive translation (NAT) models. However, there exists a discrepancy on low …

Multi-task learning for multilingual neural machine translation

Y Wang, CX Zhai, HH Awadalla - arxiv preprint arxiv:2010.02523, 2020 - arxiv.org
While monolingual data has been shown to be useful in improving bilingual neural machine
translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual …

Self-training sampling with monolingual data uncertainty for neural machine translation

W Jiao, X Wang, Z Tu, S Shi, MR Lyu, I King - arxiv preprint arxiv …, 2021 - arxiv.org
Self-training has proven effective for improving NMT performance by augmenting model
training with synthetic parallel data. The common practice is to construct synthetic data …

Improving simultaneous machine translation with monolingual data

H Deng, L Ding, X Liu, M Zhang, D Tao… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge
distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However …

The USYD-JD Speech Translation System for IWSLT 2021

L Ding, D Wu, D Tao - arxiv preprint arxiv:2107.11572, 2021 - arxiv.org
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low
resource speech translation task. We participated in the Swahili-English direction and got …

HW-TSC's participation in the WMT 2021 news translation shared task

D Wei, Z Li, Z Wu, Z Yu, X Chen, H Shang… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the
WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including …

Tencent's multilingual machine translation system for wmt22 large-scale african languages

W Jiao, Z Tu, J Li, W Wang, J Huang, S Shi - arxiv preprint arxiv …, 2022 - arxiv.org
This paper describes Tencent's multilingual machine translation systems for the WMT22
shared task on Large-Scale Machine Translation Evaluation for African Languages. We …