Self-training with noisy student improves imagenet classification
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet,
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …
which is 2.0% better than the state-of-the-art model that requires 3.5 B weakly labeled …
Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation
Abstract Knowledge distillation (KD) is the preliminary step for training non-autoregressive
translation (NAT) models, which eases the training of NAT models at the cost of losing …
translation (NAT) models, which eases the training of NAT models at the cost of losing …
Improving neural machine translation by bidirectional training
We present a simple and effective pretraining strategy--bidirectional training (BiT) for neural
machine translation. Specifically, we bidirectionally update the model parameters at the …
machine translation. Specifically, we bidirectionally update the model parameters at the …
Rejuvenating low-frequency words: Making the most of parallel data in non-autoregressive translation
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-
autoregressive translation (NAT) models. However, there exists a discrepancy on low …
autoregressive translation (NAT) models. However, there exists a discrepancy on low …
Multi-task learning for multilingual neural machine translation
While monolingual data has been shown to be useful in improving bilingual neural machine
translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual …
translation (NMT), effectively and efficiently leveraging monolingual data for Multilingual …
Self-training sampling with monolingual data uncertainty for neural machine translation
Self-training has proven effective for improving NMT performance by augmenting model
training with synthetic parallel data. The common practice is to construct synthetic data …
training with synthetic parallel data. The common practice is to construct synthetic data …
Improving simultaneous machine translation with monolingual data
Simultaneous machine translation (SiMT) is usually done via sequence-level knowledge
distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However …
distillation (Seq-KD) from a full-sentence neural machine translation (NMT) model. However …
The USYD-JD Speech Translation System for IWSLT 2021
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low
resource speech translation task. We participated in the Swahili-English direction and got …
resource speech translation task. We participated in the Swahili-English direction and got …
HW-TSC's participation in the WMT 2021 news translation shared task
This paper presents the submission of Huawei Translate Services Center (HW-TSC) to the
WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including …
WMT 2021 News Translation Shared Task. We participate in 7 language pairs, including …
Tencent's multilingual machine translation system for wmt22 large-scale african languages
This paper describes Tencent's multilingual machine translation systems for the WMT22
shared task on Large-Scale Machine Translation Evaluation for African Languages. We …
shared task on Large-Scale Machine Translation Evaluation for African Languages. We …