- Academic Search

Q Wang, B Li, T **ao, J Zhu, C Li, DF Wong… - arxiv preprint arxiv …, 2019 - arxiv.org

Transformer is the state-of-the-art model in recent machine translation evaluations. Two
strands of research are promising to improve models of this kind: the first uses wide …

保存引用被引用次数：861 相关文章所有 9 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Leveraging pre-trained checkpoints for sequence generation tasks

S Rothe, S Narayan, A Severyn - Transactions of the Association for …, 2020 - direct.mit.edu

Unsupervised pre-training of large neural models has recently revolutionized Natural
Language Processing. By warm-starting from the publicly released checkpoints, NLP …

保存引用被引用次数：519 相关文章所有 12 个版本

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Sparse is enough in scaling transformers

S Jaszczur, A Chowdhery… - Advances in …, 2021 - proceedings.neurips.cc

Large Transformer models yield impressive results on many tasks, but are expensive to
train, or even fine-tune, and so slow at decoding that their use and study becomes out of …

保存引用被引用次数：94 相关文章所有 7 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Very deep transformers for neural machine translation

X Liu, K Duh, L Liu, J Gao - arxiv preprint arxiv:2008.07772, 2020 - arxiv.org

We explore the application of very deep Transformer models for Neural Machine Translation
(NMT). Using a simple yet effective initialization technique that stabilizes training, we show …

保存引用被引用次数：135 相关文章所有 2 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

UniTE: Unified translation evaluation

Y Wan, D Liu, B Yang, H Zhang, B Chen… - arxiv preprint arxiv …, 2022 - arxiv.org

Translation quality evaluation plays a crucial role in machine translation. According to the
input format, it is mainly separated into three tasks, ie, reference-only, source-only and …

保存引用被引用次数：61 相关文章所有 6 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring versatile generative language model via parameter-efficient transfer learning

Z Lin, A Madotto, P Fung - arxiv preprint arxiv:2004.03829, 2020 - arxiv.org

Fine-tuning pre-trained generative language models to down-stream language generation
tasks has shown promising results. However, this comes with the cost of having a single …

保存引用被引用次数：147 相关文章所有 4 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving neural machine translation by bidirectional training

L Ding, D Wu, D Tao - arxiv preprint arxiv:2109.07780, 2021 - arxiv.org

We present a simple and effective pretraining strategy--bidirectional training (BiT) for neural
machine translation. Specifically, we bidirectionally update the model parameters at the …

保存引用被引用次数：55 相关文章所有 4 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multilingual neural machine translation with language clustering

X Tan, J Chen, D He, Y **a, T Qin, TY Liu - arxiv preprint arxiv …, 2019 - arxiv.org

Multilingual neural machine translation (NMT), which translates multiple languages using a
single model, is of great practical importance due to its advantages in simplifying the training …

保存引用被引用次数：124 相关文章所有 5 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Explicit sparse transformer: Concentrated attention through explicit selection

G Zhao, J Lin, Z Zhang, X Ren, Q Su, X Sun - arxiv preprint arxiv …, 2019 - arxiv.org

Self-attention based Transformer has demonstrated the state-of-the-art performances in a
number of natural language processing tasks. Self-attention is able to model long-term …

保存引用被引用次数：134 相关文章所有 3 个版本 HTML 版

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Non-autoregressive neural machine translation with enhanced decoder input

J Guo, X Tan, D He, T Qin, L Xu, TY Liu - … of the AAAI conference on artificial …, 2019 - aaai.org

Non-autoregressive translation (NAT) models, which remove the dependence on previous
target tokens from the inputs of the decoder, achieve significantly inference speedup but at …

保存引用被引用次数：136 相关文章所有 7 个版本 HTML 版

创建快讯

引用

高级搜索

已保存到“我的图书馆”

Layer-wise coordination between encoder and decoder for neural machine translation

Learning deep transformer models for machine translation

Leveraging pre-trained checkpoints for sequence generation tasks

Sparse is enough in scaling transformers

Very deep transformers for neural machine translation

UniTE: Unified translation evaluation

Exploring versatile generative language model via parameter-efficient transfer learning

Improving neural machine translation by bidirectional training

Multilingual neural machine translation with language clustering

Explicit sparse transformer: Concentrated attention through explicit selection

Non-autoregressive neural machine translation with enhanced decoder input