„Google“ mokslinčius

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arxiv preprint arxiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Išsaugoti Cituoti Cituoja 41 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards making the most of chatgpt for machine translation

K Peng, L Ding, Q Zhong, L Shen, X Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies
have shown that it achieves comparable results to commercial systems for high-resource …

Išsaugoti Cituoti Cituoja 263 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on non-autoregressive generation for neural machine translation and beyond

Y **ao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Išsaugoti Cituoti Cituoja 94 Susiję straipsniai Visos 9 versijos

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org

The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

Išsaugoti Cituoti Cituoja 2 Susiję straipsniai Visos 2 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving sharpness-aware minimization with fisher mask for better generalization on language models

Q Zhong, L Ding, L Shen, P Mi, J Liu, B Du… - arxiv preprint arxiv …, 2022 - arxiv.org

Fine-tuning large pretrained language models on a limited training corpus usually suffers
from poor generalization. Prior works show that the recently-proposed sharpness-aware …

Išsaugoti Cituoti Cituoja 54 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Token-level self-evolution training for sequence-to-sequence learning

K Peng, L Ding, Q Zhong, Y Ouyang… - Proceedings of the …, 2023 - aclanthology.org

Adaptive training approaches, widely used in sequence-to-sequence models, commonly
reweigh the losses of different target tokens based on priors, eg word frequency. However …

Išsaugoti Cituoti Cituoja 23 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Where Does the Performance Improvement Come From? -A Reproducibility Concern about Image-Text Retrieval

J Rao, F Wang, L Ding, S Qi, Y Zhan, W Liu… - Proceedings of the 45th …, 2022 - dl.acm.org

This article aims to provide the information retrieval community with some reflections on
recent advances in retrieval learning by analyzing the reproducibility of image-text retrieval …

Išsaugoti Cituoti Cituoja 42 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revisiting catastrophic forgetting in large language model tuning

H Li, L Ding, M Fang, D Tao - arxiv preprint arxiv:2406.04836, 2024 - arxiv.org

Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when
learning new data. It compromises the effectiveness of large language models (LLMs) …

Išsaugoti Cituoti Cituoja 12 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Selectit: Selective instruction tuning for large language models via uncertainty-aware self-reflection

L Liu, X Liu, DF Wong, D Li, Z Wang, B Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-
centric interactions. Recent advancements have shown that the careful selection of a small …

Išsaugoti Cituoti Cituoja 11 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation

L Ding, L Wang, S Shi, D Tao, Z Tu - … of the 60th Annual Meeting of …, 2022 - aclanthology.org

Abstract Knowledge distillation (KD) is the preliminary step for training non-autoregressive
translation (NAT) models, which eases the training of NAT models at the cost of losing …

Išsaugoti Cituoti Cituoja 34 Susiję straipsniai Visos 2 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Progressive multi-granularity training for non-autoregressive translation

On efficient training of large-scale deep learning models: A literature review

Towards making the most of chatgpt for machine translation

A survey on non-autoregressive generation for neural machine translation and beyond

On Efficient Training of Large-Scale Deep Learning Models

Improving sharpness-aware minimization with fisher mask for better generalization on language models

Token-level self-evolution training for sequence-to-sequence learning

Where Does the Performance Improvement Come From? -A Reproducibility Concern about Image-Text Retrieval

Revisiting catastrophic forgetting in large language model tuning

Selectit: Selective instruction tuning for large language models via uncertainty-aware self-reflection

Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation