On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arxiv preprint arxiv …, 2023 - arxiv.org
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Towards making the most of chatgpt for machine translation

K Peng, L Ding, Q Zhong, L Shen, X Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies
have shown that it achieves comparable results to commercial systems for high-resource …

A survey on non-autoregressive generation for neural machine translation and beyond

Y **ao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

Improving sharpness-aware minimization with fisher mask for better generalization on language models

Q Zhong, L Ding, L Shen, P Mi, J Liu, B Du… - arxiv preprint arxiv …, 2022 - arxiv.org
Fine-tuning large pretrained language models on a limited training corpus usually suffers
from poor generalization. Prior works show that the recently-proposed sharpness-aware …

Token-level self-evolution training for sequence-to-sequence learning

K Peng, L Ding, Q Zhong, Y Ouyang… - Proceedings of the …, 2023 - aclanthology.org
Adaptive training approaches, widely used in sequence-to-sequence models, commonly
reweigh the losses of different target tokens based on priors, eg word frequency. However …

Where Does the Performance Improvement Come From? -A Reproducibility Concern about Image-Text Retrieval

J Rao, F Wang, L Ding, S Qi, Y Zhan, W Liu… - Proceedings of the 45th …, 2022 - dl.acm.org
This article aims to provide the information retrieval community with some reflections on
recent advances in retrieval learning by analyzing the reproducibility of image-text retrieval …

Revisiting catastrophic forgetting in large language model tuning

H Li, L Ding, M Fang, D Tao - arxiv preprint arxiv:2406.04836, 2024 - arxiv.org
Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when
learning new data. It compromises the effectiveness of large language models (LLMs) …

Selectit: Selective instruction tuning for large language models via uncertainty-aware self-reflection

L Liu, X Liu, DF Wong, D Li, Z Wang, B Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-
centric interactions. Recent advancements have shown that the careful selection of a small …

Redistributing low-frequency words: Making the most of monolingual data in non-autoregressive translation

L Ding, L Wang, S Shi, D Tao, Z Tu - … of the 60th Annual Meeting of …, 2022 - aclanthology.org
Abstract Knowledge distillation (KD) is the preliminary step for training non-autoregressive
translation (NAT) models, which eases the training of NAT models at the cost of losing …