Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Distillation from heterogeneous models for top-k recommendation

SK Kang, W Kweon, D Lee, J Lian, X **e… - Proceedings of the ACM …, 2023 - dl.acm.org
Recent recommender systems have shown remarkable performance by using an ensemble
of heterogeneous models. However, it is exceedingly costly because it requires resources …

A preliminary study of the intrinsic relationship between complexity and alignment

Y Zhao, B Yu, B Hui, H Yu, F Huang, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Training large language models (LLMs) with open-domain instruction data has yielded
remarkable success in aligning to end tasks and user preferences. Extensive research has …

Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender Systems

SK Kang, W Kweon, D Lee, J Lian, X **e… - ACM Transactions on …, 2024 - dl.acm.org
In recent years, recommender systems have achieved remarkable performance by using
ensembles of heterogeneous models. However, this approach is costly due to the resources …

Multi-level curriculum learning for multi-turn dialogue generation

G Chen, R Zhan, DF Wong… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
Since deep learning is the dominant paradigm in the multi-turn dialogue generation task,
large-scale training data is the key factor affecting the model performance. To make full use …

Knowledge in attention assistant for improving generalization in deep teacher–student models

S Morabbi, H Soltanizadeh, S Mozaffari… - … Journal of Modelling …, 2024 - Taylor & Francis
Research on knowledge distillation has become active in deep neural networks. Knowledge
distillation involves training a low-capacity model from a high-capacity model. However …

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

K Luo, Z Ding, Z Weng, L Qiao, M Zhao, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
While Chain of Thought (CoT) prompting approaches have significantly consolidated the
reasoning capabilities of large language models (LLMs), they still face limitations that …

MedProm: Bridging Dialogue Gaps in Healthcare with Knowledge-Enhanced Generative Models

D Varshney, N Behera, P Katari, A Ekbal - ACM Transactions on …, 2025 - dl.acm.org
In medical dialogue systems, recent advancements underscore the critical role of
incorporating relevant medical knowledge to enhance performance. However, existing …

Paced-curriculum distillation with prediction and label uncertainty for image segmentation

M Islam, L Seenivasan, SP Sharan, VK Viekash… - International Journal of …, 2023 - Springer
Purpose In curriculum learning, the idea is to train on easier samples first and gradually
increase the difficulty, while in self-paced learning, a pacing function defines the speed to …