- Academic Search

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu

Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Salva Cita Citato da 110 Articoli correlati Tutte e 10 le versioni

[Free GPT-4]

[PDF] researchgate.net

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

T Ding, T Chen, H Zhu, J Jiang, Y Zhong… - ar** the artificial general intelligence landscape …

Salva Cita Citato da 19 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Distillation from heterogeneous models for top-k recommendation

SK Kang, W Kweon, D Lee, J Lian, X **e… - Proceedings of the ACM …, 2023 - dl.acm.org

Recent recommender systems have shown remarkable performance by using an ensemble
of heterogeneous models. However, it is exceedingly costly because it requires resources …

Salva Cita Citato da 13 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] arxiv.org

A preliminary study of the intrinsic relationship between complexity and alignment

Y Zhao, B Yu, B Hui, H Yu, F Huang, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Training large language models (LLMs) with open-domain instruction data has yielded
remarkable success in aligning to end tasks and user preferences. Extensive research has …

Salva Cita Citato da 17 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] acm.org

Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender Systems

SK Kang, W Kweon, D Lee, J Lian, X **e… - ACM Transactions on …, 2024 - dl.acm.org

In recent years, recommender systems have achieved remarkable performance by using
ensembles of heterogeneous models. However, this approach is costly due to the resources …

Salva Cita Citato da 1 Articoli correlati

Multi-level curriculum learning for multi-turn dialogue generation

G Chen, R Zhan, DF Wong… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org

Since deep learning is the dominant paradigm in the multi-turn dialogue generation task,
large-scale training data is the key factor affecting the model performance. To make full use …

Salva Cita Citato da 2 Articoli correlati Tutte e 2 le versioni

Knowledge in attention assistant for improving generalization in deep teacher–student models

S Morabbi, H Soltanizadeh, S Mozaffari… - … Journal of Modelling …, 2024 - Taylor & Francis

Research on knowledge distillation has become active in deep neural networks. Knowledge
distillation involves training a low-capacity model from a high-capacity model. However …

Salva Cita Citato da 1 Articoli correlati

[Free GPT-4]

[PDF] arxiv.org

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

K Luo, Z Ding, Z Weng, L Qiao, M Zhao, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org

While Chain of Thought (CoT) prompting approaches have significantly consolidated the
reasoning capabilities of large language models (LLMs), they still face limitations that …

Salva Cita Citato da 1 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] acm.org

MedProm: Bridging Dialogue Gaps in Healthcare with Knowledge-Enhanced Generative Models

D Varshney, N Behera, P Katari, A Ekbal - ACM Transactions on …, 2025 - dl.acm.org

In medical dialogue systems, recent advancements underscore the critical role of
incorporating relevant medical knowledge to enhance performance. However, existing …

Salva Cita Articoli correlati

[Free GPT-4]

[PDF] arxiv.org

Paced-curriculum distillation with prediction and label uncertainty for image segmentation

M Islam, L Seenivasan, SP Sharan, VK Viekash… - International Journal of …, 2023 - Springer

Purpose In curriculum learning, the idea is to train on easier samples first and gradually
increase the difficulty, while in self-paced learning, a pacing function defines the speed to …

Salva Cita Citato da 5 Articoli correlati Tutte e 7 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Combining curriculum learning and knowledge distillation for dialogue generation

Efficient methods for natural language processing: A survey

[PDF][PDF] The efficiency spectrum of large language models: An algorithmic survey

Distillation from heterogeneous models for top-k recommendation

A preliminary study of the intrinsic relationship between complexity and alignment

Unbiased, Effective, and Efficient Distillation from Heterogeneous Models for Recommender Systems

Multi-level curriculum learning for multi-turn dialogue generation

Knowledge in attention assistant for improving generalization in deep teacher–student models

Let's Be Self-generated via Step by Step: A Curriculum Learning Approach to Automated Reasoning with Large Language Models

MedProm: Bridging Dialogue Gaps in Healthcare with Knowledge-Enhanced Generative Models

Paced-curriculum distillation with prediction and label uncertainty for image segmentation