- Academic Search

T Lei, J Bai, S Brahma, J Ainslie… - Advances in …, 2023 - proceedings.neurips.cc

Abstract We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning
method that also improves inference efficiency. CoDA generalizes beyond standard adapter …

Save Cite Cited by 54 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Distillspec: Improving speculative decoding via knowledge distillation

Y Zhou, K Lyu, AS Rawat, AK Menon… - arxiv preprint arxiv …, 2023 - arxiv.org

Speculative decoding (SD) accelerates large language model inference by employing a
faster draft model for generating multiple tokens, which are then verified in parallel by the …

Save Cite Cited by 64 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Gkd: Generalized knowledge distillation for auto-regressive sequence models

R Agarwal, N Vieillard, P Stanczyk, S Ramos… - arxiv preprint arxiv …, 2023 - arxiv.org

Knowledge distillation is commonly used for compressing neural networks to reduce their
inference cost and memory footprint. However, current distillation methods for auto …

Save Cite Cited by 61 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

When attention meets fast recurrence: Training language models with reduced compute

T Lei - arxiv preprint arxiv:2102.12459, 2021 - arxiv.org

Large language models have become increasingly difficult to train because of the growing
computation time and cost. In this work, we present SRU++, a highly-efficient architecture …

Save Cite Cited by 65 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Learning to generate better than your llm

JD Chang, K Brantley, R Ramamurthy, D Misra… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large
Language Models (LLMs) for conditional text generation. In particular, recent LLMs such as …

Save Cite Cited by 36 Related articles All 3 versions Free GPT-4 View as HTML

Driver behavioral cloning for route following in autonomous vehicles using task knowledge distillation

G Li, Z Ji, S Li, X Luo, X Qu - IEEE Transactions on Intelligent …, 2022 - ieeexplore.ieee.org

Planning appropriate driving trajectory for route following is an important function for
autonomous driving. Behavioral cloning, which allows automatic trajectory learning and …

Save Cite Cited by 37 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Distillm: Towards streamlined distillation for large language models

J Ko, S Kim, T Chen, SY Yun - arxiv preprint arxiv:2402.03898, 2024 - arxiv.org

Knowledge distillation (KD) is widely used for compressing a teacher model to a smaller
student model, reducing its inference cost and memory footprint while preserving model …

Save Cite Cited by 25 Related articles All 3 versions Free GPT-4 View as HTML

Multi-teacher distillation with single model for neural machine translation

X Liang, L Wu, J Li, T Qin, M Zhang… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

Knowledge distillation (KD) is an effective strategy for neural machine translation (NMT) to
improve the performance of a student model. Usually, the teacher can guide the student to …

Save Cite Cited by 16 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Teaching autoregressive language models complex tasks by demonstration

G Recchia - arxiv preprint arxiv:2109.02102, 2021 - arxiv.org

This paper demonstrates that by fine-tuning an autoregressive language model (GPT-Neo)
on appropriately structured step-by-step demonstrations, it is possible to teach it to execute a …

Save Cite Cited by 23 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Target-side augmentation for document-level machine translation

G Bao, Z Teng, Y Zhang - arxiv preprint arxiv:2305.04505, 2023 - arxiv.org

Document-level machine translation faces the challenge of data sparsity due to its long input
length and a small amount of training data, increasing the risk of learning spurious patterns …

Save Cite Cited by 10 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Autoregressive knowledge distillation through imitation learning

Conditional adapters: Parameter-efficient transfer learning with fast inference

Distillspec: Improving speculative decoding via knowledge distillation

Gkd: Generalized knowledge distillation for auto-regressive sequence models

When attention meets fast recurrence: Training language models with reduced compute

Learning to generate better than your llm

Driver behavioral cloning for route following in autonomous vehicles using task knowledge distillation

Distillm: Towards streamlined distillation for large language models

Multi-teacher distillation with single model for neural machine translation

Teaching autoregressive language models complex tasks by demonstration

Target-side augmentation for document-level machine translation