Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Sequence-level knowledge distillation

Y Kim, AM Rush - arxiv preprint arxiv:1606.07947, 2016 - arxiv.org
Neural machine translation (NMT) offers a novel alternative formulation of translation that is
potentially simpler than statistical approaches. However to reach competitive performance …

Bam! born-again multi-task networks for natural language understanding

K Clark, MT Luong, U Khandelwal, CD Manning… - arxiv preprint arxiv …, 2019 - arxiv.org
It can be challenging to train multi-task neural networks that outperform or even match their
single-task counterparts. To help address this, we propose using knowledge distillation …

Massively multilingual transfer for NER

A Rahimi, Y Li, T Cohn - arxiv preprint arxiv:1902.00193, 2019 - arxiv.org
In cross-lingual transfer, NLP models over one or more source languages are applied to a
low-resource target language. While most prior work has used a single source model or a …

Head-driven phrase structure grammar parsing on Penn treebank

J Zhou, H Zhao - arxiv preprint arxiv:1907.02684, 2019 - arxiv.org
Head-driven phrase structure grammar (HPSG) enjoys a uniform formalism representing rich
contextual syntactic and even semantic meanings. This paper makes the first attempt to …

Graph-based dependency parsing with graph neural networks

T Ji, Y Wu, M Lan - Proceedings of the 57th Annual Meeting of the …, 2019 - aclanthology.org
We investigate the problem of efficiently incorporating high-order features into neural graph-
based dependency parsing. Instead of explicitly extracting high-order features from …

Rethinking self-attention: Towards interpretability in neural parsing

K Mrini, F Dernoncourt, Q Tran, T Bui, W Chang… - arxiv preprint arxiv …, 2019 - arxiv.org
Attention mechanisms have improved the performance of NLP tasks while allowing models
to remain explainable. Self-attention is currently widely used, however interpretability is …

Model compression with two-stage multi-teacher knowledge distillation for web question answering system

Z Yang, L Shou, M Gong, W Lin, D Jiang - Proceedings of the 13th …, 2020 - dl.acm.org
Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have
demonstrated excellent results in question answering areas. However, due to the sheer …

What do recurrent neural network grammars learn about syntax?

A Kuncoro, M Ballesteros, L Kong, C Dyer… - arxiv preprint arxiv …, 2016 - arxiv.org
Recurrent neural network grammars (RNNG) are a recently proposed probabilistic
generative modeling family for natural language. They show state-of-the-art language …

Deep multitask learning for semantic dependency parsing

H Peng, S Thomson, NA Smith - arxiv preprint arxiv:1704.06855, 2017 - arxiv.org
We present a deep neural architecture that parses sentences into three semantic
dependency graph formalisms. By using efficient, nearly arc-factored inference and a …