Knowledge distillation: A survey
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …
especially for computer vision tasks. The great success of deep learning is mainly due to its …
Sequence-level knowledge distillation
Neural machine translation (NMT) offers a novel alternative formulation of translation that is
potentially simpler than statistical approaches. However to reach competitive performance …
potentially simpler than statistical approaches. However to reach competitive performance …
Bam! born-again multi-task networks for natural language understanding
It can be challenging to train multi-task neural networks that outperform or even match their
single-task counterparts. To help address this, we propose using knowledge distillation …
single-task counterparts. To help address this, we propose using knowledge distillation …
Massively multilingual transfer for NER
In cross-lingual transfer, NLP models over one or more source languages are applied to a
low-resource target language. While most prior work has used a single source model or a …
low-resource target language. While most prior work has used a single source model or a …
Head-driven phrase structure grammar parsing on Penn treebank
Head-driven phrase structure grammar (HPSG) enjoys a uniform formalism representing rich
contextual syntactic and even semantic meanings. This paper makes the first attempt to …
contextual syntactic and even semantic meanings. This paper makes the first attempt to …
Graph-based dependency parsing with graph neural networks
We investigate the problem of efficiently incorporating high-order features into neural graph-
based dependency parsing. Instead of explicitly extracting high-order features from …
based dependency parsing. Instead of explicitly extracting high-order features from …
Rethinking self-attention: Towards interpretability in neural parsing
Attention mechanisms have improved the performance of NLP tasks while allowing models
to remain explainable. Self-attention is currently widely used, however interpretability is …
to remain explainable. Self-attention is currently widely used, however interpretability is …
Model compression with two-stage multi-teacher knowledge distillation for web question answering system
Deep pre-training and fine-tuning models (such as BERT and OpenAI GPT) have
demonstrated excellent results in question answering areas. However, due to the sheer …
demonstrated excellent results in question answering areas. However, due to the sheer …
What do recurrent neural network grammars learn about syntax?
Recurrent neural network grammars (RNNG) are a recently proposed probabilistic
generative modeling family for natural language. They show state-of-the-art language …
generative modeling family for natural language. They show state-of-the-art language …
Deep multitask learning for semantic dependency parsing
We present a deep neural architecture that parses sentences into three semantic
dependency graph formalisms. By using efficient, nearly arc-factored inference and a …
dependency graph formalisms. By using efficient, nearly arc-factored inference and a …