Transformers aftermath: Current research and rising trends
ESD Reis, CAD Costa, DED Silveira… - Communications of the …, 2021 - dl.acm.org
Transformers aftermath: current research and rising trends Page 1 154 COMMUNICATIONS
OF THE ACM | APRIL 2021 | VOL. 64 | NO. 4 review articles NATURAL LANGUAGE …
OF THE ACM | APRIL 2021 | VOL. 64 | NO. 4 review articles NATURAL LANGUAGE …
Structure-level knowledge distillation for multilingual sequence labeling
Multilingual sequence labeling is a task of predicting label sequences using a single unified
model for multiple languages. Compared with relying on multiple monolingual models, using …
model for multiple languages. Compared with relying on multiple monolingual models, using …
[PDF][PDF] Self-attentive Biaffine Dependency Parsing.
The current state-of-the-art dependency parsing approaches employ BiLSTMs to encode
input sentences. Motivated by the success of the transformer-based machine translation, this …
input sentences. Motivated by the success of the transformer-based machine translation, this …
Scalable syntax-aware language models using knowledge distillation
Prior work has shown that, on small amounts of training data, syntactic neural language
models learn structurally sensitive generalisations more successfully than sequential …
models learn structurally sensitive generalisations more successfully than sequential …
Evaluating explanation methods for neural machine translation
Recently many efforts have been devoted to interpreting the black-box NMT models, but little
progress has been made on metrics to evaluate explanation methods. Word Alignment Error …
progress has been made on metrics to evaluate explanation methods. Word Alignment Error …
Distilling neural networks for greener and faster dependency parsing
M Anderson, C Gómez-Rodríguez - arxiv preprint arxiv:2006.00844, 2020 - arxiv.org
The carbon footprint of natural language processing research has been increasing in recent
years due to its reliance on large and inefficient neural network implementations. Distillation …
years due to its reliance on large and inefficient neural network implementations. Distillation …
Improved training of mixture-of-experts language gans
Despite the dramatic success in image generation, Generative Adversarial Networks (GANs)
still face great challenges in text generation. The difficulty in generator training arises from …
still face great challenges in text generation. The difficulty in generator training arises from …
Predicting events in moba games: Prediction, attribution, and evaluation
The multiplayer online battle arena (MOBA) games have become increasingly popular in
recent years. Consequently, many efforts have been devoted to providing pregame or in …
recent years. Consequently, many efforts have been devoted to providing pregame or in …
RU-SURE? uncertainty-aware code suggestions by maximizing utility across random user intents
Large language models show impressive results at predicting structured text such as code,
but also commonly introduce errors and hallucinations in their output. When used to assist …
but also commonly introduce errors and hallucinations in their output. When used to assist …
Knowledge base embedding by cooperative knowledge distillation
Knowledge bases are increasingly exploited as gold standard data sources which benefit
various knowledge-driven NLP tasks. In this paper, we explore a new research direction to …
various knowledge-driven NLP tasks. In this paper, we explore a new research direction to …