Pre-trained models for natural language processing: A survey

X Qiu, T Sun, Y Xu, Y Shao, N Dai, X Huang - Science China …, 2020 - Springer
Recently, the emergence of pre-trained models (PTMs) has brought natural language
processing (NLP) to a new era. In this survey, we provide a comprehensive review of PTMs …

A survey on deep learning for named entity recognition

J Li, A Sun, J Han, C Li - IEEE transactions on knowledge and …, 2020 - ieeexplore.ieee.org
Named entity recognition (NER) is the task to identify mentions of rigid designators from text
belonging to predefined semantic types such as person, location, organization etc. NER …

An empirical evaluation of generic convolutional and recurrent networks for sequence modeling

S Bai, JZ Kolter, V Koltun - arxiv preprint arxiv:1803.01271, 2018 - arxiv.org
For most deep learning practitioners, sequence modeling is synonymous with recurrent
networks. Yet recent results indicate that convolutional architectures can outperform …

On the variance of the adaptive learning rate and beyond

L Liu, H Jiang, P He, W Chen, X Liu, J Gao… - arxiv preprint arxiv …, 2019 - arxiv.org
The learning rate warmup heuristic achieves remarkable success in stabilizing training,
accelerating convergence and improving generalization for adaptive stochastic optimization …

Reducing transformer depth on demand with structured dropout

A Fan, E Grave, A Joulin - arxiv preprint arxiv:1909.11556, 2019 - arxiv.org
Overparameterized transformer networks have obtained state of the art results in various
natural language processing tasks, such as machine translation, language modeling, and …

MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance

W Zhao, M Peyrard, F Liu, Y Gao, CM Meyer… - arxiv preprint arxiv …, 2019 - arxiv.org
A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …

Dissecting contextual word embeddings: Architecture and representation

ME Peters, M Neumann, L Zettlemoyer… - arxiv preprint arxiv …, 2018 - arxiv.org
Contextual word representations derived from pre-trained bidirectional language models
(biLMs) have recently been shown to provide significant improvements to the state of the art …

Structured pruning of large language models

Z Wang, J Wohlwend, T Lei - arxiv preprint arxiv:1910.04732, 2019 - arxiv.org
Large language models have recently achieved state of the art performance across a wide
variety of natural language tasks. Meanwhile, the size of these models and their latency …

Named entity extraction for knowledge graphs: A literature overview

T Al-Moslmi, MG Ocaña, AL Opdahl, C Veres - IEEE Access, 2020 - ieeexplore.ieee.org
An enormous amount of digital information is expressed as natural-language (NL) text that is
not easily processable by computers. Knowledge Graphs (KG) offer a widely used format for …

Framework for deep learning-based language models using multi-task learning in natural language understanding: A systematic literature review and future directions

RM Samant, MR Bachute, S Gite, K Kotecha - IEEE Access, 2022 - ieeexplore.ieee.org
Learning human languages is a difficult task for a computer. However, Deep Learning (DL)
techniques have enhanced performance significantly for almost all-natural language …