Language modeling with gated convolutional networks

YN Dauphin, A Fan, M Auli… - … conference on machine …, 2017 - proceedings.mlr.press
The pre-dominant approach to language modeling to date is based on recurrent neural
networks. Their success on this task is often linked to their ability to capture unbounded …

[BOOK][B] Neural network methods in natural language processing

Y Goldberg - 2017 - books.google.com
Neural networks are a family of powerful machine learning models and this book focuses on
their application to natural language data. The first half of the book (Parts I and II) covers the …

Fasttext. zip: Compressing text classification models

A Joulin, E Grave, P Bojanowski, M Douze… - arxiv preprint arxiv …, 2016 - arxiv.org
We consider the problem of producing compact architectures for text classification, such that
the full model fits in a limited amount of memory. After considering different solutions …

Towards energy-efficient deep learning: An overview of energy-efficient approaches along the deep learning lifecycle

V Mehlin, S Schacht, C Lanquillon - arxiv preprint arxiv:2303.01980, 2023 - arxiv.org
Deep Learning has enabled many advances in machine learning applications in the last few
years. However, since current Deep Learning algorithms require much energy for …

Pre-training tasks for embedding-based large-scale retrieval

WC Chang, FX Yu, YW Chang, Y Yang… - arxiv preprint arxiv …, 2020 - arxiv.org
We consider the large-scale query-document retrieval problem: given a query (eg, a
question), return the set of relevant documents (eg, paragraphs containing the answer) from …

[PDF][PDF] Jurassic-1: Technical details and evaluation

O Lieber, O Sharir, B Lenz, Y Shoham - White Paper. AI21 Labs, 2021 - sharir.org
Jurassic-1 is a pair of auto-regressive language models recently released by AI21 Labs,
consisting of J1-Jumbo, a 178B-parameter model, and J1-Large, a 7B-parameter model. We …

An introduction to neural information retrieval

B Mitra, N Craswell - Foundations and Trends® in Information …, 2018 - nowpublishers.com
Neural ranking models for information retrieval (IR) use shallow or deep neural networks to
rank search results in response to a query. Traditional learning to rank models employ …

Nonparametric masked language modeling

S Min, W Shi, M Lewis, X Chen, W Yih… - arxiv preprint arxiv …, 2022 - arxiv.org
Existing language models (LMs) predict tokens with a softmax over a finite vocabulary,
which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first …

Learning visual features from large weakly supervised data

A Joulin, L Van Der Maaten, A Jabri… - Computer Vision–ECCV …, 2016 - Springer
Convolutional networks trained on large supervised datasets produce visual features which
form the basis for the state-of-the-art in many computer-vision problems. Further …

Exploring sparsity in recurrent neural networks

S Narang, E Elsen, G Diamos, S Sengupta - arxiv preprint arxiv …, 2017 - arxiv.org
Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the
quantity of data and the amount of available compute have increased, so have model sizes …