A survey of deep learning techniques for neural machine translation

S Yang, Y Wang, X Chu - arxiv preprint arxiv:2002.07526, 2020 - arxiv.org
In recent years, natural language processing (NLP) has got great development with deep
learning techniques. In the sub-field of machine translation, a new approach named Neural …

Hierarchical Bayesian nonparametric models with applications

YW Teh, MI Jordan - Bayesian nonparametrics, 2010 - books.google.com
Hierarchical modeling is a fundamental concept in Bayesian statistics. The basic idea is that
parameters are endowed with distributions which may themselves introduce new …

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arxiv preprint arxiv …, 2020 - arxiv.org
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

One billion word benchmark for measuring progress in statistical language modeling

C Chelba, T Mikolov, M Schuster, Q Ge… - arxiv preprint arxiv …, 2013 - arxiv.org
We propose a new benchmark corpus to be used for measuring progress in statistical
language modeling. With almost one billion words of training data, we hope this benchmark …

Unsupervised grouped axial data modeling via hierarchical Bayesian nonparametric models with Watson distributions

W Fan, L Yang, N Bouguila - IEEE Transactions on Pattern …, 2021 - ieeexplore.ieee.org
This paper aims at proposing an unsupervised hierarchical nonparametric Bayesian
framework for modeling axial data (ie, observations are axes of direction) that can be …

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arxiv preprint arxiv …, 2021 - arxiv.org
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

Discriminative clustering by regularized information maximization

A Krause, P Perona, R Gomes - Advances in neural …, 2010 - proceedings.neurips.cc
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled
data set? We present a framework that simultaneously clusters the data and trains a …

[PDF][PDF] Dirichlet process.

YW Teh - Encyclopedia of machine learning, 2010 - Citeseer
The Dirichlet process is a stochastic proces used in Bayesian nonparametric models of data,
particularly in Dirichlet process mixture models (also known as infinite mixture models). It is …

Mondrian forests: Efficient online random forests

B Lakshminarayanan, DM Roy… - Advances in neural …, 2014 - proceedings.neurips.cc
Ensembles of randomized decision trees, usually referred to as random forests, are widely
used for classification and regression tasks in machine learning and statistics. Random …

[PDF][PDF] Distance dependent Chinese restaurant processes.

DM Blei, PI Frazier - Journal of Machine Learning Research, 2011 - jmlr.org
We develop the distance dependent Chinese restaurant process, a flexible class of
distributions over partitions that allows for dependencies between the elements. This class …