Probabilistic machine learning and artificial intelligence

Z Ghahramani - Nature, 2015 - nature.com
How can a machine learn from experience? Probabilistic modelling provides a framework
for understanding what learning is, and has therefore emerged as one of the principal …

Dive into deep learning

A Zhang, ZC Lipton, M Li, AJ Smola - arxiv preprint arxiv:2106.11342, 2021 - arxiv.org
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP

SJ Mielke, Z Alyafeai, E Salesky, C Raffel… - arxiv preprint arxiv …, 2021 - arxiv.org
What are the units of text that we want to model? From bytes to multi-word expressions, text
can be analyzed and generated at many granularities. Until recently, most natural language …

Distribution theory for hierarchical processes

F Camerlenghi, A Lijoi, P Orbanz, I Prünster - 2019 - projecteuclid.org
Distribution theory for hierarchical processes Page 1 The Annals of Statistics 2019, Vol. 47, No.
1, 67–92 https://doi.org/10.1214/17-AOS1678 © Institute of Mathematical Statistics, 2019 …

Temporal sequence modeling for video event detection

Y Cheng, Q Fan, S Pankanti… - Proceedings of the …, 2014 - openaccess.thecvf.com
We present a novel approach for event detection in video by temporal sequence modeling.
Exploiting temporal information has lain at the core of many approaches for video analysis …

Stream-based joint exploration-exploitation active learning

CC Loy, TM Hospedales, T **ang… - 2012 IEEE Conference …, 2012 - ieeexplore.ieee.org
Learning from streams of evolving and unbounded data is an important problem, for
example in visual surveillance or internet scale data. For such large and evolving real-world …

Evaluating distributional distortion in neural language modeling

B LeBrun, A Sordoni, TJ O'Donnell - arxiv preprint arxiv:2203.12788, 2022 - arxiv.org
A fundamental characteristic of natural language is the high rate at which speakers produce
novel expressions. Because of this novelty, a heavy-tail of rare events accounts for a …

A subsequence interleaving model for sequential pattern mining

J Fowkes, C Sutton - Proceedings of the 22nd ACM SIGKDD …, 2016 - dl.acm.org
Recent sequential pattern mining methods have used the minimum description length (MDL)
principle to define an encoding scheme which describes an algorithm for mining the most …

Capturing structural locality in non-parametric language models

FF Xu, J He, G Neubig, VJ Hellendoorn - arxiv preprint arxiv:2110.02870, 2021 - arxiv.org
Structural locality is a ubiquitous feature of real-world datasets, wherein data points are
organized into local hierarchies. Some examples include topical clusters in text or project …

The importance of generation order in language modeling

N Ford, D Duckworth, M Norouzi, GE Dahl - arxiv preprint arxiv …, 2018 - arxiv.org
Neural language models are a critical component of state-of-the-art systems for machine
translation, summarization, audio transcription, and other tasks. These language models are …