Convergence guarantees for the Good-Turing estimator

A Painsky - Journal of Machine Learning Research, 2022 - jmlr.org
Consider a finite sample from an unknown distribution over a countable alphabet. The
occupancy probability (OP) refers to the total probability of symbols that appear exactly k …

Bayesian Nonparametric Inference for" Species-sampling" Problems

C Balocchi, S Favaro, Z Naulet - arxiv preprint arxiv:2203.06076, 2022 - arxiv.org
Given an observed sample from a population of individuals belonging to species," species-
sampling" problems (SSPs) call for estimating some features of the unknown species …

Generalized Good-Turing improves missing mass estimation

A Painsky - Journal of the American Statistical Association, 2023 - Taylor & Francis
Consider a finite sample from an unknown distribution over a countable alphabet. The
missing mass refers to the probability of symbols that do not appear in the sample …

Confidence intervals for parameters of unobserved events

A Painsky - Journal of the American Statistical Association, 2024 - Taylor & Francis
Consider a finite sample from an unknown distribution over a countable alphabet.
Unobserved events are alphabet symbols which do not appear in the sample. Estimating the …

Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence

A Pananjady, V Muthukumar, A Thangaraj - Journal of Machine Learning …, 2024 - jmlr.org
We study the problem of estimating the stationary mass---also called the unigram mass---
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …

Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity

J Arbel, K Mengersen, J Rousseau - The Annals of Applied Statistics, 2016 - JSTOR
We introduce a dependent Bayesian nonparametric model for the probabilistic modeling of
membership of subgroups in a community based on partially replicated data. The focus here …

Bayesian nonparametric inference for discovery probabilities: Credible intervals and large sample asymptiotics

J Arbel, S Favaro, B Nipoti, YW Teh - Statistica Sinica, 2017 - JSTOR
Given a sample of size n from a population of individuals belonging to different species with
unknown proportions, a problem of practical interest consists in making inference on the …

[PDF][PDF] Just wing it: optimal estimation of missing mass in a Markovian sequence

A Pananjady, V Muthukumar, A Thangaraj - stat, 2024 - dp-ai-application.oss-cn …
We study the problem of estimating the stationary mass—also called the unigram mass—
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …

Asymptotic properties of Turing's formula in relative error

M Grabchak, Z Zhang - Machine Learning, 2017 - Springer
Turing's formula allows one to estimate the total probability associated with letters from an
alphabet, which are not observed in a random sample. In this paper we give conditions for …

Bayesian calculus and predictive characterizations of extended feature allocation models

M Beraha, F Camerlenghi, L Ghilotti - arxiv preprint arxiv:2502.10257, 2025 - arxiv.org
We introduce and study a unified Bayesian framework for extended feature allocations
which flexibly captures interactions--such as repulsion or attraction--among features and …