Word segmentation on discovered phone units with dynamic programming and self-supervised scoring

H Kamper - IEEE/ACM Transactions on Audio, Speech, and …, 2022 - ieeexplore.ieee.org
Recent work on unsupervised speech segmentation has used self-supervised models with
phone and word segmentation modules that are trained jointly. This paper instead revisits …

Segmenting continuous motions with hidden semi-markov models and gaussian processes

T Nakamura, T Nagai, D Mochihashi… - Frontiers in …, 2017 - frontiersin.org
Humans divide perceived continuous information into segments to facilitate recognition. For
example, humans can segment speech waves into recognizable morphemes. Analogously …

HVGH: unsupervised segmentation for high-dimensional time series using deep neural compression and statistical generative model

M Nagano, T Nakamura, T Nagai… - Frontiers in Robotics …, 2019 - frontiersin.org
Humans perceive continuous high-dimensional information by dividing it into meaningful
segments, such as words and units of motion. We believe that such unsupervised …

Design and structure of the Juman++ morphological analyzer toolkit

A Tolmachev, D Kawahara… - Journal of Natural …, 2020 - jstage.jst.go.jp
An NLP tool is practical when it is fast in addition to having high accuracy. We describe the
architecture and the used methods to achieve 250× analysis speed improvement on the …

Unsupervised word segmentation with bi-directional neural language model

L Wang, X Zheng - ACM Transactions on Asian and Low-Resource …, 2022 - dl.acm.org
We propose an unsupervised word segmentation model, in which for each unlabelled
sentence sample, the learning objective is to maximize the generation probability of the …

A masked segmental language model for unsupervised natural language segmentation

CM Downey, F **a, GA Levow… - arxiv preprint arxiv …, 2021 - arxiv.org
Segmentation remains an important preprocessing step both in languages where" words" or
other important syntactic/semantic units (like morphemes) are not clearly delineated by white …

Sequence pattern extraction by segmenting time series data using GP-HSMM with hierarchical Dirichlet process

M Nagano, T Nakamura, T Nagai… - 2018 IEEE/RSJ …, 2018 - ieeexplore.ieee.org
Humans recognize perceived continuous information by dividing it into significant segments
such as words and unit motions. We believe that such unsupervised segmentation is also an …

Learning word meanings and grammar for verbalization of daily life activities using multilayered multimodal latent Dirichlet allocation and Bayesian hidden Markov …

M Attamimi, Y Ando, T Nakamura, T Nagai… - Advanced …, 2016 - Taylor & Francis
Intelligent systems need to understand and respond to human words to enable them to
interact with humans in a natural way. Several studies attempted to realize these abilities by …

Weakly supervised word segmentation for computational language documentation

S Okabe, L Besacier, F Yvon - Annual meeting of the Association for …, 2022 - hal.science
Word and morpheme segmentation are fundamental steps of language documentation as
they allow to discover lexical units in a language for which the lexicon is unknown. However …

Multilingual unsupervised sequence segmentation transfers to extremely low-resource languages

CM Downey, S Drizin, L Haroutunian… - arxiv preprint arxiv …, 2021 - arxiv.org
We show that unsupervised sequence-segmentation performance can be transferred to
extremely low-resource languages by pre-training a Masked Segmental Language Model …