Shapeformer: Transformer-based shape completion via sparse representation

X Yan, L Lin, NJ Mitra, D Lischinski… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present ShapeFormer, a transformer-based network that produces a distribution of
object completions, conditioned on incomplete, and possibly noisy, point clouds. The …

Vector quantization for recommender systems: a review and outlook

Q Liu, X Dong, J **ao, N Chen, H Hu, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Vector quantization, renowned for its unparalleled feature compression capabilities, has
been a prominent topic in signal processing and machine learning research for several …

Generative spoken dialogue language modeling

TA Nguyen, E Kharitonov, J Copet, Y Adi… - Transactions of the …, 2023 - direct.mit.edu
We introduce dGSLM, the first “textless” model able to generate audio samples of naturalistic
spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with …

MT3: Multi-task multitrack music transcription

J Gardner, I Simon, E Manilow, C Hawthorne… - arxiv preprint arxiv …, 2021 - arxiv.org
Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a
challenging task at the core of music understanding. Unlike Automatic Speech Recognition …

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

S Cuervo, A Lancucki, R Marxer… - Advances in …, 2022 - proceedings.neurips.cc
The success of deep learning comes from its ability to capture the hierarchical structure of
data by learning high-level representations defined in terms of low-level ones. In this paper …

Towards learning discrete representations via self-supervision for wearables-based human activity recognition

H Haresamudram, I Essa, T Ploetz - Sensors, 2024 - mdpi.com
Human activity recognition (HAR) in wearable and ubiquitous computing typically involves
translating sensor readings into feature representations, either derived through dedicated …

Aligned contrastive predictive coding

J Chorowski, G Ciesielski, J Dzikowski… - arxiv preprint arxiv …, 2021 - arxiv.org
We investigate the possibility of forcing a self-supervised model trained using a contrastive
predictive loss to extract slowly varying latent representations. Rather than producing …

Deep neural imputation: A framework for recovering incomplete brain recordings

S Talukder, JJ Sun, M Leonard, BW Brunton… - arxiv preprint arxiv …, 2022 - arxiv.org
Neuroscientists and neuroengineers have long relied on multielectrode neural recordings to
study the brain. However, in a typical experiment, many factors corrupt neural recordings …

On compressing sequences for self-supervised speech models

Y Meng, HJ Chen, J Shi, S Watanabe… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Compressing self-supervised models has become increasingly necessary, as self-
supervised models become larger. While previous approaches have primarily focused on …

Exploring the benefits of tokenization of discrete acoustic units

A Dekel, R Fernandez - arxiv preprint arxiv:2406.05547, 2024 - arxiv.org
Tokenization algorithms that merge the units of a base vocabulary into larger, variable-rate
units have become standard in natural language processing tasks. This idea, however, has …