Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Random cycle coding: Lossless compression of cluster assignments via bits-back coding

D Severo, A Khisti, A Makhzani - Advances in Neural …, 2025 - proceedings.neurips.cc
We present an optimal method for encoding cluster assignments of arbitrary data sets. Our
method, Random Cycle Coding (RCC), encodes data sequentially and sends assignment …

Machine learning and high dimensional vector search

M Douze - arxiv preprint arxiv:2502.16931, 2025 - arxiv.org
Machine learning and vector search are two research topics that developed in parallel in
nearby communities. However, unlike many other fields related to big data, machine …

RAQ-VAE: Rate-Adaptive Vector-Quantized Variational Autoencoder

J Seo, J Kang - arxiv preprint arxiv:2405.14222, 2024 - arxiv.org
Vector Quantized Variational AutoEncoder (VQ-VAE) is an established technique in
machine learning for learning discrete representations across various modalities. However …

Representation Collapsing Problems in Vector Quantization

W Zhao, Q Zou, R Shah, D Liu - arxiv preprint arxiv:2411.16550, 2024 - arxiv.org
Vector quantization is a technique in machine learning that discretizes continuous
representations into a set of discrete vectors. It is widely employed in tokenizing data …

Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search

D Severo, G Ottaviano, M Muckley, K Ullrich… - arxiv preprint arxiv …, 2025 - arxiv.org
Approximate nearest neighbor search for vectors relies on indexes that are most often
accessed from RAM. Therefore, storage is the factor limiting the size of the database that can …

Balance of number of embedding and their dimensions in vector quantization

H Chen, SS Reddy, Z Chen, D Liu - arxiv preprint arxiv:2407.04939, 2024 - arxiv.org
The dimensionality of the embedding and the number of available embeddings (also called
codebook size) are critical factors influencing the performance of Vector Quantization (VQ), a …

Qinco2: Vector Compression and Search with Improved Implicit Neural Codebooks

T Vallaeys, M Muckley, J Verbeek, M Douze - arxiv preprint arxiv …, 2025 - arxiv.org
Vector quantization is a fundamental technique for compression and large-scale nearest
neighbor search. For high-accuracy operating points, multi-codebook quantization …

Random Permutation Codes: Lossless Source Coding of Non-Sequential Data

D Severo - arxiv preprint arxiv:2411.14879, 2024 - arxiv.org
This thesis deals with the problem of communicating and storing non-sequential data. We
investigate this problem through the lens of lossless source coding, also sometimes referred …