Nast: Noise aware speech tokenization for speech language models

S Messica, Y Adi - arxiv preprint arxiv:2406.11037, 2024 - arxiv.org
Speech tokenization is the task of representing speech signals as a sequence of discrete
units. Such representations can be later used for various downstream tasks including …

[PDF][PDF] Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.

J Le Roux, N Ono, S Sagayama - SAPA@ INTERSPEECH, 2008 - Citeseer
As many acoustic signal processing methods, for example for source separation or noise
canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a …

[BOOK][B] Designing audio effect plugins in C++: for AAX, AU, and VST3 with DSP theory

W Pirkle - 2019 - taylorfrancis.com
Designing Audio Effect Plugins in C++ presents everything you need to know about digital
signal processing in an accessible way. Not just another theory-heavy digital signal …

Augmentation invariant discrete representation for generative spoken language modeling

I Gat, F Kreuk, TA Nguyen, A Lee, J Copet… - arxiv preprint arxiv …, 2022 - arxiv.org
Generative Spoken Language Modeling research focuses on optimizing speech Language
Models (LMs) using raw audio recordings without accessing any textual supervision. Such …

[PDF][PDF] Audio pitch shifting using the constant-Q transform

C Schörkhuber, A Klapuri, A Sontacchi - Journal of the Audio Engineering …, 2013 - Citeseer
In this paper a frequency-domain pitch shifting approach based on the CQT is proposed.
The CQT is specifically attractive for pitch shifting because it can be implemented by …

Speech time-scale modification with GANs

E Cohen, F Kreuk, J Keshet - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org
While listening to spoken content, it is often desired to vary the speech rate while preserving
the speaker's timbre and pitch. To date, advanced signal processing techniques are used to …

Analysis of three pitch-shifting algorithms for different musical instruments

A Rai, BD Barkana - 2019 IEEE Long Island Systems …, 2019 - ieeexplore.ieee.org
Pitch-shifting is a process where the original pitch of the sound is increased or decreased
without affecting the length of the sound clip being recorded. Pitch shifters are being …

[PDF][PDF] PVSOLA: A phase vocoder with synchronized overlap-add

A Moinet, T Dutoit - Proceedings of the International Conference on Digital …, 2011 - dafx.de
In this paper we present an original method mixing temporal and spectral processing to
reduce the phasiness in the phase vocoder. Phasiness is an inherent artifact of the phase …

Low latency audio pitch shifting in the frequency domain

N Juillerat, B Hirsbrunner - 2010 International Conference on …, 2010 - ieeexplore.ieee.org
This paper presents a low latency pitch shifting algorithm based on the Short-Time Fourier
Transform (STFT). Unlike existing STFT-based implementations of pitch shifting, the …

[PDF][PDF] Neural ATSM: Fully Neural Network-based Adaptive Time-Scale Modification Using Sentence-Specific Dynamic Control

J Lee, S Jang, JH Chang - Proc. Interspeech 2024, 2024 - isca-archive.org
Adaptive time-scale modification (ATSM) adaptively adjusts audio speed and improves upon
previous systems by tailoring the scale for each phoneme in two steps: phoneme positioning …