- Academic Search

S Messica, Y Adi - arxiv preprint arxiv:2406.11037, 2024 - arxiv.org

Speech tokenization is the task of representing speech signals as a sequence of discrete
units. Such representations can be later used for various downstream tasks including …

Save Cite Cited by 6 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

[PDF][PDF] Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.

J Le Roux, N Ono, S Sagayama - SAPA@ INTERSPEECH, 2008 - Citeseer

As many acoustic signal processing methods, for example for source separation or noise
canceling, operate in the magnitude spectrogram domain, the problem of reconstructing a …

Save Cite Cited by 128 Related articles All 14 versions Free GPT-4 DeepSeek

[BOOK][B] Designing audio effect plugins in C++: for AAX, AU, and VST3 with DSP theory

W Pirkle - 2019 - taylorfrancis.com

Designing Audio Effect Plugins in C++ presents everything you need to know about digital
signal processing in an accessible way. Not just another theory-heavy digital signal …

Save Cite Cited by 89 Related articles All 6 versions Free GPT-4 DeepSeek Library Search View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Augmentation invariant discrete representation for generative spoken language modeling

I Gat, F Kreuk, TA Nguyen, A Lee, J Copet… - arxiv preprint arxiv …, 2022 - arxiv.org

Generative Spoken Language Modeling research focuses on optimizing speech Language
Models (LMs) using raw audio recordings without accessing any textual supervision. Such …

Save Cite Cited by 10 Related articles All 10 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

[PDF][PDF] Audio pitch shifting using the constant-Q transform

C Schörkhuber, A Klapuri, A Sontacchi - Journal of the Audio Engineering …, 2013 - Citeseer

In this paper a frequency-domain pitch shifting approach based on the CQT is proposed.
The CQT is specifically attractive for pitch shifting because it can be implemented by …

Save Cite Cited by 58 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

Speech time-scale modification with GANs

E Cohen, F Kreuk, J Keshet - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org

While listening to spoken content, it is often desired to vary the speech rate while preserving
the speaker's timbre and pitch. To date, advanced signal processing techniques are used to …

Save Cite Cited by 10 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] liu.se

Analysis of three pitch-shifting algorithms for different musical instruments

A Rai, BD Barkana - 2019 IEEE Long Island Systems …, 2019 - ieeexplore.ieee.org

Pitch-shifting is a process where the original pitch of the sound is increased or decreased
without affecting the length of the sound clip being recorded. Pitch shifters are being …

Save Cite Cited by 10 Related articles All 2 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] dafx.de

[PDF][PDF] PVSOLA: A phase vocoder with synchronized overlap-add

A Moinet, T Dutoit - Proceedings of the International Conference on Digital …, 2011 - dafx.de

In this paper we present an original method mixing temporal and spectral processing to
reduce the phasiness in the phase vocoder. Phasiness is an inherent artifact of the phase …

Save Cite Cited by 30 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

Low latency audio pitch shifting in the frequency domain

N Juillerat, B Hirsbrunner - 2010 International Conference on …, 2010 - ieeexplore.ieee.org

This paper presents a low latency pitch shifting algorithm based on the Short-Time Fourier
Transform (STFT). Unlike existing STFT-based implementations of pitch shifting, the …

Save Cite Cited by 15 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Neural ATSM: Fully Neural Network-based Adaptive Time-Scale Modification Using Sentence-Specific Dynamic Control

J Lee, S Jang, JH Chang - Proc. Interspeech 2024, 2024 - isca-archive.org

Adaptive time-scale modification (ATSM) adaptively adjusts audio speed and improves upon
previous systems by tailoring the scale for each phoneme in two steps: phoneme positioning …

Create alert

Cite

Advanced search

Saved to My library

PhaVoRIT: A phase vocoder for real-time interactive time-stretching

Nast: Noise aware speech tokenization for speech language models

[PDF][PDF] Explicit consistency constraints for STFT spectrograms and their application to phase reconstruction.

[BOOK][B] Designing audio effect plugins in C++: for AAX, AU, and VST3 with DSP theory

Augmentation invariant discrete representation for generative spoken language modeling

[PDF][PDF] Audio pitch shifting using the constant-Q transform

Speech time-scale modification with GANs

Analysis of three pitch-shifting algorithms for different musical instruments

[PDF][PDF] PVSOLA: A phase vocoder with synchronized overlap-add

Low latency audio pitch shifting in the frequency domain

[PDF][PDF] Neural ATSM: Fully Neural Network-based Adaptive Time-Scale Modification Using Sentence-Specific Dynamic Control