End-to-end speech-to-text translation: A survey
N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …
one language to text in another language. It finds its application in various domains, such as …
Sequential contrastive audio-visual learning
Contrastive learning has emerged as a powerful technique in audio-visual representation
learning, leveraging the natural co-occurrence of audio and visual modalities in extensive …
learning, leveraging the natural co-occurrence of audio and visual modalities in extensive …
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is
increasing interest in extending their capabilities to speech--the most common form in …
increasing interest in extending their capabilities to speech--the most common form in …
Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?
The prosody of a spoken utterance, including features like stress, intonation and rhythm, can
significantly affect the underlying semantics, and as a consequence can also affect its textual …
significantly affect the underlying semantics, and as a consequence can also affect its textual …
How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations
Multimodal foundation models aim to create a unified representation space that abstracts
away from surface features like language syntax or modality differences. To investigate this …
away from surface features like language syntax or modality differences. To investigate this …
Contrastive Learning for Task-Independent SpeechLLM-Pretraining
Large language models (LLMs) excel in natural language processing but adapting these
LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine …
LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine …