Μελετητής Google

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier

Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 6 Σχετικά άρθρα Όλες οι 2 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sequential contrastive audio-visual learning

I Tsiamas, S Pascual, C Yeh, J Serrà - arxiv preprint arxiv:2407.05782, 2024 - arxiv.org

Contrastive learning has emerged as a powerful technique in audio-visual representation
learning, leveraging the natural co-occurrence of audio and visual modalities in extensive …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 3 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

TK Lam, M Gaido, S Papi, L Bentivogli… - arxiv preprint arxiv …, 2025 - arxiv.org

Following the remarkable success of Large Language Models (LLMs) in NLP tasks, there is
increasing interest in extending their capabilities to speech--the most common form in …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?

I Tsiamas, M Sperber, A Finch, S Garg - arxiv preprint arxiv:2410.24019, 2024 - arxiv.org

The prosody of a spoken utterance, including features like stress, intonation and rhythm, can
significantly affect the underlying semantics, and as a consequence can also affect its textual …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations

H Lee, D Liu, S Sinhamahapatra, J Niehues - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal foundation models aim to create a unified representation space that abstracts
away from surface features like language syntax or modality differences. To investigate this …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Contrastive Learning for Task-Independent SpeechLLM-Pretraining

M Züfle, J Niehues - arxiv preprint arxiv:2412.15712, 2024 - arxiv.org

Large language models (LLMs) excel in natural language processing but adapting these
LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine …

Αποθήκευση Παράθεση Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Pushing the Limits of Zero-shot End-to-End Speech Translation

End-to-end speech-to-text translation: A survey

Sequential contrastive audio-visual learning

Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison

Speech is More Than Words: Do Speech-to-Text Translation Systems Leverage Prosody?

How do Multimodal Foundation Models Encode Text and Speech? An Analysis of Cross-Lingual and Cross-Modal Representations

Contrastive Learning for Task-Independent SpeechLLM-Pretraining