- Academic Search

WN Hsu, D Harwath, C Song, J Glass - arxiv preprint arxiv:2012.15454, 2020 - arxiv.org

In this paper we present the first model for directly synthesizing fluent, natural-sounding
spoken audio captions for images that does not require natural language text as an …

Salva Cita Citato da 83 Articoli correlati Tutte e 10 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition

J Ni, L Wang, H Gao, K Qian, Y Zhang, S Chang… - arxiv preprint arxiv …, 2022 - arxiv.org

An unsupervised text-to-speech synthesis (TTS) system learns to generate speech
waveforms corresponding to any written sentence in a language by observing: 1) a …

Salva Cita Citato da 37 Articoli correlati Tutte e 13 le versioni Versione HTML

SightSpeak Object detection and speech generation for visually challenged people.

P Likhitha, AR Naik, KN Chari, S Dessai… - 2024 15th …, 2024 - ieeexplore.ieee.org

This novel approach is to enhance accessibility for visually impaired individuals by
integrating object detection and speech generation using YOLOv5 model on the COCO …

Salva Cita Articoli correlati

Cross Lingual Style Transfer Using Multiscale Loss Function for Soliga: A Low Resource Tribal Language

A Dasare, BL Reddy, ASC Koushik, BS Raj… - … Conference on Speech …, 2023 - Springer

Voice conversion is the art of mimicking different speaker voices and styles. In this paper, we
present a cross-lingual speaker style adaptation based on a multi-scale loss function, using …

Salva Cita Articoli correlati Tutte e 3 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] illinois.edu

Unsupervised speech technology for low-resource languages

H Gao - 2024 - ideals.illinois.edu

Deep neural network based speech processing systems have found widespread
applications in daily life, being employed for tasks such as automatic speech recognition …

Salva Cita Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] apsipa.org

Direct speech-reply generation from text-dialogue context

K Fujita, Y Ijima, H Sugiyama - 2022 Asia-Pacific Signal and …, 2022 - ieeexplore.ieee.org

Natural speech-dialogue generation has been achieved with cascade systems combining
automatic speech recog-nition, text-dialogue, and text-to-speech models. However, it is still …

Salva Cita Articoli correlati Tutte e 3 le versioni

이미지 묘사 기법에 대한 조사

옥수빈， 이대호 - Journal of KIISE, 2023 - dbpia.co.kr

딥러닝의 발전과 함께 주목받고 있는 이미지 묘사 기술은 이미지 속 내용을 파악하는 컴퓨터
비전 분야와 문장으로 번역하는 자연어 처리 분야의 기술이 복합적으로 사용된다. 본 …

Salva Cita Articoli correlati

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Lexical emergence from context: exploring unsupervised learning approaches on large multimodal language corpora

WN Havard - 2021 - theses.hal.science

In recent years, deep learning methods allowed the creation of neural models that are able
to process several modalities at once. Neural models of Visually Grounded Speech (VGS) …

Salva Cita Articoli correlati Tutte e 8 le versioni Versione HTML

[Free GPT-4]
[DeepSeek]

[PDF] afcp-parole.org

[PDF][PDF] L'émergence du lexique en contexte: apport des méthodes non supervisées sur grands corpus de données multimodales

MJL SCHWARTZ, MO SCHARENBORG, ML PRÉVOT… - afcp-parole.org

Résumé Ces dernieres années, les méthodes d'apprentissage profond ont permis de créer
des mod-eles neuronaux capables de traiter plusieurs modalitésa la fois. Les modeles …

Salva Cita Articoli correlati Tutte e 2 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Show and speak: Directly synthesize spoken description of images

Text-free image-to-speech synthesis using learned segmental units

Unsupervised text-to-speech synthesis by unsupervised automatic speech recognition

SightSpeak Object detection and speech generation for visually challenged people.

Cross Lingual Style Transfer Using Multiscale Loss Function for Soliga: A Low Resource Tribal Language

Unsupervised speech technology for low-resource languages

Direct speech-reply generation from text-dialogue context

이미지 묘사 기법에 대한 조사

Lexical emergence from context: exploring unsupervised learning approaches on large multimodal language corpora

[PDF][PDF] L'émergence du lexique en contexte: apport des méthodes non supervisées sur grands corpus de données multimodales