Google 학술 검색

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org

Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

저장 인용 175회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

저장 인용 469회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] mtak.hu

Multilingual processing of speech via web services

T Kisler, U Reichel, F Schiel - Computer Speech & Language, 2017 - Elsevier

A new software paradigmSoftware as a Service'based on web services is proposed for
multilingual linguistic tools and exemplified with the BAS CLARIN web services. Instead of …

저장 인용 672회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] danielpovey.com

Librispeech: an asr corpus based on public domain audio books

V Panayotov, G Chen, D Povey… - 2015 IEEE international …, 2015 - ieeexplore.ieee.org

This paper introduces a new corpus of read English speech, suitable for training and
evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks …

저장 인용 7533회 인용 관련 학술자료 전체 12개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] github.io

[PDF][PDF] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding.

P Yenigalla, A Kumar, S Tripathi, C Singh, S Kar… - …, 2018 - abhayk1201.github.io

This paper proposes a speech emotion recognition method based on phoneme sequence
and spectrogram. Both phoneme sequence and spectrogram retain emotion contents of …

저장 인용 220회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] jair.org

Automatic language identification in texts: A survey

T Jauhiainen, M Lui, M Zampieri, T Baldwin… - Journal of Artificial …, 2019 - jair.org

Language identification (" LI") is the problem of determining the natural language that a
document or part thereof is written in. Automatic LI has been extensively researched for over …

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Report on the 11th IWSLT evaluation campaign

M Cettolo, J Niehues, S Stüker… - Proceedings of the …, 2014 - aclanthology.org

The paper overviews the 11th evaluation campaign organized by the IWSLT workshop. The
2014 evaluation offered multiple tracks on lecture transcription and translation based on the …

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos

O Koller, NC Camgoz, H Ney… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

In this work we present a new approach to the field of weakly supervised learning in the
video domain. Our method is relevant to sequence learning problems which can be split up …

저장 인용 354회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks

K Rao, F Peng, H Sak… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org

Grapheme-to-phoneme (G2P) models are key components in speech recognition and text-to-
speech systems as they describe how words are pronounced. We propose a G2P model …

저장 인용 286회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] googleapis.com

Data driven word pronunciation learning and scoring with crowd sourcing based on the word's phonemes pronunciation scores

F Peng, F Beaufays, B Strope, X Lei… - US Patent …, 2017 - Google Patents

Methods, systems, and apparatus, including computer programs encoded on a computer
storage medium, for determining pronunciations for particular terms. The methods, systems …

저장 인용 267회 인용 관련 학술자료 전체 4개의 버전 저장된 페이지

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Joint-sequence models for grapheme-to-phoneme conversion

Survey of post-OCR processing approaches

A survey on neural speech synthesis

Multilingual processing of speech via web services

Librispeech: an asr corpus based on public domain audio books

[PDF][PDF] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding.

Automatic language identification in texts: A survey

Report on the 11th IWSLT evaluation campaign

Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos

Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks

Data driven word pronunciation learning and scoring with crowd sourcing based on the word's phonemes pronunciation scores