- Academic Search

Seaco-paraformer: A non-autoregressive asr system with flexible and effective hotword customizati...

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms

K An, Q Chen, C Deng, Z Du, C Gao, Z Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …

Gem Citer Citeret af 25 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Glm-4-voice: Towards intelligent and human-like end-to-end spoken chatbot

A Zeng, Z Du, M Liu, K Wang, S Jiang, L Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It
supports both Chinese and English, engages in real-time voice conversations, and varies …

Gem Citer Citeret af 3 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minmo: A multimodal large language model for seamless voice interaction

Q Chen, Y Chen, Y Chen, M Chen, Y Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent advancements in large language models (LLMs) and multimodal speech-text
models have laid the groundwork for seamless voice interactions, enabling real-time …

Gem Citer Citeret af 2 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] A multitask training approach to enhance whisper with open-vocabulary keyword spotting

Y Li, M Zhang, C Su, Y Li, X Qiao, M Ren, M Ma… - Interspeech, 2024 - isca-archive.org

The recognition of rare named entities, such as personal names and terminologies, is
challenging for automatic speech recognition (ASR) systems, especially when they are not …

Gem Citer Citeret af 3 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CTC-Assisted LLM-Based Contextual ASR

G Yang, Z Ma, Z Gao, S Zhang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Contextual ASR or hotword customization holds substantial practical value. Despite the
impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) …

Gem Citer Citeret af 1 Relaterede artikler Alle 3 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A multitask training approach to enhance whisper with contextual biasing and open-vocabulary keyword spotting

Y Li, M Zhang, C Su, Y Li, X Qiao, M Ren, M Ma… - arxiv preprint arxiv …, 2023 - arxiv.org

The recognition of rare named entities, such as personal names and terminologies, is
challenging for automatic speech recognition (ASR) systems, especially when they are not …

Gem Citer Citeret af 4 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VHASR: A Multimodal Speech Recognition System With Vision Hotwords

J Hu, Z Li, P Wang, H Ai, L Zhang, H Zhao - arxiv preprint arxiv …, 2024 - arxiv.org

The image-based multimodal automatic speech recognition (ASR) model enhances speech
recognition performance by incorporating audio-related image. However, some works …

Gem Citer Citeret af 1 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An efficient text augmentation approach for contextualized Mandarin speech recognition

N Zheng, X Wan, K Liu, Z Du, Z Huan - arxiv preprint arxiv:2406.09950, 2024 - arxiv.org

Although contextualized automatic speech recognition (ASR) systems are commonly used to
improve the recognition of uncommon words, their effectiveness is hindered by the inherent …

Gem Citer Citeret af 1 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

CB-whisper: Contextual biasing whisper using open-vocabulary keyword-spotting

Y Li, Y Li, M Zhang, C Su, J Yu, M Piao… - Proceedings of the …, 2024 - aclanthology.org

End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare
name entities, such as personal names, organizations and terminologies that are not …

Gem Citer Citeret af 3 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] isca-archive.org

[PDF][PDF] Contextual Biasing with Confidence-based Homophone Detector for Mandarin End-to-End Speech Recognition

C Yang, L Zheng, S Tian, G Cheng, S **ao… - Proc. Interspeech …, 2024 - isca-archive.org

Deep biasing methods and shallow fusion methods have been demonstrated to improve the
performance of end-to-end ASR effectively. However, accurate recognition often becomes …

Gem Citer Relaterede artikler Alle 2 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Seaco-paraformer: A non-autoregressive asr system with flexible and effective hotword customizati...

Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms

Glm-4-voice: Towards intelligent and human-like end-to-end spoken chatbot

Minmo: A multimodal large language model for seamless voice interaction

[PDF][PDF] A multitask training approach to enhance whisper with open-vocabulary keyword spotting

CTC-Assisted LLM-Based Contextual ASR

A multitask training approach to enhance whisper with contextual biasing and open-vocabulary keyword spotting

VHASR: A Multimodal Speech Recognition System With Vision Hotwords

An efficient text augmentation approach for contextualized Mandarin speech recognition

CB-whisper: Contextual biasing whisper using open-vocabulary keyword-spotting

[PDF][PDF] Contextual Biasing with Confidence-based Homophone Detector for Mandarin End-to-End Speech Recognition