Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms
This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …
interactions between humans and large language models (LLMs). At its core are two …
Glm-4-voice: Towards intelligent and human-like end-to-end spoken chatbot
We introduce GLM-4-Voice, an intelligent and human-like end-to-end spoken chatbot. It
supports both Chinese and English, engages in real-time voice conversations, and varies …
supports both Chinese and English, engages in real-time voice conversations, and varies …
Minmo: A multimodal large language model for seamless voice interaction
Recent advancements in large language models (LLMs) and multimodal speech-text
models have laid the groundwork for seamless voice interactions, enabling real-time …
models have laid the groundwork for seamless voice interactions, enabling real-time …
[PDF][PDF] A multitask training approach to enhance whisper with open-vocabulary keyword spotting
The recognition of rare named entities, such as personal names and terminologies, is
challenging for automatic speech recognition (ASR) systems, especially when they are not …
challenging for automatic speech recognition (ASR) systems, especially when they are not …
CTC-Assisted LLM-Based Contextual ASR
Contextual ASR or hotword customization holds substantial practical value. Despite the
impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) …
impressive performance of current end-to-end (E2E) automatic speech recognition (ASR) …
A multitask training approach to enhance whisper with contextual biasing and open-vocabulary keyword spotting
The recognition of rare named entities, such as personal names and terminologies, is
challenging for automatic speech recognition (ASR) systems, especially when they are not …
challenging for automatic speech recognition (ASR) systems, especially when they are not …
VHASR: A Multimodal Speech Recognition System With Vision Hotwords
J Hu, Z Li, P Wang, H Ai, L Zhang, H Zhao - arxiv preprint arxiv …, 2024 - arxiv.org
The image-based multimodal automatic speech recognition (ASR) model enhances speech
recognition performance by incorporating audio-related image. However, some works …
recognition performance by incorporating audio-related image. However, some works …
An efficient text augmentation approach for contextualized Mandarin speech recognition
N Zheng, X Wan, K Liu, Z Du, Z Huan - arxiv preprint arxiv:2406.09950, 2024 - arxiv.org
Although contextualized automatic speech recognition (ASR) systems are commonly used to
improve the recognition of uncommon words, their effectiveness is hindered by the inherent …
improve the recognition of uncommon words, their effectiveness is hindered by the inherent …
CB-whisper: Contextual biasing whisper using open-vocabulary keyword-spotting
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare
name entities, such as personal names, organizations and terminologies that are not …
name entities, such as personal names, organizations and terminologies that are not …
[PDF][PDF] Contextual Biasing with Confidence-based Homophone Detector for Mandarin End-to-End Speech Recognition
Deep biasing methods and shallow fusion methods have been demonstrated to improve the
performance of end-to-end ASR effectively. However, accurate recognition often becomes …
performance of end-to-end ASR effectively. However, accurate recognition often becomes …