Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Audio-visual cross-attention network for robotic speaker tracking
Audio-visual signals can be used jointly for robotic perception as they complement each
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …
other. Such multi-modal sensory fusion has a clear advantage, especially under noisy …
Transfer learning of wav2vec 2.0 for automatic lyric transcription
Automatic speech recognition (ASR) has progressed significantly in recent years due to the
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …
emergence of large-scale datasets and the self-supervised learning (SSL) paradigm …
Lyricwhiz: Robust multilingual zero-shot lyrics transcription by whispering to chatgpt
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription
method achieving state-of-the-art performance on various lyrics transcription datasets, even …
method achieving state-of-the-art performance on various lyrics transcription datasets, even …
Generate, discriminate and contrast: A semi-supervised sentence representation learning framework
Most sentence embedding techniques heavily rely on expensive human-annotated
sentence pairs as the supervised signals. Despite the use of large-scale unlabeled data, the …
sentence pairs as the supervised signals. Despite the use of large-scale unlabeled data, the …
Few-shot class-incremental audio classification via discriminative prototype learning
In real-world scenarios, new audio classes with insufficient samples usually emerge
continually, which motivates the study of few-shot class-incremental audio classification …
continually, which motivates the study of few-shot class-incremental audio classification …
Predict-and-update network: Audio-visual speech recognition inspired by human speech perception
Audio and visual signals complement each other in human speech perception, and the
same applies to automatic speech recognition. The visual signal is less evident than the …
same applies to automatic speech recognition. The visual signal is less evident than the …
Dynamic transformers provide a false sense of efficiency
Despite much success in natural language processing (NLP), pre-trained language models
typically lead to a high computational cost during inference. Multi-exit is a mainstream …
typically lead to a high computational cost during inference. Multi-exit is a mainstream …
[HTML][HTML] Wagner Ring Dataset: A complex opera scenario for music processing and computational musicology
This paper introduces the Wagner Ring Dataset (WRD), a multi-modal and multi-version
resource on the large-scale opera cycle Der Ring des Nibelungen by Richard Wagner. The …
resource on the large-scale opera cycle Der Ring des Nibelungen by Richard Wagner. The …
Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music
Lyrics transcription of polyphonic music is challenging as the background music affects lyrics
intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, ie a …
intelligibility. Typically, lyrics transcription can be performed by a two-step pipeline, ie a …
Elucidate gender fairness in singing voice transcription
It is widely known that males and females typically possess different sound characteristics
when singing, such as timbre and pitch, but it has never been explored whether these …
when singing, such as timbre and pitch, but it has never been explored whether these …