Wavllm: Towards robust and adaptive speech large language model

S Hu, L Zhou, S Liu, S Chen, L Meng, H Hao… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent advancements in large language models (LLMs) have revolutionized the field of
natural language processing, progressively broadening their scope to multimodal …

Generalizing across domains via cross-gradient training

S Shankar, V Piratla, S Chakrabarti… - arxiv preprint arxiv …, 2018 - arxiv.org
We present CROSSGRAD, a method to use multi-domain training data to learn a classifier
that generalizes to new domains. CROSSGRAD does not need an adaptation phase via …

Light gated recurrent units for speech recognition

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

E Vincent, S Watanabe, AA Nugraha, J Barker… - Computer Speech & …, 2017 - Elsevier
Speech enhancement and automatic speech recognition (ASR) are most often evaluated in
matched (or multi-condition) settings where the acoustic conditions of the training data …

[PDF][PDF] Speech and language processing

D Jurafsky - 2000 - academia.edu
" This book is an absolute necessity for instructors at all levels, as well as an indispensible
reference for researchers. Introducing NLP, computational linguistics, and speech …

[PDF][PDF] Domain adaptation with structural correspondence learning

J Blitzer, R McDonald, F Pereira - Proceedings of the 2006 …, 2006 - aclanthology.org
Discriminative learning methods are widely used in natural language processing. These
methods work best when their training and test data are drawn from the same distribution …

Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos

O Koller, NC Camgoz, H Ney… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
In this work we present a new approach to the field of weakly supervised learning in the
video domain. Our method is relevant to sequence learning problems which can be split up …

The 2005 music information retrieval evaluation exchange (mirex 2005): Preliminary overview

JS Downie, K West, A Ehmann… - 6th int. conf. on music …, 2005 - inria.hal.science
This paper is an extended abstract which provides a brief preliminary overview of the 2005
Music Information Retrieval Evaluation eXchange (MIREX 2005). The MIREX organizational …

[PDF][PDF] Shallow parsing with conditional random fields

F Sha, F Pereira - Proceedings of the 2003 human language …, 2003 - aclanthology.org
Conditional random fields for sequence labeling offer advantages over both generative
models like HMMs and classifiers applied at each sequence position. Among sequence …

MCYT baseline corpus: a bimodal biometric database

J Ortega-Garcia, J Fierrez-Aguilar, D Simon… - IEE Proceedings-Vision …, 2003 - IET
The current need for large multimodal databases to evaluate automatic biometric recognition
systems has motivated the development of the MCYT bimodal database. The main purpose …