A survey of transformer-based multimodal pre-trained modals
With the broad industrialization of Artificial Intelligence (AI), we observe a large fraction of
real-world AI applications are multimodal in nature in terms of relevant data and ways of …
real-world AI applications are multimodal in nature in terms of relevant data and ways of …
Design, Implementation, and Practical Evaluation of a Voice Recognition Based IoT Home Automation System for Low-Resource Languages and Resource …
Systems with voice control are an attractive option for increasing technological integration,
not only for people with little knowledge on technology or constrained Internet access, but …
not only for people with little knowledge on technology or constrained Internet access, but …
Mokey: Enabling narrow fixed-point inference for out-of-the-box floating-point transformer models
Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy
and capability for Natural Language Processing applications. These models demand more …
and capability for Natural Language Processing applications. These models demand more …
Automatic speech recognition with BERT and CTC transformers: A review
This review paper provides a comprehensive analysis of recent advances in automatic
speech recognition (ASR) with bidirectional encoder representations from transformers …
speech recognition (ASR) with bidirectional encoder representations from transformers …
CLFormer: A lightweight transformer based on convolutional embedding and linear self-attention with strong robustness for bearing fault diagnosis under limited …
As a rising star in the field of deep learning, the Transformers have achieved remarkable
achievements in numerous tasks. Nonetheless, due to the safety considerations, complex …
achievements in numerous tasks. Nonetheless, due to the safety considerations, complex …
Knowledge distillation meets few-shot learning: An approach for few-shot intent classification within and across domains
A Sauer, S Asaadi, F Küch - Proceedings of the 4th Workshop on …, 2022 - aclanthology.org
Large Transformer-based natural language understanding models have achieved state-of-
the-art performance in dialogue systems. However, scarce labeled data for training, the …
the-art performance in dialogue systems. However, scarce labeled data for training, the …
[PDF][PDF] Improving Spoken Language Understanding with Cross-Modal Contrastive Learning.
Spoken language understanding (SLU) is conventionally based on pipeline architecture
with error propagation issues. To mitigate this problem, end-to-end (E2E) models are …
with error propagation issues. To mitigate this problem, end-to-end (E2E) models are …
Prompt-driven target speech diarization
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …
'when target event occurred'within an audio signal. We devise a neural architecture called …
New avenues for automated railway safety information processing in enterprise architecture: An nlp approach
Enterprise Architecture (EA) is crucial in any organisation as it defines the basic building
blocks of a business. It is typically presented as a set of documents that help all departments …
blocks of a business. It is typically presented as a set of documents that help all departments …
[PDF][PDF] Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding.
End-to-end spoken language understanding (E2E-SLU) has witnessed impressive
improvements through cross-modal (textto-audio) transfer learning. However, current …
improvements through cross-modal (textto-audio) transfer learning. However, current …