A survey of transformer-based multimodal pre-trained modals

X Han, YT Wang, JL Feng, C Deng, ZH Chen… - Neurocomputing, 2023 - Elsevier
With the broad industrialization of Artificial Intelligence (AI), we observe a large fraction of
real-world AI applications are multimodal in nature in terms of relevant data and ways of …

Design, Implementation, and Practical Evaluation of a Voice Recognition Based IoT Home Automation System for Low-Resource Languages and Resource …

I Froiz-Míguez, P Fraga-Lamas… - IEEE …, 2023 - ieeexplore.ieee.org
Systems with voice control are an attractive option for increasing technological integration,
not only for people with little knowledge on technology or constrained Internet access, but …

Mokey: Enabling narrow fixed-point inference for out-of-the-box floating-point transformer models

AH Zadeh, M Mahmoud, A Abdelhadi… - Proceedings of the 49th …, 2022 - dl.acm.org
Increasingly larger and better Transformer models keep advancing state-of-the-art accuracy
and capability for Natural Language Processing applications. These models demand more …

Automatic speech recognition with BERT and CTC transformers: A review

N Djeffal, H Kheddar, D Addou… - 2023 2nd …, 2023 - ieeexplore.ieee.org
This review paper provides a comprehensive analysis of recent advances in automatic
speech recognition (ASR) with bidirectional encoder representations from transformers …

CLFormer: A lightweight transformer based on convolutional embedding and linear self-attention with strong robustness for bearing fault diagnosis under limited …

H Fang, J Deng, Y Bai, B Feng, S Li… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
As a rising star in the field of deep learning, the Transformers have achieved remarkable
achievements in numerous tasks. Nonetheless, due to the safety considerations, complex …

Knowledge distillation meets few-shot learning: An approach for few-shot intent classification within and across domains

A Sauer, S Asaadi, F Küch - Proceedings of the 4th Workshop on …, 2022 - aclanthology.org
Large Transformer-based natural language understanding models have achieved state-of-
the-art performance in dialogue systems. However, scarce labeled data for training, the …

[PDF][PDF] Improving Spoken Language Understanding with Cross-Modal Contrastive Learning.

J Dong, J Fu, P Zhou, H Li, X Wang - Interspeech, 2022 - researchgate.net
Spoken language understanding (SLU) is conventionally based on pipeline architecture
with error propagation issues. To mitigate this problem, end-to-end (E2E) models are …

Prompt-driven target speech diarization

Y Jiang, Z Chen, R Tao, L Deng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …

New avenues for automated railway safety information processing in enterprise architecture: An nlp approach

AW Qurashi, ZA Farhat, V Holmes, AP Johnson - IEEE Access, 2023 - ieeexplore.ieee.org
Enterprise Architecture (EA) is crucial in any organisation as it defines the basic building
blocks of a business. It is typically presented as a set of documents that help all departments …

[PDF][PDF] Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding.

Y Zhu, Z Wang, H Liu, P Wang, M Feng, M Chen… - …, 2022 - isca-archive.org
End-to-end spoken language understanding (E2E-SLU) has witnessed impressive
improvements through cross-modal (textto-audio) transfer learning. However, current …