Tera: Self-supervised learning of transformer encoder representation for speech

AT Liu, SW Li, H Lee - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
We introduce a self-supervised speech pre-training method called TERA, which stands for
Transformer Encoder Representations from Alteration. Recent approaches often learn by …

Slue: New benchmark tasks for spoken language understanding evaluation on natural speech

S Shon, A Pasad, F Wu, P Brusco… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Progress in speech processing has been facilitated by shared datasets and benchmarks.
Historically these have focused on automatic speech recognition (ASR), speaker …

Expanding large pre-trained unimodal models with multimodal information injection for image-text multimodal classification

T Liang, G Lin, M Wan, T Li, G Ma… - Proceedings of the …, 2022 - openaccess.thecvf.com
Fine-tuning pre-trained models for downstream tasks is mainstream in deep learning.
However, the pre-trained models are limited to be fine-tuned by data from a specific …

Applications, risk, challenges, and future prospects of ChatGPT in electronic records management

D Lin, R Zou - Journal of Artificial Intelligence Research, 2024 - sub.ifspress.hk
The widespread application and rapid development of ChatGPT are disrupting traditional
models across industries, bringing revolutionary changes to electronic records …

End-to-end neural transformer based spoken language understanding

M Radfar, A Mouchtaris, S Kunzmann - arxiv preprint arxiv:2008.10984, 2020 - arxiv.org
Spoken language understanding (SLU) refers to the process of inferring the semantic
information from audio signals. While the neural transformers consistently deliver the best …

Semi-supervised spoken language understanding via self-supervised speech and language model pretraining

CI Lai, YS Chuang, HY Lee, SW Li… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Much recent work on Spoken Language Understanding (SLU) is limited in at least one of
three ways: models were trained on oracle text input and neglected ASR errors, models …

Speech-language pre-training for end-to-end spoken language understanding

Y Qian, X Bianv, Y Shi, N Kanda, L Shen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
End-to-end (E2E) spoken language understanding (SLU) can infer semantics directly from
speech signal without cascading an automatic speech recognizer (ASR) with a natural …

St-bert: Cross-modal language model pre-training for end-to-end spoken language understanding

M Kim, G Kim, SW Lee, JW Ha - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Language model pre-training has shown promising results in various downstream tasks. In
this context, we introduce a cross-modal pre-trained language model, called Speech-Text …

Understanding self-attention of self-supervised audio transformers

S Yang, AT Liu, H Lee - arxiv preprint arxiv:2006.03265, 2020 - arxiv.org
Self-supervised Audio Transformers (SAT) enable great success in many downstream
speech applications like ASR, but how they work has not been widely explored yet. In this …

Exploring transfer learning for end-to-end spoken language understanding

S Rongali, B Liu, L Cai, K Arkoudas, C Su… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Abstract Voice Assistants such as Alexa, Siri, and Google Assistant typically use a two-stage
Spoken Language Understanding pipeline; first, an Automatic Speech Recognition (ASR) …