[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Vision transformers for remote sensing image classification

Y Bazi, L Bashmal, MMA Rahhal, RA Dayil, NA Ajlan - Remote Sensing, 2021 - mdpi.com
In this paper, we propose a remote-sensing scene-classification method based on vision
transformers. These types of networks, which are now recognized as state-of-the-art models …

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arxiv preprint arxiv:2206.08317, 2022 - arxiv.org
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

A survey on non-autoregressive generation for neural machine translation and beyond

Y **ao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Joint entity and relation extraction with set prediction networks

D Sui, X Zeng, Y Chen, K Liu… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
Joint entity and relation extraction is an important task in natural language processing, which
aims to extract all relational triples mentioned in a given sentence. In essence, the relational …

Vision–language model for visual question answering in medical imagery

Y Bazi, MMA Rahhal, L Bashmal, M Zuair - Bioengineering, 2023 - mdpi.com
In the clinical and healthcare domains, medical images play a critical role. A mature medical
visual question answering system (VQA) can improve diagnosis by answering clinical …

Intermediate loss regularization for ctc-based speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

Mask CTC: Non-autoregressive end-to-end ASR with CTC and mask predict

Y Higuchi, S Watanabe, N Chen, T Ogawa… - arxiv preprint arxiv …, 2020 - arxiv.org
We present Mask CTC, a novel non-autoregressive end-to-end automatic speech
recognition (ASR) framework, which generates a sequence by refining outputs of the …

Vision Transformer‐based recognition of diabetic retinopathy grade

J Wu, R Hu, Z **ao, J Chen, J Liu - Medical Physics, 2021 - Wiley Online Library
Background In the domain of natural language processing, Transformers are recognized as
state‐of‐the‐art models, which opposing to typical convolutional neural networks (CNNs) do …

Imputer: Sequence modelling via imputation and dynamic programming

W Chan, C Saharia, G Hinton… - International …, 2020 - proceedings.mlr.press
This paper presents the Imputer, a neural sequence model that generates output sequences
iteratively via imputations. The Imputer is an iterative generation model, requiring only a …