Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

Melhubert: A simplified hubert on mel spectrograms

TQ Lin, H Lee, H Tang - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
Self-supervised models have had great success in learning speech representations that can
generalize to various downstream tasks. However, most self-supervised models require a …