Synctalkface: Talking face generation with precise lip-syncing via audio-lip memory SJ Park, M Kim, J Hong, J Choi, YM Ro Proceedings of the AAAI Conference on Artificial Intelligence 36 (2), 2062-2070, 2022 | 80 | 2022 |
Multi-modality associative bridging through memory: Speech sound recollected from face video M Kim, J Hong, SJ Park, YM Ro Proceedings of the IEEE/CVF International Conference on Computer Vision, 296-306, 2021 | 51 | 2021 |
Cromm-vsr: Cross-modal memory augmented visual speech recognition M Kim, J Hong, SJ Park, YM Ro IEEE Transactions on Multimedia 24, 4342-4355, 2021 | 34 | 2021 |
Speech reconstruction with reminiscent sound via visual voice memory J Hong, M Kim, SJ Park, YM Ro IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 3654-3667, 2021 | 24 | 2021 |
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model J Hong, SJ Park, YM Ro arXiv preprint arXiv:2310.14946, 2023 | 7 | 2023 |
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation SJ Park, CW Kim, H Rha, M Kim, J Hong, JH Yeo, YM Ro arXiv preprint arXiv:2406.07867, 2024 | 6 | 2024 |
Test-time adaptation for real image denoising via meta-transfer learning A Gunawan, MA Nugroho, SJ Park arXiv preprint arXiv:2207.02066, 2022 | 5 | 2022 |
Exploring Phonetic Context-Aware Lip-Sync for Talking Face Generation SJ Park, M Kim, J Choi, YM Ro ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 4 | 2024 |
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion SJ Park, J Hong, M Kim, YM Ro arXiv preprint arXiv:2310.05934, 2023 | 4 | 2023 |
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation J Choi, SJ Park, M Kim, YM Ro Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024 | 3 | 2024 |
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation M Kim, J Yeo, SJ Park, H Rha, YM Ro Proceedings of the 32nd ACM International Conference on Multimedia, 1311-1320, 2024 | 2 | 2024 |
Multilingual visual speech recognition with a single model by learning with discrete visual speech units M Kim, JH Yeo, J Choi, SJ Park, YM Ro arXiv preprint arXiv:2401.09802, 2024 | 2 | 2024 |
Reprogramming audio-driven talking face synthesis into text-driven J Choi, M Kim, SJ Park, YM Ro arXiv preprint arXiv:2306.16003, 2023 | 2 | 2023 |
Long-Form Speech Generation with Spoken Language Models SJ Park, J Salazar, A Jansen, K Kinoshita, YM Ro, RJ Skerry-Ryan arXiv preprint arXiv:2412.18603, 2024 | | 2024 |
Empathetic Response in Audio-Visual Conversations Using Emotion Preference Optimization and MambaCompressor Y Kim, SJ Park, YM Ro arXiv preprint arXiv:2412.17572, 2024 | | 2024 |
AV-EmoDialog: Chat with Audio-Visual Users Leveraging Emotional Cues SJ Park, Y Kim, H Rha, B Godiva, YM Ro arXiv preprint arXiv:2412.17292, 2024 | | 2024 |
Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models J Choi, M Kim, SJ Park, YM Ro ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |
Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation S Han, SJ Park, CW Kim, YM Ro ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | | 2024 |
Multilingual Visual Speech Recognition with a Single Model using Visual Speech Unit M Kim, J Yeo, J Choi, SJ Park, YM Ro | | |