Multi-modal dense video captioning V Iashin, E Rahtu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020 | 216 | 2020 |
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer V Iashin, E Rahtu Proceedings of British Machine Vision Conference (BMVC), 2020 | 169 | 2020 |
Taming Visually Guided Sound Generation V Iashin, E Rahtu Proceedings of British Machine Vision Conference (BMVC), 2021 | 112 | 2021 |
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors V Iashin, W Xie, E Rahtu, A Zisserman Proceedings of British Machine Vision Conference (BMVC), 2022 | 20 | 2022 |
Top-1 CORSMAL challenge 2020 submission: Filling mass estimation using multi-modal observations of human-robot handovers V Iashin, F Palermo, G Solak, C Coppola Pattern Recognition. ICPR International Workshops and Challenges: Virtual …, 2021 | 15 | 2021 |
The CORSMAL benchmark for the prediction of the properties of containers A Xompero, S Donaher, V Iashin, F Palermo, G Solak, C Coppola, ... IEEE Access 10, 41388-41402, 2022 | 11 | 2022 |
Synchformer: Efficient synchronization from sparse cues V Iashin, W Xie, E Rahtu, A Zisserman ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 8 | 2024 |
Temporally aligned audio for video with autoregression I Viertola, V Iashin, E Rahtu arXiv preprint arXiv:2409.13689, 2024 | 5 | 2024 |
Multi-modal Video Content Understanding V Iashin | | 2023 |