Automated video labelling: Identifying faces by corroborative evidence

A Brown, E Coto, A Zisserman - 2021 IEEE 4th International …, 2021 - ieeexplore.ieee.org
We present a method for automatically labelling all faces in video archives, such as TV
broadcasts, by combining multiple evidence sources and multiple modalities (visual and …

Deep cross-modal face naming for people news retrieval

Y Tian, L Zhou, Y Zhang, T Zhang… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
How to integrate multimodal information sources for face naming in multimodal news is a hot
and yet challenging problem. A novel deep cross-modal face naming scheme is developed …

Identifying Speakers in Dialogue Transcripts: A Text-based Approach Using Pretrained Language Models

M Nguyen, F Dernoncourt, S Yoon… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce an approach to identifying speaker names in dialogue transcripts, a crucial
task for enhancing content accessibility and searchability in digital media archives. Despite …

Self-contained entity discovery from captioned videos

M Ayoughi, P Mettes, P Groth - ACM Transactions on Multimedia …, 2023 - dl.acm.org
This article introduces the task of visual named entity discovery in videos without the need
for task-specific supervision or task-specific external knowledge sources. Assigning specific …

From face recognition to models of identity: A Bayesian approach to learning about unknown identities from unsupervised data

DC de Castro, S Nowozin - Proceedings of the European …, 2018 - openaccess.thecvf.com
Current face recognition systems robustly recognize identities across a wide variety of
imaging conditions. In these systems recognition is performed via classification into known …

Expertise detection in crowdsourcing forums using the composition of latent topics and joint syntactic–semantic cues

YD Woldemariam - SN Computer Science, 2021 - Springer
We develop an NLP method for inferring potential contributors among multitude of users
within crowdsourcing forums (CSFs). The method basically provides a way to predict …

Hierarchical multi-label propagation using speaking face graphs for multimodal person discovery

GB da Fonseca, G Sargent, R Sicre… - Multimedia Tools and …, 2021 - Springer
TV archives are growing in size so fast that manually indexing becomes unfeasible.
Automatic indexing techniques can be applied to overcome this issue, and this work …

Adapting language specific components of cross-media analysis frameworks to less-resourced languages: the case of Amharic

Y Woldemariam, A Dahlgren - … of the 1st joint workshop on spoken …, 2020 - aclanthology.org
We present an ASR based pipeline for Amharic that orchestrates NLP components within a
cross media analysis framework (CMAF). One of the major challenges that are inherently …

NLP methods for improving user rating systems in crowdsourcing forums and speech recognition of less resourced languages

YD Woldemariam - 2024 - diva-portal.org
We develop NLP and ASR methods (eg, algorithms, architectures) for solving these
problems: biases induced by user rating, ranking, recommendation and search engine …

UPC multimodal speaker diarization system for the 2018 Albayzin Challenge

MÀ India Massana, I Sagastiberri… - … 2018: program and …, 2018 - upcommons.upc.edu
This paper presents the UPC system proposed for the Multimodal Speaker Diarization task
of the 2018 Albayzin Challenge. This approach works by processing individually the speech …