- Academic Search

T Baltrušaitis, C Ahuja… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

Our experience of the world is multimodal-we see objects, hear sounds, feel texture, smell
odors, and taste flavors. Modality refers to the way in which something happens or is …

Uložit Citovat Počet citací tohoto článku: 3886 Související články Všechny verze (počet: 12)

[Free GPT-4]

[PDF] academia.edu

A tutorial on multilabel learning

E Gibaja, S Ventura - ACM Computing Surveys (CSUR), 2015 - dl.acm.org

Multilabel learning has become a relevant learning paradigm in the past years due to the
increasing number of fields where it can be applied and also to the emerging number of …

Uložit Citovat Počet citací tohoto článku: 629 Související články Všechny verze (počet: 4)

[Free GPT-4]

[PDF] thecvf.com

Regionclip: Region-based language-image pretraining

Y Zhong, J Yang, P Zhang, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Contrastive language-image pretraining (CLIP) using image-text pairs has achieved
impressive results on image classification in both zero-shot and transfer learning settings …

Uložit Citovat Počet citací tohoto článku: 596 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] thecvf.com

AutoAD: Movie description in context

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

Uložit Citovat Počet citací tohoto článku: 58 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]

[PDF] thecvf.com

Objects that sound

R Arandjelovic, A Zisserman - Proceedings of the European …, 2018 - openaccess.thecvf.com

In this paper our objectives are, first, networks that can embed audio and visual inputs into a
common space that is suitable for cross-modal retrieval; and second, a network that can …

Uložit Citovat Počet citací tohoto článku: 645 Související články Všechny verze (počet: 11) Zobrazit jako HTML

[Free GPT-4]

[PDF] thecvf.com

Scene graph generation from objects, phrases and region captions

Y Li, W Ouyang, B Zhou, K Wang… - Proceedings of the …, 2017 - openaccess.thecvf.com

Object detection, scene graph generation and region captioning, which are three scene
understanding tasks at different semantic levels, are tied together: scene graphs are …

Uložit Citovat Počet citací tohoto článku: 618 Související články Všechny verze (počet: 10) Zobrazit jako HTML

[Free GPT-4]

[PDF] thecvf.com

Densecap: Fully convolutional localization networks for dense captioning

J Johnson, A Karpathy… - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com

We introduce the dense captioning task, which requires a computer vision system to both
localize and describe salient regions in images in natural language. The dense captioning …

Uložit Citovat Počet citací tohoto článku: 1513 Související články Všechny verze (počet: 12) Zobrazit jako HTML

Deep collaborative embedding for social image understanding

Z Li, J Tang, T Mei - IEEE transactions on pattern analysis and …, 2018 - ieeexplore.ieee.org

In this work, we investigate the problem of learning knowledge from the massive community-
contributed images with rich weakly-supervised context information, which can benefit …

Uložit Citovat Počet citací tohoto článku: 379 Související články Všechny verze (počet: 6)

[Free GPT-4]

[PDF] arxiv.org

Microsoft coco captions: Data collection and evaluation server

X Chen, H Fang, TY Lin, R Vedantam, S Gupta… - arxiv preprint arxiv …, 2015 - arxiv.org

In this paper we describe the Microsoft COCO Caption dataset and evaluation server. When
completed, the dataset will contain over one and a half million captions describing over …

Uložit Citovat Počet citací tohoto článku: 2867 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]

[PDF] thecvf.com

Deep visual-semantic alignments for generating image descriptions

A Karpathy, L Fei-Fei - … of the IEEE conference on computer …, 2015 - openaccess.thecvf.com

We present a model that generates natural language descriptions of images and their
regions. Our approach leverages datasets of images and their sentence descriptions to …

Uložit Citovat Počet citací tohoto článku: 7393 Související články Všechny verze (počet: 38) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Matching words and pictures

Multimodal machine learning: A survey and taxonomy

A tutorial on multilabel learning

Regionclip: Region-based language-image pretraining

AutoAD: Movie description in context

Objects that sound

Scene graph generation from objects, phrases and region captions

Densecap: Fully convolutional localization networks for dense captioning

Deep collaborative embedding for social image understanding

Microsoft coco captions: Data collection and evaluation server

Deep visual-semantic alignments for generating image descriptions