- Academic Search

T Han, M Bain, A Nagrani, G Varol… - Proceedings of the …, 2023 - openaccess.thecvf.com

The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …

Uložit Citovat Počet citací tohoto článku: 59 Související články Všechny verze (počet: 7) Zobrazit jako HTML

AAP-MIT: Attentive atrous pyramid network and memory incorporated transformer for multisentence video description

J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …

Uložit Citovat Počet citací tohoto článku: 90 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org

Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Uložit Citovat Počet citací tohoto článku: 161 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lmeye: An interactive perception network for large language models

Y Li, B Hu, X Chen, L Ma, Y Xu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Current efficient approaches to building Multimodal Large Language Models (MLLMs)
mainly incorporate visual information into LLMs with a simple visual map** network such …

Uložit Citovat Počet citací tohoto článku: 35 Související články Všechny verze (počet: 8)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Compute to tell the tale: Goal-driven narrative generation

Y Wong, S Fan, Y Guo, Z Xu, K Stephen… - Proceedings of the 30th …, 2022 - dl.acm.org

Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …

Uložit Citovat Počet citací tohoto článku: 15 Související články Všechny verze (počet: 2)

Unified adaptive relevance distinguishable attention network for image-text matching

K Zhang, Z Mao, AA Liu, Y Zhang - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Image-text matching, as a fundamental cross-modal task, bridges the gap between vision
and language. The core is to accurately learn semantic alignment to find relevant shared …

Uložit Citovat Počet citací tohoto článku: 58 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What makes a good story and how can we measure it? a comprehensive survey of story evaluation

D Yang, Q ** - arxiv preprint arxiv:2408.14622, 2024 - arxiv.org

With the development of artificial intelligence, particularly the success of Large Language
Models (LLMs), the quantity and quality of automatically generated stories have significantly …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shot2story20k: A new benchmark for comprehensive understanding of multi-shot videos

M Han, L Yang, X Chang, H Wang - arxiv preprint arxiv:2312.10300, 2023 - arxiv.org

A short clip of video may contain progression of multiple events and an interesting story line.
A human need to capture both the event in every shot and associate them together to …

Uložit Citovat Počet citací tohoto článku: 15 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Image retrieval from contextual descriptions

B Krojer, V Adlakha, V Vineet, Y Goyal, E Ponti… - arxiv preprint arxiv …, 2022 - arxiv.org

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role
in grounding the meaning of a linguistic utterance. In order to measure to what extent current …

Uložit Citovat Počet citací tohoto článku: 38 Související články Všechny verze (počet: 8) Zobrazit jako HTML

Image difference captioning with instance-level fine-grained feature representation

Q Huang, Y Liang, J Wei, Y Cai, H Liang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …

Uložit Citovat Počet citací tohoto článku: 46 Související články Všechny verze (počet: 2)

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Video storytelling: Textual summaries for events

Autoad: Movie description in context

AAP-MIT: Attentive atrous pyramid network and memory incorporated transformer for multisentence video description

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

Lmeye: An interactive perception network for large language models

Compute to tell the tale: Goal-driven narrative generation

Unified adaptive relevance distinguishable attention network for image-text matching

What makes a good story and how can we measure it? a comprehensive survey of story evaluation

Shot2story20k: A new benchmark for comprehensive understanding of multi-shot videos

Image retrieval from contextual descriptions

Image difference captioning with instance-level fine-grained feature representation