Google Académico

H Zhang, A Sun, W **g, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Guardar Citar Citado por 53 Artículos relacionados Las 8 versiones

Graph neural networks for visual question answering: a systematic review

AA Yusuf, C Feng, X Mao, R Ally Duma… - Multimedia Tools and …, 2024 - Springer

Recently, visual question answering (VQA) has gained considerable interest within the
computer vision and natural language processing (NLP) research areas. The VQA task …

Guardar Citar Citado por 8 Artículos relacionados

[Free GPT-4]

[PDF] github.io

Multimodal relation extraction with efficient graph alignment

C Zheng, J Feng, Z Fu, Y Cai, Q Li, T Wang - Proceedings of the 29th …, 2021 - dl.acm.org

Relation extraction (RE) is a fundamental process in constructing knowledge graphs.
However, previous methods on relation extraction suffer sharp performance decline in short …

Guardar Citar Citado por 112 Artículos relacionados Las 4 versiones

[Free GPT-4]

[PDF] thecvf.com

Vlg-net: Video-language graph matching network for video grounding

M Soldan, M Xu, S Qu, J Tegner… - Proceedings of the …, 2021 - openaccess.thecvf.com

Grounding language queries in videos aims at identifying the time interval (or moment)
semantically relevant to a language query. The solution to this challenging task demands …

Guardar Citar Citado por 86 Artículos relacionados Las 10 versiones Versión en HTML

Sentiment interaction and multi-graph perception with graph convolutional networks for aspect-based sentiment analysis

Q Lu, X Sun, R Sutcliffe, Y **ng, H Zhang - Knowledge-Based Systems, 2022 - Elsevier

Graph convolutional networks have been successfully applied to aspect-based sentiment
analysis, due to their ability to flexibly capture syntactic information and word dependencies …

Guardar Citar Citado por 32 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Multimodal dialogue response generation

Q Sun, Y Wang, C Xu, K Zheng, Y Yang, H Hu… - arxiv preprint arxiv …, 2021 - arxiv.org

Responsing with image has been recognized as an important capability for an intelligent
conversational agent. Yet existing works only focus on exploring the multimodal dialogue …

Guardar Citar Citado por 52 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Low-fidelity video encoder optimization for temporal action localization

M Xu, JM Perez Rua, X Zhu… - Advances in Neural …, 2021 - proceedings.neurips.cc

Most existing temporal action localization (TAL) methods rely on a transfer learning pipeline:
by first optimizing a video encoder on a large action classification dataset (ie, source …

Guardar Citar Citado por 49 Artículos relacionados Las 14 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Exploring sparse spatial relation in graph inference for text-based vqa

S Zhou, D Guo, J Li, X Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Text-based visual question answering (TextVQA) faces the significant challenge of avoiding
redundant relational inference. To be specific, a large number of detected objects and …

Guardar Citar Citado por 15 Artículos relacionados Las 6 versiones

[Free GPT-4]

[PDF] arxiv.org

Visual question answering using deep learning: A survey and performance analysis

Y Srivastava, V Murali, SR Dubey… - Computer Vision and …, 2021 - Springer

Abstract The Visual Question Answering (VQA) task combines challenges for processing
data with both Visual and Linguistic processing, to answer basic 'common sense'questions …

Guardar Citar Citado por 72 Artículos relacionados Las 4 versiones

Image difference captioning with instance-level fine-grained feature representation

Q Huang, Y Liang, J Wei, Y Cai, H Liang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …

Guardar Citar Citado por 47 Artículos relacionados Las 2 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Aligned dual channel graph convolutional network for visual question answering

Temporal sentence grounding in videos: A survey and future directions

Graph neural networks for visual question answering: a systematic review

Multimodal relation extraction with efficient graph alignment

Vlg-net: Video-language graph matching network for video grounding

Sentiment interaction and multi-graph perception with graph convolutional networks for aspect-based sentiment analysis

Multimodal dialogue response generation

Low-fidelity video encoder optimization for temporal action localization

Exploring sparse spatial relation in graph inference for text-based vqa

Visual question answering using deep learning: A survey and performance analysis

Image difference captioning with instance-level fine-grained feature representation