Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

Boostmis: Boosting medical image semi-supervised learning with adaptive pseudo labeling and informative active annotation

W Zhang, L Zhu, J Hallinan, S Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we propose a novel semi-supervised learning (SSL) framework named
BoostMIS that combines adaptive pseudo labeling and informative active annotation to …

Fine-tuning multimodal llms to follow zero-shot demonstrative instructions

J Li, K Pan, Z Ge, M Gao, W Ji, W Zhang… - The Twelfth …, 2023 - openreview.net
Recent advancements in Multimodal Large Language Models (MLLMs) have been utilizing
Visual Prompt Generators (VPGs) to convert visual features into tokens that LLMs can …

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

Revisiting the domain shift and sample uncertainty in multi-source active domain transfer

W Zhang, Z Lv, H Zhou, JW Liu, J Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a
new target domain by actively selecting a limited number of target data to annotate. This …

Hierarchical representation network with auxiliary tasks for video captioning and video question answering

L Gao, Y Lei, P Zeng, J Song, M Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recently, integrating vision and language for in-depth video understanding eg, video
captioning and video question answering, has become a promising direction for artificial …

Duet: A tuning-free device-cloud collaborative parameters generation framework for efficient device model generalization

Z Lv, W Zhang, S Zhang, K Kuang, F Wang… - Proceedings of the …, 2023 - dl.acm.org
Device Model Generalization (DMG) is a practical yet under-investigated research topic for
on-device machine learning applications. It aims to improve the generalization ability of pre …

Referring expression comprehension: A survey of methods and datasets

Y Qiao, C Deng, Q Wu - IEEE Transactions on Multimedia, 2020 - ieeexplore.ieee.org
Referring expression comprehension (REC) aims to localize a target object in an image
described by a referring expression phrased in natural language. Different from the object …

Unified adaptive relevance distinguishable attention network for image-text matching

K Zhang, Z Mao, AA Liu, Y Zhang - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Image-text matching, as a fundamental cross-modal task, bridges the gap between vision
and language. The core is to accurately learn semantic alignment to find relevant shared …

Gradient-regulated meta-prompt learning for generalizable vision-language models

J Li, M Gao, L Wei, S Tang, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prompt tuning, a recently emerging paradigm, enables the powerful vision-language pre-
training models to adapt to downstream tasks in a parameter-and data-efficient way, by …