Vlp: A survey on vision-language pre-training

FL Chen, DZ Zhang, ML Han, XY Chen, J Shi… - Machine Intelligence …, 2023 - Springer
In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …

Edge-cloud polarization and collaboration: A comprehensive survey for ai

J Yao, S Zhang, Y Yao, F Wang, J Ma… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Influenced by the great success of deep learning via cloud computing and the rapid
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …

Towards out-of-distribution generalization: A survey

J Liu, Z Shen, Y He, X Zhang, R Xu, H Yu… - arxiv preprint arxiv …, 2021 - arxiv.org
Traditional machine learning paradigms are based on the assumption that both training and
test data follow the same statistical pattern, which is mathematically referred to as …

Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system

T Wei, F Feng, J Chen, Z Wu, J Yi, X He - Proceedings of the 27th ACM …, 2021 - dl.acm.org
The general aim of the recommender system is to provide personalized suggestions to
users, which is opposed to suggesting popular items. However, the normal training …

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

Causerec: Counterfactual user sequence synthesis for sequential recommendation

S Zhang, D Yao, Z Zhao, TS Chua, F Wu - Proceedings of the 44th …, 2021 - dl.acm.org
Learning user representations based on historical behaviors lies at the core of modern
recommender systems. Recent advances in sequential recommenders have convincingly …

Causal representation learning for out-of-distribution recommendation

W Wang, X Lin, F Feng, X He, M Lin… - Proceedings of the ACM …, 2022 - dl.acm.org
Modern recommender systems learn user representations from historical interactions, which
suffer from the problem of user feature shifts, such as an income increase. Historical …

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

Compositional temporal grounding with structured variational cross-graph correspondence learning

J Li, J **e, L Qian, L Zhu, S Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Temporal grounding in videos aims to localize one target video segment that semantically
corresponds to a given query sentence. Thanks to the semantic diversity of natural language …

Exposing and mitigating spurious correlations for cross-modal retrieval

JM Kim, A Koepke, C Schmid… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Cross-modal retrieval methods are the preferred tool to search databases for the text that
best matches a query image and vice versa However, image-text retrieval models commonly …