Vlp: A survey on vision-language pre-training
In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …
such as computer vision (CV) and natural language processing (NLP) to a new era …
Edge-cloud polarization and collaboration: A comprehensive survey for ai
Influenced by the great success of deep learning via cloud computing and the rapid
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …
development of edge chips, research in artificial intelligence (AI) has shifted to both of the …
Towards out-of-distribution generalization: A survey
Traditional machine learning paradigms are based on the assumption that both training and
test data follow the same statistical pattern, which is mathematically referred to as …
test data follow the same statistical pattern, which is mathematically referred to as …
Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system
The general aim of the recommender system is to provide personalized suggestions to
users, which is opposed to suggesting popular items. However, the normal training …
users, which is opposed to suggesting popular items. However, the normal training …
Counterfactual vqa: A cause-effect look at language bias
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …
Causerec: Counterfactual user sequence synthesis for sequential recommendation
Learning user representations based on historical behaviors lies at the core of modern
recommender systems. Recent advances in sequential recommenders have convincingly …
recommender systems. Recent advances in sequential recommenders have convincingly …
Causal representation learning for out-of-distribution recommendation
Modern recommender systems learn user representations from historical interactions, which
suffer from the problem of user feature shifts, such as an income increase. Historical …
suffer from the problem of user feature shifts, such as an income increase. Historical …
Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …
language query. Existing techniques achieve such alignment by exploiting dense boundary …
Compositional temporal grounding with structured variational cross-graph correspondence learning
Temporal grounding in videos aims to localize one target video segment that semantically
corresponds to a given query sentence. Thanks to the semantic diversity of natural language …
corresponds to a given query sentence. Thanks to the semantic diversity of natural language …
Exposing and mitigating spurious correlations for cross-modal retrieval
Cross-modal retrieval methods are the preferred tool to search databases for the text that
best matches a query image and vice versa However, image-text retrieval models commonly …
best matches a query image and vice versa However, image-text retrieval models commonly …