Video-text as game players: Hierarchical banzhaf interaction for cross-modal representation learning

P **, J Huang, P **ong, S Tian, C Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Contrastive learning-based video-language representation learning approaches, eg, CLIP,
have achieved outstanding performance, which pursue semantic interaction upon pre …

Diffusionret: Generative text-video retrieval with diffusion model

P **, H Li, Z Cheng, K Li, X Ji, C Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Existing text-video retrieval solutions are, in essence, discriminant models focused on
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …

M-mix: Generating hard negatives via multi-sample mixing for contrastive learning

S Zhang, M Liu, J Yan, H Zhang, L Huang… - Proceedings of the 28th …, 2022 - dl.acm.org
Negative pairs, especially hard negatives as combined with common negatives (easy to
discriminate), are essential in contrastive learning, which plays a role of avoiding …

Text-video retrieval with disentangled conceptualization and set-to-set alignment

P **, H Li, Z Cheng, J Huang, Z Wang, L Yuan… - arxiv preprint arxiv …, 2023 - arxiv.org
Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with
natural language descriptions. Current methods either fail to leverage the local details or are …

Out-of-distributed semantic pruning for robust semi-supervised learning

Y Wang, P Qiao, C Liu, G Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in robust semi-supervised learning (SSL) typical filters out-of-distribution
(OOD) information at the sample level. We argue that an overlooked problem of robust SSL …

Patch-level contrastive learning via positional query for visual pre-training

S Zhang, Q Zhou, Z Wang, F Wang… - … on Machine Learning, 2023 - proceedings.mlr.press
Dense contrastive learning (DCL) has been recently explored for learning localized
information for dense prediction tasks (eg, detection and segmentation). It still suffers the …

Know your self-supervised learning: A survey on image-based generative and discriminative training

U Ozbulak, HJ Lee, B Boga, ET Anzaku, H Park… - arxiv preprint arxiv …, 2023 - arxiv.org
Although supervised learning has been highly successful in improving the state-of-the-art in
the domain of image-based computer vision in the past, the margin of improvement has …

Invariant Graph Learning Meets Information Bottleneck for Out-of-Distribution Generalization

W Mao, J Wu, H Liu, Y Sui, X Wang - arxiv preprint arxiv:2408.01697, 2024 - arxiv.org
Graph out-of-distribution (OOD) generalization remains a major challenge in graph learning
since graph neural networks (GNNs) often suffer from severe performance degradation …

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

X Li, J Wang, X Xu, X Peng, R Singh… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audiovisual segmentation (AVS) is a challenging task that aims to segment visual objects in
videos according to their associated acoustic cues. With multiple sound sources and …

Pcp-mae: Learning to predict centers for point masked autoencoders

X Zhang, S Zhang, J Yan - arxiv preprint arxiv:2408.08753, 2024 - arxiv.org
Masked autoencoder has been widely explored in point cloud self-supervised learning,
whereby the point cloud is generally divided into visible and masked parts. These methods …