Large language models are visual reasoning coordinators

L Chen, B Li, S Shen, J Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc
Visual reasoning requires multimodal perception and commonsense cognition of the world.
Recently, multiple vision-language models (VLMs) have been proposed with excellent …

Robust mixture-of-expert training for convolutional neural networks

Y Zhang, R Cai, T Chen, G Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has
demonstrated a great promise to enable high-accuracy and ultra-efficient model inference …

Aloft: A lightweight mlp-like architecture with dynamic low-frequency transform for domain generalization

J Guo, N Wang, L Qi, Y Shi - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Abstract Domain generalization (DG) aims to learn a model that generalizes well to unseen
target domains utilizing multiple source domains without re-training. Most existing DG works …

Dgmamba: Domain generalization via generalized state space model

S Long, Q Zhou, X Li, X Lu, C Ying, Y Luo… - Proceedings of the …, 2024 - dl.acm.org
Domain generalization (DG) aims at solving distribution shift problems in various scenes.
Existing approaches are based on Convolution Neural Networks (CNNs) or Vision …

Moe-ffd: Mixture of experts for generalized and parameter-efficient face forgery detection

C Kong, A Luo, P Bao, Y Yu, H Li, Z Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Deepfakes have recently raised significant trust issues and security concerns among the
public. Compared to CNN face forgery detectors, ViT-based methods take advantage of the …

Knowledge distillation-based domain-invariant representation learning for domain generalization

Z Niu, J Yuan, X Ma, Y Xu, J Liu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Domain generalization (DG) aims to generalize the knowledge learned from multiple source
domains to unseen target domains. Existing DG techniques can be subsumed under two …

Graph mixture of experts: Learning on large-scale graphs with explicit diversity modeling

H Wang, Z Jiang, Y You, Y Han, G Liu… - Advances in …, 2024 - proceedings.neurips.cc
Graph neural networks (GNNs) have found extensive applications in learning from graph
data. However, real-world graphs often possess diverse structures and comprise nodes and …

Statistical perspective of top-k sparse softmax gating mixture of experts

H Nguyen, P Akbarian, F Yan, N Ho - arxiv preprint arxiv:2309.13850, 2023 - arxiv.org
Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive
deep-learning architectures without increasing the computational cost. Despite its popularity …

How well does gpt-4v (ision) adapt to distribution shifts? a preliminary investigation

Z Han, G Zhou, R He, J Wang, T Wu, Y Yin… - arxiv preprint arxiv …, 2023 - arxiv.org
In machine learning, generalization against distribution shifts--where deployment conditions
diverge from the training scenarios--is crucial, particularly in fields like climate modeling …

Rethinking domain generalization: Discriminability and generalizability

S Long, Q Zhou, C Ying, L Ma… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Domain generalization (DG) endeavours to develop robust models that possess strong
generalizability while preserving excellent discriminability. Nonetheless, pivotal DG …