Large language models are visual reasoning coordinators

L Chen, B Li, S Shen, J Yang, C Li… - Advances in …, 2023 - proceedings.neurips.cc
Visual reasoning requires multimodal perception and commonsense cognition of the world.
Recently, multiple vision-language models (VLMs) have been proposed with excellent …

Aloft: A lightweight mlp-like architecture with dynamic low-frequency transform for domain generalization

J Guo, N Wang, L Qi, Y Shi - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Abstract Domain generalization (DG) aims to learn a model that generalizes well to unseen
target domains utilizing multiple source domains without re-training. Most existing DG works …

Robust mixture-of-expert training for convolutional neural networks

Y Zhang, R Cai, T Chen, G Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has
demonstrated a great promise to enable high-accuracy and ultra-efficient model inference …

Fusemoe: Mixture-of-experts transformers for fleximodal fusion

X Han, H Nguyen, C Harris, N Ho… - Advances in Neural …, 2025 - proceedings.neurips.cc
As machine learning models in critical fields increasingly grapple with multimodal data, they
face the dual challenges of handling a wide array of modalities, often incomplete due to …

Knowledge distillation-based domain-invariant representation learning for domain generalization

Z Niu, J Yuan, X Ma, Y Xu, J Liu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Domain generalization (DG) aims to generalize the knowledge learned from multiple source
domains to unseen target domains. Existing DG techniques can be subsumed under two …

Dgmamba: Domain generalization via generalized state space model

S Long, Q Zhou, X Li, X Lu, C Ying, Y Luo… - Proceedings of the …, 2024 - dl.acm.org
Domain generalization (DG) aims at solving distribution shift problems in various scenes.
Existing approaches are based on Convolution Neural Networks (CNNs) or Vision …

Graph mixture of experts: Learning on large-scale graphs with explicit diversity modeling

H Wang, Z Jiang, Y You, Y Han, G Liu… - Advances in …, 2023 - proceedings.neurips.cc
Graph neural networks (GNNs) have found extensive applications in learning from graph
data. However, real-world graphs often possess diverse structures and comprise nodes and …

Ca-moeit: Generalizable face anti-spoofing via dual cross-attention and semi-fixed mixture-of-expert

A Liu - International Journal of Computer Vision, 2024 - Springer
Although the generalization of face anti-spo-ofing (FAS) is increasingly concerned, it is still
in the initial stage to solve it based on Vision Transformer (ViT). In this paper, we present a …

Moe-ffd: Mixture of experts for generalized and parameter-efficient face forgery detection

C Kong, A Luo, P Bao, Y Yu, H Li, Z Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Deepfakes have recently raised significant trust issues and security concerns among the
public. Compared to CNN face forgery detectors, ViT-based methods take advantage of the …

On least square estimation in softmax gating mixture of experts

H Nguyen, N Ho, A Rinaldo - arxiv preprint arxiv:2402.02952, 2024 - arxiv.org
Mixture of experts (MoE) model is a statistical machine learning design that aggregates
multiple expert networks using a softmax gating function in order to form a more intricate and …