A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

[PDF][PDF] Merging mixture of experts and retrieval augmented generation for enhanced information retrieval and reasoning

X **ong, M Zheng - 2024 - assets-eu.researchsquare.com
This study investigates the integration of Retrieval Augmented Generation (RAG) into the
Mistral 8x7B Large Language Model (LLM), which already uses Mixture of Experts (MoE), to …

Scaling diffusion transformers to 16 billion parameters

Z Fei, M Fan, C Yu, D Li, J Huang - arxiv preprint arxiv:2407.11633, 2024 - arxiv.org
In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is
scalable and competitive with dense networks while exhibiting highly optimized inference …

Interpretable cascading mixture-of-experts for urban traffic congestion prediction

W Jiang, J Han, H Liu, T Tao, N Tan… - Proceedings of the 30th …, 2024 - dl.acm.org
Rapid urbanization has significantly escalated traffic congestion, underscoring the need for
advanced congestion prediction services to bolster intelligent transportation systems. As one …

: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

G Chen, X Zhao, T Chen, Y Cheng - arxiv preprint arxiv:2406.11353, 2024 - arxiv.org
Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for
scaling up large language models (LLMs). However, the reliability assessment of MoE lags …

A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts

X Zhang, N Ou, BD Basaran, M Visentin, M Qiao… - … Conference on Medical …, 2024 - Springer
Brain lesion segmentation plays an essential role in neurological research and diagnosis.
As brain lesions can be caused by various pathological alterations, different types of brain …

Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts

Z Zhang, X Liu, H Cheng, C Xu, J Gao - arxiv preprint arxiv:2407.09590, 2024 - arxiv.org
By increasing model parameters but activating them sparsely when performing a task, the
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …

Advancing the Robustness of Large Language Models through Self-Denoised Smoothing

J Ji, B Hou, Z Zhang, G Zhang, W Fan, Q Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Although large language models (LLMs) have achieved significant success, their
vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised …

Revisiting the Trade-Off Between Accuracy and Robustness Via Weight Distribution of Filters

X Wei, S Zhao, B Li - IEEE Transactions on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Adversarial attacks have been proven to be potential threats to Deep Neural Networks
(DNNs), and many methods are proposed to defend against adversarial attacks. However …

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

H Wang, Y Zhang, R Bai, Y Zhao, S Liu, Z Tu - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in diffusion models have made generative image editing more
accessible, enabling creative edits but raising ethical concerns, particularly regarding …