A survey on mixture of experts
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …
diverse fields, ranging from natural language processing to computer vision and beyond …
[PDF][PDF] Merging mixture of experts and retrieval augmented generation for enhanced information retrieval and reasoning
X **ong, M Zheng - 2024 - assets-eu.researchsquare.com
This study investigates the integration of Retrieval Augmented Generation (RAG) into the
Mistral 8x7B Large Language Model (LLM), which already uses Mixture of Experts (MoE), to …
Mistral 8x7B Large Language Model (LLM), which already uses Mixture of Experts (MoE), to …
Scaling diffusion transformers to 16 billion parameters
In this paper, we present DiT-MoE, a sparse version of the diffusion Transformer, that is
scalable and competitive with dense networks while exhibiting highly optimized inference …
scalable and competitive with dense networks while exhibiting highly optimized inference …
Interpretable cascading mixture-of-experts for urban traffic congestion prediction
Rapid urbanization has significantly escalated traffic congestion, underscoring the need for
advanced congestion prediction services to bolster intelligent transportation systems. As one …
advanced congestion prediction services to bolster intelligent transportation systems. As one …
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Mixture-of-Experts (MoE) has gained increasing popularity as a promising framework for
scaling up large language models (LLMs). However, the reliability assessment of MoE lags …
scaling up large language models (LLMs). However, the reliability assessment of MoE lags …
A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts
Brain lesion segmentation plays an essential role in neurological research and diagnosis.
As brain lesions can be caused by various pathological alterations, different types of brain …
As brain lesions can be caused by various pathological alterations, different types of brain …
Diversifying the expert knowledge for task-agnostic pruning in sparse mixture-of-experts
By increasing model parameters but activating them sparsely when performing a task, the
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …
use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large …
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing
Although large language models (LLMs) have achieved significant success, their
vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised …
vulnerability to adversarial perturbations, including recent jailbreak attacks, has raised …
Revisiting the Trade-Off Between Accuracy and Robustness Via Weight Distribution of Filters
Adversarial attacks have been proven to be potential threats to Deep Neural Networks
(DNNs), and many methods are proposed to defend against adversarial attacks. However …
(DNNs), and many methods are proposed to defend against adversarial attacks. However …
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Recent advancements in diffusion models have made generative image editing more
accessible, enabling creative edits but raising ethical concerns, particularly regarding …
accessible, enabling creative edits but raising ethical concerns, particularly regarding …