Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts

H Nguyen, P Akbarian, T Pham, T Nguyen… - arxiv preprint arxiv …, 2024 - arxiv.org
The cosine router in sparse Mixture of Experts (MoE) has recently emerged as an attractive
alternative to the conventional linear router. Indeed, the cosine router demonstrates …

Understanding expert structures on minimax parameter estimation in contaminated mixture of experts

F Yan, H Nguyen, D Le, P Akbarian, N Ho - arxiv preprint arxiv …, 2024 - arxiv.org
We conduct the convergence analysis of parameter estimation in the contaminated mixture
of experts. This model is motivated from the prompt learning problem where ones utilize …

A general theory for softmax gating multinomial logistic mixture of experts

H Nguyen, P Akbarian, TT Nguyen, N Ho - arxiv preprint arxiv:2310.14188, 2023 - arxiv.org
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating
functions to achieve greater performance in numerous regression and classification …

DGPO: discovering multiple strategies with diversity-guided policy optimization

W Chen, S Huang, Y Chiang, T Pearce… - Proceedings of the …, 2024 - ojs.aaai.org
Most reinforcement learning algorithms seek a single optimal strategy that solves a given
task. However, it can often be valuable to learn a diverse set of solutions, for instance, to …