From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Recent advances in generative ai and large language models: Current status, challenges, and perspectives

DH Hagos, R Battle, DB Rawat - IEEE Transactions on Artificial …, 2024 - ieeexplore.ieee.org
The emergence of generative artificial intelligence (AI) and large language models (LLMs)
has marked a new era of natural language processing (NLP), introducing unprecedented …

Language is not all you need: Aligning perception with language models

S Huang, L Dong, W Wang, Y Hao… - Advances in …, 2024 - proceedings.neurips.cc
A big convergence of language, multimodal perception, action, and world modeling is a key
step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a …

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Retentive network: A successor to transformer for large language models

Y Sun, L Dong, S Huang, S Ma, Y **a, J Xue… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large
language models, simultaneously achieving training parallelism, low-cost inference, and …

Language models are general-purpose interfaces

Y Hao, H Song, L Dong, S Huang, Z Chi… - arxiv preprint arxiv …, 2022 - arxiv.org
Foundation models have received much attention due to their effectiveness across a broad
range of downstream applications. Though there is a big convergence in terms of …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

LoRAMoE: Alleviating world knowledge forgetting in large language models via MoE-style plugin

S Dou, E Zhou, Y Liu, S Gao, W Shen… - Proceedings of the …, 2024 - aclanthology.org
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling
them to align with human instructions and enhance their capabilities in downstream tasks …

Tutel: Adaptive mixture-of-experts at scale

C Hwang, W Cui, Y **ong, Z Yang… - Proceedings of …, 2023 - proceedings.mlsys.org
Sparsely-gated mixture-of-experts (MoE) has been widely adopted to scale deep learning
models to trillion-plus parameters with fixed computational cost. The algorithmic …

Adamv-moe: Adaptive multi-task vision mixture-of-experts

T Chen, X Chen, X Du, A Rashwan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Sparsely activated Mixture-of-Experts (MoE) is becoming a promising paradigm for
multi-task learning (MTL). Instead of compressing multiple tasks' knowledge into a single …