الباحث العلمي من Google

From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape‏

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …‏

حفظ اقتباس تم اقتباسها في عدد: 124 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

A survey on scheduling techniques in computing and network convergence‏

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023‏ - ieeexplore.ieee.org‏

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …‏

حفظ اقتباس تم اقتباسها في عدد: 11 مقالات ذات صلة الإصدارات الـ 2كلها

A survey on mixture of experts‏

W Cai, J Jiang, F Wang, J Tang, S Kim… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …‏

حفظ اقتباس تم اقتباسها في عدد: 70 مقالات ذات صلة الإصدارات الـ 4كلها نسخة مخزَّنة مؤقتًا

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

Megablocks: Efficient sparse training with mixture-of-experts‏

T Gale, D Narayanan, C Young… - … of Machine Learning …, 2023‏ - proceedings.mlsys.org‏

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …‏

حفظ اقتباس تم اقتباسها في عدد: 84 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] researchsquare.com

Enhancing simplified chinese poetry comprehension in llama-7b: A novel approach to mimic mixture of experts effect‏

Y Zhang, X Chen - 2023‏ - researchsquare.com‏

This study explored the potential of manual augmentation in enhancing the comprehension
and translation capabilities of large language models, specifically focusing on the LLaMA …‏

حفظ اقتباس تم اقتباسها في عدد: 53 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Accelerating distributed {MoE} training and inference with lina‏

J Li, Y Jiang, Y Zhu, C Wang, H Xu - 2023 USENIX Annual Technical …, 2023‏ - usenix.org‏

Scaling model parameters improves model quality at the price of high computation
overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) …‏

حفظ اقتباس تم اقتباسها في عدد: 43 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference‏

R Hwang, J Wei, S Cao, C Hwang… - 2024 ACM/IEEE 51st …, 2024‏ - ieeexplore.ieee.org‏

Large language models (LLMs) based on transformers have made significant strides in
recent years, the success of which is driven by scaling up their model size. Despite their high …‏

حفظ اقتباس تم اقتباسها في عدد: 26 مقالات ذات صلة الإصدارات الـ 4كلها

Schemoe: An extensible mixture-of-experts distributed training system with tasks scheduling‏

S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu… - Proceedings of the …, 2024‏ - dl.acm.org‏

In recent years, large-scale models can be easily scaled to trillions of parameters with
sparsely activated mixture-of-experts (MoE), which significantly improves the model quality …‏

حفظ اقتباس تم اقتباسها في عدد: 15 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Janus: A unified distributed training framework for sparse mixture-of-experts models‏

J Liu, JH Wang, Y Jiang - Proceedings of the ACM SIGCOMM 2023 …, 2023‏ - dl.acm.org‏

Scaling models to large sizes to improve performance has led a trend in deep learning, and
sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models …‏

حفظ اقتباس تم اقتباسها في عدد: 25 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A hybrid tensor-expert-data parallelism approach to optimize mixture-of-experts training‏

S Singh, O Ruwase, AA Awan, S Rajbhandari… - Proceedings of the 37th …, 2023‏ - dl.acm.org‏

Mixture-of-Experts (MoE) is a neural network architecture that adds sparsely activated expert
blocks to a base model, increasing the number of parameters without impacting …‏

حفظ اقتباس تم اقتباسها في عدد: 21 مقالات ذات صلة الإصدارات الـ 10كلها بحث عن المكتبات

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Se-moe: A scalable and efficient mixture-of-experts distributed training and inference system

From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape‏

A survey on scheduling techniques in computing and network convergence‏

A survey on mixture of experts‏

Megablocks: Efficient sparse training with mixture-of-experts‏

Enhancing simplified chinese poetry comprehension in llama-7b: A novel approach to mimic mixture of experts effect‏

Accelerating distributed {MoE} training and inference with lina‏

Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference‏

Schemoe: An extensible mixture-of-experts distributed training system with tasks scheduling‏

Janus: A unified distributed training framework for sparse mixture-of-experts models‏

A hybrid tensor-expert-data parallelism approach to optimize mixture-of-experts training‏