Google Akademik

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Kaydet Alıntı yap Alıntılanma sayısı: 487 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

A review of current trends, techniques, and challenges in large language models (llms)

R Patil, V Gudivada - Applied Sciences, 2024 - mdpi.com

Natural language processing (NLP) has significantly transformed in the last decade,
especially in the field of language modeling. Large language models (LLMs) have achieved …

Kaydet Alıntı yap Alıntılanma sayısı: 87 İlgili makaleler 6 sürümün hepsi Önbellek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen technical report

J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …

Kaydet Alıntı yap Alıntılanma sayısı: 2454 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] zhjwpku.com

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com

Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Kaydet Alıntı yap Alıntılanma sayısı: 3738 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mixtral of experts

AQ Jiang, A Sablayrolles, A Roux, A Mensch… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has
the same architecture as Mistral 7B, with the difference that each layer is composed of 8 …

Kaydet Alıntı yap Alıntılanma sayısı: 1402 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Kaydet Alıntı yap Alıntılanma sayısı: 196 İlgili makaleler 7 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] mlsys.org

Efficiently scaling transformer inference

R Pope, S Douglas, A Chowdhery… - Proceedings of …, 2023 - proceedings.mlsys.org

We study the problem of efficient generative inference for Transformer models, in one of its
most challenging settings: large deep models, with tight latency targets and long sequence …

Kaydet Alıntı yap Alıntılanma sayısı: 361 İlgili makaleler 7 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Kaydet Alıntı yap Alıntılanma sayısı: 296 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

D Dai, C Deng, C Zhao, RX Xu, H Gao, D Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for
managing computational costs when scaling up model parameters. However, conventional …

Kaydet Alıntı yap Alıntılanma sayısı: 167 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Raphael: Text-to-image generation via large mixture of diffusion paths

Z Xue, G Song, Q Guo, B Liu, Z Zong… - Advances in Neural …, 2023 - proceedings.neurips.cc

Text-to-image generation has recently witnessed remarkable achievements. We introduce a
text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images …

Kaydet Alıntı yap Alıntılanma sayısı: 134 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Gshard: Scaling giant models with conditional computation and automatic sharding

Challenges and applications of large language models

A review of current trends, techniques, and challenges in large language models (llms)

Qwen technical report

[PDF][PDF] A survey of large language models

Mixtral of experts

MM1: methods, analysis and insights from multimodal LLM pre-training

Efficiently scaling transformer inference

Google usm: Scaling automatic speech recognition beyond 100 languages

Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

Raphael: Text-to-image generation via large mixture of diffusion paths