- Academic Search

R Patil, V Gudivada - Applied Sciences, 2024 - mdpi.com

Natural language processing (NLP) has significantly transformed in the last decade,
especially in the field of language modeling. Large language models (LLMs) have achieved …

บันทึก อ้างอิง อ้างโดย90 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ แคช

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier

Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

บันทึก อ้างอิง อ้างโดย70 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only

G Penedo, Q Malartic, D Hesslow, R Cojocaru… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models are commonly trained on a mixture of filtered web data and curated
high-quality corpora, such as social media conversations, books, or technical papers. This …

บันทึก อ้างอิง อ้างโดย747 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers

Y Balaji, S Nah, X Huang, A Vahdat, J Song… - arxiv preprint arxiv …, 2022 - arxiv.org

Large-scale diffusion-based generative models have led to breakthroughs in text-
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …

บันทึก อ้างอิง อ้างโดย734 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Glm-130b: An open bilingual pre-trained model

A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding… - arxiv preprint arxiv …, 2022 - arxiv.org

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model
with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as …

บันทึก อ้างอิง อ้างโดย633 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The refinedweb dataset for falcon llm: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow… - Advances in …, 2023 - proceedings.neurips.cc

Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …

บันทึก อ้างอิง อ้างโดย119 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in neural …, 2022 - proceedings.neurips.cc

Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

บันทึก อ้างอิง อ้างโดย1006 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press

Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

บันทึก อ้างอิง อ้างโดย446 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training compute-optimal large language models

J Hoffmann, S Borgeaud, A Mensch… - arxiv preprint arxiv …, 2022 - arxiv.org

We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …

บันทึก อ้างอิง อ้างโดย1944 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

D4: Improving llm pretraining via document de-duplication and diversification

K Tirumala, D Simig, A Aghajanyan… - Advances in Neural …, 2023 - proceedings.neurips.cc

Over recent years, an increasing amount of compute and data has been poured into training
large language models (LLMs), usually by doing one-pass learning on as many tokens as …

บันทึก อ้างอิง อ้างโดย106 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Efficient large scale language modeling with mixtures of experts

A review of current trends, techniques, and challenges in large language models (llms)

A survey of techniques for optimizing transformer inference

The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only

ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers

Glm-130b: An open bilingual pre-trained model

The refinedweb dataset for falcon llm: Outperforming curated corpora with web data only

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

Training compute-optimal large language models

D4: Improving llm pretraining via document de-duplication and diversification