- Academic Search

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

Simpan Kutip Dirujuk 37 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

The impact of positional encoding on length generalization in transformers

A Kazemnejad, I Padhi… - Advances in …, 2024 - proceedings.neurips.cc

Length generalization, the ability to generalize from small training context sizes to larger
ones, is a critical challenge in the development of Transformer-based language models …

Simpan Kutip Dirujuk 119 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Lm-infinite: Simple on-the-fly length generalization for large language models

C Han, Q Wang, W **ong, Y Chen, H Ji… - arxiv preprint arxiv …, 2023 - arxiv.org

In recent years, there have been remarkable advancements in the performance of
Transformer-based Large Language Models (LLMs) across various domains. As these LLMs …

Simpan Kutip Dirujuk 106 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey

S Pawar, SM Tonmoy, SM Zaman, V Jain… - arxiv preprint arxiv …, 2024 - arxiv.org

The advent of Large Language Models (LLMs) represents a notable breakthrough in Natural
Language Processing (NLP), contributing to substantial progress in both text …

Simpan Kutip Dirujuk 17 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Length generalization in arithmetic transformers

S Jelassi, S d'Ascoli, C Domingo-Enrich, Y Wu… - arxiv preprint arxiv …, 2023 - arxiv.org

We examine how transformers cope with two challenges: learning basic integer arithmetic,
and generalizing to longer sequences than seen during training. We find that relative …

Simpan Kutip Dirujuk 34 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Learning to reason and memorize with self-notes

J Lanchantin, S Toshniwal, J Weston… - Advances in Neural …, 2024 - proceedings.neurips.cc

Large language models have been shown to struggle with multi-step reasoning, and do not
retain previous reasoning steps for future use. We propose a simple method for solving both …

Simpan Kutip Dirujuk 23 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Kerple: Kernelized relative positional embedding for length extrapolation

TC Chi, TH Fan, PJ Ramadge… - Advances in Neural …, 2022 - proceedings.neurips.cc

Relative positional embeddings (RPE) have received considerable attention since RPEs
effectively model the relative distance among tokens and enable length extrapolation. We …

Simpan Kutip Dirujuk 48 kali Artikel terkait 8 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Positional description matters for transformers arithmetic

R Shen, S Bubeck, R Eldan, YT Lee, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers, central to the successes in modern Natural Language Processing, often falter
on arithmetic tasks despite their vast capabilities--which paradoxically include remarkable …

Simpan Kutip Dirujuk 31 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

AdaMCT: adaptive mixture of CNN-transformer for sequential recommendation

J Jiang, P Zhang, Y Luo, C Li, JB Kim, K Zhang… - Proceedings of the …, 2023 - dl.acm.org

Sequential recommendation (SR) aims to model users' dynamic preferences from a series of
interactions. A pivotal challenge in user modeling for SR lies in the inherent variability of …

Simpan Kutip Dirujuk 45 kali Artikel terkait 4 versi

SMLP4Rec: An Efficient all-MLP Architecture for Sequential Recommendations

J Gao, X Zhao, M Li, M Zhao, R Wu, R Guo… - ACM Transactions on …, 2024 - dl.acm.org

Self-attention models have achieved the state-of-the-art performance in sequential
recommender systems by capturing the sequential dependencies among user–item …

Simpan Kutip Dirujuk 15 kali Artikel terkait

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Shape: Shifted absolute position embedding for transformers

Advancing transformer architecture in long-context large language models: A comprehensive survey

The impact of positional encoding on length generalization in transformers

Lm-infinite: Simple on-the-fly length generalization for large language models

The What, Why, and How of Context Length Extension Techniques in Large Language Models--A Detailed Survey

Length generalization in arithmetic transformers

Learning to reason and memorize with self-notes

Kerple: Kernelized relative positional embedding for length extrapolation

Positional description matters for transformers arithmetic

AdaMCT: adaptive mixture of CNN-transformer for sequential recommendation

SMLP4Rec: An Efficient all-MLP Architecture for Sequential Recommendations