- Academic Search

Y Liang, Z Shi, Z Song, Y Zhou - arxiv preprint arxiv:2405.16411, 2024 - arxiv.org

Tensor Attention, a multi-view attention that is able to capture high-order correlations among
multiple modalities, can overcome the representational limitations of classical matrix …

Speichern Zitieren Zitiert von: 27 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

Y Liang, H Liu, Z Shi, Z Song, Z Xu, J Yin - arxiv preprint arxiv:2405.05219, 2024 - arxiv.org

The self-attention mechanism is the key to the success of transformers in recent Large
Language Models (LLMs). However, the quadratic computational cost $ O (n^ 2) $ in the …

Speichern Zitieren Zitiert von: 24 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Multi-layer transformers gradient can be approximated in almost linear time

Y Liang, Z Sha, Z Shi, Z Song, Y Zhou - arxiv preprint arxiv:2408.13233, 2024 - arxiv.org

The computational complexity of the self-attention mechanism in popular transformer
architectures poses significant challenges for training and inference, and becomes the …

Speichern Zitieren Zitiert von: 24 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Hsr-enhanced sparse attention acceleration

B Chen, Y Liang, Z Sha, Z Shi, Z Song - arxiv preprint arxiv:2410.10165, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities across various
applications, but their performance on long-context tasks is often limited by the …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2410.11268, 2024 - arxiv.org

In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

Speichern Zitieren Zitiert von: 13 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Differentially private attention computation

Y Gao, Z Song, X Yang, Y Zhou - arxiv preprint arxiv:2305.04701, 2023 - arxiv.org

Large language models (LLMs) have had a profound impact on numerous aspects of daily
life including natural language processing, content generation, research methodologies and …

Speichern Zitieren Zitiert von: 25 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Advancing the understanding of fixed point iterations in deep neural networks: A detailed analytical study

Y Ke, X Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2410.11279, 2024 - arxiv.org

Recent empirical studies have identified fixed point iteration phenomena in deep neural
networks, where the hidden state tends to stabilize after several layers, showing minimal …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

The computational limits of state-space models and mamba via the lens of circuit complexity

Y Chen, X Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2412.06148, 2024 - arxiv.org

In this paper, we analyze the computational limitations of Mamba and State-space Models
(SSMs) by using the circuit complexity framework. Despite Mamba's stateful design and …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

On the expressive power of modern hopfield networks

X Li, Y Li, Y Liang, Z Shi, Z Song - arxiv preprint arxiv:2412.05562, 2024 - arxiv.org

Modern Hopfield networks (MHNs) have emerged as powerful tools in deep learning,
capable of replacing components such as pooling layers, LSTMs, and attention …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 2 Versionen HTML-Version

Fast Second-order Method for Neural Networks under Small Treewidth Setting

X Li, J Long, Z Song, T Zhou - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Training neural networks is a fundamental problem in theoretical machine learning. Second-
order methods are rarely used in practice due to their high computational cost, even they …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 2 Versionen

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

A tighter complexity analysis of sparsegpt

Tensor attention training: Provably efficient learning of higher-order transformers

Conv-basis: A new paradigm for efficient attention inference and gradient computation in transformers

Multi-layer transformers gradient can be approximated in almost linear time

Hsr-enhanced sparse attention acceleration

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

Differentially private attention computation

Advancing the understanding of fixed point iterations in deep neural networks: A detailed analytical study

The computational limits of state-space models and mamba via the lens of circuit complexity

On the expressive power of modern hopfield networks

Fast Second-order Method for Neural Networks under Small Treewidth Setting