Študovňa Google

W Gurnee, T Horsley, ZC Guo, TR Kheirkhah… - arxiv preprint arxiv …, 2024 - arxiv.org

A basic question within the emerging field of mechanistic interpretability is the degree to
which neural networks learn the same underlying mechanisms. In other words, are neural …

Uložiť Citovať Citované 26-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] rug.nl

A primer on the inner workings of transformer-based language models

J Ferrando, G Sarti, A Bisazza, M Costa-jussà - 2024 - research.rug.nl

The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …

Uložiť Citovať Citované 48-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompting a pretrained transformer can be a universal approximator

A Petrov, PHS Torr, A Bibi - arxiv preprint arxiv:2402.14753, 2024 - arxiv.org

Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of
transformer models, our theoretical understanding of these fine-tuning methods remains …

Uložiť Citovať Citované 11-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the role of attention masks and layernorm in transformers

X Wu, A Ajorlou, Y Wang, S Jegelka… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-attention is the key mechanism of transformers, which are the essential building blocks
of modern foundation models. Recent studies have shown that pure self-attention suffers …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] oup.com

Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes

P Xuan, J Gu, H Cui, S Wang, N Toshiya, C Liu… - …, 2024 - academic.oup.com

Motivation The human microbiome may impact the effectiveness of drugs by modulating
their activities and toxicities. Predicting candidate microbes for drugs can facilitate the …

Uložiť Citovať Citované 9-krát Súvisiace články Všetky verzie 13

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Counting like transformers: Compiling temporal counting logic into softmax transformers

A Yang, D Chiang - arxiv preprint arxiv:2404.04393, 2024 - arxiv.org

Deriving formal bounds on the expressivity of transformers, as well as studying transformers
that are constructed to implement known algorithms, are both effective methods for better …

Uložiť Citovať Citované 10-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] techrxiv.org

Decoder-only transformers: the brains behind generative AI, large language models and large multimodal models

D Naik, I Naik, N Naik - … on Computing, Communication, Cybersecurity & AI, 2024 - Springer

The rise of creative machines is attributed to generative AI which enabled machines to
create new contents. Wherein the introduction of the advanced neural network architecture …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] techrxiv.org

ChatGPT Is All You Need: Untangling Its Underlying AI Models, Architecture, Training Procedure, Capabilities, Limitations And Applications

I Naik, D Naik, N Naik - Authorea Preprints, 2024 - techrxiv.org

ChatGPT has now become a global phenomenon that has revolutionized the manner in
which machines interact with humans. It is a noteworthy enhancement in the field of …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 3 HTML verzia

Sentiment analysis of social media comments based on multimodal attention fusion network

Z Liu, T Yang, W Chen, J Chen, Q Li, J Zhang - Applied Soft Computing, 2024 - Elsevier

Social media comments are no longer in a single textual modality, but heterogeneous data
in multiple modalities, such as vision, sound, and text, which is why multimodal sentiment …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 2

CTNet: convolutional transformer network for diabetic retinopathy classification

R Bala, A Sharma, N Goel - Neural Computing and Applications, 2024 - Springer

Currently, diabetic retinopathy diagnosis tools use deep learning and machine learning
algorithms for fundus image classification. Deep learning techniques especially convolution …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 3

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

On the expressivity role of LayerNorm in transformers' attention

Universal neurons in gpt2 language models

A primer on the inner workings of transformer-based language models

Prompting a pretrained transformer can be a universal approximator

On the role of attention masks and layernorm in transformers

Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes

Counting like transformers: Compiling temporal counting logic into softmax transformers

Decoder-only transformers: the brains behind generative AI, large language models and large multimodal models

ChatGPT Is All You Need: Untangling Its Underlying AI Models, Architecture, Training Procedure, Capabilities, Limitations And Applications

Sentiment analysis of social media comments based on multimodal attention fusion network

CTNet: convolutional transformer network for diabetic retinopathy classification