Universal neurons in gpt2 language models

W Gurnee, T Horsley, ZC Guo, TR Kheirkhah… - arxiv preprint arxiv …, 2024 - arxiv.org
A basic question within the emerging field of mechanistic interpretability is the degree to
which neural networks learn the same underlying mechanisms. In other words, are neural …

A primer on the inner workings of transformer-based language models

J Ferrando, G Sarti, A Bisazza, M Costa-jussà - 2024 - research.rug.nl
The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …

Prompting a pretrained transformer can be a universal approximator

A Petrov, PHS Torr, A Bibi - arxiv preprint arxiv:2402.14753, 2024 - arxiv.org
Despite the widespread adoption of prompting, prompt tuning and prefix-tuning of
transformer models, our theoretical understanding of these fine-tuning methods remains …

On the role of attention masks and layernorm in transformers

X Wu, A Ajorlou, Y Wang, S Jegelka… - arxiv preprint arxiv …, 2024 - arxiv.org
Self-attention is the key mechanism of transformers, which are the essential building blocks
of modern foundation models. Recent studies have shown that pure self-attention suffers …

Multi-scale topology and position feature learning and relationship-aware graph reasoning for prediction of drug-related microbes

P Xuan, J Gu, H Cui, S Wang, N Toshiya, C Liu… - …, 2024 - academic.oup.com
Motivation The human microbiome may impact the effectiveness of drugs by modulating
their activities and toxicities. Predicting candidate microbes for drugs can facilitate the …

Counting like transformers: Compiling temporal counting logic into softmax transformers

A Yang, D Chiang - arxiv preprint arxiv:2404.04393, 2024 - arxiv.org
Deriving formal bounds on the expressivity of transformers, as well as studying transformers
that are constructed to implement known algorithms, are both effective methods for better …

Decoder-only transformers: the brains behind generative AI, large language models and large multimodal models

D Naik, I Naik, N Naik - … on Computing, Communication, Cybersecurity & AI, 2024 - Springer
The rise of creative machines is attributed to generative AI which enabled machines to
create new contents. Wherein the introduction of the advanced neural network architecture …

ChatGPT Is All You Need: Untangling Its Underlying AI Models, Architecture, Training Procedure, Capabilities, Limitations And Applications

I Naik, D Naik, N Naik - Authorea Preprints, 2024 - techrxiv.org
ChatGPT has now become a global phenomenon that has revolutionized the manner in
which machines interact with humans. It is a noteworthy enhancement in the field of …

Sentiment analysis of social media comments based on multimodal attention fusion network

Z Liu, T Yang, W Chen, J Chen, Q Li, J Zhang - Applied Soft Computing, 2024 - Elsevier
Social media comments are no longer in a single textual modality, but heterogeneous data
in multiple modalities, such as vision, sound, and text, which is why multimodal sentiment …

CTNet: convolutional transformer network for diabetic retinopathy classification

R Bala, A Sharma, N Goel - Neural Computing and Applications, 2024 - Springer
Currently, diabetic retinopathy diagnosis tools use deep learning and machine learning
algorithms for fundus image classification. Deep learning techniques especially convolution …