Driving with llms: Fusing object-level vector modality for explainable autonomous driving

L Chen, O Sinavski, J Hünermann… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Large Language Models (LLMs) have shown promise in the autonomous driving sector,
particularly in generalization and interpretability. We introduce a unique objectlevel …

Gradient-based visual explanation for transformer-based clip

C Zhao, K Wang, X Zeng, R Zhao… - … on Machine Learning, 2024 - proceedings.mlr.press
Significant progress has been achieved on the improvement and downstream usages of the
Contrastive Language-Image Pre-training (CLIP) vision-language model, while less …

ToxiSpanSE: An explainable toxicity detection in code review comments

J Sarker, S Sultana, SR Wilson… - 2023 ACM/IEEE …, 2023 - ieeexplore.ieee.org
Background: The existence of toxic conversations in open-source platforms can degrade
relationships among software developers and may negatively impact software product …

Eatformer: Improving vision transformer inspired by evolutionary algorithm

J Zhang, X Li, Y Wang, C Wang, Y Yang, Y Liu… - International Journal of …, 2024 - Springer
Motivated by biological evolution, this paper explains the rationality of Vision Transformer by
analogy with the proven practical evolutionary algorithm (EA) and derives that both have …

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

J Wu, B Duan, W Kang, H Tang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
While Transformers have rapidly gained popularity in various computer vision applications
post-hoc explanations of their internal mechanisms remain largely unexplored. Vision …

A simple interpretable transformer for fine-grained image classification and analysis

D Paul, A Chowdhury, X **ong, FJ Chang… - arxiv preprint arxiv …, 2023 - arxiv.org
We present a novel usage of Transformers to make image classification interpretable. Unlike
mainstream classifiers that wait until the last fully connected layer to incorporate class …

SkipPLUS: Skip the First Few Layers to Better Explain Vision Transformers

F Mehri, M Fayyaz, MS Baghshah… - Proceedings of the …, 2024 - openaccess.thecvf.com
Despite their remarkable performance the explainability of Vision Transformers (ViTs)
remains a challenge. While forward attention-based token attribution techniques have …

Reduction of class activation uncertainty with background information

HM Kabir - arxiv preprint arxiv:2305.03238, 2023 - arxiv.org
Multitask learning is a popular approach to training high-performing neural networks with
improved generalization. In this paper, we propose a background class to achieve improved …

Sparse-Tuning: Adapting vision transformers with efficient fine-tuning and inference

T Liu, X Liu, S Huang, L Shi, Z Xu, Y **n, Q Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …

On the Faithfulness of Vision Transformer Explanations

J Wu, W Kang, H Tang, Y Hong… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract To interpret Vision Transformers post-hoc explanations assign salience scores to
input pixels providing human-understandable heatmaps. However whether these …