Driving with llms: Fusing object-level vector modality for explainable autonomous driving
Large Language Models (LLMs) have shown promise in the autonomous driving sector,
particularly in generalization and interpretability. We introduce a unique objectlevel …
particularly in generalization and interpretability. We introduce a unique objectlevel …
Gradient-based visual explanation for transformer-based clip
Significant progress has been achieved on the improvement and downstream usages of the
Contrastive Language-Image Pre-training (CLIP) vision-language model, while less …
Contrastive Language-Image Pre-training (CLIP) vision-language model, while less …
ToxiSpanSE: An explainable toxicity detection in code review comments
Background: The existence of toxic conversations in open-source platforms can degrade
relationships among software developers and may negatively impact software product …
relationships among software developers and may negatively impact software product …
Eatformer: Improving vision transformer inspired by evolutionary algorithm
Motivated by biological evolution, this paper explains the rationality of Vision Transformer by
analogy with the proven practical evolutionary algorithm (EA) and derives that both have …
analogy with the proven practical evolutionary algorithm (EA) and derives that both have …
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
While Transformers have rapidly gained popularity in various computer vision applications
post-hoc explanations of their internal mechanisms remain largely unexplored. Vision …
post-hoc explanations of their internal mechanisms remain largely unexplored. Vision …
A simple interpretable transformer for fine-grained image classification and analysis
We present a novel usage of Transformers to make image classification interpretable. Unlike
mainstream classifiers that wait until the last fully connected layer to incorporate class …
mainstream classifiers that wait until the last fully connected layer to incorporate class …
SkipPLUS: Skip the First Few Layers to Better Explain Vision Transformers
Despite their remarkable performance the explainability of Vision Transformers (ViTs)
remains a challenge. While forward attention-based token attribution techniques have …
remains a challenge. While forward attention-based token attribution techniques have …
Reduction of class activation uncertainty with background information
HM Kabir - arxiv preprint arxiv:2305.03238, 2023 - arxiv.org
Multitask learning is a popular approach to training high-performing neural networks with
improved generalization. In this paper, we propose a background class to achieve improved …
improved generalization. In this paper, we propose a background class to achieve improved …
Sparse-Tuning: Adapting vision transformers with efficient fine-tuning and inference
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …
On the Faithfulness of Vision Transformer Explanations
Abstract To interpret Vision Transformers post-hoc explanations assign salience scores to
input pixels providing human-understandable heatmaps. However whether these …
input pixels providing human-understandable heatmaps. However whether these …