Scene adaptive sparse transformer for event-based object detection

Y Peng, H Li, Y Zhang, X Sun… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
While recent Transformer-based approaches have shown impressive performances on
event-based object detection tasks their high computational costs still diminish the low …

Cdac: Cross-domain attention consistency in transformer for domain adaptive semantic segmentation

K Wang, D Kim, R Feris… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
While transformers have greatly boosted performance in semantic segmentation, domain
adaptive transformers are not yet well explored. We identify that the domain gap can cause …

Lightweight convolutional neural networks with context broadcast transformer for real-time semantic segmentation

K Hu, Z **e, Q Hu - Image and Vision Computing, 2024 - Elsevier
With the increasing application of embedded mobile devices in various fields, lightweight
real-time semantic segmentation systems have attracted more and more attention. Many …

Worldafford: Affordance grounding based on natural language instructions

C Chen, Y Cong, Z Kan - 2024 IEEE 36th International …, 2024 - ieeexplore.ieee.org
Affordance grounding aims to localize the interaction regions for the manipulated objects in
the scene image according to given instructions, which is essential for Embodied AI and …

Unitnorm: Rethinking normalization for transformers in time series

N Huang, C Kümmerle, X Zhang - arxiv preprint arxiv:2405.15903, 2024 - arxiv.org
Normalization techniques are crucial for enhancing Transformer models' performance and
stability in time series analysis tasks, yet traditional methods like batch and layer …

A cross-modal collaborative guiding network for sarcasm explanation in multi-modal multi-party dialogues

X Zhuang, Z Li, C Zhang, H Ma - Engineering Applications of Artificial …, 2025 - Elsevier
Indirect forms of language, such as sarcasm, are highly prevalent in contemporary human
daily communication. While the indirect nature of metaphorical language ensures that …

When multi-view meets multi-level: A novel spatio-temporal transformer for traffic prediction

J Lin, Q Ren, X Lv, H Xu, Y Liu - Information Fusion, 2025 - Elsevier
Traffic prediction is a vital aspect of Intelligent Transportation Systems with widespread
applications. The main challenge is accurately modeling the complex spatial and temporal …

Dual-encoder network for pavement concrete crack segmentation with multi-stage supervision

J Wang, H Yao, J Hu, Y Ma, J Wang - Automation in Construction, 2025 - Elsevier
Cracks are a prevalent disease on pavement concrete materials. Timely assessment and
repair of concrete materials can significantly extend their service life. However, accurate …

Bring Adaptive Binding Prototypes to Generalized Referring Expression Segmentation

W Li, Z Zhao, H Bai, F Su - arxiv preprint arxiv:2405.15169, 2024 - arxiv.org
Referring Expression Segmentation (RES) has attracted rising attention, aiming to identify
and segment objects based on natural language expressions. While substantial progress …

A Multi-Modal Unified Representation Learning Framework with Masked Image Modeling for Remote Sensing Images

D Du, T Liu, Y Gu - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
The coordinated utilization of diverse types of satellite sensors provides a more
comprehensive view of the Earth's surface. However, due to the significant heterogeneity …