- Academic Search

ZY Dou, Y Xu, Z Gan, J Wang, S Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Vision-and-language (VL) pre-training has proven to be highly effective on various
VL downstream tasks. While recent work has shown that fully transformer-based VL models …

Lưu Trích dẫn Trích dẫn 410 bài viết Bài viết có liên quan Tất cả 6 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning deep transformer models for machine translation

Q Wang, B Li, T **ao, J Zhu, C Li, DF Wong… - arxiv preprint arxiv …, 2019 - arxiv.org

Transformer is the state-of-the-art model in recent machine translation evaluations. Two
strands of research are promising to improve models of this kind: the first uses wide …

Lưu Trích dẫn Trích dẫn 863 bài viết Bài viết có liên quan Tất cả 9 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Improving image captioning by leveraging intra-and inter-layer global representation in transformer network

J Ji, Y Luo, X Sun, F Chen, G Luo, Y Wu… - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Transformer-based architectures have shown great success in image captioning, where
object regions are encoded and then attended into the vectorial representations to guide the …

Lưu Trích dẫn Trích dẫn 195 bài viết Bài viết có liên quan Tất cả 6 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Bridgetower: Building bridges between encoders in vision-language representation learning

X Xu, C Wu, S Rosenman, V Lal, W Che… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Vision-Language (VL) models with the Two-Tower architecture have dominated visual-
language representation learning in recent years. Current VL models either use lightweight …

Lưu Trích dẫn Trích dẫn 68 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Modeling localness for self-attention networks

B Yang, Z Tu, DF Wong, F Meng, LS Chao… - arxiv preprint arxiv …, 2018 - arxiv.org

Self-attention networks have proven to be of profound value for its strength of capturing
global dependencies. In this work, we propose to model localness for self-attention …

Lưu Trích dẫn Trích dẫn 214 bài viết Bài viết có liên quan Tất cả 8 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rethinking skip connection with layer normalization in transformers and resnets

F Liu, X Ren, Z Zhang, X Sun, Y Zou - arxiv preprint arxiv:2105.07205, 2021 - arxiv.org

Skip connection, is a widely-used technique to improve the performance and the
convergence of deep neural networks, which is believed to relieve the difficulty in …

Lưu Trích dẫn Trích dẫn 128 bài viết Bài viết có liên quan Tất cả 7 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-head attention with disagreement regularization

J Li, Z Tu, B Yang, MR Lyu, T Zhang - arxiv preprint arxiv:1810.10183, 2018 - arxiv.org

Multi-head attention is appealing for the ability to jointly attend to information from different
representation subspaces at different positions. In this work, we introduce a disagreement …

Lưu Trích dẫn Trích dẫn 193 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

On the diversity of multi-head attention

J Li, X Wang, Z Tu, MR Lyu - Neurocomputing, 2021 - Elsevier

Multi-head attention is appealing for the ability to jointly attend to information from different
representation subspaces at different positions. In this work, we propose two approaches to …

Lưu Trích dẫn Trích dẫn 71 bài viết Bài viết có liên quan

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Convolutional self-attention networks

B Yang, L Wang, D Wong, LS Chao, Z Tu - arxiv preprint arxiv …, 2019 - arxiv.org

Self-attention networks (SANs) have drawn increasing interest due to their high
parallelization in computation and flexibility in modeling dependencies. SANs can be further …

Lưu Trích dẫn Trích dẫn 147 bài viết Bài viết có liên quan Tất cả 7 phiên bản Xem dạng HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Context-aware self-attention networks

B Yang, J Li, DF Wong, LS Chao, X Wang… - Proceedings of the AAAI …, 2019 - ojs.aaai.org

Self-attention model has shown its flexibility in parallel computation and the effectiveness on
modeling both long-and short-term dependencies. However, it calculates the dependencies …

Lưu Trích dẫn Trích dẫn 130 bài viết Bài viết có liên quan Tất cả 9 phiên bản Xem dạng HTML

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

Exploiting deep representations for neural machine translation

An empirical study of training end-to-end vision-and-language transformers

Learning deep transformer models for machine translation

Improving image captioning by leveraging intra-and inter-layer global representation in transformer network

Bridgetower: Building bridges between encoders in vision-language representation learning

Modeling localness for self-attention networks

Rethinking skip connection with layer normalization in transformers and resnets

Multi-head attention with disagreement regularization

On the diversity of multi-head attention

Convolutional self-attention networks

Context-aware self-attention networks