Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
An empirical study of training end-to-end vision-and-language transformers
Abstract Vision-and-language (VL) pre-training has proven to be highly effective on various
VL downstream tasks. While recent work has shown that fully transformer-based VL models …
VL downstream tasks. While recent work has shown that fully transformer-based VL models …
Learning deep transformer models for machine translation
Transformer is the state-of-the-art model in recent machine translation evaluations. Two
strands of research are promising to improve models of this kind: the first uses wide …
strands of research are promising to improve models of this kind: the first uses wide …
Improving image captioning by leveraging intra-and inter-layer global representation in transformer network
Transformer-based architectures have shown great success in image captioning, where
object regions are encoded and then attended into the vectorial representations to guide the …
object regions are encoded and then attended into the vectorial representations to guide the …
Bridgetower: Building bridges between encoders in vision-language representation learning
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-
language representation learning in recent years. Current VL models either use lightweight …
language representation learning in recent years. Current VL models either use lightweight …
Modeling localness for self-attention networks
Self-attention networks have proven to be of profound value for its strength of capturing
global dependencies. In this work, we propose to model localness for self-attention …
global dependencies. In this work, we propose to model localness for self-attention …
Rethinking skip connection with layer normalization in transformers and resnets
Skip connection, is a widely-used technique to improve the performance and the
convergence of deep neural networks, which is believed to relieve the difficulty in …
convergence of deep neural networks, which is believed to relieve the difficulty in …
Multi-head attention with disagreement regularization
Multi-head attention is appealing for the ability to jointly attend to information from different
representation subspaces at different positions. In this work, we introduce a disagreement …
representation subspaces at different positions. In this work, we introduce a disagreement …
On the diversity of multi-head attention
Multi-head attention is appealing for the ability to jointly attend to information from different
representation subspaces at different positions. In this work, we propose two approaches to …
representation subspaces at different positions. In this work, we propose two approaches to …
Convolutional self-attention networks
Self-attention networks (SANs) have drawn increasing interest due to their high
parallelization in computation and flexibility in modeling dependencies. SANs can be further …
parallelization in computation and flexibility in modeling dependencies. SANs can be further …
Context-aware self-attention networks
Self-attention model has shown its flexibility in parallel computation and the effectiveness on
modeling both long-and short-term dependencies. However, it calculates the dependencies …
modeling both long-and short-term dependencies. However, it calculates the dependencies …