Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Multimodal research in vision and language: A review of current and emerging trends
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …
with a diverse range of modalities present in the real-world data. More recently, this has …
Evaluation of text generation: A survey
A Celikyilmaz, E Clark, J Gao - arxiv preprint arxiv:2006.14799, 2020 - arxiv.org
The paper surveys evaluation methods of natural language generation (NLG) systems that
have been developed in the last few years. We group NLG evaluation methods into three …
have been developed in the last few years. We group NLG evaluation methods into three …
Dip: Dual incongruity perceiving network for sarcasm detection
C Wen, G Jia, J Yang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Sarcasm indicates the literal meaning is contrary to the real attitude. Considering the
popularity and complementarity of image-text data, we investigate the task of multi-modal …
popularity and complementarity of image-text data, we investigate the task of multi-modal …
[HTML][HTML] A systematic literature review on image captioning
R Staniūtė, D Šešok - Applied Sciences, 2019 - mdpi.com
Natural language problems have already been investigated for around five years. Recent
progress in artificial intelligence (AI) has greatly improved the performance of models …
progress in artificial intelligence (AI) has greatly improved the performance of models …
Language models can see: Plugging visual controls in text generation
Generative language models (LMs) such as GPT-2/3 can be prompted to generate text with
remarkable quality. While they are designed for text-prompted generation, it remains an …
remarkable quality. While they are designed for text-prompted generation, it remains an …
Zero-shot video object segmentation with co-attention siamese networks
We introduce a novel network, called CO-attention siamese network (COSNet), to address
the zero-shot video object segmentation task in a holistic fashion. We exploit the inherent …
the zero-shot video object segmentation task in a holistic fashion. We exploit the inherent …
Gpt-4v (ision) as a social media analysis engine
Recent research has offered insights into the extraordinary capabilities of Large Multimodal
Models (LMMs) in various general vision and language tasks. There is growing interest in …
Models (LMMs) in various general vision and language tasks. There is growing interest in …
Emotional video captioning with vision-based emotion interpretation network
Effectively summarizing and re-expressing video content by natural languages in a more
human-like fashion is one of the key topics in the field of multimedia content understanding …
human-like fashion is one of the key topics in the field of multimedia content understanding …
Style-aware contrastive learning for multi-style image captioning
Existing multi-style image captioning methods show promising results in generating a
caption with accurate visual content and desired linguistic style. However, existing methods …
caption with accurate visual content and desired linguistic style. However, existing methods …
Human-like controllable image captioning with verb-specific semantic roles
Abstract Controllable Image Captioning (CIC)--generating image descriptions following
designated control signals--has received unprecedented attention over the last few years …
designated control signals--has received unprecedented attention over the last few years …