Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Autoad: Movie description in context
The objective of this paper is an automatic Audio Description (AD) model that ingests movies
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …
and outputs AD in text form. Generating high-quality movie AD is challenging due to the …
AAP-MIT: Attentive atrous pyramid network and memory incorporated transformer for multisentence video description
Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …
in computer vision and natural language understanding due to the intricate nature of video …
Trends in integration of vision and language research: A survey of tasks, datasets, and methods
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …
growth in the last few years. This success can be partly attributed to the advancements made …
Lmeye: An interactive perception network for large language models
Y Li, B Hu, X Chen, L Ma, Y Xu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Current efficient approaches to building Multimodal Large Language Models (MLLMs)
mainly incorporate visual information into LLMs with a simple visual map** network such …
mainly incorporate visual information into LLMs with a simple visual map** network such …
Compute to tell the tale: Goal-driven narrative generation
Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …
Unified adaptive relevance distinguishable attention network for image-text matching
Image-text matching, as a fundamental cross-modal task, bridges the gap between vision
and language. The core is to accurately learn semantic alignment to find relevant shared …
and language. The core is to accurately learn semantic alignment to find relevant shared …
What makes a good story and how can we measure it? a comprehensive survey of story evaluation
With the development of artificial intelligence, particularly the success of Large Language
Models (LLMs), the quantity and quality of automatically generated stories have significantly …
Models (LLMs), the quantity and quality of automatically generated stories have significantly …
Shot2story20k: A new benchmark for comprehensive understanding of multi-shot videos
A short clip of video may contain progression of multiple events and an interesting story line.
A human need to capture both the event in every shot and associate them together to …
A human need to capture both the event in every shot and associate them together to …
Image retrieval from contextual descriptions
The ability to integrate context, including perceptual and temporal cues, plays a pivotal role
in grounding the meaning of a linguistic utterance. In order to measure to what extent current …
in grounding the meaning of a linguistic utterance. In order to measure to what extent current …
Image difference captioning with instance-level fine-grained feature representation
The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …
pairs and describing the difference with natural language. The key challenges of this task …