Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Knowledge graphs meet multi-modal learning: A comprehensive survey
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
semantic web community's exploration into multi-modal dimensions unlocking new avenues …
Neural prompt search
The size of vision models has grown exponentially over the last few years, especially after
the emergence of Vision Transformer. This has motivated the development of parameter …
the emergence of Vision Transformer. This has motivated the development of parameter …
V3det: Vast vocabulary visual detection dataset
Recent advances in detecting arbitrary objects in the real world are trained and evaluated
on object detection datasets with a relatively restricted vocabulary. To facilitate the …
on object detection datasets with a relatively restricted vocabulary. To facilitate the …
T-rex2: Towards generic object detection via text-visual prompt synergy
We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …
set object detection methods relying on text prompts effectively encapsulate the abstract …
Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures
Transformers have revolutionized computer vision and natural language processing, but
their high computational complexity limits their application in high-resolution image …
their high computational complexity limits their application in high-resolution image …
Octavius: Mitigating task interference in mllms via lora-moe
Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-
shot generalization capabilities to multimodal learning through instruction tuning. As more …
shot generalization capabilities to multimodal learning through instruction tuning. As more …
RADAM: Texture recognition through randomized aggregated encoding of deep activation maps
Texture analysis is a classical yet challenging task in computer vision for which deep neural
networks are actively being applied. Most approaches are based on building feature …
networks are actively being applied. Most approaches are based on building feature …
Open long-tailed recognition in a dynamic world
Real world data often exhibits a long-tailed and open-ended (ie, with unseen classes)
distribution. A practical recognition system must balance between majority (head) and …
distribution. A practical recognition system must balance between majority (head) and …
Benchmarking omni-vision representation through the lens of visual realms
Though impressive performance has been achieved in specific visual realms (eg faces,
dogs, and places), an omni-vision representation generalizing to many natural visual …
dogs, and places), an omni-vision representation generalizing to many natural visual …
Chef: A comprehensive evaluation framework for standardized assessment of multimodal large language models
Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting
with visual content with myriad potential downstream tasks. However, even though a list of …
with visual content with myriad potential downstream tasks. However, even though a list of …