Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Pre-trained language models for text generation: A survey
Text Generation aims to produce plausible and readable text in human language from input
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
data. The resurgence of deep learning has greatly advanced this field, in particular, with the …
Video description: A comprehensive survey of deep learning approaches
Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
Panda-70m: Captioning 70m videos with multiple cross-modality teachers
The quality of the data and annotation upper-bounds the quality of a downstream model.
While there exist large text corpora and image-text pairs high-quality video-text data is much …
While there exist large text corpora and image-text pairs high-quality video-text data is much …
Vid2seq: Large-scale pretraining of a visual language model for dense video captioning
In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …
Timechat: A time-sensitive multimodal large language model for long video understanding
This work proposes TimeChat a time-sensitive multimodal large language model specifically
designed for long video understanding. Our model incorporates two key architectural …
designed for long video understanding. Our model incorporates two key architectural …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Pali-x: On scaling up a multilingual vision and language model
We present the training recipe and results of scaling up PaLI-X, a multilingual vision and
language model, both in terms of size of the components and the breadth of its training task …
language model, both in terms of size of the components and the breadth of its training task …
Streaming dense video captioning
An ideal model for dense video captioning--predicting captions localized temporally in a
video--should be able to handle long input videos predict rich detailed textual descriptions …
video--should be able to handle long input videos predict rich detailed textual descriptions …
[PDF][PDF] Learning interactive real-world simulators
Generative models trained on internet data have revolutionized how text, image, and video
content can be created. Perhaps the next milestone for generative models is to simulate …
content can be created. Perhaps the next milestone for generative models is to simulate …
Reflect: Summarizing robot experiences for failure explanation and correction
The ability to detect and analyze failed executions automatically is crucial for an explainable
and robust robotic system. Recently, Large Language Models (LLMs) have demonstrated …
and robust robotic system. Recently, Large Language Models (LLMs) have demonstrated …