Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Foundations & trends in multimodal machine learning: Principles, challenges, and open questions
PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
Subject-driven text-to-image generation via apprenticeship learning
W Chen, H Hu, Y Li, N Ruiz, X Jia… - Advances in …, 2023 - proceedings.neurips.cc
Recent text-to-image generation models like DreamBooth have made remarkable progress
in generating highly customized images of a target subject, by fine-tuning an``expert …
in generating highly customized images of a target subject, by fine-tuning an``expert …
Multimodal learning with transformers: A survey
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …
Artificial intelligence for science in quantum, atomistic, and continuum systems
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural
sciences. Today, AI has started to advance natural sciences by improving, accelerating, and …
sciences. Today, AI has started to advance natural sciences by improving, accelerating, and …
Seeing what you said: Talking face generation guided by a lip reading expert
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …
concerning lips given coherent speech input. The previous studies revealed the importance …
Verbs in action: Improving verb understanding in video-language models
Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …
and the environment through space and time. Recently, state-of-the-art video-language …
Effective conditioned and composed image retrieval combining clip-based features
Conditioned and composed image retrieval extend CBIR systems by combining a query
image with an additional text that expresses the intent of the user, describing additional …
image with an additional text that expresses the intent of the user, describing additional …
Foundations and trends in multimodal machine learning: Principles, challenges, and open questions
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
computer agents with intelligent capabilities such as understanding, reasoning, and learning …
Actionclip: Adapting language-image pretrained models for video action recognition
The canonical approach to video action recognition dictates a neural network model to do a
classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of …
classic and standard 1-of-N majority vote task. They are trained to predict a fixed set of …