Google Académico

Z **ng, Q Feng, H Chen, Q Dai, H Hu, H Xu… - ACM Computing …, 2024 - dl.acm.org

The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …

Guardar Citar Citado por 94 Artículos relacionados Las 3 versiones

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Chat-univi: Unified visual representation empowers large language models with image and video understanding

P **, R Takanobu, W Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …

Guardar Citar Citado por 170 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Momentdiff: Generative video moment retrieval from random to real

P Li, CW **e, H **e, L Zhao, L Zhang… - Advances in neural …, 2023 - proceedings.neurips.cc

Video moment retrieval pursues an efficient and generalized solution to identify the specific
temporal segments within an untrimmed video that correspond to a given language …

Guardar Citar Citado por 64 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Guardar Citar Citado por 61 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs

P **, Y Wu, Y Fan, Z Sun, W Yang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Most text-driven human motion generation methods employ sequential modeling
approaches, eg, transformer, to extract sentence-level text representations automatically and …

Guardar Citar Citado por 26 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Progressive spatio-temporal prototype matching for text-video retrieval

P Li, CW **e, L Zhao, H **e, J Ge… - Proceedings of the …, 2023 - openaccess.thecvf.com

The performance of text-video retrieval has been significantly improved by vision-language
cross-modal learning schemes. The typical solution is to directly align the global video-level …

Guardar Citar Citado por 34 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Uatvr: Uncertainty-adaptive text-video retrieval

B Fang, W Wu, C Liu, Y Zhou, Y Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the explosive growth of web videos and emerging large-scale vision-language pre-
training models, eg, CLIP, retrieving videos of interest with text instructions has attracted …

Guardar Citar Citado por 48 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Text-video retrieval with disentangled conceptualization and set-to-set alignment

P **, H Li, Z Cheng, J Huang, Z Wang, L Yuan… - arxiv preprint arxiv …, 2023 - arxiv.org

Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with
natural language descriptions. Current methods either fail to leverage the local details or are …

Guardar Citar Citado por 35 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Patch-level contrastive learning via positional query for visual pre-training

S Zhang, Q Zhou, Z Wang, F Wang… - … on Machine Learning, 2023 - proceedings.mlr.press

Dense contrastive learning (DCL) has been recently explored for learning localized
information for dense prediction tasks (eg, detection and segmentation). It still suffers the …

Guardar Citar Citado por 13 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Freestyleret: Retrieving images from style-diversified queries

H Li, Y Jia, P **, Z Cheng, K Li, J Sui, C Liu… - European Conference on …, 2024 - Springer

Image Retrieval aims to retrieve corresponding images based on a given query. In
application scenarios, users intend to express their retrieval intent through various query …

Guardar Citar Citado por 9 Artículos relacionados Las 2 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Diffusionret: Generative text-video retrieval with diffusion model

A survey on video diffusion models

Chat-univi: Unified visual representation empowers large language models with image and video understanding

Momentdiff: Generative video moment retrieval from random to real

Video understanding with large language models: A survey

Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs

Progressive spatio-temporal prototype matching for text-video retrieval

Uatvr: Uncertainty-adaptive text-video retrieval

Text-video retrieval with disentangled conceptualization and set-to-set alignment

Patch-level contrastive learning via positional query for visual pre-training

Freestyleret: Retrieving images from style-diversified queries