- Academic Search

P **, H Li, Z Cheng, K Li, X Ji, C Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Existing text-video retrieval solutions are, in essence, discriminant models focused on
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …

Tallenna Viittaa Viittausten määrä 63 Aiheeseen liittyviä artikkeleita Kaikki 9 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs

P **, Y Wu, Y Fan, Z Sun, W Yang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Most text-driven human motion generation methods employ sequential modeling
approaches, eg, transformer, to extract sentence-level text representations automatically and …

Tallenna Viittaa Viittausten määrä 28 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Uatvr: Uncertainty-adaptive text-video retrieval

B Fang, W Wu, C Liu, Y Zhou, Y Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the explosive growth of web videos and emerging large-scale vision-language pre-
training models, eg, CLIP, retrieving videos of interest with text instructions has attracted …

Tallenna Viittaa Viittausten määrä 50 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Freestyleret: retrieving images from style-diversified queries

H Li, Y Jia, P **, Z Cheng, K Li, J Sui, C Liu… - European Conference on …, 2024 - Springer

Image Retrieval aims to retrieve corresponding images based on a given query. In
application scenarios, users intend to express their retrieval intent through various query …

Tallenna Viittaa Viittausten määrä 10 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Discover and align taxonomic context priors for open-world semi-supervised learning

Y Wang, Z Zhong, P Qiao, X Cheng… - Advances in …, 2023 - proceedings.neurips.cc

Abstract Open-world Semi-Supervised Learning (OSSL) is a realistic and challenging task,
aiming to classify unlabeled samples from both seen and novel classes using partially …

Tallenna Viittaa Viittausten määrä 9 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Hierarchical banzhaf interaction for general video-language representation learning

P **, H Li, L Yuan, S Yan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multimodal representation learning, with contrastive learning, plays an important role in the
artificial intelligence domain. As an important subfield, video-language representation …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Weakly supervised gaussian contrastive grounding with large multimodal models for video question answering

H Wang, C Lai, Y Sun, W Ge - … of the 32nd ACM International Conference …, 2024 - dl.acm.org

Video Question Answering (VideoQA) aims to answer natural language questions based on
the information observed in videos. Despite the recent success of Large Multimodal Models …

Tallenna Viittaa Viittausten määrä 5 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota

Cooperative game modeling with weighted token-level alignment for audio-text retrieval

Y **n, B Wang, L Shang - IEEE Signal Processing Letters, 2023 - ieeexplore.ieee.org

Previous audio-text retrieval (ATR) methods primarily concentrate on constructing
contrastive pairs between entire audio clips and full caption sentences, while neglecting fine …

Tallenna Viittaa Viittausten määrä 7 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Toward 3d spatial reasoning for human-like text-based visual question answering

H Li, J Huang, P **, G Song, Q Wu, J Chen - arxiv preprint arxiv …, 2022 - arxiv.org

Text-based Visual Question Answering~(TextVQA) aims to produce correct answers for
given questions about the images with multiple scene texts. In most cases, the texts naturally …

Tallenna Viittaa Viittausten määrä 14 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] zju.edu.cn

Single-shot pruning and quantization for hardware-friendly neural network acceleration

B Jiang, J Chen, Y Liu - Engineering Applications of Artificial Intelligence, 2023 - Elsevier

Applying CNN on embedded systems is challenging due to model size limitations. Pruning
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …

Tallenna Viittaa Viittausten määrä 5 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Tg-vqa: Ternary game of video question answering

Diffusionret: Generative text-video retrieval with diffusion model

Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs

Uatvr: Uncertainty-adaptive text-video retrieval

Freestyleret: retrieving images from style-diversified queries

Discover and align taxonomic context priors for open-world semi-supervised learning

Hierarchical banzhaf interaction for general video-language representation learning

Weakly supervised gaussian contrastive grounding with large multimodal models for video question answering

Cooperative game modeling with weighted token-level alignment for audio-text retrieval

Toward 3d spatial reasoning for human-like text-based visual question answering

Single-shot pruning and quantization for hardware-friendly neural network acceleration