Google Наука

Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Y **ao, Z Luo, Y Liu, Y Ma, H Bian… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Video Moment Retrieval (MR) and Highlight Detection (HD) have attracted
significant attention due to the growing demand for video analysis. Recent approaches treat …

Запазване Позоваване С позовавания в 32 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Soc: Semantic-assisted object cluster for referring video object segmentation

Z Luo, Y **ao, Y Liu, S Li, Y Wang… - Advances in …, 2023 - proceedings.neurips.cc

This paper studies referring video object segmentation (RVOS) by boosting video-level
visual-linguistic alignment. Recent approaches model the RVOS task as a sequence …

Запазване Позоваване С позовавания в 41 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] google.com

Etdnet: Efficient transformer-based detection network for surface defect detection

H Zhou, R Yang, R Hu, C Shu… - IEEE transactions on …, 2023 - ieeexplore.ieee.org

Deep learning (DL)-based surface defect detectors play a crucial role in ensuring product
quality during inspection processes. However, accurately and efficiently detecting defects …

Запазване Позоваване С позовавания в 36 Сродни статии Всички 6 версии

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

MambaTree: Tree Topology is All You Need in State Space Model

Y **ao, L Song, J Wang, S Song… - Advances in Neural …, 2025 - proceedings.neurips.cc

The state space models, employing recursively propagated features, demonstrate strong
representation capabilities comparable to Transformer models and superior efficiency …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient prompt tuning of large vision-language model for fine-grained ship classification

L Lan, F Wang, X Zheng, Z Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Remote-sensing fine-grained ship classification (RS-FGSC) poses a significant challenge
due to the high similarity between classes and the limited availability of labeled data, limiting …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 3 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Audio-free prompt tuning for language-audio models

Y Li, X Wang, H Liu - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org

Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associate audio features
with human language, making it a natural zero-shot classifier to recognize unseen sound …

Запазване Позоваване С позовавания в 10 Сродни статии Всички 3 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video object segmentation with dynamic query modulation

H Zhou, R Hu, X Li - 2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org

Storing intermediate frame segmentations as memory for long-range context modeling,
spatial-temporal memory-based methods have recently showcased impressive results in …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 7 версии

Multimodal Isotropic Neural Architecture with Patch Embedding

H Truchan, E Naumov, R Abedin, G Palmer… - … Conference on Neural …, 2023 - Springer

Patch embedding has been a significant advancement in Transformer-based models,
particularly the Vision Transformer (ViT), as it enables handling larger image sizes and …

Запазване Позоваване С позовавания в 2 Сродни статии Всички 2 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Semanticac: semantics-assisted framework for audio classification

Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection

Soc: Semantic-assisted object cluster for referring video object segmentation

Etdnet: Efficient transformer-based detection network for surface defect detection

MambaTree: Tree Topology is All You Need in State Space Model

Efficient prompt tuning of large vision-language model for fine-grained ship classification

Audio-free prompt tuning for language-audio models

Video object segmentation with dynamic query modulation

Multimodal Isotropic Neural Architecture with Patch Embedding