Google Tudós

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Mentés Hivatkozás Idézetek száma: 48 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural prompt search

Y Zhang, K Zhou, Z Liu - IEEE Transactions on Pattern Analysis …, 2024 - ieeexplore.ieee.org

The size of vision models has grown exponentially over the last few years, especially after
the emergence of Vision Transformer. This has motivated the development of parameter …

Mentés Hivatkozás Idézetek száma: 343 Kapcsolódó cikkek Mind a(z) 10 változat

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

V3det: Vast vocabulary visual detection dataset

J Wang, P Zhang, T Chu, Y Cao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent advances in detecting arbitrary objects in the real world are trained and evaluated
on object detection datasets with a relatively restricted vocabulary. To facilitate the …

Mentés Hivatkozás Idézetek száma: 56 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

T-rex2: Towards generic object detection via text-visual prompt synergy

Q Jiang, F Li, Z Zeng, T Ren, S Liu, L Zhang - European Conference on …, 2024 - Springer

We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …

Mentés Hivatkozás Idézetek száma: 24 Kapcsolódó cikkek Mind a(z) 4 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures

Y Duan, W Wang, Z Chen, X Zhu, L Lu, T Lu… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers have revolutionized computer vision and natural language processing, but
their high computational complexity limits their application in high-resolution image …

Mentés Hivatkozás Idézetek száma: 38 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Octavius: Mitigating task interference in mllms via lora-moe

Z Chen, Z Wang, Z Wang, H Liu, Z Yin, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent studies have demonstrated Large Language Models (LLMs) can extend their zero-
shot generalization capabilities to multimodal learning through instruction tuning. As more …

Mentés Hivatkozás Idézetek száma: 28 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

RADAM: Texture recognition through randomized aggregated encoding of deep activation maps

L Scabini, KM Zielinski, LC Ribas, WN Gonçalves… - Pattern Recognition, 2023 - Elsevier

Texture analysis is a classical yet challenging task in computer vision for which deep neural
networks are actively being applied. Most approaches are based on building feature …

Mentés Hivatkozás Idézetek száma: 23 Kapcsolódó cikkek Mind a(z) 8 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open long-tailed recognition in a dynamic world

Z Liu, Z Miao, X Zhan, J Wang, B Gong… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Real world data often exhibits a long-tailed and open-ended (ie, with unseen classes)
distribution. A practical recognition system must balance between majority (head) and …

Mentés Hivatkozás Idézetek száma: 24 Kapcsolódó cikkek Mind a(z) 10 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Benchmarking omni-vision representation through the lens of visual realms

Y Zhang, Z Yin, J Shao, Z Liu - European Conference on Computer Vision, 2022 - Springer

Though impressive performance has been achieved in specific visual realms (eg faces,
dogs, and places), an omni-vision representation generalizing to many natural visual …

Mentés Hivatkozás Idézetek száma: 27 Kapcsolódó cikkek Mind a(z) 6 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Chef: A comprehensive evaluation framework for standardized assessment of multimodal large language models

Z Shi, Z Wang, H Fan, Z Yin, L Sheng, Y Qiao… - arxiv preprint arxiv …, 2023 - arxiv.org

Multimodal Large Language Models (MLLMs) have shown impressive abilities in interacting
with visual content with myriad potential downstream tasks. However, even though a list of …

Mentés Hivatkozás Idézetek száma: 9 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Bamboo: Building mega-scale vision dataset continually with human-machine synergy

Knowledge graphs meet multi-modal learning: A comprehensive survey

Neural prompt search

V3det: Vast vocabulary visual detection dataset

T-rex2: Towards generic object detection via text-visual prompt synergy

Vision-rwkv: Efficient and scalable visual perception with rwkv-like architectures

Octavius: Mitigating task interference in mllms via lora-moe

RADAM: Texture recognition through randomized aggregated encoding of deep activation maps

Open long-tailed recognition in a dynamic world

Benchmarking omni-vision representation through the lens of visual realms

Chef: A comprehensive evaluation framework for standardized assessment of multimodal large language models