- Academic Search

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org

Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Simpan Kutip Dirujuk 1015 kali Artikel terkait 8 versi

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Adversarial text-to-image synthesis: A review

S Frolov, T Hinz, F Raue, J Hees, A Dengel - Neural Networks, 2021 - Elsevier

With the advent of generative adversarial networks, synthesizing images from text
descriptions has recently become an active research area. It is a flexible and intuitive way for …

Simpan Kutip Dirujuk 224 kali Artikel terkait 9 versi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

Simpan Kutip Dirujuk 489 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Beyond transmitting bits: Context, semantics, and task-oriented communications

D Gündüz, Z Qin, IE Aguerri, HS Dhillon… - IEEE Journal on …, 2022 - ieeexplore.ieee.org

Communication systems to date primarily aim at reliably communicating bit sequences.
Such an approach provides efficient engineering designs that are agnostic to the meanings …

Simpan Kutip Dirujuk 441 kali Artikel terkait 6 versi

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Simpan Kutip Dirujuk 125 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Semantic communications: Principles and challenges

Z Qin, X Tao, J Lu, W Tong, GY Li - arxiv preprint arxiv:2201.01389, 2021 - arxiv.org

Semantic communication, regarded as the breakthrough beyond the Shannon paradigm,
aims at the successful transmission of semantic information conveyed by the source rather …

Simpan Kutip Dirujuk 391 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Y Yao, A Zhang, Z Zhang, Z Liu, TS Chua, M Sun - AI Open, 2024 - Elsevier

Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …

Simpan Kutip Dirujuk 276 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Compositional chain-of-thought prompting for large multimodal models

C Mitra, B Huang, T Darrell… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The combination of strong visual backbones and Large Language Model (LLM) reasoning
has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range …

Simpan Kutip Dirujuk 70 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

S Wu, H Fei, H Zhang, TS Chua - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we investigate the task of text-to-image (T2I) synthesis under the abstract-to-
intricate setting, ie, generating intricate visual content from simple abstract text prompts …

Simpan Kutip Dirujuk 47 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Unbiased scene graph generation from biased training

K Tang, Y Niu, J Huang, J Shi… - Proceedings of the …, 2020 - openaccess.thecvf.com

Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …

Simpan Kutip Dirujuk 818 kali Artikel terkait 10 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Image retrieval using scene graphs

A comprehensive survey of deep learning for image captioning

[HTML][HTML] Adversarial text-to-image synthesis: A review

Diffusiondet: Diffusion model for object detection

Beyond transmitting bits: Context, semantics, and task-oriented communications

Transformer-based visual segmentation: A survey

Semantic communications: Principles and challenges

[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Compositional chain-of-thought prompting for large multimodal models

Imagine that! abstract-to-intricate text-to-image synthesis with scene graph hallucination diffusion

Unbiased scene graph generation from biased training