Google Наука

TJ Fu, W Hu, X Du, WY Wang, Y Yang… - arxiv preprint arxiv …, 2023 - arxiv.org

Instruction-based image editing improves the controllability and flexibility of image
manipulation via natural commands without elaborate descriptions or regional masks …

Запазване Позоваване С позовавания в 95 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards semantic equivalence of tokenization in multimodal llm

S Wu, H Fei, X Li, J Ji, H Zhang, TS Chua… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in
processing vision-language tasks. One of the crux of MLLMs lies in vision tokenization …

Запазване Позоваване С позовавания в 48 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dccf: Deep comprehensible color filter learning framework for high-resolution image harmonization

B Xue, S Ran, Q Chen, R Jia, B Zhao… - European conference on …, 2022 - Springer

Image color harmonization algorithm aims to automatically match the color distribution of
foreground and background images captured in different conditions. Previous deep learning …

Запазване Позоваване С позовавания в 67 Сродни статии Всички 9 версии

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Towards generic image manipulation detection with weakly-supervised self-consistency learning

Y Zhai, T Luan, D Doermann… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

As advanced image manipulation techniques emerge, detecting the manipulation becomes
increasingly important. Despite the success of recent learning-based approaches for image …

Запазване Позоваване С позовавания в 24 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Text-to-image cross-modal generation: A systematic review

M Żelaszczyk, J Mańdziuk - arxiv preprint arxiv:2401.11631, 2024 - arxiv.org

We review research on generating visual data from text from the angle of" cross-modal
generation." This point of view allows us to draw parallels between various methods geared …

Запазване Позоваване С позовавания в 9 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Auto-encoding morph-tokens for multimodal llm

K Pan, S Tang, J Li, Z Fan, W Chow, S Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation
(visual output) presents an ongoing challenge. This is due to a conflicting objective: for …

Запазване Позоваване С позовавания в 14 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] A review of multi-modal learning from the text-guided visual processing viewpoint

U Ullah, JS Lee, CH An, H Lee, SY Park, RH Baek… - Sensors, 2022 - mdpi.com

For decades, co-relating different data domains to attain the maximum potential of machines
has driven research, especially in neural networks. Similarly, text and visual data (images …

Запазване Позоваване С позовавания в 8 Сродни статии Всички 7 версии Кеширана версия

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Language-guided global image editing via cross-modal cyclic mechanism

W Jiang, N Xu, J Wang, C Gao, J Shi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Editing an image automatically via a linguistic request can significantly save laborious
manual work and is friendly to photography novice. In this paper, we focus on the task of …

Запазване Позоваване С позовавания в 29 Сродни статии Всички 6 версии Във вид на HTML

A regionally indicated visual grounding network for remote sensing images

R Hang, S Xu, Q Liu - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org

Visual grounding (VG) is essential to promote the human-computer interaction in object
detection tasks. Most of the current VG methods mainly focus on grounding the target objects …

Запазване Позоваване С позовавания в 3 Сродни статии Всички 2 версии

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Ls-gan: iterative language-based image manipulation via long and short term consistency reasoning

G Cong, L Li, Z Liu, Y Tu, W Qin, S Zhang… - Proceedings of the 30th …, 2022 - dl.acm.org

Iterative language-based image manipulation aims to edit images step by step according to
user's linguistic instructions. The existing methods mostly focus on aligning the attributes …

Запазване Позоваване С позовавания в 15 Сродни статии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Learning by planning: Language-guided global image editing

Guiding instruction-based image editing via multimodal large language models

Towards semantic equivalence of tokenization in multimodal llm

Dccf: Deep comprehensible color filter learning framework for high-resolution image harmonization

Towards generic image manipulation detection with weakly-supervised self-consistency learning

Text-to-image cross-modal generation: A systematic review

Auto-encoding morph-tokens for multimodal llm

[HTML][HTML] A review of multi-modal learning from the text-guided visual processing viewpoint

Language-guided global image editing via cross-modal cyclic mechanism

A regionally indicated visual grounding network for remote sensing images

Ls-gan: iterative language-based image manipulation via long and short term consistency reasoning