Google Наука

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arxiv preprint arxiv …, 2023 - arxiv.org

Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

Запазване Позоваване С позовавания в 56 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prefix-diffusion: A lightweight diffusion model for diverse image captioning

G Liu, Y Li, Z Fei, H Fu, X Luo, Y Guo - arxiv preprint arxiv:2309.04965, 2023 - arxiv.org

While impressive performance has been achieved in image captioning, the limited diversity
of the generated captions and the large parameter scale remain major barriers to the real …

Запазване Позоваване С позовавания в 11 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Attractive storyteller: Stylized visual storytelling with unpaired text

D Yang, Q ** - Proceedings of the 61st Annual Meeting of the …, 2023 - aclanthology.org

Most research on stylized image captioning aims to generate style-specific captions using
unpaired text, and has achieved impressive performance for simple styles like positive and …

Запазване Позоваване С позовавания в 8 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Character-Centric Creative Story Generation via Imagination

K Park, M Kim, K Jung - arxiv preprint arxiv:2409.16667, 2024 - arxiv.org

Creative story generation has long been a goal of NLP research. While existing
methodologies have aimed to generate long and coherent stories, they fall significantly short …

Запазване Позоваване Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Which one are you referring to? multimodal object identification in situated dialogue

H Lovenia, S Cahyawijaya, P Fung - arxiv preprint arxiv:2302.14680, 2023 - arxiv.org

The demand for multimodal dialogue systems has been rising in various domains,
emphasizing the importance of interpreting multimodal inputs from conversational and …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VScript: Controllable script generation with visual presentation

Z Ji, Y Xu, I Cheng, S Cahyawijaya, R Frieske… - arxiv preprint arxiv …, 2022 - arxiv.org

In order to offer a customized script tool and inspire professional scriptwriters, we present
VScript. It is a controllable pipeline that generates complete scripts, including dialogues and …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 4 версии Във вид на HTML

Style-unaware meta-learning for generalizable person re-identification

J Shao, P Cai - Journal of Electronic Imaging, 2024 - spiedigitallibrary.org

Due to the influence of domain bias, domain generalization person re-identification models
are not capable of generalizing well on unseen domains. The style factor is a critical factor …

Запазване Позоваване Сродни статии Всички 4 версии

Visualizing the Unseen: Arabic Image-to-Story Generation Using Deep Learning Techniques

E Saleh, C Sabty - Pacific Rim International Conference on Artificial …, 2024 - Springer

Images are integral to our digital experiences, and combining visual elements with verbal
storytelling is crucial. While English image captioning has progressed significantly, Arabic …

Запазване Позоваване Сродни статии Всички 2 версии

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

Prefix-diffusion: A lightweight diffusion model for diverse image captioning

Attractive storyteller: Stylized visual storytelling with unpaired text

A Character-Centric Creative Story Generation via Imagination

Which one are you referring to? multimodal object identification in situated dialogue

VScript: Controllable script generation with visual presentation

Style-unaware meta-learning for generalizable person re-identification

Visualizing the Unseen: Arabic Image-to-Story Generation Using Deep Learning Techniques