Gengyuan Zhang

Citace

	Všechny	Od 2020
Citace	210	210
h-index	5	5
i10-index	4	4

160

120

20222023202420258 31 153 16

Veřejný přístup

Zobrazit všechny

2 články

0 článků

dostupné

nedostupné

Vychází ze zplnomocnění pro financování

Spoluautoři

Volker TrespLudwig-Maximilians-Universität München (LMU Munich)E-mailová adresa ověřena na: dbs.ifi.lmu.de
Jindong GuGoogle Research & DeepMind, University of OxfordE-mailová adresa ověřena na: robots.ox.ac.uk
Zhen HanAmazon Web ServicesE-mailová adresa ověřena na: amazon.com

Sledovat

Gengyuan Zhang

Ludwig-Maximilians-Universität München

E-mailová adresa ověřena na: dbs.ifi.lmu.de - Domovská stránka

Multimodal learning Video Understanding Vision-Language Model


Název Seřadit podle citací Seřadit podle roku Seřadit podle názvu	Citace Citace	Rok
A systematic survey of prompt engineering on vision-language foundation models J Gu, Z Han, S Chen, A Beirami, B He, G Zhang, R Liao, Y Qin, V Tresp, ... arXiv preprint arXiv:2307.12980, 2023	140	2023
Time-dependent entity embedding is not all you need: A re-evaluation of temporal knowledge graph completion models under a unified framework Z Han, G Zhang, Y Ma, V Tresp Proceedings of the 2021 Conference on Empirical Methods in Natural Language …, 2021	23	2021
Multi-event Video-Text Retrieval G Zhang, J Ren, J Gu, V Tresp Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	12	2023
Cl-crossvqa: A continual learning benchmark for cross-domain visual question answering Y Zhang, H Chen, A Frikha, Y Yang, D Krompass, G Zhang, J Gu, V Tresp arXiv preprint arXiv:2211.10567, 2022	11	2022
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning G Zhang, Y Zhang, K Zhang, V Tresp Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024	9	2024
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs R Liao, M Erler, H Wang, G Zhai, G Zhang, Y Ma, V Tresp arXiv preprint arXiv:2409.20365, 2024	5	2024
A systematic survey of prompt engineering on vision-language foundation models. arXiv J Gu, Z Han, S Chen, A Beirami, B He, G Zhang, P Torr arXiv preprint arXiv:2307.12980, 2023	5	2023
Multimodal pragmatic jailbreak on text-to-image models T Liu, Z Lai, G Zhang, P Torr, V Demberg, V Tresp, J Gu arXiv preprint arXiv:2409.19149, 2024	3	2024
Localizing Events in Videos with Multimodal Queries G Zhang, MLA Fok, Y Xia, Y Tang, D Cremers, P Torr, V Tresp, J Gu arXiv preprint arXiv:2406.10079, 2024	1	2024
SPOT! Revisiting Video-Language Models for Event Understanding G Zhang, J Bi, J Gu, Y Chen, V Tresp arXiv preprint arXiv:2311.12919, 2023	1	2023
Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries R Amoroso, G Zhang, R Koner, L Baraldi, R Cucchiara, V Tresp Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2025		2025
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models H Chen, H Li, Y Zhang, G Zhang, J Bi, P Torr, J Gu, D Krompass, V Tresp arXiv preprint arXiv:2410.04810, 2024		2024
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning Supplementary Materials G Zhang, Y Zhang, K Zhang, V Tresp, AD WikiTiLo Middle East 11, 16, 0

Systém momentálně nemůže danou operaci provést. Zkuste to znovu později.

Články 1–13

Citace za rok

Duplicitní citace

Sloučené citace

Přidat spoluautorySpoluautoři

Sledovat

Citace

Spoluautoři