- Academic Search

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Simpan Kutip Dirujuk 254 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Simpan Kutip Dirujuk 79 kali Artikel terkait

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama-adapter v2: Parameter-efficient visual instruction model

P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou… - arxiv preprint arxiv …, 2023 - arxiv.org

How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …

Simpan Kutip Dirujuk 542 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Simpan Kutip Dirujuk 219 kali Artikel terkait 6 versi Pencarian Perpustakaan Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

One small step for generative ai, one giant leap for agi: A complete survey on chatgpt in aigc era

C Zhang, C Zhang, C Li, Y Qiao, S Zheng… - arxiv preprint arxiv …, 2023 - arxiv.org

OpenAI has recently released GPT-4 (aka ChatGPT plus), which is demonstrated to be one
small step for generative AI (GAI), but one giant leap for artificial general intelligence (AGI) …

Simpan Kutip Dirujuk 172 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Frozen in time: A joint video and image encoder for end-to-end retrieval

M Bain, A Nagrani, G Varol… - Proceedings of the …, 2021 - openaccess.thecvf.com

Our objective in this work is video-text retrieval-in particular a joint embedding that enables
efficient text-to-video retrieval. The challenges in this area include the design of the visual …

Simpan Kutip Dirujuk 1177 kali Artikel terkait 12 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

HallusionBench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models

T Guan, F Liu, X Wu, R **an, Z Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce" HallusionBench" a comprehensive benchmark designed for the evaluation of
image-context reasoning. This benchmark presents significant challenges to advanced large …

Simpan Kutip Dirujuk 149 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Large language models are visual reasoning coordinators

L Chen, B Li, S Shen, J Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc

Visual reasoning requires multimodal perception and commonsense cognition of the world.
Recently, multiple vision-language models (VLMs) have been proposed with excellent …

Simpan Kutip Dirujuk 36 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

T Guan, F Liu, X Wu, R **an, Z Li, X Liu… - arxiv preprint arxiv …, 2023 - researchgate.net

Large language models (LLMs), after being aligned with vision models and integrated into
vision-language models (VLMs), can bring impressive improvement in image reasoning …

Simpan Kutip Dirujuk 113 kali Artikel terkait Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

PP Liang, A Zadeh, LP Morency - arxiv preprint arxiv:2209.03430, 2022 - arxiv.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Simpan Kutip Dirujuk 153 kali Artikel terkait 2 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge

Parameter-efficient fine-tuning for large models: A comprehensive survey

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Llama-adapter v2: Parameter-efficient visual instruction model

Multimodal foundation models: From specialists to general-purpose assistants

One small step for generative ai, one giant leap for agi: A complete survey on chatgpt in aigc era

Frozen in time: A joint video and image encoder for end-to-end retrieval

HallusionBench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models

Large language models are visual reasoning coordinators

[PDF][PDF] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions