Google Наука

D Xu, W Chen, W Peng, C Zhang, T Xu, X Zhao… - Frontiers of Computer …, 2024 - Springer

Abstract Information Extraction (IE) aims to extract structural knowledge from plain natural
language texts. Recently, generative Large Language Models (LLMs) have demonstrated …

Запазване Позоваване С позовавания в 145 Сродни статии Всички 9 версии

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Phi-3 technical report: A highly capable language model locally on your phone

M Abdin, J Aneja, H Awadalla, A Awadallah… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …

Запазване Позоваване С позовавания в 878 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llava-onevision: Easy visual task transfer

B Li, Y Zhang, D Guo, R Zhang, F Li, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed
by consolidating our insights into data, models, and visual representations in the LLaVA …

Запазване Позоваване С позовавания в 306 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arxiv preprint arxiv …, 2024 - arxiv.org

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …

Запазване Позоваване С позовавания в 83 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

mplug-owl3: Towards long image-sequence understanding in multi-modal large language models

J Ye, H Xu, H Liu, A Hu, M Yan, Q Qian… - The Thirteenth …, 2024 - openreview.net

Multi-modal Large Language Models have demonstrated remarkable capabilities in
executing instructions for a variety of single-image tasks. Despite this progress, significant …

Запазване Позоваване С позовавания в 60 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video instruction tuning with synthetic data

Y Zhang, J Wu, W Li, B Li, Z Ma, Z Liu, C Li - arxiv preprint arxiv …, 2024 - arxiv.org

The development of video large multimodal models (LMMs) has been hindered by the
difficulty of curating large amounts of high-quality raw data from the web. To address this, we …

Запазване Позоваване С позовавания в 47 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Запазване Позоваване С позовавания в 63 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Longvila: Scaling long-context visual language models for long videos

F Xue, Y Chen, D Li, Q Hu, L Zhu, X Li, Y Fang… - arxiv preprint arxiv …, 2024 - arxiv.org

Long-context capability is critical for multi-modal foundation models, especially for long
video understanding. We introduce LongVILA, a full-stack solution for long-context visual …

Запазване Позоваване С позовавания в 34 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Kangaroo: A powerful video-language model supporting long-context video input

J Liu, Y Wang, H Ma, X Wu, X Ma, X Wei, J Jiao… - arxiv preprint arxiv …, 2024 - arxiv.org

Rapid advancements have been made in extending Large Language Models (LLMs) to
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …

Запазване Позоваване С позовавания в 34 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vita: Towards open-source interactive omni multimodal llm

C Fu, H Lin, Z Long, Y Shen, M Zhao, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The remarkable multimodal capabilities and interactive experience of GPT-4o underscore
their necessity in practical applications, yet open-source models rarely excel in both areas …

Запазване Позоваване С позовавания в 52 Сродни статии Всички 3 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis

Large language models for generative information extraction: A survey

Phi-3 technical report: A highly capable language model locally on your phone

Llava-onevision: Easy visual task transfer

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

mplug-owl3: Towards long image-sequence understanding in multi-modal large language models

Video instruction tuning with synthetic data

Video understanding with large language models: A survey

Longvila: Scaling long-context visual language models for long videos

Kangaroo: A powerful video-language model supporting long-context video input

Vita: Towards open-source interactive omni multimodal llm