- Academic Search

C Chen, K Shu - AI Magazine, 2024‏ - Wiley Online Library‏

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …‏

שמור צטט צוטט על ידי 140 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Human action recognition: A taxonomy-based survey, updates, and opportunities‏

MG Morshed, T Sultana, A Alam, YK Lee - Sensors, 2023‏ - mdpi.com‏

Human action recognition systems use data collected from a wide range of sensors to
accurately identify and interpret human actions. One of the most challenging issues for …‏

שמור צטט צוטט על ידי 93 מאמרים בנושא זה כל 13 הגרסאות במטמון

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Mvbench: A comprehensive multi-modal video understanding benchmark‏

K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

With the rapid development of Multi-modal Large Language Models (MLLMs) a number of
diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities …‏

שמור צטט צוטט על ידי 280 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videochat: Chat-centric video understanding‏

KC Li, Y He, Y Wang, Y Li, W Wang, P Luo… - ar** an end-to-end chat-centric video
understanding system, coined as VideoChat. It integrates video foundation models and …‏

שמור צטט צוטט על ידי 596 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Video-chatgpt: Towards detailed video understanding via large vision and language models‏

M Maaz, H Rasheed, S Khan, FS Khan - arxiv preprint arxiv:2306.05424, 2023‏ - arxiv.org‏

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to
interact with visual data. While there have been initial attempts for image-based …‏

שמור צטט צוטט על ידי 602 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Scaling vision transformers to 22 billion parameters‏

M Dehghani, J Djolonga, B Mustafa… - International …, 2023‏ - proceedings.mlr.press‏

The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …‏

שמור צטט צוטט על ידי 589 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks‏

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …‏

שמור צטט צוטט על ידי 188 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Panda-70m: Captioning 70m videos with multiple cross-modality teachers‏

TS Chen, A Siarohin, W Menapace… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

The quality of the data and annotation upper-bounds the quality of a downstream model.
While there exist large text corpora and image-text pairs high-quality video-text data is much …‏

שמור צטט צוטט על ידי 144 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding‏

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024‏ - Springer‏

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …‏

שמור צטט צוטט על ידי 161 מאמרים בנושא זה כל 7 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Egoschema: A diagnostic benchmark for very long-form video language understanding‏

K Mangalam, R Akshulakov… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

We introduce EgoSchema, a very long-form video question-answering dataset, and
benchmark to evaluate long video understanding capabilities of modern vision and …‏

שמור צטט צוטט על ידי 186 מאמרים בנושא זה כל 6 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Internvideo: General video foundation models via generative and discriminative learning

Combating misinformation in the age of llms: Opportunities and challenges‏

Human action recognition: A taxonomy-based survey, updates, and opportunities‏

Mvbench: A comprehensive multi-modal video understanding benchmark‏

Videochat: Chat-centric video understanding‏

Video-chatgpt: Towards detailed video understanding via large vision and language models‏

Scaling vision transformers to 22 billion parameters‏

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks‏

Panda-70m: Captioning 70m videos with multiple cross-modality teachers‏

Videomamba: State space model for efficient video understanding‏

Egoschema: A diagnostic benchmark for very long-form video language understanding‏