Video understanding with large language models: A survey
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …
content, the demand for proficient video understanding tools has intensified markedly. Given …
A survey on multimodal benchmarks: In the era of large ai models
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …
advancements in artificial intelligence, significantly enhancing the capability to understand …
Progress-Aware Video Frame Captioning
While image captioning provides isolated descriptions for individual images, and video
captioning offers one single narrative for an entire video clip, our work explores an important …
captioning offers one single narrative for an entire video clip, our work explores an important …
Describe Now: User-Driven Audio Description for Blind and Low Vision Individuals
Audio descriptions (AD) make videos accessible for blind and low vision (BLV) users by
describing visual elements that cannot be understood from the main audio track. AD created …
describing visual elements that cannot be understood from the main audio track. AD created …
SemiKong: Curating, Training, and Evaluating A Semiconductor Industry-Specific Large Language Model
Large Language Models (LLMs) have demonstrated the potential to address some issues
within the semiconductor industry. However, they are often general-purpose models that …
within the semiconductor industry. However, they are often general-purpose models that …
Coherent Physical Commonsense Reasoning in Foundational Language Models
S Storks - 2024 - deepblue.lib.umich.edu
Recent years in natural language processing (NLP) research have seen a paradigm shift
toward foundational language models (LMs), which are self-supervised, transformer-based …
toward foundational language models (LMs), which are self-supervised, transformer-based …
User-Driven Automated Audio Description to Enhance Video Accessibility for Blind and Low Vision Users
MS Cheema - 2024 - search.proquest.com
Audio descriptions (AD) make videos accessible for blind and low vision (BLV) users by
describing visual elements that cannot be understood from the main audio track. AD created …
describing visual elements that cannot be understood from the main audio track. AD created …
[PDF][PDF] STATE-AWARE OBJECT UNDERSTANDING
NW Nguyen - 2024 - nguyennm1024.github.io
The advent of potent multimodal large language models alongside expansive datasets has
markedly advanced visual understanding tasks. While the bulk of research in this domain …
markedly advanced visual understanding tasks. While the bulk of research in this domain …