A survey of video-based human action recognition in team sports
Over the past few decades, numerous studies have focused on identifying and recognizing
human actions using machine learning and computer vision techniques. Video-based …
human actions using machine learning and computer vision techniques. Video-based …
Medical video generation for disease progression simulation
Modeling disease progression is crucial for improving the quality and efficacy of clinical
diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image …
diagnosis and prognosis, but it is often hindered by a lack of longitudinal medical image …
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation
Current video generation models excel at generating short clips but still struggle with
creating multi-shot, movie-like videos. Existing models trained on large-scale data on the …
creating multi-shot, movie-like videos. Existing models trained on large-scale data on the …
Video-to-Audio Generation with Fine-grained Temporal Semantics
With recent advances of AIGC, video generation have gained a surge of research interest in
both academia and industry (eg, Sora). However, it remains a challenge to produce …
both academia and industry (eg, Sora). However, it remains a challenge to produce …
A Survey of Sustainability in Large Language Models: Applications, Economics, and Challenges
Large Language Models (LLMs) have transformed numerous domains by providing
advanced capabilities in natural language understanding, generation, and reasoning …
advanced capabilities in natural language understanding, generation, and reasoning …
Memory-enhanced hierarchical transformer for video paragraph captioning
B Zhang, J Gao, Y Yuan - Neurocomputing, 2025 - Elsevier
Video paragraph captioning aims to describe a video that contains multiple events with a
paragraph of generated coherent sentences. Such a captioning task is full of challenges …
paragraph of generated coherent sentences. Such a captioning task is full of challenges …
Dual Variational Knowledge Attention for Class Incremental Vision Transformer
Class incremental learning (CIL) strives to emulate the human cognitive process of
continuously learning and adapting to new tasks while retaining knowledge from past …
continuously learning and adapting to new tasks while retaining knowledge from past …
Adversarial Attacks Against Shared Knowledge Interpretation in Semantic Communications
VT Hoang, VL Nguyen, RG Chang… - IEEE Transactions …, 2025 - ieeexplore.ieee.org
Semantic communications (SEMCOM) is a novel communication model that exploits neural
networks or deep learning techniques to convey the semantics of the data and contextual …
networks or deep learning techniques to convey the semantics of the data and contextual …
Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
Z Qian, A Sharifi, T Carroll, SN Lim - arxiv preprint arxiv:2411.18644, 2024 - arxiv.org
Video generation has achieved impressive quality, but it still suffers from artifacts such as
temporal inconsistency and violation of physical laws. Leveraging 3D scenes can …
temporal inconsistency and violation of physical laws. Leveraging 3D scenes can …
Seeing World Dynamics in a Nutshell
We consider the problem of efficiently representing casually captured monocular videos in a
spatially-and temporally-coherent manner. While existing approaches predominantly rely on …
spatially-and temporally-coherent manner. While existing approaches predominantly rely on …