A review of deep learning for video captioning
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …
contributions from domains such as computer vision, natural language processing …
Video description: A comprehensive survey of deep learning approaches
Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
understanding into automatic textual narration. It bridges the key AI fields of computer vision …
Adapt: Action-aware driving caption transformer
End-to-end autonomous driving has great potential in the transportation industry. However,
the lack of transparency and interpretability of the automatic decision-making process …
the lack of transparency and interpretability of the automatic decision-making process …
Video captioning using global-local representation
Video captioning is a challenging task as it needs to accurately transform visual
understanding into natural language description. To date, state-of-the-art methods …
understanding into natural language description. To date, state-of-the-art methods …
Exploring group video captioning with efficient relational approximation
Current video captioning efforts most focus on describing a single video while the need for
captioning videos in groups has increased considerably. In this study, we propose a new …
captioning videos in groups has increased considerably. In this study, we propose a new …
Refined semantic enhancement towards frequency diffusion for video captioning
Video captioning aims to generate natural language sentences that describe the given video
accurately. Existing methods obtain favorable generation by exploring richer visual …
accurately. Existing methods obtain favorable generation by exploring richer visual …
TAVT: Towards Transferable Audio-Visual Text Generation
Audio-visual text generation aims to understand multi-modality contents and translate them
into texts. Although various transfer learning techniques of text generation have been …
into texts. Although various transfer learning techniques of text generation have been …
Dyadformer: A multi-modal transformer for long-range modeling of dyadic interactions
Personality computing has become an emerging topic in computer vision, due to the wide
range of applications it can be used for. However, most works on the topic have focused on …
range of applications it can be used for. However, most works on the topic have focused on …
Evolution of visual data captioning Methods, Datasets, and evaluation Metrics: A comprehensive survey
Abstract Automatic Visual Captioning (AVC) generates syntactically and semantically correct
sentences by describing important objects, attributes, and their relationships with each other …
sentences by describing important objects, attributes, and their relationships with each other …
Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods
Generating an image/video caption has always been a fundamental problem of Artificial
Intelligence, which is usually performed using the potential of Deep Learning Methods …
Intelligence, which is usually performed using the potential of Deep Learning Methods …