Unravelling the impact of generative artificial intelligence (GAI) in industrial applications: A review of scientific and grey literature
The scope of application of generative artificial intelligence (GAI) in industrial functions is
gaining high prominence in academic and industrial discourses. In this article, we explore …
gaining high prominence in academic and industrial discourses. In this article, we explore …
Natural language processing (NLP) in management research: A literature review
Natural language processing (NLP) is gaining momentum in management research for its
ability to automatically analyze and comprehend human language. Yet, despite its extensive …
ability to automatically analyze and comprehend human language. Yet, despite its extensive …
End-to-end generative pretraining for multimodal video captioning
Recent video and language pretraining frameworks lack the ability to generate sentences.
We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining …
We present Multimodal Video Generative Pretraining (MV-GPT), a new pretraining …
End-to-end dense video captioning with parallel decoding
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …
Spatio-temporal graph for video captioning with knowledge distillation
Video captioning is a challenging task that requires a deep understanding of visual scenes.
State-of-the-art methods generate captions using either scene-level or object-level …
State-of-the-art methods generate captions using either scene-level or object-level …
Localizing moments in video with natural language
We consider retrieving a specific temporal segment, or moment, from a video given a natural
language text description. Methods designed to retrieve whole video clips with natural …
language text description. Methods designed to retrieve whole video clips with natural …
Msr-vtt: A large video description dataset for bridging video and language
While there has been increasing interest in the task of describing video with natural
language, current computer vision algorithms are still severely limited in terms of the …
language, current computer vision algorithms are still severely limited in terms of the …
Visual relationship detection with language priors
Visual relationships capture a wide variety of interactions between pairs of objects in images
(eg “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible …
(eg “man riding bicycle” and “man pushing bicycle”). Consequently, the set of possible …
Clip4caption: Clip for video caption
Video captioning is a challenging task since it requires generating sentences describing
various diverse and complex videos. Existing video captioning models lack adequate visual …
various diverse and complex videos. Existing video captioning models lack adequate visual …
Single-shot multi-person 3d pose estimation from monocular rgb
We propose a new single-shot method for multi-person 3D pose estimation in general
scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose …
scenes from a monocular RGB camera. Our approach uses novel occlusion-robust pose …