A short review on supervised machine learning and deep learning techniques in computer vision
AA Nafea, SA Alameri, RR Majeed… - … Journal of Machine …, 2024 - mesopotamian.press
In last years, computer vision has shown important advances, mainly using the application of
supervised machine learning (ML) and deep learning (DL) techniques. The objective of this …
supervised machine learning (ML) and deep learning (DL) techniques. The objective of this …
Video description: A survey of methods, datasets, and evaluation metrics
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …
the contents of a given video. It has applications in human-robot interaction, hel** the …
Video recap: Recursive captioning of hour-long videos
Most video captioning models are designed to process short video clips of few seconds and
output text describing low-level visual concepts (eg objects scenes atomic actions). However …
output text describing low-level visual concepts (eg objects scenes atomic actions). However …
Howto100m: Learning a text-video embedding by watching hundred million narrated video clips
Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …
provided captions. However, such datasets are expensive and time consuming to create and …
Object relational graph with teacher-recommended learning for video captioning
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …
video captioning task. Existing models lack adequate visual representation due to the …
Videos as space-time region graphs
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …
important cues: modeling temporal shape dynamics and modeling functional relationships …
Hierarchical conditional relation networks for video question answering
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …
STAT: Spatial-temporal attention mechanism for video captioning
Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …
summarize the video contents. Inspired by the visual attention mechanism of human beings …
Video question answering via gradually refined attention over appearance and motion
Recently image question answering (ImageQA) has gained lots of attention in the research
community. However, as its natural extension, video question answering (VideoQA) is less …
community. However, as its natural extension, video question answering (VideoQA) is less …
Video captioning with attention-based LSTM and semantic consistency
Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …
motivated the exploration of their applications for video captioning. By taking a video as a …