A short review on supervised machine learning and deep learning techniques in computer vision

AA Nafea, SA Alameri, RR Majeed… - … Journal of Machine …, 2024 - mesopotamian.press
In last years, computer vision has shown important advances, mainly using the application of
supervised machine learning (ML) and deep learning (DL) techniques. The objective of this …

Video description: A survey of methods, datasets, and evaluation metrics

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …

Video recap: Recursive captioning of hour-long videos

MM Islam, N Ho, X Yang, T Nagarajan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Most video captioning models are designed to process short video clips of few seconds and
output text describing low-level visual concepts (eg objects scenes atomic actions). However …

Object relational graph with teacher-recommended learning for video captioning

Z Zhang, Y Shi, C Yuan, B Li, P Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

Hierarchical conditional relation networks for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

Video question answering via gradually refined attention over appearance and motion

D Xu, Z Zhao, J **ao, F Wu, H Zhang, X He… - Proceedings of the 25th …, 2017 - dl.acm.org
Recently image question answering (ImageQA) has gained lots of attention in the research
community. However, as its natural extension, video question answering (VideoQA) is less …

Video captioning with attention-based LSTM and semantic consistency

L Gao, Z Guo, H Zhang, X Xu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …