Fact: Frame-action cross-attention temporal modeling for efficient action segmentation

Z Lu, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We study supervised action segmentation whose goal is to predict framewise action labels
of a video. To capture temporal dependencies over long horizons prior works either improve …

Error detection in egocentric procedural task videos

SP Lee, Z Lu, Z Zhang, M Hoai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a new egocentric procedural error dataset containing videos with various types
of errors as well as normal videos and propose a new framework for procedural error …

Deep learning for surgical workflow analysis: a survey of progresses, limitations, and trends

Y Li, Z Zhao, R Li, F Li - Artificial Intelligence Review, 2024 - Springer
Automatic surgical workflow analysis, which aims to recognize the ongoing surgical events
in videos, is fundamental for develo** context-aware computer-assisted systems. This …

Progress-aware online action segmentation for egocentric procedural task videos

Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …

Ophclip: Hierarchical retrieval-augmented learning for ophthalmic surgical video-language pretraining

M Hu, K Yuan, Y Shen, F Tang, X Xu, L Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Surgical practice involves complex visual interpretation, procedural skills, and advanced
medical knowledge, making surgical vision-language pretraining (VLP) particularly …

Surgformer: Surgical transformer with hierarchical temporal attention for surgical phase recognition

S Yang, L Luo, Q Wang, H Chen - International Conference on Medical …, 2024 - Springer
Existing state-of-the-art methods for surgical phase recognition either rely on the extraction
of spatial-temporal features at a short-range temporal resolution or adopt the sequential …

AI solutions for overcoming delays in telesurgery and telementoring to enhance surgical practice and education

Y Li, N Raison, S Ourselin, T Mahmoodi… - Journal of robotic …, 2024 - Springer
Artificial intelligence (AI) has emerged as a transformative tool in surgery, particularly in
telesurgery and telementoring. However, its potential to enhance data transmission …

[HTML][HTML] On the pitfalls of Batch Normalization for end-to-end video learning: A study on surgical workflow analysis

D Rivoir, I Funke, S Speidel - Medical Image Analysis, 2024 - Elsevier
Batch Normalization's (BN) unique property of depending on other samples in a batch is
known to cause problems in several tasks, including sequential modeling. Yet, BN-related …

Tunes: A temporal u-net with self-attention for video-based surgical phase recognition

I Funke, D Rivoir, S Krell… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Objective: To enable context-aware computer assistance in the operating room of the future,
cognitive systems need to understand automatically which surgical phase is being …

MuST: Multi-scale T ransformers for Surgical Phase Recognition

A Pérez, S Rodríguez, N Ayobi, N Aparicio… - … Conference on Medical …, 2024 - Springer
Phase recognition in surgical videos is crucial for enhancing computer-aided surgical
systems as it enables automated understanding of sequential procedural stages. Existing …