Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos

Author:

AlDahoul NouarORCID,Karim Hezerul Abdul,Momo Mhd Adel,Tan Myles Joshua Toledo,Fermin Jamie Ledesma

Abstract

AbstractLaparoscopic videos are tools used by surgeons to insert narrow tubes into the abdomen and keep the skin without large incisions. The videos captured by a camera are prone to numerous distortions such as uneven illumination, motion blur, defocus blur, smoke, and noise which have impact on visual quality. Automatic detection and identification of distortions are significant to enhance the quality of laparoscopic videos to avoid errors during surgery. The video quality assessment includes two stages: classification of distortions affecting the video frames to identify their types and ranking of distortions to estimate the intensity levels. The dataset generated in ICIP2020 challenge including laparoscopic videos was utilized for training, validation, and testing the proposed solution. The difficulty of this dataset is caused by having five categories of distortions and four levels of severity. Additionally, the availability of multiple distortion categories in one video is considered the most challenging part of this dataset. The work presented in this paper contributes to solve the multi-label distortion classification and ranking problem. This paper aims to enhance the performance of distortion classification solutions. Vision transformer which is a deep learning model was used to extract informative features by transferring learning and representation from the general domain to the medical domain (laparoscopic videos). Additionally, six parallel multilayer perceptron (MLP) classifiers were added and attached to vision transformer for distortion classification and ranking. The experiment showed that the proposed solution outperforms existing distortion classification methods in terms of average accuracy (89.7%), average single distortion F1 score (94.18%), and average of both single and multiple distortions F1 score (96.86%). Moreover, it can also rank the distortions with an average accuracy of 79.22% and average F1 score of 78.44%. Hence, the high performance of the method proposed in this paper opens the door to integrate our solution in the intelligent video enhancement system.

Funder

Multimedia University

Publisher

Springer Science and Business Media LLC

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3