Affiliation:
1. The Sverzhevskiy Otorhinolaryngology Healthcare Research Institute
2. The Russian National Research Medical University named after N.I. Pirogov
3. Rubedo LLC
Abstract
BACKGROUND: Timely and accurate diagnosis of the disease is the foundation for effective treatment strategies for the patient. The authors demonstrate in their study that otolaryngologists are incorrect in approximately one-quarter of their diagnoses, while general practitioners (internists, pediatricians, and paramedics) are incorrect in approximately one-half of their diagnoses. Consequently, this results in the emergence of complications, the chronicization of processes, an increase in treatment and rehabilitation time, a deterioration of the population’s ability to work, and a decline in patient confidence [1].
In the field of foreign medicine, artificial intelligence tools have been actively introduced in otorhinolaryngology. The most prevalent application of artificial intelligence in otorhinolaryngology is the use of computer vision as a tool for training and subsequently for the diagnosis and treatment of diseases of the ear, throat, and nose. According to the Ministry of Health of the Russian Federation, on average, more than 6% of the population of the country consults an otorhinolaryngologist annually with pathology of the external and middle ear. This aligns with the observation that approximately 9 million individuals require consultation with an otorhinolaryngologist on an annual basis. In otorhinolaryngology, images obtained from endoscopic examinations of patients (e.g., videolaryngoscopy) are used to train neural networks [2–4].
The development and introduction of technologies based on the application of artificial intelligence algorithms into clinical practice is one of the priorities of medical technology development and requires a careful and balanced approach to the development and training of such systems.
AIM: The study aimed to develop and train a neural network (artificial intelligence algorithms) to detect ear pathology from digital endoscopic images.
MATERIALS AND METHODS: The initial phase of our research involved the creation of a digital database comprising endoscopic photographs. For this purpose, endovideos of normal and pathologically altered tympanic membranes in an anonymized format were collected during a standard otosurgical appointment. The subsequent step was to establish a system of criteria for evaluating the images for subsequent annotation. A diagnostic tree of ear diseases based on visual features was constructed to develop a reasoning algorithm for identifying the condition (normal/pathological) of the external auditory canal and tympanic membrane. The subjective nature of image evaluation was mitigated by implementing a collegial approach in a consilium format.
In order to train the neural network, the research team performed, uploaded, and labeled 5,750 digital endoscopic images in JPEG format. A total of 750 images of the external auditory canal with an unaltered tympanic membrane were identified, while 5,000 images exhibited pathological alterations. The images were subsequently labeled in accordance with the established criteria for evaluating visual features, which were then used to assign the nosological status of the disease or norm.
RESULTS: The study yielded insights into the main metrics, namely specificity, accuracy, and sensitivity. The results of the values for 11 classes (normal and 10 different nosologies) revealed a considerable degree of variation in the metrics. The specificity metric exhibited a range of values from 0.846 to 0.982, while the accuracy metric demonstrated a similar range from 0.422 to 0.950. The sensitivity metric exhibited a narrower range of values, from 0.433 to 0.900.
CONCLUSIONS: This study demonstrates the potential for developing and training a neural network based on the application of artificial intelligence algorithms to assess the condition of the external auditory canal and tympanic membrane. In this case, the collection of high-quality images is not the sole crucial component; equally important is the competent annotation of data and the creation of a “tree of diagnoses” based on visual features. Further improvement of the accuracy of recognizing the main ear diseases can serve as the basis for the creation of a system of assistance in medical decision-making and provide direct assistance in practical medicine.