Affiliation:
1. Russian Scientific Center of Roentgenoradiology
2. Expert Institute
3. United IT Space LLC
4. “Expert” Group of Companies
5. Burdenko Voronezh State Medical University
6. Remedy Logic
7. Decision Support Systems LLC
Abstract
Objective: comparative evaluation of output data of a set of trained convolutional neural network (CNN) models and interpretation of pathological changes in lumbar spine by radiologists during magnetic resonance imaging.Material and methods. More than 12,000 anonymized archives were collected to generate training and test neural network datasets from patients aged over 18 years. Each archive consisted of a set of programs in two planes containing T2-TSE, T1-TSE and T2 sequences with fat suppression program. Subsequently, the selected studies were tagged in two steps, directly consisting of manual tagging and its validation by experts. CNN training was performed separately for normal analysis, qualitative detection of individual pathological changes, and quantitative analysis. The accuracy of the models was verified by comparing the protocols of five radiologists and the output of CNN models in two steps. The first, intermediate stage evaluated the accuracy of the neural networks in detecting disc bulges, protrusions and extrusions, spinal canal stenosis, lateral stenosis, foraminal stenosis, spondylolisthesis and facet joint arthrosis. In the final stage, in addition to the pathologies considered in the intermediate one, the accuracy of detecting degenerative changes of the occlusive plates, synovitis of intervertebral joints, intervertebral discs degeneration, osteophytes, transitional vertebrae, hypertrophy of yellow ligaments and Schmorl’s hernia was tested. The reference value for all pathological changes considered in this paper was determined by majority vote and, in case of disagreement, by an external radiologist. The radiologists’ interpretations were then compared with those of the trained model.Results. The artificial intelligence (AI) showed comparable sensitivity and specificity values compared to the reference result in a group of experienced radiologists for binary classification (presence/absence) of individual lumbosacral spine degenerative changes. The sensitivity and specificity of AI results were 0.88 and 0.97 for extrusions, 0.81 and 0.94 for protrusions, 0.87 and 0.98 for central stenosis, 0.83 and 0.85 for lateral stenosis, 0.92 and 0.84 for foraminal stenosis, 0.85 and 0.5 for osteoarthritis, 0.73 and 0.96 for occlusive plates degeneration, 0.85 and 0.84 for intervertebral joint synovitis, 0.91 and 0.88 for osteophytes, 0.93 and 0.72 for intervertebral disc degeneration, 1.0 and 1.0 for transitional vertebrae, 0.8 and 1.0 for spondylolisthesis, 0.67 and 0.99 for yellow ligament hypertrophy, and 0.75 and 1.0 for Schmorl’s hernia, respectively. The accuracy of quantitative size characterization of lumbosacral spine protrusions and extrusions showed unsatisfactory results, but improvements in the quality of determination of these parameters are planned in future work.Conclusion. AI models showed comparable performance to expert radiologists in detecting lumbosacral spine degenerative changes. Consistent improvement of CNN models based on comparative evaluation with radiologists improves the sensitivity and specificity of pathologic change detection.