Abstract
Background
Radiolucent bone lesions are encountered in all orthopedic specialties, and concise description is essential to inform evaluation and treatment. We studied the interobserver reliability and intra-observer reproducibility of three classification systems of radiographic radiolucent lesions: (1) original Lodwick classification, (2) modified Lodwick classification, and (3) Enneking classification for benign tumors. We hypothesized that intra-observer reproducibility would be good but interobserver reliability would be poor, improving with training level, and highest for the Enneking classification.
Methods
Forty-eight case sets of de-identified radiographs of radiolucent osseous lesions were selected from an orthopedic oncology practice. Each set included two orthogonal views of the lesion from initial presentation. Twenty participants (one third-year medical student, 18 residents, one orthopedic oncologist) classified each case twice, with a minimum two-week gap between sessions, according to the Lodwick classification, modified Lodwick classification, and Enneking classification. Interobserver reliability and intra-observer reproducibility were calculated using Fleiss’ kappa and Krippendorff’s alpha, treating the classifications as nominal and ordinal rankings, respectively. Linear regression models were used to determine the effect of training level on reproducibility. Contingency tables were used to assess the accuracy of correctly identifying benign versus malignant lesions against their known diagnoses.
Results
Interobserver reliability was poor, as demonstrated by agreement of 39% (κ = 0.23; α = 0.54), 39% (κ = 0.25; α = 0.48), and 53% (κ = 0.28; α = 0.45) for the Lodwick, modified Lodwick, and Enneking classifications, respectively. Intra-observer reproducibility also lacked strong agreement (κ = 0.42–0.45). Training level had no effect on reproducibility (R2 < 0.2, p > 0.05 for all classifications). Comparison of intra-observer reproducibility showed Krippendorff’s alpha for the Lodwick (α = 0.72), modified Lodwick (α = 0.69), and Enneking classification (α = 0.63). Self-agreement for individuals ranged from 39–78%. Lesions were correctly classified as malignant for 73.3%, 59.0%, and 62% of cases for the three classification systems, respectively.
Conclusions
Our data demonstrate that three common classifications for osseous radiolucent lesions are neither reliable nor reproducible. Consistency of classification varied depending on lesion characteristics, with the strongest reproducibility demonstrated for the highest and lowest grades of the classification systems. There was no association between orthopedic experience and intra-observer reproducibility. These deficiencies may be improved with AI applications.