How subjective CT image quality assessment becomes surprisingly reliable: pairwise comparisons instead of Likert scale

Author:

Hoeijmakers Eva J. I.ORCID,Martens Bibi,Hendriks Babs M. F.,Mihl Casper,Miclea Razvan L.,Backes Walter H.,Wildberger Joachim E.,Zijta Frank M.,Gietema Hester A.,Nelemans Patricia J.,Jeukens Cécile R. L. P. N.

Abstract

Abstract Objectives The aim of this study is to improve the reliability of subjective IQ assessment using a pairwise comparison (PC) method instead of a Likert scale method in abdominal CT scans. Methods Abdominal CT scans (single-center) were retrospectively selected between September 2019 and February 2020 in a prior study. Sample variance in IQ was obtained by adding artificial noise using dedicated reconstruction software, including reconstructions with filtered backprojection and varying iterative reconstruction strengths. Two datasets (each n = 50) were composed with either higher or lower IQ variation with the 25 original scans being part of both datasets. Using in-house developed software, six observers (five radiologists, one resident) rated both datasets via both the PC method (forcing observers to choose preferred scans out of pairs of scans resulting in a ranking) and a 5-point Likert scale. The PC method was optimized using a sorting algorithm to minimize necessary comparisons. The inter- and intraobserver agreements were assessed for both methods with the intraclass correlation coefficient (ICC). Results Twenty-five patients (mean age 61 years ± 15.5; 56% men) were evaluated. The ICC for interobserver agreement for the high-variation dataset increased from 0.665 (95%CI 0.396–0.814) to 0.785 (95%CI 0.676–0.867) when the PC method was used instead of a Likert scale. For the low-variation dataset, the ICC increased from 0.276 (95%CI 0.034–0.500) to 0.562 (95%CI 0.337–0.729). Intraobserver agreement increased for four out of six observers. Conclusion The PC method is more reliable for subjective IQ assessment indicated by improved inter- and intraobserver agreement. Clinical relevance statement This study shows that the pairwise comparison method is a more reliable method for subjective image quality assessment. Improved reliability is of key importance for optimization studies, validation of automatic image quality assessment algorithms, and training of AI algorithms. Key Points • Subjective assessment of diagnostic image quality via Likert scale has limited reliability. • A pairwise comparison method improves the inter- and intraobserver agreement. • The pairwise comparison method is more reliable for CT optimization studies. Graphical Abstract

Publisher

Springer Science and Business Media LLC

Reference28 articles.

1. Valentin J (2007) The 2007 Recommendations of the International Commission on Radiological Protection. Oxford: Elsevier 37(2-4):1-133

2. Valentin J (2007) International Commission on Radiation Protection. Managing patient dose in multi-detector computed tomography (MDCT). New York: Elsevier 1-79

3. Samei E, Bakalyar D, Boedeker KL et al (2019) Performance evaluation of computed tomography systems: summary of AAPM Task Group 233. Med Phys 46(11):e735–e756

4. Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 22(140):5–55

5. Zhang Z, Zhau J, Liu N, Gu X, Zhang Y (2017) An improved pairwise comparison scaling method for subjective image quality assessment. IEEE Int Symp Broadb Multimed Syst Broadcast (BMSB) 1-6

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3