Predictive Classification of IBS-subtype: Performance of a 250-gene RNA expression panel vs. Complete Blood Count (CBC) profiles under a Random Forest model

Author:

Robinson JeffreyORCID

Abstract

AbstractIn this experiment, an R-script was developed to select the best performing machine learning (ML) predictive classification algorithm for IBS-subtype, and compare the performance of two datasets from the same clinical cohort – 1) The Complete Blood Count (CBC) results, and 2) A 250-gene Nanostring expression panel run on RNA from the “Buffy Coat” fraction. This publicly available data was compiled from open-source repositories and previously published supplementary data. Column labels were reformatted according to “tidy-data” standards. NA values in the data were imputed based on the mean value of the data column. Subject groups included Control (ie. healthy), IBS-D (diarrhea predominant), and IBS-C (constipation predominant) subtypes. These groups had unequal numbers in the original study, and so random re-sampling was used to make the group numbers equal for downstream linear regression-based analyses. The data was randomly split into training and validation subsets, and 5 classification algorithms were tested. Random Forest was clearly the best performing algorithm for both CBC and gene expression panel data, generally with >95% predictive accuracy, without additional tuning. The 250-gene RNA expression panel performed somewhat better than the CBC profile under a Random Forest model, however the CBC profiles had only 13 predictor variables vs. the 250 of the RNA expression panel. Some artifacts may result from the duplication of IBS-D and IBS-C rows from to the group-size balancing method, and so larger and more comprehensive datasets will be obtained for a follow-up analysis. The R-script and reformatted data are published as supplementary material here, and as a component of the ‘AnalyzeBloodworkv1.2’ GitHub repository.

Publisher

Cold Spring Harbor Laboratory

Reference9 articles.

1. Manning AP. , et al. Towards positive diagnosis of the irritable bowel. Br Med J. 2(6138):653–4.

2. Van Leeuwen AM , Kranpitz TR , Smith L. 2006. Complete Blood Count. pp. 413–420 In: Davis’s Comprehensive Handbook of Laboratory and Diagnostic Tests with Nursing Implications 2nd . F.A. Davis Company, Philadelphia.

3. Robinson JM. , et al. 2019. Complete blood count with differential: An effective diagnostic for IBS subtype in the context of BMI? BioRxiv. doi: https://doi.org/10.1101/608208.

4. Differential Gene Expression Associated with BMI, Gender, and IBS-subtype in Human White Blood Cells: Results from a Custom 250-plex Nanostring Probe Panel

5. Brownlee. 2016. Your First Machine Learning Project in R Step-By-Step. https://machinelearningmastery.com/machine-learning-in-r-step-by-step/.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3