Author:
Mamtani Manju R,Thakre Tushar P,Kalkonde Mrunal Y,Amin Manik A,Kalkonde Yogeshwar V,Amin Amit P,Kulkarni Hemant
Abstract
Abstract
Background
In spite of the recognized diagnostic potential of biomarkers, the quest for squelching noise and wringing in information from a given set of biomarkers continues. Here, we suggest a statistical algorithm that – assuming each molecular biomarker to be a diagnostic test – enriches the diagnostic performance of an optimized set of independent biomarkers employing established statistical techniques. We validated the proposed algorithm using several simulation datasets in addition to four publicly available real datasets that compared i) subjects having cancer with those without; ii) subjects with two different cancers; iii) subjects with two different types of one cancer; and iv) subjects with same cancer resulting in differential time to metastasis.
Results
Our algorithm comprises of three steps: estimating the area under the receiver operating characteristic curve for each biomarker, identifying a subset of biomarkers using linear regression and combining the chosen biomarkers using linear discriminant function analysis. Combining these established statistical methods that are available in most statistical packages, we observed that the diagnostic accuracy of our approach was 100%, 99.94%, 96.67% and 93.92% for the real datasets used in the study. These estimates were comparable to or better than the ones previously reported using alternative methods. In a synthetic dataset, we also observed that all the biomarkers chosen by our algorithm were indeed truly differentially expressed.
Conclusion
The proposed algorithm can be used for accurate diagnosis in the setting of dichotomous classification of disease states.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference71 articles.
1. Armstrong NJ, van de Wiel MA: Microarray data analysis: from hypotheses to conclusions using gene expression data. Cell Oncol 2004, 26(5–6):279–290.
2. Gaasterland T, Bekiranov S: Making the most of microarray data. Nat Genet 2000, 24(3):204–206. 10.1038/73392
3. Li L, Tang H, Wu Z, Gong J, Gruidl M, Zou J, Tockman M, Clark RA: Data mining techniques for cancer detection using serum proteomic profiling. Artif Intell Med 2004, 32(2):71–83. 10.1016/j.artmed.2004.03.006
4. Man MZ, Dyson G, Johnson K, Liao B: Evaluating methods for classifying expression data. J Biopharm Stat 2004, 14(4):1065–1084. 10.1081/BIP-200035491
5. Brentani RR, Carraro DM, Verjovski-Almeida S, Reis EM, Neves EJ, de Souza SJ, Carvalho AF, Brentani H, Reis LF: Gene expression arrays in cancer research: methods and applications. Crit Rev Oncol Hematol 2005, 54(2):95–105.
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献