Affiliation:
1. Department of Gastroenterology Beijing Friendship Hospital Capital Medical University Beijing 100050 China
2. Beijing Key Laboratory for Precancerous Lesion of Digestive Diseases Beijing 100050 China
3. National Clinical Research Center for Digestive Diseases Beijing 100050 China
4. Beijing Digestive Disease Center Beijing Beijing 100050 China
Abstract
AbstractColorectal cancer (CRC) is the second leading cause of cancer‐related death worldwide. Many molecular classification strategies are proposed for CRC but few studies include survival data in their models. Herein a prognosis‐oriented CRC classifier is constructed by adapting the natural partially labeled censored survival data into a customized semi‐supervised learning algorithm, which is called Monte‐Carlo K‐nearest neighbor voting (MC‐KV) classifier. Three CRC subtypes with distinct prognoses are identified by this classifier using the data from the cancer genome atlas. Furthermore, a six‐gene risk model is constructed by combining weighted gene coexpression network analysis and least absolute selection and shrinkage operator for variable selection and four algorithms (random survival forest, support vector machine, Adaboost, and logistic regression) for optimization. The optimized model shows great performance in distinguishing high‐risk from low‐risk patients with a maximum area under curve of 0.869, 0.906, and 0.921 in 1‐, 3‐, and 5‐year survival, respectively. Additionally, the six‐gene signature identified by MC‐KV exhibits great predictive efficiency for other cancer types. Overall, a tool, Monte‐Carlo K‐nearest neighbor voting (MC‐KV), is provided to identify molecular subtyping of CRC, which suggests the potential contribution of semi‐supervised algorithms and the inclusion of patient‐level survival data in cancer classification.
Funder
National Natural Science Foundation of China
Subject
Multidisciplinary,Modeling and Simulation,Numerical Analysis,Statistics and Probability