Abstract
Systemic sclerosis (SSc) is an autoimmune, chronic disease that remains not well understood. It is believed that the cause of the illness is a combination of genetic and environmental factors. The evolution of the illness also greatly varies from patient to patient. A common complication of the illness, with an associated higher mortality, is interstitial lung disease (ILD). We present in this paper an algorithm (using machine learning techniques) that it is able to identify, with a 92.2% accuracy, patients suffering from ILD-SSc using gene expression data obtained from peripheral blood. The data were obtained from public sources (GEO accession GSE181228) and contains genetic data for 134 patients at an initial stage as well as at a follow up date (12 months later) for 98 of these patients. Additionally, there are 45 control (healthy) cases. The algorithm also identified 172 genes that might be involved in the illness. These 172 genes appeared in all the 20 most accurate classification models among a total of half a million models estimated. Their frequency might suggest that they are related to the illness to some degree. The proposed algorithm, besides differentiating between control and patients, was also able to distinguish among different variants of the illness (diffuse variants). This can have a significance from a treatment point of view. The different type of variants have a different associated prognosis.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献