Abstract
ABSTRACTThe impact of common and rare variants in COVID-19 host genetics is widely studied in [16]. Here, common and rare variants were used to define an interpretable machine learning model for predicting COVID-19 severity. Firstly, variants were converted into sets of Boolean features, depending on the absence or the presence of variants in each gene. An ensemble of LASSO logistic regression models was used to identify the most informative Boolean features with respect to the genetic bases of severity. After that, the Boolean features, selected by these logistic models, were combined into an Integrated PolyGenic Score, the so called IPGS, which offers a very simple description of the contribution of host genetics in COVID-19 severity. IPGS leads to an accuracy of 55-60% on different cohorts and, after a logistic regression with in input both IPGS and the age, it leads to an accuracy of 75%. The goal of this paper is to improve the previous results, using the information on the host organs involved in the disease. We generalized the IPGS adding a statistical weight for each organ, through the transformation of Boolean features into “Boolean quantum features”, inspired by the Quantum Mechanics. The organs’ coefficients were set via the application of the genetic algorithm Pygad and, after that, we defined two new Integrated PolyGenic Score (and). By applying a logistic regression with both(or indifferently) and age as input, we reach an accuracy of 84-86%, thus improving the results previously shown in [16] by a factor of 10%.
Publisher
Cold Spring Harbor Laboratory