Abstract
ABSTRACTThere are few data considering human genetics as an important risk factor for birth abnormalities related to ZIKV infection during pregnancy, even though sub-Saharan African populations are apparently more resistant to CZS as compared to populations in the Americas. We hypothesized that single nucleotide variants (SNVs), especially in innate immune genes, could make some populations more susceptible to Zika congenital complications than others. Differences in the SNV frequencies among continental populations provide great potential for Machine Learning techniques. We explored a key immune genomic gradient between individuals from Africa, Asia and Latin America, working with complex signatures, using 297 SNVs. We employed a two-step approach. In the first step, decision trees (DTs) were used to extract the most discriminating SNVs among populations. In the second step, machine learning algorithms were used to evaluate the quality of the SNV pool identified in step one for discriminating between individuals from sub-Saharan African and Latin-American populations. Our results suggest that 10 SNVs from 10 genes (CLEC4M, CD58, OAS2, CD80, VEPH1, CTLA4, CD274, CD209, PLAAT4, CREB3L1) were able to discriminate sub-Saharan Africans from Latin American populations using only immune genome data, with an accuracy close to 100%. Moreover, we found that these SNVs form a genome gradient across the three main continental populations. These SNVs are important elements of the innate immune system and in the response against viruses. Our data support the Human Immune Genome Complex Gradient hypothesis as a new theory that may help to explain the CZS catastrophe in Brazil.
Publisher
Cold Spring Harbor Laboratory