Abstract
AbstractKnowing a patient’s genetic ancestry is crucial in clinical settings, providing benefits such as tailored genetic testing, targeted health screening based on ancestral disease-predisposition rates, and personalized medication dosages. However, self-reported ancestry can be subjective, making it difficult to apply consistently. Moreover, existing approaches utilize genome sequencing data to infer ancestry at the continental level, creating the need for methods optimized for individual ancestry assignment. We present SNVstory, a method built upon three independent machine learning models for accurately inferring the sub-continental ancestry of individuals. SNVstory includes a feature-importance scheme, unique among open-source ancestral tools, which allows the user to track the ancestral signal broadcast by a given gene or locus. We apply SNVstory to a clinical dataset, comparing self-reported ethnicity and race to our inferred genetic ancestry. SNVstory represents a significant advance in methods to assign genetic ancestry, predicting ancestry across 36 different populations with high accuracy.
Publisher
Cold Spring Harbor Laboratory