DBFE: Distribution-based feature extraction from copy number and structural variants in whole-genome data-Reference-Cited by-同舟云学术

DBFE: Distribution-based feature extraction from copy number and structural variants in whole-genome data

Published:2022-02-10 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Piernik Maciej^ORCID,Brzezinski Dariusz^ORCID,Sztromwasser Pawel^ORCID,Pacewicz Klaudia^ORCID,Majer-Burman Weronika^ORCID,Gniot Michal^ORCID,Sielski Dawid^ORCID,Wozna Alicja^ORCID,Zawadzki Pawel^ORCID

Abstract

AbstractMotivationWhole-genome sequencing has revolutionized biosciences by providing tools for constructing complete DNA sequences of individuals. With entire genomes at hand, scientists can pinpoint DNA fragments responsible for different cancers and predict patient responses to cancer treatments. However, the sheer volume of whole-genome data makes it difficult to encode the characteristics of genomic variants as features for machine learning algorithms.ResultsWe present three feature extraction methods that facilitate classifier learning from distributions of genomic variants. The proposed approaches use binning, clustering, and kernel density estimation to produce features that discriminate between two groups of patients. Experiments on genomes of 219 ovarian, 61 lung, and 929 breast cancer patients show that the proposed approaches automatically identify genomic biomarkers associated with cancer subtypes and clinical response to oncological treatment. Finally, we show that the extracted features can be used alongside unsupervised learning methods to analyze genomic samples.AvailabilityThe source code of the presented algorithms and reproducible experimental scripts are available on Github at https://github.com/MNMdiagnostics/dbfeContactmaciej.piernik@cs.put.poznan.pl

Publisher

Cold Spring Harbor Laboratory

Reference36 articles.

1. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

2. Genomics-Driven Precision Medicine for Advanced Pancreatic Cancer: Early Results from the COMPASS Trial

3. Broad Institute (2019) Picard toolkit. http://broadinstitute.github.io/picard/.

4. Chen, J. and Liu, Y. (2013) Quantile and quantile-function estimations under density ratio model. Ann. Statist., 41.