Abstract
ABSTRACTAdversarial attacks can drastically change the output of a method by performing a small change on its input. While they can be a useful framework to analyze worst-case robustness, they can also be used by malicious agents to perform damage in machine learning-based applications. The proliferation of platforms that allow users to share their DNA sequences and phenotype information to enable association studies has led to an increase in large databases. Such open platforms are, however, vulnerable to malicious users uploading corrupted genetic sequence files that could damage downstream studies. Such studies commonly include steps involving the analysis of the genomic sequence’s structure using dimensionality reduction techniques and ancestry inference methods. In this paper we show how white-box gradient-based adversarial attacks can be used to corrupt the output of genomic analyses, and we explore different machine learning techniques to detect such manipulations.
Publisher
Cold Spring Harbor Laboratory
Reference22 articles.
1. openSNP–A Crowdsourced Web Resource for Personal Genomics
2. DNA.Land is a framework to collect genomes and phenomes in the era of abundant genetic information
3. Bayesian network construction and genotypephenotype inference using gwas statistics;IEEE/ACM transactions on computational biology and bioinformatics,2017
4. A. D. Mantes , D. M. Montserrat , C. D. Bustamante , X. Giró-i Nieto , and A. G. Ioannidis , “Neural admixture: rapid population clustering with autoencoders,” bioRxiv, pp. 2021–06, 2022.
5. D. M. Montserrat , C. Bustamante , and A. Ioannidis , “Lai-net: Local-ancestry inference with neural networks,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020, pp. 1314–1318.