Abstract
AbstractPatient interactions with health care providers result in entries to electronic health records (EHRs). EHRs were built for clinical and billing purposes but contain many data points about an individual. Mining these records provides opportunities to extract electronic phenotypes, which can be paired with genetic data to identify genes underlying common human diseases. This task remains challenging: high quality phenotyping is costly and requires physician review; many fields in the records are sparsely filled; and our definitions of diseases are continuing to improve over time. Here we develop and evaluate a semi-supervised learning method for EHR phenotype extraction using denoising autoencoders for phenotype stratification. By combining denoising autoencoders with random forests we find classification improvements across multiple simulation models and improved survival prediction in ALS clinical trial data. This is particularly evident in cases where only a small number of patients have high quality phenotypes, a common scenario in EHR-based research. Denoising autoencoders perform dimensionality reduction enabling visualization and clustering for the discovery of new subtypes of disease. This method represents a promising approach to clarify disease subtypes and improve genotype-phenotype association studies that leverage EHRs.GRAPHICAL ABSTRACTHIGHLIGHTSDenoising autoencoders (DAs) can model electronic health records.Semi-supervised learning with DAs improves ALS patient survival predictions.DAs improve patient cluster visualization through dimensionality reduction.
Publisher
Cold Spring Harbor Laboratory
Reference43 articles.
1. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data
2. Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders., Pacific Symp;Biocomput,2015
3. 111th Congress (2009-2010), H.R.1 - American Recovery and Reinvestment Act of 2009, 2009.
4. Health Care and the American Recovery and Reinvestment Act
5. A PheWAS approach in studying HLA-DRB1*1501
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献