Completion of the DrugMatrix Toxicogenomics Database using ToxCompl

Author:

Cong Guojing,Patton Robert M.ORCID,Chao Frank,Svoboda Daniel L.,Casey Warren M.,Schmitt Charles P.ORCID,Murphy Charles,Erickson Jeremy N.,Combs Parker,Auerbach Scott S.ORCID

Abstract

AbstractThe DrugMatrix Database contains systematically generated toxicogenomics data from short-term in vivo studies for over 600 chemicals. However, most of the potential endpoints in the database are missing due to a lack of experimental measurements. We present our study on leveraging matrix factorization and machine learning methods to predict the missing values in the DrugMatrix, which includes gene expression across eight tissues on two expression platforms along with paired clinical chemistry, hematology, and histopathology measurements. One major challenge we encounter is the skewed distribution of the available measured data, in terms of both tissue sources and values. We propose a method, ToxiCompl, that applies systematic hybrid sampling guided by Bayesian optimization in conjunction with low-rank matrix factorization to recover the missing values. ToxiCompl achieves good training and validation performance from a machine learning perspective.We further conduct an in-depth validation of the predicted data from biological and toxicological perspectives with a series of analyses. These include examining the connectivity pattern of predicted gene expression responses, characterizing molecular pathway-level responses from sets of differentially expressed genes, evaluating known transcriptional biomarkers of tissue toxicity, and characterizing pre-dicted apical endpoints. Our analysis shows that the predicted differential gene expression, broadly speaking, aligns with what would be anticipated. For example, in most instances, our predicted differentially expressed gene lists offer a connectivity level comparable to that of measured data in connectivity analysis. Using Havcr1, a known transcriptional biomarker of kidney injury, we identify treatments that, based on the predicted expression data, manifest kidney toxicity in a manner that is mechanistically plausible and supported by the literature. Characterization of the predicted clinical chemistry data suggests that strong effects are relatively reliably predicted, while more subtle effects pose a greater challenge. In the case of histopathological prediction, we find a significant overprediction due to positivity bias in the measured data. Developing methods to deal with this bias is one of the areas we plan to target for future improvement. The main advantage of the ToxiCompl approach is that, in the absence of additional experimental data, it drastically extends the toxicogenomic landscape into a number of data-poor tissues, thereby allowing researchers to formulate mechanistic hypotheses about effects in tissues that have been underrepresented in the literature. All measured and predicted DrugMatrix data (i.e., gene expression, clinical chemistry, hematology, and histopathology) are available to the public through an intuitive GUI interface that allows for data retrieval, gene set analysis and high dimensional visualization of gene expression similarity (https://rstudio.niehs.nih.gov/complete_drugmatrix/).

Publisher

Cold Spring Harbor Laboratory

Reference120 articles.

1. Affymetrix rat genome 230 2.0 array. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL1355.

2. Ge healthcare/amersham biosciences codelink uniset rat i bioarray, layout exp5280x2-613. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL5425.

3. Human gene set: Hallmark p53 pathway. https://www.gsea-msigdb.org/gsea/msigdb/human/geneset/HALLMARK_P53_PATHWAY.html.

4. The potential of lipocalin-2/NGAL as biomarker for inflammatory and metabolic diseases

5. Thioacetamide toxicity and the spleen: histological and biochemical analysis;Anatomia, histologia, embryologia,2000

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3