Scalable Causal Structure Learning: Scoping Review of Traditional and Deep Learning Algorithms and New Opportunities in Biomedicine-Reference-Cited by-同舟云学术

Scalable Causal Structure Learning: Scoping Review of Traditional and Deep Learning Algorithms and New Opportunities in Biomedicine

Published:2023-01-17 Issue: Volume:11 Page:e38266
ISSN:2291-9694
Container-title:JMIR Medical Informatics
language:en
Short-container-title:JMIR Med Inform

Author:

Upadhyaya Pulakesh^ORCID,Zhang Kai^ORCID,Li Can^ORCID,Jiang Xiaoqian^ORCID,Kim Yejin^ORCID

Abstract

Background Causal structure learning refers to a process of identifying causal structures from observational data, and it can have multiple applications in biomedicine and health care. Objective This paper provides a practical review and tutorial on scalable causal structure learning models with examples of real-world data to help health care audiences understand and apply them. Methods We reviewed traditional (combinatorial and score-based) methods for causal structure discovery and machine learning–based schemes. Various traditional approaches have been studied to tackle this problem, the most important among these being the Peter Spirtes and Clark Glymour algorithms. This was followed by analyzing the literature on score-based methods, which are computationally faster. Owing to the continuous constraint on acyclicity, there are new deep learning approaches to the problem in addition to traditional and score-based methods. Such methods can also offer scalability, particularly when there is a large amount of data involving multiple variables. Using our own evaluation metrics and experiments on linear, nonlinear, and benchmark Sachs data, we aimed to highlight the various advantages and disadvantages associated with these methods for the health care community. We also highlighted recent developments in biomedicine where causal structure learning can be applied to discover structures such as gene networks, brain connectivity networks, and those in cancer epidemiology. Results We also compared the performance of traditional and machine learning–based algorithms for causal discovery over some benchmark data sets. Directed Acyclic Graph-Graph Neural Network has the lowest structural hamming distance (19) and false positive rate (0.13) based on the Sachs data set, whereas Greedy Equivalence Search and Max-Min Hill Climbing have the best false discovery rate (0.68) and true positive rate (0.56), respectively. Conclusions Machine learning–based approaches, including deep learning, have many advantages over traditional approaches, such as scalability, including a greater number of variables, and potentially being applied in a wide range of biomedical applications, such as genetics, if sufficient data are available. Furthermore, these models are more flexible than traditional models and are poised to positively affect many applications in the future.

Publisher

JMIR Publications Inc.

Subject

Health Information Management,Health Informatics

Reference65 articles.

1. SpirtesPGlymourCScheinesRKauffmanSAimaleVWimberlyFConstructing Bayesian network models of gene expression networks from microarray dataCarnegie Mellon University20002021-01-15https://kilthub.cmu.edu/articles/journal_contribution/Constructing_Bayesian_Network_Models_of_Gene_Expression_Networks_from_Microarray_Data/6491291

2. The Causal Relationship Between Portal Usage and Self-Efficacious Health Information–Seeking Behaviors: Secondary Analysis of the Health Information National Trends Survey Data

3. A Bayesian Network Analysis of the Probabilistic Relationships Between Various Obesity Phenotypes and Cardiovascular Disease Risk in Chinese Adults: Chinese Population-Based Observational Study

4. TransmiR v2.0 databaseThe Cui Lab2021-10-14http://www.cuilab.cn/transmir

5. DCI: learning causal differences between gene regulatory networks

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identifying common disease trajectories of Alzheimer’s disease with electronic health records;2024-07-27

2. Optimization of Data Structures and Trade-Offs with Concurrency Control in Multithread Software Structures Using Artificial Intelligence;2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT);2024-02-09

3. Graphical Learning and Causal Inference for Drug Repurposing;2023-08-02

4. Design of Autonomous Wireless Sensor Network Using Mobile Robots for Intrusion Detection and Border Surveillance;2023 International Wireless Communications and Mobile Computing (IWCMC);2023-06-19

5. Causal Discovery and Features Importance Analysis: What Can Be Inferred About At-Risk Students?;Business Intelligence;2023