Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text-Reference-Cited by-同舟云学术

Subgraph augmented non-negative tensor factorization (SANTF) for modeling clinical narrative text

Published:2015-04-09 Issue:5 Volume:22 Page:1009-1019
ISSN:1527-974X
Container-title:Journal of the American Medical Informatics Association
language:en
Short-container-title:

Author:

Luo Yuan¹,Xin Yu¹,Hochberg Ephraim²,Joshi Rohit¹,Uzuner Ozlem³,Szolovits Peter¹

Affiliation:

1. Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology

2. Center for Lymphoma, Massachusetts General Hospital and Department of Medicine, Harvard Medical School

3. Department of Information Studies, State University of New York at Albany

Abstract

Abstract Objective Extracting medical knowledge from electronic medical records requires automated approaches to combat scalability limitations and selection biases. However, existing machine learning approaches are often regarded by clinicians as black boxes. Moreover, training data for these automated approaches at often sparsely annotated at best. The authors target unsupervised learning for modeling clinical narrative text, aiming at improving both accuracy and interpretability. Methods The authors introduce a novel framework named subgraph augmented non-negative tensor factorization (SANTF). In addition to relying on atomic features (e.g., words in clinical narrative text), SANTF automatically mines higher-order features (e.g., relations of lymphoid cells expressing antigens) from clinical narrative text by converting sentences into a graph representation and identifying important subgraphs. The authors compose a tensor using patients, higher-order features, and atomic features as its respective modes. We then apply non-negative tensor factorization to cluster patients, and simultaneously identify latent groups of higher-order features that link to patient clusters, as in clinical guidelines where a panel of immunophenotypic features and laboratory results are used to specify diagnostic criteria. Results and Conclusion SANTF demonstrated over 10% improvement in averaged F-measure on patient clustering compared to widely used non-negative matrix factorization (NMF) and k-means clustering methods. Multiple baselines were established by modeling patient data using patient-by-features matrices with different feature configurations and then performing NMF or k-means to cluster patients. Feature analysis identified latent groups of higher-order features that lead to medical insights. We also found that the latent groups of atomic features help to better correlate the latent groups of higher-order features.

Publisher

Oxford University Press (OUP)

Subject

Health Informatics

Link

http://academic.oup.com/jamia/article-pdf/22/5/1009/34146351/ocv016.pdf

Reference50 articles.

1. Computational medicine: translating models to clinical care;Winslow;Sci Transl Med.,2012

2. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning;Shipp;Nat Med.,2002

3. A simple algorithm for identifying negated findings and diseases in discharge summaries;Chapman;J Biomed Informat.,2001

4. Exploiting semantic relations for literature-based discovery;Hristovski;AMIA Ann Symp Proc.,2006

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Robust Tensor CUR Decompositions: Rapid Low-Tucker-Rank Tensor Recovery with Sparse Corruptions;SIAM Journal on Imaging Sciences;2024-01-25

2. Unsupervised EHR‐based phenotyping via matrix and tensor decompositions;WIREs Data Mining and Knowledge Discovery;2023-03-05

3. Machine Learning for Lung Cancer Diagnosis, Treatment, and Prognosis;Genomics, Proteomics & Bioinformatics;2022-10

4. The Application Research on Self-Health Management of People based on Mobile Internet Use;2022 IEEE 5th International Conference on Knowledge Innovation and Invention (ICKII );2022-07-22

5. Research and Application of Artificial Intelligence Based on Electronic Health Records of Patients With Cancer: Systematic Review;JMIR Medical Informatics;2022-04-20