Term Domain Distribution Analysis: a Data Mining Tool for Text Databases-Reference-Cited by-同舟云学术

Term Domain Distribution Analysis: a Data Mining Tool for Text Databases

Published:1999 Issue:02 Volume:38 Page:96-101
ISSN:0026-1270
Container-title:Methods of Information in Medicine
language:en
Short-container-title:Methods Inf Med

Author:

Chu W. W.,Parker D. S.,Goldman R. M.,Goldman J. A.

Abstract

AbstractIn this paper, we give a case history illustrating the real-world application of a useful technique for data mining of text databases. The technique, which we call Term Domain Distribution Analysis (TDDA), consists of keeping track of term frequencies for specific finite domains and announcing significant differences from standard frequency distributions over these domains as a hypothesis. TDDA is part of a larger framework, the Digital Filter Model, for data mining of text documents. In the case study presented, the domain of terms was the pair {right, left}, over which we expected a uniform distribution. In analyzing term frequencies in a thoracic lung cancer database, the TDDA technique led to the surprising discovery that primary thoracic lung cancer tumors appear in the right lung more often than the left lung, with a ratio of 3:2. Treating the text discovery as a hypothesis, we verified this relationship against the medical literature in which primary lung tumor sites were reported, using a standard χ2 statistic. We subsequently developed a working theoretical model of lung cancer that may explain the discovery. This discovery and our model may change how oncologists view the mechanisms of primary lung tumor location.

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Advanced and Specialised Nursing,Health Informatics

Link

http://www.thieme-connect.de/products/ejournals/pdf/10.1055/s-0038-1634180.pdf

Cited by 16 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Questionnaire Results of Remote Practical Hospital Pharmacy Training Using Online System Amid COVID-19 Pandemic;Iryo Yakugaku (Japanese Journal of Pharmaceutical Health Care and Sciences);2020-12-10

2. Anomaly Detection in Healthcare: Detecting Erroneous Treatment Plans in Time Series Radiotherapy Data;International Journal of Semantic Computing;2014-09

3. Auto-Selection of DPC Codes from Discharge Summaries by Text Mining in Several Hospitals and Analysis of Differences in Discharge Summaries;Journal of Advanced Computational Intelligence and Intelligent Informatics;2012-01-20

4. Structure and infrastructure of infectious agent research literature: SARS;Scientometrics;2010-05-30

5. Assessment of China's and India's science and technology literature — introduction, background, and approach;Technological Forecasting and Social Change;2007-11