Author:
Casanova Isidoro J.,Campos Manuel,Juarez Jose M.,Gomariz Antonio,Canovas-Segura Bernardo,Lorente-Ros Marta,Lorente Jose A.
Abstract
Abstract
Background
Pattern mining techniques are helpful tools when extracting new knowledge in real practice, but the overwhelming number of patterns is still a limiting factor in the health-care domain. Current efforts concerning the definition of measures of interest for patterns are focused on reducing the number of patterns and quantifying their relevance (utility/usefulness). However, although the temporal dimension plays a key role in medical records, few efforts have been made to extract temporal knowledge about the patient’s evolution from multivariate sequential patterns.
Methods
In this paper, we propose a method to extract a new type of patterns in the clinical domain called Jumping Diagnostic Odds Ratio Sequential Patterns (JDORSP). The aim of this method is to employ the odds ratio to identify a concise set of sequential patterns that represent a patient’s state with a statistically significant protection factor (i.e., a pattern associated with patients that survive) and those extensions whose evolution suddenly changes the patient’s clinical state, thus making the sequential patterns a statistically significant risk factor (i.e., a pattern associated with patients that do not survive), or vice versa.
Results
The results of our experiments highlight that our method reduces the number of sequential patterns obtained with state-of-the-art pattern reduction methods by over 95%. Only by achieving this drastic reduction can medical experts carry out a comprehensive clinical evaluation of the patterns that might be considered medical knowledge regarding the temporal evolution of the patients. We have evaluated the surprisingness and relevance of the sequential patterns with clinicians, and the most interesting fact is the high surprisingness of the extensions of the patterns that become a protection factor, that is, the patients that recover after several days of being at high risk of dying.
Conclusions
Our proposed method with which to extract JDORSP generates a set of interpretable multivariate sequential patterns with new knowledge regarding the temporal evolution of the patients. The number of patterns is greatly reduced when compared to those generated by other methods and measures of interest. An additional advantage of this method is that it does not require any parameters or thresholds, and that the reduced number of patterns allows a manual evaluation.
Funder
Agencia Estatal de Investigación
Publisher
Springer Science and Business Media LLC
Reference41 articles.
1. Fan H. Efficient mining of interesting emerging patterns and their effective use in classification. PhD thesis, The Department of Computer Science and Software Engineering, University of Melbourne (2004).
2. Okeh U, Ogbonna L. Statistical evaluation of indicators of diagnostic test performance. Am J BioScience. 2013;1(4):63. https://doi.org/10.11648/j.ajbio.20130104.13.
3. Glas AS, Lijmer JG, Prins MH, Bonsel GJ, Bossuyt PMM. The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol. 2003;56(11):1129–35. https://doi.org/10.1016/s0895-4356(03)00177-x.
4. Gupta MK, Chandra P. A comprehensive survey of data mining. Int J Inform Technol. 2020;12(4):1243–57. https://doi.org/10.1007/s41870-020-00427-7.
5. He Z, Gu F, Zhao C, Liu X, Wu J, Wang J. Conditional discriminative pattern mining: concepts and algorithms. Inf Sci. 2017;375:1–15. https://doi.org/10.1016/j.ins.2016.09.047.