Abstract
In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning classifier models based on generalized learning vector quantization, a powerful methodology for sequence classification is achieved which allows substantial knowledge extraction in addition to the high classification ability due to the model-inherent robustness. Any potential (slightly) inferior performance of the used classifier is compensated by the additional knowledge provided by interpretable models. This knowledge may assist the user in the analysis and understanding of the used data and considered task. After theoretical justification of the concepts, we demonstrate the approach for various example data sets covering different areas in biomolecular sequence analysis.
Subject
General Physics and Astronomy
Reference135 articles.
1. What Is Life?;Schrödinger,1944
2. Stages of emerging life —Five principles of early organization
3. Synergetics—An Introduction Nonequilibrium Phase Transitions and Self-Organization in Physics, Chemistry and Biology;Haken,1983
4. Information and Self-Organization;Haken,1988
5. Bioinformatics;Baldi,2001
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献