Abstract
AbstractIn current era, the intelligent development of traditional Chinese medicine (TCM) has attracted more and more attention. As the main carrier of clinical medication, formulas use synergies of active substances to enhance efficacy and reduce side effects. Related studies show that there is a nonlinear relationship between the efficacy of formulas and herbs. Deep learning is an effective technique for fitting nonlinear relationships. However, it is not good for using deep learning model directly due to ignoring the characteristics of formulas. In this paper, we propose a detached feature extraction approach (TCM2Vec) based on deep learning for better feature extraction and efficacy prediction. We build two detached encoders, one of it uses cross-feature-based unsupervised pre-training model (FMh2v) to extract the relationship features of herbal medicines for initializing, while the other one simulates multi-dimensional characteristics of medicines by normal distribution. Then we integrate relationships and medicinal characteristics for deep feature extraction. We processed 31,114 unlabeled formulas for pre-training and two classification tasks in-domain for predicting and fine-tuning. One of tasks is multi-classed with 1036 formulas, other one is multi-labelled with 1,723 formulas. For labelled formulas, different feature extraction models based on detached encoder are trained to predict efficacy. Compared with the no pre-training, CBOW and BERT baseline models, FMh2v leads to performance gains. Moreover, the detached encoder offers large positive effects in different models which for efficacy prediction, where ACC increased by 5.80% on average and F1 increased by 12.06% on average. Overall, the proposed feature extraction is an effective method for obtaining characteristic representation of TCM formulas, and provides reference for the adaptability of artificial intelligence technology in the domain of TCM.
Funder
The Key Project of TCM Scientific Research Program in Hunan Province
Natural Science Foundation of Hunan Province
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Media Technology,Software
Reference48 articles.
1. Acharjya DP, Ahmed PK (2022) A hybridized rough set and bat-inspired algorithm for knowledge inferencing in the diagnosis of chronic liver disease. Multimed Tools Appl 81(10):13489–13512
2. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
3. Chowdhury GG (2003) Natural language processing. Annu Rev Inf Sci Technol 37(1):51–89
4. Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). Comput Sci 2015:1–14
5. Deng L, Chang C, Huang X, Liang L, Liang H (2020) Quantitative study on medicinal properties of traditional Chinese medicine based on BP neural network. Chin Tradit Herb Drugs 51(16):4277–4283