An Improved Deep Learning Model: S-TextBLCNN for Traditional Chinese Medicine Formula Classification-Reference-Cited by-同舟云学术

An Improved Deep Learning Model: S-TextBLCNN for Traditional Chinese Medicine Formula Classification

Published:2021-12-22 Issue: Volume:12 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Cheng Ning,Chen Yue,Gao Wanqing,Liu Jiajun,Huang Qunfu,Yan Cheng,Huang Xindi,Ding Changsong

Abstract

Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination.Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia, natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie, an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed.Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F1-score of 0.762, both higher than the logistic regression (acc = 0.561, F1-score = 0.567), SVM (acc = 0.703, F1-score = 0.591), LSTM (acc = 0.723, F1-score = 0.621), and TextCNN (acc = 0.745, F1-score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F1-score is greatly improved by an average of 47.1% in 19 models.Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference39 articles.

1. Sentiment Analysis of Movie Reviews Based on Improved Word2vec and Ensemble Learning;Bao;J. Phys. Conf. Ser.,2020

2. On the Effects of Using Word2vec Representations in Neural Networks for Dialogue Act Recognition;Cerisara;Comput. Speech Lang.,2018

3. Analyzing Tongue Images Using a Conceptual Alignment Deep Autoencoder;Dai;IEEE Access,2018

4. Optimizing Semantic Deep forest for Tweet Topic Classification;Daouadi;Inf. Syst.,2021

5. Boosting the Performance of Over-sampling Algorithms through Under-sampling the Minority Class;de Morais;Neurocomputing,2019

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. HerbMet: Enhancing metabolomics data analysis for accurate identification of Chinese herbal medicines using deep learning;Phytochemical Analysis;2024-08-21

2. Biological Mechanism of Traditional Chinese Medicine Formula and Herbs in Treating Diseases from the Perspective of Cold and Hot;World Journal of Traditional Chinese Medicine;2024-02-29

3. Classification method of traditional Chinese medicine compound decoction duration based on multi-dimensional feature weighted fusion;Computer Methods in Biomechanics and Biomedical Engineering;2024-01-09

4. Application of Data Fusion in Traditional Chinese Medicine: A Review;Sensors;2023-12-25

5. Traditional Chinese Medicine Formula Classification Using Large Language Models;2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2023-12-05