Author:
Jiang Xiaomei,Wang Shuo,Liu Wenjian,Yang Yun
Abstract
PurposeTraditional Chinese medicine (TCM) prescriptions have always relied on the experience of TCM doctors, and machine learning(ML) provides a technical means for learning these experiences and intelligently assists in prescribing. However, in TCM prescription, there are the main (Jun) herb and the auxiliary (Chen, Zuo and Shi) herb collocations. In a prescription, the types of auxiliary herbs are often more than the main herb and the auxiliary herbs often appear in other prescriptions. This leads to different frequencies of different herbs in prescriptions, namely, imbalanced labels (herbs). As a result, the existing ML algorithms are biased, and it is difficult to predict the main herb with less frequency in the actual prediction and poor performance. In order to solve the impact of this problem, this paper proposes a framework for multi-label traditional Chinese medicine (ML-TCM) based on multi-label resampling.Design/methodology/approachIn this work, a multi-label learning framework is proposed that adopts and compares the multi-label random resampling (MLROS), multi-label synthesized resampling (MLSMOTE) and multi-label synthesized resampling based on local label imbalance (MLSOL), three multi-label oversampling techniques to rebalance the TCM data.FindingsThe experimental results show that after resampling, the less frequent but important herbs can be predicted more accurately. The MLSOL method is shown to be the best with over 10% improvements on average because it balances the data by considering both features and labels when resampling.Originality/valueThe authors first systematically analyzed the label imbalance problem of different sampling methods in the field of TCM and provide a solution. And through the experimental results analysis, the authors proved the feasibility of this method, which can improve the performance by 10%−30% compared with the state-of-the-art methods.