Affiliation:
1. School of Management, Zhejiang University, 866 Yuhangtang Road, Hangzhou 310058, P. R. China
2. The First Affiliated Hospital, Zhejiang University School of Medicine, 79 Qingchun Road, Hangzhou 310003, P. R. China
Abstract
In the chronic disease diagnosis with high-dimensional clinical features, feature selection (FS) algorithms are widely applied to avoid sparse data. In current FS algorithms, only population features, which are in strong relevance with states of all patients, are extracted, while subspace features, which are in weak relevance with states of all patients but in strong relevance with states of patients under a certain state, are ignored. Eliminated relevant information in subspace features worsens the performance of current classification models. To alleviate the conflict of feature extraction in sparse data, we propose a two-phase classification model with relevant information in both population and subspace features considered. For a patient, his probability under each state is estimated in a space whose dimensions are population features in Phase 1, and in a space whose dimensions are subspace features under that state in Phase 2. The final result of the classification model is based on results in both phases. With both population and subspace features considered and probabilities under each state estimated in a low-dimensional space, the two-phase classification model outperforms other benchmark models both in accuracy and mean absolute error in the hepatic fibrosis diagnosis for patients with chronic hepatitis B.
Funder
NSFC
Zheng Zhang is supported by the National Natural Science Foundation of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
General Medicine,Computer Science (miscellaneous)