Abstract
Background and purpose
Mean pulmonary artery pressure (mPAP) is a key index for chronic thromboembolic pulmonary hypertension (CTEPH). Using machine learning, we attempted to construct an accurate prediction model for mPAP in patients with CTEPH.
Methods
A total of 136 patients diagnosed with CTEPH were included, for whom mPAP was measured. The following patient data were used as explanatory variables in the model: basic patient information (age and sex), blood tests (brain natriuretic peptide (BNP)), echocardiography (tricuspid valve pressure gradient (TRPG)), and chest radiography (cardiothoracic ratio (CTR), right second arc ratio, and presence of avascular area). Seven machine learning methods including linear regression were used for the multivariable prediction models. Additionally, prediction models were constructed using the AutoML software. Among the 136 patients, 2/3 and 1/3 were used as training and validation sets, respectively. The average of R squared was obtained from 10 different data splittings of the training and validation sets.
Results
The optimal machine learning model was linear regression (averaged R squared, 0.360). The optimal combination of explanatory variables with linear regression was age, BNP level, TRPG level, and CTR (averaged R squared, 0.388). The R squared of the optimal multivariable linear regression model was higher than that of the univariable linear regression model with only TRPG.
Conclusion
We constructed a more accurate prediction model for mPAP in patients with CTEPH than a model of TRPG only. The prediction performance of our model was improved by selecting the optimal machine learning method and combination of explanatory variables.
Publisher
Public Library of Science (PLoS)