Abstract
<div class="section abstract"><div class="htmlview paragraph">Fuel cell vehicles have always garnered a lot of attention in terms of energy utilization and environmental protection. In the analysis of fuel cell performance, there are usually some outliers present in the raw experimental data that can significantly affect the data analysis results. Therefore, data cleaning work is necessary to remove these outliers. The polarization curve is a crucial tool for describing the basic characteristics of fuel cells, typically described by semi-empirical formulas. The parameters in these semi-empirical formulas are fitted using the raw experimental data, so how to quickly and effectively automatically identify and remove data outliers is a crucial step in the process of fitting polarization curve parameters. This article explores data-cleaning methods based on the Local Outlier Factor (LOF) algorithm and the Isolation Forest algorithm to remove data outliers. For fuel cell experimental data, two algorithms are used to score all data points for outliers, and a reasonable threshold is set for outlier identification and removal. Then the parameters in the empirical formula of the polarization curve are fitted. The evaluation indicators adopt the coefficient of determination and root mean square error. The results show that after removing data outliers using two algorithms, the polarization curve has greatly improved in terms of fitting effects compared to the raw data. In addition, this article also compares and analyzes the outlier removal effects of the Isolation Forest algorithm and LOF algorithm and the two evaluation indicators. The results show that the LOF algorithm has higher accuracy and stability than the Isolation Forest algorithm in detecting outliers.</div></div>