Abstract
Abstract. Mass spectrometric measurements commonly yield data on hundreds of variables over thousands of points in time. Refining and synthesising this “raw” data into chemical information necessitates the use of advanced, statistics-based data analytical techniques. In the field of analytical aerosol chemistry, statistical, dimensionality reductive methods have become widespread in the last decade, yet comparable advanced chemometric techniques for data classification and identification remain marginal. Here we present an example of combining data dimensionality reduction (factorisation), with exploratory classification (clustering), and show the results can not only reproduce and corroborate earlier findings, but also complement and broaden our current perspectives on aerosol chemical classification. We find that applying positive matrix factorisation to extract spectral characteristics of the organic component of air pollution plumes together with an unsupervised clustering algorithm, k-means++, for classification, reproduces classical organic aerosol speciation schemes. In addition to the typical oxidation level and aerosol source driven aerosol classification we were also able to classify and characterise outlier groups that would likely be disregarded in a more conventional analysis. Evaluating solution quality for the classification also provides means to assess the performance of mass spectral similarity metrics and optimise weighting for mass spectral variables. This both improves algorithm-based classification and provides important clues for a human analyst on the relative importance of variables and data structures.
Funder
European Commission
Academy of Finland
Luonnontieteiden ja Tekniikan Tutkimuksen Toimikunta
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献