Abstract
AbstractTechnological advancement led to the development of tools to collect vast amounts of data usually recorded at temporal stamps or arriving over time, e.g. data from sensors. Common ways of analysing this kind of data also involve supervised classification techniques; however, despite constant improvements in the literature, learning from high-dimensional data is always a challenging task due to many issues such as, for example, dealing with the curse of dimensionality and looking for a trade-off between complexity and accuracy. Nowadays, research in functional data analysis (FDA) and statistical learning is very lively to address these drawbacks adequately. This study offers a supervised classification strategy that combines FDA and tree-based procedures. Specifically, we introduce functional classification trees, functional bagging, and functional random forest exploiting the functional principal components decomposition as a tool to extract new features and build functional classifiers. In addition, we introduce new tools to support the understanding of the classification rules, such as the functional empirical separation prototype, functional predicted separation prototype, and the leaves’ functional deviance. Furthermore, we suggest some possible solutions for choosing the number of functional principal components and functional classification trees to be implemented in the supervised classification procedure. This research aims to provide an approach to improve the accuracy of the functional classifier, serve the interpretation of the functional classification rules, and overcome the classical drawbacks due to the high-dimensionality of the data. An application on a real dataset regarding daily electrical power demand shows the functioning of the supervised classification proposal. A simulation study with nine scenarios highlights the performance of this approach and compares it with other functional classification methods. The results demonstrate that this line of research is exciting and promising; indeed, in addition to the benefits of the suggested interpretative tools, we exceed the previously established accuracy records on a dataset available online.
Funder
Università degli Studi della Campania Luigi Vanvitelli
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Statistics, Probability and Uncertainty,Statistics and Probability
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献