Affiliation:
1. Departamento de Matemáticas Aplicadas y Sistemas, Universidad Autónoma Metropolitana-Cuajimalpa, Ciudad de México, México
2. Colegio de Ciencia y Tecnología, Universidad Autónoma de la Ciudad de México, Ciudad de México, México
Abstract
It is important to make sense of the data within its context to propose a useful model to solve a problem. This domain knowledge includes information not contained in the data, but that will help us understand the data to be fed into a machine-learning algorithm and guide us on what features might help our model. Nevertheless, domain knowledge may become insufficient as the input variables increase, forcing the need to try automated feature selection techniques. In this study, we investigate whether the joint use of 1) feature selection techniques, such as Chi-square, Tree-based Feature Selection, Pearson’s Correlation, LASSO, Low Variance, and Recursive Feature Elimination, 2) outlier detection methods such as Isolation-Forest, and 3) Cross-Validation techniques lead to improving the accuracy in multiclass classification in machine learning. Specifically, we address the classification of patterns representing the activation state of cell signaling components into classes that symbolize the different cellular processes triggered in cancer cells. The results presented in this work have shown an accuracy increase with up to 80% fewer input features by only using 3 out of the 16 original descriptors.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science
Reference27 articles.
1. V. Aggarwal, V. Gupta, P. Singh, K. Sharma and N. Sharma, Detection of spatial outlier by using improved z-score test, in: 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), IEEE, 2019, pp. 788–790.
2. Aoct-net: A convolutional network automated classification of multiclass retinal diseases using spectral-domain optical coherence tomography images;Alqudah;Medical & Biological Engineering & Computing,2020
3. Benchmark for filter methods for feature selection in high-dimensional classification data;Bommert;Computational Statistics & Data Analysis,2020
4. Feature selection in machine learning: A new perspective;Cai;Neurocomputing,2018
5. A survey on feature selection methods;Chandrashekar;Computers & Electrical Engineering,2014
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献