NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data-Reference-Cited by-同舟云学术

NNFSRR: Nearest Neighbor Feature Selection and Redundancy Removal Method for Nearest Neighbor Search in Microarray Gene Expression Data

Published:2023-09-19 Issue: Volume:9 Page:
ISSN:2411-7145
Container-title:EAI Endorsed Transactions on Pervasive Health and Technology
language:
Short-container-title:EAI Endorsed Trans Perv Health Tech

Author:

Bhartiya Rupali,Prajapati Gend Lal

Abstract

INTRODUCTION: Gene expression data analysis is a critical aspect of disease prediction and classification, playing a pivotal role in the field of bioinformatics and biomedical research. High-dimensional gene expression datasets hold a wealth of information, but their effective utilization is hindered by the presence of irrelevant dimensions and noise. The challenge lies in extracting meaningful features from these datasets to enhance the accuracy of disease prediction and classification while maintaining computational efficiency. Feature selection is a crucial step in addressing these challenges, as it aims to identify and retain only the most informative characteristics from large high-dimensional microarray datasets. In the context of microarray gene expression data, characterized by its substantial dimensionality, selecting relevant features is essential for efficient nearest neighbor search, a fundamental component of various analytical tasks in bioinformatics and data mining. Existing feature selection methods in high-dimensional data often face issues related to the trade-off between search accuracy and computational efficiency. This paper introduces a novel approach, the Nearest Neighbor Feature Selection with Symmetrical Uncertainty-based Redundancy Removal (NNFSRR) method, designed to enhance the classification of microarray gene expression data through feature selection. The NNFSRR method focuses on reducing the dimensionality of the dataset by identifying and removing redundant features, allowing subsequent searches to operate solely on relevant dimensions. OBJECTIVES: The primary goal is to evaluate the NNFSRR method's effectiveness in improving nearest neighbor search in microarray gene expression datasets by reducing dimensionality. This method utilizes Symmetrical Uncertainty-based correlation between dimensions for feature selection and aims to enhance accuracy and efficiency compared to existing methods. METHODS: The NNFSRR method uses Symmetrical Uncertainty to identify and remove redundant features from microarray gene expression datasets. Reduced datasets are used for nearest neighbor search, improving accuracy and efficiency. Experiments are conducted using real-world datasets, and comparisons with existing methods are made based on search time and accuracy. RESULTS: The NNFSRR method demonstrates improved nearest neighbor search performance, outperforming basic brute force methods and existing feature selection techniques. Selected feature sets exhibit strong class associations while minimizing feature correlations, enhancing classification precision. CONCLUSION: In conclusion, the NNFSRR method presents a promising approach to address the challenges posed by high-dimensional gene expression data. It effectively reduces dimensionality, improves search accuracy, and enhances the efficiency of nearest neighbor search. Our experimental results demonstrate that this method outperforms existing techniques in terms of search time and accuracy, making it a valuable tool for applications in bioinformatics, data mining, pattern recognition, and biological information retrieval. The NNFSRR method holds the potential to advance our understanding of complex biological processes and support more accurate disease prediction and classification.

Publisher

European Alliance for Innovation n.o.

Subject

Health Informatics,Computer Science (miscellaneous)

Reference23 articles.

1. Journal article: Koul, Nimrita, and Sunilkumar S. Manvi. "Feature Selection from Gene Expression Data Using Simulated Annealing and Partial Least Squares Regression Coefficients." Global Transitions Proceedings (2022).

2. Journal article: Hambali, Moshood A., Tinuke O. Oladele, and Kayode S. Adewole. "Microarray cancer feature selection: review, challenges and research directions." International Journal of Cognitive Computing in Engineering 1 (2020): 78-97.

3. Journal article: P. E. Kafrawy, H. Fathi, M. Qaraad, A. K. Kelany and X. Chen, "An Efficient SVM-Based Feature Selection Model for Cancer Classification Using High-Dimensional Microarray Data," in IEEE Access, vol. 9, pp. 155353-155369, 2021, doi: 10.1109/ACCESS.2021.3123090.

4. Journal article: Gumaei, Abdu, Rachid Sammouda, Mabrook Al-Rakhami, Hussain AlSalman, and Ali El-Zaart. "Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression." Health Informatics Journal 27, no. 1 (2021): 1460458221989402.

5. Journal article: Tripathy, Jogeswar, Rasmita Dash, Binod Kumar Pattanayak, Sambit Kumar Mishra, Tapas Kumar Mishra, and Deepak Puthal. "Combination of Reduction Detection Using TOPSIS for Gene Expression Data Analysis." Big Data and Cognitive Computing 6, no. 1 (2022): 24.