Affiliation:
1. School of Science, Jiangnan University, Wuxi 214122, Jiangsu, China
2. Wuxi Vocational Institute of Commerce, Wuxi 214122, Jiangsu, China
Abstract
Background. Cervical cancer is the most common gynecological malignancy, and its incidence has tended to be younger in recent years. Through the analysis of high-throughput expression data, the identification of key genes in cancer and healthy individuals as predictors of cervical cancer is of great significance for the early detection and early treatment of cervical cancer. Method. Granular computing is a concept and computing paradigm to deal with problems through information granulation, and the process of granulation can be realized by means of clustering. Based on this, this paper proposes an AB method to obtain representative elements in a multiattribute data system. First, the evaluation index FHEI of the clustering structure is introduced, and Algorithm 1 is designed to obtain the optimal clustering structure of each attribute of the data system and use it as the base cluster. Secondly, based on the clustering ensemble technology of granular computing, Algorithm 2 is designed with the help of the concept of information entropy. The algorithm takes the base cluster as the input to obtain the optimal ensemble clustering structure. Finally, using the nearest center principle, the representative elements of each class in the optimal ensemble clustering structure are obtained. Results. In this paper, the differentially expressed genes (DEGs) are screened out by using the gene expression data of cervical cancer, and the scores of the four interaction relationships among the DEGs are used as a multiattribute data system and input into the AB method. The five representative elements obtained are RTTN, SAMD10, ZNF207, WAC, and METTL14, which are the predictors of cervical cancer. The classification accuracy of these predictors is as high as 98.82%. This paper also conducts a comparative study between the AB method and other classical methods on six independent gene expression datasets. The results show that the number of predictors obtained by the AB method is small but has a high classification accuracy in the classification of patient samples.
Funder
National Natural Science Foundation of China
Subject
General Engineering,General Mathematics