Prediction of Cancer in DNA Sequences Using Unsupervised Learning Methods

Author:

DOĞRU Şeyma1,ALTUNTAŞ Volkan1ORCID

Affiliation:

1. BURSA TEKNİK ÜNİVERSİTESİ

Abstract

Today, with the development of technology, the decision-making capabilities of machines have also increased. With their high analytical skills, computers can easily catch points and relationships that may escape the human eye. Thanks to these capabilities, machines are also widely used in the field of health. For example, many machine learning techniques developed on cancer prediction have been successfully applied. Early detection of cancer is crucial to survival. In the early diagnosis of cancer, the rates of drug treatment, chemotherapy or radiotherapy that the person will be exposed to are significantly reduced and the patient gets through this process with the least amount of wear and tear. Gene Expression Cancer RNA-Seq Dataset was used in this study. This data set includes gene expression values of 5 cancer types (BRCA, KIRC, LUAD, LUSC, UCEC). DNA sequences in the dataset were analyzed using k-means and hierarchical clustering algorithms, which are unsupervised machine learning methods. The aim of the study is to develop a usable machine learning model for early detection of cancer at the gene level. Adjusted Rand Index (ARI), Silhouette Score, and Accuracy metrics were used to evaluate the analysis results. The rand index calculates similarity between clusters by counting the binaries assigned to clusters. The adjusted Rand Index is a randomly adjusted version of the Rand Index. The silhouette score indicates how well a data point fits within its own set among separated datasets. The accuracy metric is obtained as a percentage of correctly clustered data points divided by all predictions. Different connection methods are used in the hierarchical clustering algorithm. These are 'complete', 'ward', 'average' and 'single'. As a result of the study, the accuracy in the k-means algorithm was 0.990, the Adjusted Rand Index was 0.79, and the Silhouette Score was 0.14. Looking at the hierarchical clustering, ward performed the best of the four linkage methods, with an ARI score of 0.76 and a silhouette score of 0.13. As a result of the study, the accuracy of in the hierarchical clustering algorithm was 0.999.

Publisher

Bursa Technical University

Subject

Materials Chemistry

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. The smart analysis of cell damage and cancerous prediction using information clustering model;2023 Second International Conference On Smart Technologies For Smart Nation (SmartTechCon);2023-08-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3