Affiliation:
1. Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India
2. Computer Science & Applications Department, GuruShree ShanthiVijai Jain College, Chennai, Tamil Nadu, India
Abstract
A wide reach on cancer prediction and detection using Next Generation Sequencing (NGS) by the application of artificial intelligence is highly appreciated in the current scenario of the medical field. Next generation sequences were extracted from NCBI (National Centre for Biotechnology Information) gene repository. Sequences of normal Homo sapiens (Class 1), BRCA1 (Class 2) and BRCA2 (Class 3) were extracted for Machine Learning (ML) purpose. The total volume of datasets extracted for the process were 1580 in number under four categories of 50, 100, 150 and 200 sequences. The breast cancer prediction process was carried out in three major steps such as feature extraction, machine learning classification and performance evaluation. The features were extracted with sequences as input. Ten features of DNA sequences such as ORF (Open Reading Frame) count, individual nucleobase average count of A, T, C, G, AT and GC-content, AT/GC composition, G-quadruplex occurrence, MR (Mutation Rate) were extracted from three types of sequences for the classification process. The sequence type was also included as a target variable to the feature set with values 0, 1 and 2 for classes 1, 2 and 3 respectively. Nine various supervised machine learning techniques like LR (Logistic Regression statistical model), LDA (Linear Discriminant analysis model), k-NN (k nearest neighbours’ algorithm), DT (Decision tree technique), NB (Naive Bayes classifier), SVM (Support-Vector Machine algorithm), RF (Random Forest learning algorithm), AdaBoost (AB) and Gradient Boosting (GB) were employed on four various categories of datasets. Of all supervised models, decision tree machine learning technique performed most with maximum accuracy in classification of 94.03%. Classification model performance was evaluated using precision, recall, F1-score and support values wherein F1-score was most similar to the classification accuracy.
Subject
Computer Science Applications,General Engineering,Modelling and Simulation
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. An Early Diagnosis of Breast Cancer through Integrated Model of Random Forest and Catboost;2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT);2024-02-09
2. Using an innovative method for breast cancer diagnosis based on Extreme Gradient Boost optimized by Simplified Memory Bounded A*;Biomedical Signal Processing and Control;2024-01
3. Machine Learning Approaches for Investigating Breast Cancer;Biosciences Biotechnology Research Asia;2023-12-31
4. Breast Cancer Prediction Using Histogram Gradient Boosting Classifier;2023 3rd International Conference on Advancement in Electronics & Communication Engineering (AECE);2023-11-23
5. Breast Cancer Detection Technique using Machine Learning Classifiers;2023 Second International Conference On Smart Technologies For Smart Nation (SmartTechCon);2023-08-18