Optimizing classification efficiency with machine learning techniques for pattern matching-Reference-Cited by-同舟云学术

Optimizing classification efficiency with machine learning techniques for pattern matching

Published:2023-07-25 Issue:1 Volume:10 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Hamed Belal A.,Ibrahim Osman Ali Sadek,Abd El-Hafeez Tarek^ORCID

Abstract

AbstractThe study proposes a novel model for DNA sequence classification that combines machine learning methods and a pattern-matching algorithm. This model aims to effectively categorize DNA sequences based on their features and enhance the accuracy and efficiency of DNA sequence classification. The performance of the proposed model is evaluated using various machine learning algorithms, and the results indicate that the SVM linear classifier achieves the highest accuracy and F1 score among the tested algorithms. This finding suggests that the proposed model can provide better overall performance than other algorithms in DNA sequence classification. In addition, the proposed model is compared to two suggested algorithms, namely FLPM and PAPM, and the results show that the proposed model outperforms these algorithms in terms of accuracy and efficiency. The study further explores the impact of pattern length on the accuracy and time complexity of each algorithm. The results show that as the pattern length increases, the execution time of each algorithm varies. For a pattern length of 5, SVM Linear and EFLPM have the lowest execution time of 0.0035 s. However, at a pattern length of 25, SVM Linear has the lowest execution time of 0.0012 s. The experimental results of the proposed model show that SVM Linear has the highest accuracy and F1 score among the tested algorithms. SVM Linear achieved an accuracy of 0.963 and an F1 score of 0.97, indicating that it can provide the best overall performance in DNA sequence classification. Naive Bayes also performs well with an accuracy of 0.838 and an F1 score of 0.94. The proposed model offers a valuable contribution to the field of DNA sequence analysis by providing a novel approach to pre-processing and feature extraction. The model’s potential applications include drug discovery, personalized medicine, and disease diagnosis. The study’s findings highlight the importance of considering the impact of pattern length on the accuracy and time complexity of DNA sequence classification algorithms.

Funder

Minia University

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s40537-023-00804-6.pdf

Reference47 articles.

1. Marczyk VR, Recamonde-Mendoza M, Maia AL, Goemann IMJT. Classification of Thyroid Tumors Based on DNA Methylation Patterns 2023(ja).

2. Liu PJFiG. Pan-cancer DNA methylation analysis and tumor origin identification of carcinoma of unknown primary site based on multi-omics. 2022;12:798748.

3. Zhao F, Li L, Lin P, Chen Y, Xing S, Du H, Wang Z, Yang J, Huan T, Long C, Zhang L, Wang B, Fang M. HExpPredict: In Vivo Exposure Prediction of Human Blood Exposome Using a Random Forest Model and Its Application in Chemical Risk Prioritization. 2023;131(3):037009.

4. Suyama Y, Hirota SK, Matsuo A, Tsunamoto Y, Mitsuyuki C, Shimura A, Okano K. Complementary combination of multiplex high-throughput DNA sequencing for molecular phylogeny. Wiley Online Library; 2022.

5. Zhong H-S, Dong M-J, F.J.I.S.C.L S, Gao. G4Bank: A database of experimentally identified DNA G-quadruplex sequences 2023: p. 1–9.

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. TMSC-m7G: A transformer architecture based on multi-sense-scaled embedding features and convolutional neural network to identify RNA N7-methylguanosine sites;Computational and Structural Biotechnology Journal;2024-12

2. Predicting compressed earth blocks compressive strength by means of machine learning models;Construction and Building Materials;2024-10

3. Optimal reconfiguration of distribution systems considering reliability: Introducing long-term memory component AEO algorithm;Expert Systems with Applications;2024-09

4. MobileNet-V2 /IFHO model for Accurate Detection of early-stage diabetic retinopathy;Heliyon;2024-09

5. A comprehensive learning based swarm optimization approach for feature selection in gene expression data;Heliyon;2024-09