DATA MINING TOOLS FOR BIOLOGICAL SEQUENCES-Reference-Cited by-同舟云学术

DATA MINING TOOLS FOR BIOLOGICAL SEQUENCES

Published:2003-04 Issue:01 Volume:01 Page:139-167
ISSN:0219-7200
Container-title:Journal of Bioinformatics and Computational Biology
language:en
Short-container-title:J. Bioinform. Comput. Biol.

Author:

LIU HUIQING¹,WONG LIMSOON¹

Affiliation:

1. Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore

Abstract

We describe a methodology, as well as some related data mining tools, for analyzing sequence data. The methodology comprises three steps: (a) generating candidate features from the sequences, (b) selecting relevant features from the candidates, and (c) integrating the selected features to build a system to recognize specific properties in sequence data. We also give relevant techniques for each of these three steps. For generating candidate features, we present various types of features based on the idea of k-grams. For selecting relevant features, we discuss signal-to-noise, t-statistics, and entropy measures, as well as a correlation-based feature selection method. For integrating selected features, we use machine learning methods, including C4.5, SVM, and Naive Bayes. We illustrate this methodology on the problem of recognizing translation initiation sites. We discuss how to generate and select features that are useful for understanding the distinction between ATG sites that are translation initiation sites and those that are not. We also discuss how to use such features to build reliable systems for recognizing translation initiation sites in DNA sequences.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science Applications,Molecular Biology,Biochemistry

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219720003000216

Reference91 articles.

1. Use of conditional probabilities for determining relationships between amino acid sequence and protein secondary structure

2. Dragon Promoter Finder: recognition of vertebrate RNA polymerase II promoters

Cited by 62 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on Spacecraft Fault Diagnosis and Recovery Architecture;Journal of Physics: Conference Series;2024-05-01

2. Genetic landscape of homologous recombination repair genes in early‐onset/familial prostate cancer patients;Genes, Chromosomes and Cancer;2023-07-12

3. Using a Dual CRISPR/Cas9 Approach to Gain Insight into the Role of LRP1B in Glioblastoma;International Journal of Molecular Sciences;2023-07-10

4. Predictive analysis for pathogenicity classification of H5Nx avian influenza strains using machine learning techniques;Preventive Veterinary Medicine;2023-07

5. Genetic landscape of homologous recombination repair genes in early-onset/familial prostate cancer patients;2023-01-12