PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences-Reference-Cited by-同舟云学术

PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences

Published:2019-03-08 Issue:19 Volume:35 Page:3831-3833
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Muhammod Rafsanjani¹,Ahmed Sajid¹,Md Farid Dewan¹,Shatabda Swakkhar¹,Sharma Alok²³⁴^ORCID,Dehzangi Abdollah⁵

Affiliation:

1. Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh

2. School of Engineering and Physics, University of the South Pacific, Private Mail Bag, Laucala Campus, Suva, Fiji

3. RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan

4. Institite for Integrated and Intelligent Systems, Griffith University, Brisbane, Queensland, Australia

5. Department of Computer Science, Morgan State University, Baltimore, MD, USA

Abstract

AbstractMotivationExtracting useful feature set which contains significant discriminatory information is a critical step in effectively presenting sequence data to predict structural, functional, interaction and expression of proteins, DNAs and RNAs. Also, being able to filter features with significant information and avoid sparsity in the extracted features require the employment of efficient feature selection techniques. Here we present PyFeat as a practical and easy to use toolkit implemented in Python for extracting various features from proteins, DNAs and RNAs. To build PyFeat we mainly focused on extracting features that capture information about the interaction of neighboring residues to be able to provide more local information. We then employ AdaBoost technique to select features with maximum discriminatory information. In this way, we can significantly reduce the number of extracted features and enable PyFeat to represent the combination of effective features from large neighboring residues. As a result, PyFeat is able to extract features from 13 different techniques and represent context free combination of effective features. The source code for PyFeat standalone toolkit and employed benchmarks with a comprehensive user manual explaining its system and workflow in a step by step manner are publicly available.Resultshttps://github.com/mrzResearchArena/PyFeat/blob/master/RESULTS.md.Availability and implementationToolkit, source code and manual to use PyFeat: https://github.com/mrzResearchArena/PyFeat/Supplementary informationSupplementary data are available at Bioinformatics online.

Funder

National Institute of General Medical Sciences

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btz165/28531732/btz165.pdf

Reference12 articles.

1. ProPy: a tool to generate various modes of chou’s pseaac;Cao;Bioinformatics,2013

2. Idnaprot-es: identification of DNA-binding proteins using evolutionary and structural features;Chowdhury;Sci. Rep,2017

3. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences;Chen;Bioinformatics,2018

4. Predicting adenosine to inosine editing sites by using pseudo nucleotide compositions;Chen;Sci. Rep,2016

5. Enhanced regulatory sequence prediction using gapped k-mer features;Ghandi;PLoS Comput. Biol,2014

Cited by 88 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A predictive approach for host-pathogen interactions using deep learning and protein sequences;VirusDisease;2024-07-16

2. mRCat: A Novel CatBoost Predictor for the Binary Classification of mRNA Subcellular Localization by Fusing Large Language Model Representation and Sequence Features;Biomolecules;2024-06-27

3. Inference of gene regulatory networks based on directed graph convolutional networks;Briefings in Bioinformatics;2024-05-23

4. A multimodal dynamical variational autoencoder for audiovisual speech representation learning;Neural Networks;2024-04

5. PreSubLncR: Predicting Subcellular Localization of Long Non-Coding RNA Based on Multi-Scale Attention Convolutional Network and Bidirectional Long Short-Term Memory Network;Processes;2024-03-26