Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique-Reference-Cited by-同舟云学术

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique

Published:2020-05-26 Issue:1 Volume:21 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Pasupa Kitsuchart,Rathasamuth Wanthanee,Tongsima Sissades

Abstract

Abstract Background The number of porcine Single Nucleotide Polymorphisms (SNPs) used in genetic association studies is very large, suitable for statistical testing. However, in breed classification problem, one needs to have a much smaller porcine-classifying SNPs (PCSNPs) set that could accurately classify pigs into different breeds. This study attempted to find such PCSNPs by using several combinations of feature selection and classification methods. We experimented with different combinations of feature selection methods including information gain, conventional as well as modified genetic algorithms, and our developed frequency feature selection method in combination with a common classification method, Support Vector Machine, to evaluate the method’s performance. Experiments were conducted on a comprehensive data set containing SNPs from native pigs from America, Europe, Africa, and Asia including Chinese breeds, Vietnamese breeds, and hybrid breeds from Thailand. Results The best combination of feature selection methods—information gain, modified genetic algorithm, and frequency feature selection hybrid—was able to reduce the number of possible PCSNPs to only 1.62% (164 PCSNPs) of the total number of SNPs (10,210 SNPs) while maintaining a high classification accuracy (95.12%). Moreover, the near-identical performance of this PCSNPs set to those of bigger data sets as well as even the entire data set. Moreover, most PCSNPs were well-matched to a set of 94 genes in the PANTHER pathway, conforming to a suggestion by the Porcine Genomic Sequencing Initiative. Conclusions The best hybrid method truly provided a sufficiently small number of porcine SNPs that accurately classified swine breeds.

Funder

Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/s12859-020-3471-4.pdf

Reference38 articles.

1. Larrañaga P, Calvo B, Santana R, Bielza C, Galdiano J, Inza I, et al.Machine learning in bioinformatics. Brief Bioinformatics. 2006; 7(1):86–112. https://doi.org/doi:10.1093/bib/bbk007.

2. Tang J, Alelyani S, Liu H. Feature selection for classification: A review. In: Data Classification: Algorithms and Applications. CRC Press: 2014. p. 37–64. https://doi.org/doi:10.1201/b17320.

3. Kwak N, Choi CH. Input feature selection for classification problems. IEEE Trans Neural Netw. 2002; 13(1):143–59. https://doi.org/10.1109/72.977291.

4. Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics. 2007; 23(19):2507–17. https://doi.org/10.1093/bioinformatics/btm344.

5. Lazar C, Taminau J, Meganck S, Steenhoff D, Coletta A, Molter C, et al.A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis. IEEE/ACM Trans Comput Biol Bioinformatics. 2012; 9(4):1106–19. https://doi.org/10.1109/TCBB.2012.33.

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Identification of population‐informative markers from high‐density genotyping data through combined feature selection and machine learning algorithms: Application to European autochthonous and cosmopolitan pig breeds;Animal Genetics;2024-01-08

2. Breed identification using breed-informative SNPs and machine learning based on whole genome sequence data and SNP chip data;Journal of Animal Science and Biotechnology;2023-06-01

3. The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method;Journal of Animal Science;2023-01-01

4. Evaluation of six machine learning classification algorithms in pig breed identification using SNPs array data;Animal Genetics;2022-12-02

5. Elucidating breed-specific variants of native pigs in Korea: insights into pig breeds’ genomic characteristics;Animal Cells and Systems;2022-11-02