Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants-Reference-Cited by-同舟云学术

Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants

Published:2021-12-17 Issue: Volume:12 Page:
ISSN:1664-8021
Container-title:Frontiers in Genetics
language:
Short-container-title:Front. Genet.

Author:

Chien Ching-Hsuan,Huang Lan-Ying,Lo Shuen-Fang,Chen Liang-Jwu,Liao Chi-Chou,Chen Jia-Jyun,Chu Yen-Wei

Abstract

To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer.

Funder

Ministry of Science and Technology, Taiwan

Publisher

Frontiers Media SA

Subject

Genetics (clinical),Genetics,Molecular Medicine

Reference49 articles.

1. Applying Support Vector Machines to Imbalanced Datasets;Akbani;Machine Learn. Ecml 2004, Proc.,2004

2. High Levels of De Novo Methylation and Altered Chromatin Structure at CpG Islands in Cell Lines;Antequera;Cell,1990

3. Predicting Gene Expression from Sequence;Beer;Cell,2004

4. Logical Analysis of Data: Classification with Justification;Boros;Ann. Oper. Res.,2011

5. LIBSVM: A Library for Support Vector Machines;Chang;Acm Trans. Intell. Syst. Techn.,2011

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Promoter Prediction in DNA Classification Using Machine Learning Algorithms;2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL);2024-03-13

2. Integrated transcriptomic meta-analysis and comparative artificial intelligence models in maize under biotic stress;Scientific Reports;2023-09-23

3. Designing artificial synthetic promoters for accurate, smart, and versatile gene expression in plants;Plant Communications;2023-07