Affiliation:
1. Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
2. Department of
Computer Science, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
3. School of Information and Control Engineering, China
University of Mining and Technology, Xuzhou, 221116, China
4. Department of Biological Science,
Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
Abstract
Background:
Chemically modified therapeutic mRNAs have gained momentum recently.
In addition to commonly used modifications (e.g., pseudouridine), 5moU is considered a promising
substitution for uridine in therapeutic mRNAs. Accurate identification of 5-methoxyuridine
(5moU) would be crucial for the study and quality control of relevant in vitro-transcribed (IVT)
mRNAs. However, current methods exhibit deficiencies in providing quantitative methodologies
for detecting such modification. Utilizing the capabilities of Oxford nanopore direct RNA sequencing,
in this study, we present NanoML-5moU, a machine-learning framework designed specifically
for the read-level detection and quantification of 5moU modification for IVT data.
Materials and Methods:
Nanopore direct RNA sequencing data from both 5moU-modified and
unmodified control samples were collected. Subsequently, a comprehensive analysis and modeling
of signal event characteristics (mean, median current intensities, standard deviations, and dwell
times) were performed. Furthermore, classical machine learning algorithms, notably the Support
Vector Machine (SVM), Random Forest (RF), and XGBoost were employed to discern 5moU modifications
within NNUNN (where N represents A, C, U, or G) 5-mers.
Result:
Notably, the signal event attributes pertaining to each constituent base of the NNUNN 5-
mers, in conjunction with the utilization of the XGBoost algorithm, exhibited remarkable performance
levels (with a maximum AUROC of 0.9567 in the "AGTTC" reference 5-mer dataset and a minimum
AUROC of 0.8113 in the "TGTGC" reference 5-mer dataset). This accomplishment markedly exceeded
the efficacy of the prevailing background error comparison model (ELIGOs AUC 0.751 for sitelevel
prediction). The model's performance was further validated through a series of curated datasets,
which featured customized modification ratios designed to emulate broader data patterns, demonstrating
its general applicability in quality control of IVT mRNA vaccines. The NanoML-5moU framework
is publicly available on GitHub (https://github.com/JiayiLi21/NanoML-5moU).
Conclusion:
NanoML-5moU enables accurate read-level profiling of 5moU modification with nanopore
direct RNA-sequencing, which is a powerful tool specialized in unveiling signal patterns in
in vitro-transcribed (IVT) mRNAs.
Funder
National Natural Science Foundation of China
XJTLU Key Program Special Fund
AI University Research Centre through the XJTLU Key Programme Special Fund
Publisher
Bentham Science Publishers Ltd.