Abstract
Abstract
Background
Developing a systematic phenotypic data analysis pipeline, creating enhanced visualizations, and interpreting the results is crucial to extract meaningful insights from data in making better breeding decisions. Here, we provide an overview of how the Rainfed Rice Breeding (RRB) program at IRRI has leveraged R computational power with open-source resource tools like R Markdown, plotly, LaTeX, and HTML to develop an open-source and end-to-end data analysis workflow and pipeline, and re-designed it to a reproducible document for better interpretations, visualizations and easy sharing with collaborators.
Results
We reported the state-of-the-art implementation of the phenotypic data analysis pipeline and workflow embedded into a well-descriptive document. The developed analytical pipeline is open-source, demonstrating how to analyze the phenotypic data in crop breeding programs with step-by-step instructions. The analysis pipeline shows how to pre-process and check the quality of phenotypic data, perform robust data analysis using modern statistical tools and approaches, and convert it into a reproducible document. Explanatory text with R codes, outputs either in text, tables, or graphics, and interpretation of results are integrated into the unified document. The analysis is highly reproducible and can be regenerated at any time. The analytical pipeline source codes and demo data are available at https://github.com/whussain2/Analysis-pipeline.
Conclusion
The analysis workflow and document presented are not limited to IRRI’s RRB program but are applicable to any organization or institute with full-fledged breeding programs. We believe this is a great initiative to modernize the data analysis of IRRI’s RRB program. Further, this pipeline can be easily implemented by plant breeders or researchers, helping and guiding them in analyzing the breeding trials data in the best possible way.
Publisher
Springer Science and Business Media LLC
Subject
Plant Science,Genetics,Biotechnology
Reference45 articles.
1. Dar MH, Waza SA, Shukla S, Zaidi NW, Nayak S, Hossain M, et al. Drought tolerant rice for ensuring food security in Eastern India. Sustainability. 2020;12:2214.
2. Beaulieu-Jones BK, Greene CS. Reproducibility of computational workflows is automated using continuous analysis. Nat Biotechnol. 2017;35:342–6.
3. R Core Team 2018. R: A language and environment for statistical computing. e. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
4. Wickham H. ggplot2: Elegant Graphics for Data Analysis [Internet]. 2nd ed. Springer International Publishing; 2016. https://www.springer.com/gp/book/9783319242750. Accessed 20 Jul 2020.
5. Baumer B, Udwin D. R Markdown. WIREs Computational Statistics. 2015;7:167–77.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献