Abstract
AbstractSoftware documentation is often neglected, impacting maintenance and reuse and leading to technical issues. In particular, when working with scientific software, such issues in the documentation pose a risk to producing reliable scientific results as they may cause improper or incorrect use of the software. R is a popular programming language for scientific software with a prolific package-based ecosystem, where users contribute packages (i.e., libraries). R packages are intended to be reused, and their users rely extensively on the available documentation. Thus, understanding what information developers provide in their packages’ documentation (generally, through a system known as Roxygen, based on Javadoc) is essential to contribute to it. This study mined 379 GitHub repositories of R packages and analysed a sample to develop a taxonomy of natural language descriptions used in Roxygen documentation. This was done through hybrid card sorting, which included two experienced R developers. The resulting taxonomy covers parameters, returns, and descriptions, providing a baseline for further studies. Our taxonomy is the first of its kind for R. Based on previous studies in pure object-oriented languages, our taxonomy could be extensible to other dynamically-typed languages used in scientific programming.
Funder
National Research Council Canada
Publisher
Springer Science and Business Media LLC
Reference79 articles.
1. Aghajani E, Nagy C, Vega-Márquez OL, Linares-Vásquez M, Moreno L, Bavota G, Lanza M (2019) Software Documentation Issues Unveiled. In: IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, Montreal, Canada, pp 1199–1210, 10.1109/ICSE.2019.00122
2. Aghajani E, Nagy C, Linares-Vásquez M, Moreno L, Bavota G, Lanza M, Shepherd DC (2020) Software documentation: The practitioners’ perspective. 42nd International Conference on Software Engineering (ICSE). IEEE/ACM, South Korea, pp 590–601
3. Ahalt S, Band L, Christopherson L, Idaszak R, Lenhardt C, Minsker B, Palmer M, Shelley M, Tiemann M, Zimmerman A (2014) Water Science Software Institute: Agile and Open Source Scientific Software Development. Computing in Science Engineering 16(3):18–26. https://doi.org/10.1109/MCSE.2014.5
4. Amezquita RA, Lun ATL, Becht E, Carey VJ, Carpp LN, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pagès H, Smith ML, Huber W, Morgan M, Gottardo R, Hicks SC (2020) Orchestrating single-cell analysis with bioconductor. Nature Methods 17(2):137–145. https://doi.org/10.1038/s41592-019-0654-x
5. Ampatzoglou A, Bibi S, Avgeriou P, Verbeek M, Chatzigeorgiou A (2019) Identifying, categorizing and mitigating threats to validity in software engineering secondary studies. Information and Software Technology 106:201–230. https://doi.org/10.1016/j.infsof.2018.10.006, https://www.sciencedirect.com/science/article/pii/S0950584918302106