GrimoireLab: A toolset for software development analytics

Author:

Dueñas Santiago1,Cosentino Valerio1,Gonzalez-Barahona Jesus M.2,del Castillo San Felix Alvaro1,Izquierdo-Cortazar Daniel1,Cañas-Díaz Luis1,Pérez García-Plaza Alberto1

Affiliation:

1. Bitergia, Leganes, Madrid, Spain

2. Escuela Superior de Ingeniería de Telecomunicación, Universidad Rey Juan Carlos, Fuenlabrada, Madrid, Spain

Abstract

Background After many years of research on software repositories, the knowledge for building mature, reusable tools that perform data retrieval, storage and basic analytics is readily available. However, there is still room to improvement in the area of reusable tools implementing this knowledge. Goal To produce a reusable toolset supporting the most common tasks when retrieving, curating and visualizing data from software repositories, allowing for the easy reproduction of data sets ready for more complex analytics, and sparing the researcher or the analyst of most of the tasks that can be automated. Method Use our experience in building tools in this domain to identify a collection of scenarios where a reusable toolset would be convenient, and the main components of such a toolset. Then build those components, and refine them incrementally using the feedback from their use in both commercial, community-based, and academic environments. Results GrimoireLab, an efficient toolset composed of five main components, supporting about 30 different kinds of data sources related to software development. It has been tested in many environments, for performing different kinds of studies, and providing different kinds of services. It features a common API for accessing the retrieved data, facilities for relating items from different data sources, semi-structured storage for easing later analysis and reproduction, and basic facilities for visualization, preliminary analysis and drill-down in the data. It is also modular, making it easy to support new kinds of data sources and analysis. Conclusions We present a mature toolset, widely tested in the field, that can help to improve the situation in the area of reusable tools for mining software repositories. We show some scenarios where it has already been used. We expect it will help to reduce the effort for doing studies or providing services in this area, leading to advances in reproducibility and comparison of results.

Funder

Ministerio de Ciencia y Tecnología of Spain

Ministerio de Economia y Competitividad of Spain

Publisher

PeerJ

Subject

General Computer Science

Reference76 articles.

1. Kibble;Apache,2022

2. A mixed graph-relational dataset of socio-technical interactions in open source systems;Ashraf,2020

3. Developer-centric knowledge mining from large open-source software repositories (CROSSMINER);Bagnato,2017

4. Sourcerer: a search engine for open source code supporting structure-based search;Bajracharya,2006

5. Analytics for software development;Buse,2010

Cited by 22 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3