Analyzing FOSS license usage in publicly available software at scale via the SWH-analytics framework

Author:

Antelmi Alessia,Torquati Massimo,Corridori Giacomo,Gregori Daniele,Polzella Francesco,Spinatelli Gianmarco,Aldinucci Marco

Abstract

AbstractThe Software Heritage (SWH) dataset represents an invaluable source of open-source code as it aims to collect, preserve, and share all publicly available software in source code form ever produced by humankind. Although designed to archive deduplicated small files thanks to the use of a Merkle tree as the underlying data structure, querying the SWH dataset presents challenges due to the nature of these structures, which organize content based on hash values rather than any locality principle. The magnitude of the repository, coupled with the resource-intensive nature of the download process, highlights the need for specialized infrastructure and computational resources to effectively handle and study the extensive dataset housed within SWH. Currently, there is a lack of infrastructures specifically tailored for running analytics on the SWH dataset, leaving users to handle these issues manually. To address these challenges, we implemented the SWH-Analytics (SWHA) framework, a development environment that transparently runs custom analytic applications on publicly available software data preserved over time by SWH. Specifically, this work shows how SWHA can be effectively exploited to study usage patterns of free and open-source software licenses, highlighting the need to improve license literacy among developers.

Funder

Centro Nazionale di Ricerca in High-Performance Computing, Big Data and Quantum Computing

European Union - EuroHPC JU ADMIRE project

Università degli Studi di Torino

Publisher

Springer Science and Business Media LLC

Reference51 articles.

1. Wu M-W, Lin Y-D (2001) Open source software development: an overview. Computer 34(6):33–38. https://doi.org/10.1109/2.928619

2. Bordeleau F, Meirelles P, Sillitti A (2019) Fifteen years of open source software evolution. In: Bordeleau F, Sillitti A, Meirelles P, Lenarduzzi V (eds) Open source systems. Springer, Cham, pp 61–67. https://doi.org/10.1007/978-3-030-20883-7_6

3. Kritikos A, Stamelos I (2023) A resilience-based framework for assessing the evolution of open source software projects. J Softw Evolut Process. https://doi.org/10.1002/smr.2597

4. GitHub: Octoverse 2022: 10 years of tracking open source. https://github.blog/2022-11-17-octoverse-2022-10-years-of-tracking-open-source/. Accessed on 28 09 2023 (2022)

5. European Commission: the economic and social impact of software and services on competitiveness and innovation. https://digital-strategy.ec.europa.eu/en/library/economic-and-social-impact-software-and-services-competitiveness-and-innovation. Accessed on 28 09 2023 (2017)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3