Affiliation:
1. Delft University of Technology, Delft, Netherlands
2. Delft University of Technology, Amsterdam, Netherlands
Abstract
Modern software engineering establishes software supply chains and relies on tools and libraries to improve productivity. However, reusing external software in a project presents a security risk when the source of the component is unknown or the consistency of a component cannot be verified. The SolarWinds attack serves as a popular example in which the injection of malicious code into a library affected thousands of customers and caused a loss of billions of dollars. Reproducible builds present a mitigation strategy, as they can confirm the origin and consistency of reused components. A large reproducibility community has formed for Debian, but the reproducibility of the Maven ecosystem, the backbone of the Java supply chain, remains understudied in comparison. Reproducible Central is an initiative that curates a list of reproducible Maven libraries, but the list is limited and challenging to maintain due to manual efforts. Our research aims to support these efforts in the Maven ecosystem through automation. We investigate the feasibility of automatically finding the source code of a library from its Maven release and recovering information about the original release environment. Our tool, AROMA, can obtain this critical information from the artifact and the source repository through several heuristics and we use the results for reproduction attempts of Maven packages. Overall, our approach achieves an accuracy of up to 99.5% when compared field-by-field to the existing manual approach. In some instances, we even detected flaws in the manually maintained list, such as broken repository links. We reveal that automatic reproducibility is feasible for 23.4% of the Maven packages using AROMA, and 8% of these packages are fully reproducible. We demonstrate our ability to successfully reproduce new packages and have contributed some of them to the Reproducible Central repository. Additionally, we highlight actionable insights, outline future work in this area, and make our dataset and tools available to the public.
Funder
This study is funded by a European H2020 project, FASTEN
Publisher
Association for Computing Machinery (ACM)
Reference65 articles.
1. Why do developers use trivial packages? an empirical case study on npm
2. Apache. 2023. apache repository. https://infra.apache.org/blog/relocation-of-apache-git-repositories Accessed: 2023-08-22
3. Apache. 2023. Replacing Build-Jdk with Build-Jdk-Spec Github. https://github.com/apache/maven-archiver/pull/2/files Accessed: 2023-09-25
4. Apache. 2023. Replacing Build-Jdk with Build-Jdk-Spec Jira. https://issues.apache.org/jira/browse/MSHARED-797 Accessed: 2023-09-26
5. How the Apache community upgrades dependencies: an evolutionary study