Towards an Automated Classification of Software Libraries-Reference-Cited by-同舟云学术

Towards an Automated Classification of Software Libraries

Published:2024-03-27 Issue:4 Volume:5 Page:
ISSN:2661-8907
Container-title:SN Computer Science
language:en
Short-container-title:SN COMPUT. SCI.

Author:

Auch Maximilian^ORCID,Balluff Maximilian^ORCID,Mandl Peter^ORCID,Wolff Christian^ORCID

Abstract

AbstractNowadays, the use of third-party libraries in software is common. At the same time, the number of published libraries continues to increase. An automated classification should help to maintain an overview and identify similar software libraries. This paper investigates if new approaches can be used to classify all software libraries crawled from Apache Maven repositories into defined classes using machine learning. In addition to tags that are not always available or of poor quality, we examine one feature that is always available—the id. Consisting of group-id and artifact-id, the id of an Apache Maven software library contains valuable information that can help in classification. Through a developed preprocessing and an optimized recurrent neural network (RNN), the tokenised ids should allow a classification of most libraries. Furthermore, we present an optimized approach through a hybrid use of id tokens and tags in combination. Based on the dataset including 28,600 labeled entries, a comparison of various approaches was carried out. The RNN achieved a balanced accuracy of 71.36% by training on tokenised ids. A model trained on tags achieved a balanced accuracy of 92%. However, the new hybrid approach, which combines tags and ids, optimizes the result to 94.12%. While a classification on tags achieves a better result than the more general id-based approach, the applicability is limited to software libraries that are tagged. The hybrid approach, on the other hand, takes advantage of the classification results based on tags when these are available, but includes valuable information from the always available ids.

Funder

Hochschule für angewandte Wissenschaften München

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s42979-024-02654-2.pdf

Reference22 articles.

1. Salza P, Palomba F, Di Nucci D, de Lucia A, Ferrucci F. Third-party libraries in mobile apps. Empir Softw Eng. 2020;25(3):2341–77. https://doi.org/10.1007/s10664-019-09754-1.

2. Thung F, Lo D, Lawall J. Automated library recommendation. In: 2013 20th Working Conference on reverse engineering (WCRE), 2013; pp. 182–191. https://doi.org/10.1109/WCRE.2013.6671293.

3. Auch M, Weber M, Mandl P, Wolff C. Similarity-based analyses on software applications: a systematic literature review. J Syst Soft. 2020;168:110669

4. Yu H, Xia X, Zhao X, Qiu W. Combining collaborative filtering and topic modeling for more accurate android mobile app library recommendation. In: Mei H, editor. Proceedings of the 9th Asia-Pacific Symposium on Internetware. New York, NY: ACM Digital Library, ACM; 2017. p. 1–6. https://doi.org/10.1145/3131704.3131721.

5. Escobar-Avila J. Automatic categorization of software libraries using bytecode. In: 2015 IEEE/ACM 37th IEEE International Conference on software engineering, 2015;2:784–6.https://doi.org/10.1109/ICSE.2015.249.