Clustering Classes in Packages for Program Comprehension

Author:

Sun Xiaobing12ORCID,Liu Xiangyue1,Li Bin1ORCID,Li Bixin3,Lo David4,Liao Lingzhi5

Affiliation:

1. School of Information Engineering, Yangzhou University, Yangzhou, China

2. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

3. School of Computer Science and Engineering, Southeast University, Nanjing, China

4. School of Information Systems, Singapore Management University, Singapore

5. Nanjing University of Information Science & Technology, Nanjing, China

Abstract

During software maintenance and evolution, one of the important tasks faced by developers is to understand a system quickly and accurately. With the increasing size and complexity of an evolving system, program comprehension becomes an increasingly difficult activity. Given a target system for comprehension, developers may first focus on the package comprehension. The packages in the system are of different sizes. For small-sized packages in the system, developers can easily comprehend them. However, for large-sized packages, they are difficult to understand. In this article, we focus on understanding these large-sized packages and propose a novel program comprehension approach for large-sized packages, which utilizes the Latent Dirichlet Allocation (LDA) model to cluster large-sized packages. Thus, these large-sized packages are separated as small-sized clusters, which are easier for developers to comprehend. Empirical studies on four real-world software projects demonstrate the effectiveness of our approach. The results show that the effectiveness of our approach is better than Latent Semantic Indexing- (LSI-) and Probabilistic Latent Semantic Analysis- (PLSA-) based clustering approaches. In addition, we find that the topic that labels each cluster is useful for program comprehension.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Computer Science Applications,Software

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Multi-granular software annotation using file-level weak labelling;Empirical Software Engineering;2023-11-30

2. Research on Unmanned Intelligent Combat Support Technology;Proceedings of 2022 International Conference on Autonomous Unmanned Systems (ICAUS 2022);2023

3. Software Architecture Recovery Using Integrated Dependencies Based on Structural, Semantic, and Directory Information;International Journal of Information System Modeling and Design;2022-02-03

4. Ensemble clustering based approach for software architecture recovery;International Journal of Information Technology;2022-01-25

5. Identification of microservices from monolithic applications through topic modelling;Proceedings of the 36th Annual ACM Symposium on Applied Computing;2021-03-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3