Affiliation:
1. Bowling Green State University, Bowling Green, OH
2. Iowa State University, Ames, IA
Abstract
In today's software-centric world, ultra-large-scale software repositories, such as SourceForge, GitHub, and Google Code, are the new library of Alexandria. They contain an enormous corpus of software and related information. Scientists and engineers alike are interested in analyzing this wealth of information. However, systematic extraction and analysis of relevant data from these repositories for testing hypotheses is hard, and best left for
mining software repository
(MSR) experts! Specifically, mining source code yields significant insights into software development artifacts and processes. Unfortunately, mining source code at a large scale remains a difficult task. Previous approaches had to either limit the scope of the projects studied, limit the scope of the mining task to be more coarse grained, or sacrifice studying the history of the code. In this article we address mining source code: (a) at a very large scale; (b) at a fine-grained level of detail; and (c) with full history information. To address these challenges, we present domain-specific language features for source-code mining in our language and infrastructure called
Boa
. The goal of
Boa
is to ease testing MSR-related hypotheses. Our evaluation demonstrates that
Boa
substantially reduces programming efforts, thus lowering the barrier to entry. We also show drastic improvements in scalability.
Funder
US National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Reference54 articles.
1. Apache Software Foundation. 2015b. HBase: Open source implementation of Bigtable. http://hbase. apache.org/ Apache Software Foundation. 2015b. HBase: Open source implementation of Bigtable. http://hbase. apache.org/
2. Facilitating software evolution research with kenyon
3. Black Duck Software. 2015. Black duck open HUB. https://www.openhub.net/. Black Duck Software. 2015. Black duck open HUB. https://www.openhub.net/.
4. Space/time trade-offs in hash coding with allowable errors
Cited by
69 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献