A large-scale study of programming languages and code quality in GitHub-Reference-Cited by-同舟云学术

A large-scale study of programming languages and code quality in GitHub

Published:2017-09-25 Issue:10 Volume:60 Page:91-100
ISSN:0001-0782
Container-title:Communications of the ACM
language:en
Short-container-title:Commun. ACM

Author:

Ray Baishakhi¹,Posnett Daryl²,Devanbu Premkumar²,Filkov Vladimir²

Affiliation:

1. University of Virginia

2. University of California

Abstract

What is the effect of programming languages on software quality? This question has been a topic of much debate for a very long time. In this study, we gather a very large data set from GitHub (728 projects, 63 million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in an attempt to shed some empirical light on this question. This reasonably large sample size allows us to use a mixed-methods approach, combining multiple regression modeling with visualization and text analytics, to study the effect of language features such as static versus dynamic typing and allowing versus disallowing type confusion on software quality. By triangulating findings from different methods, and controlling for confounding effects such as team size, project size, and project history, we report that language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages, static typing is also somewhat better than dynamic typing. We also find that functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size. However, we caution the reader that even these modest effects might quite possibly be due to other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion.

Funder

NSF

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3126905

Reference24 articles.

1. Assessing programming language impact on development and maintenance

2. Don't touch my code!

3. Probabilistic topic models

4. Selecting Empirical Methods for Software Engineering Research

Cited by 43 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. More equal than others? Parity in developer interaction and its relation to bug resolution time;Innovations in Systems and Software Engineering;2024-09-04

2. Indentation and reading time: a randomized control trial on the differences between generated indented and non-indented if-statements;Empirical Software Engineering;2024-08-09

3. Learning to Detect and Localize Multilingual Bugs;Proceedings of the ACM on Software Engineering;2024-07-12

4. Enhancing embedded systems development with TS$$^-$$;Automated Software Engineering;2023-12-06

5. RUSPATCH: Towards Timely and Effectively Patching Rust Applications;2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security (QRS);2023-10-22