Empirically-Grounded Construction of Bug Prediction and Detection Tools

Osman, Haidar (2017). Empirically-Grounded Construction of Bug Prediction and Detection Tools. (Dissertation, Universität Bern, Philosophisch-naturwissenschaftliche Fakultät)

17osman_h.pdf - Published Version
Available under License Creative Commons: Attribution-Share Alike (CC-BY-SA).

Download (5MB) | Preview

There is an increasing demand on high-quality software as software bugs have an economic impact not only on software projects, but also on national economies in general. Software quality is achieved via the main quality assurance activities of testing and code reviewing. However, these activities are expensive, thus they need to be carried out efficiently.

Auxiliary software quality tools such as bug detection and bug prediction tools help developers focus their testing and reviewing activities on the parts of software that more likely contain bugs. However, these tools are far from adoption as mainstream development tools. Previous research points to their inability to adapt to the peculiarities of projects and their high rate of false positives as the main obstacles of their adoption.

We propose empirically-grounded analysis to improve the adaptability and efficiency of bug detection and prediction tools. For a bug detector to be efficient, it needs to detect bugs that are conspicuous, frequent, and specific to a software project. We empirically show that the null-related bugs fulfill these criteria and are worth building detectors for. We analyze the null dereferencing problem and find that its root cause lies in methods that return null. We propose an empirical solution to this problem that depends on the wisdom of the crowd. For each API method, we extract the nullability measure that expresses how often the return value of this method is checked against null in the ecosystem of the API. We use nullability to annotate API methods with nullness annotation and warn developers about missing and excessive null checks.

For a bug predictor to be efficient, it needs to be optimized as both a machine learning model and a software quality tool. We empirically show how feature selection and hyperparameter optimizations improve prediction accuracy. Then we optimize bug prediction to locate the maximum number of bugs in the minimum amount of code by finding the most cost-effective combination of bug prediction configurations, i.e., dependent variables, machine learning model, and response variable. We show that using both source code and change metrics as dependent variables, applying feature selection on them, then using an optimized Random Forest to predict the number of bugs results in the most cost-effective bug predictor.

Throughout this thesis, we show how empirically-grounded analysis helps us achieve efficient bug prediction and detection tools and adapt them to the characteristics of each software project.

Item Type:

Thesis (Dissertation)


08 Faculty of Science > Institute of Computer Science (INF)
08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG)

UniBE Contributor:

Osman, Haidar and Nierstrasz, Oscar Marius


000 Computer science, knowledge & systems
500 Science > 510 Mathematics




Igor Peter Hammer

Date Deposited:

22 Jan 2018 12:22

Last Modified:

03 Nov 2019 06:17



Additional Information:

e-Dissertation (edbe)





Actions (login required)

Edit item Edit item
Provide Feedback