Effective Clone Detection Without Language Barriers

Rieger, Matthias (2005). Effective Clone Detection Without Language Barriers. (Dissertation, University of Bern, Philosophisch-naturwissenschaftliche Fakultät)

[img] Text
rieger-phd.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (1MB) | Request a copy

Duplication is detected by comparing features of source fragments. The main problem for the detection is that source code is rarely copied exactly. The detection process must be able to ignore the superficial differences and to concentrate on fundamental similarities in order to find relevant duplication. While the high level information yielded by syntactic and semantic code analysis can be put to effective use, the drawbacks of these deep analysis techniques are most importantly the reduced adaptability to different programming languages. Because duplication is an ubiquitous problem, however, support for duplication detection and management is needed for every programming language in use. In this thesis we investigate how the premises of simplicity and adaptability influence all phases of the clone detection process. We analyze how line-based string matching as basic feature comparison technique can be augmented by minimal parsing to improve detection sensitivity. We investigate which code normalization techniques remove the superficial differences and reveal the similarities. We show how clone candidates are retrieved from the results of the basic comparison. We propose measures to select the relevant clones from the set of all retrieved candidates. We finally develop a collection of quantitative visualizations that enable the assessment of the copied code in the context of the entire system. We experimentally validate the proposed code normalization technique in terms of precision and recall, show how the proposed relevancy measures improve on simple size metrics, and discuss scalability issues. We also validate the line-based granularity, and perform a comparison of our technique with related string matching detectors.

Item Type:

Thesis (Dissertation)


08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG)




Manuela Bamert

Date Deposited:

29 Jan 2018 16:35

Last Modified:

21 Nov 2019 02:40





Actions (login required)

Edit item Edit item
Provide Feedback