Scaleable Code Clone Detection

Schwarz, Niko (2014). Scaleable Code Clone Detection. (Dissertation, Universität Bern, Philosophisch-naturwissenschaftlichen Fakultät)

[img] Text
schwarz-phd (1).pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (1MB) | Request a copy

Code clone detection helps connect developers across projects, if we do it on a large scale. The cornerstones that allow clone detection to work on a large scale are: (1) bad hashing (2) lightweight parsing using regular expressions and (3) MapReduce pipelines. Bad hashing means to determine whether or not two artifacts are similar by checking whether their hashes are identical. We show a bad hashing scheme that works well on source code. Lightweight parsing using regular expressions is our technique of obtaining entire parse trees from regular expressions, robustly and efficiently. We detail the algorithm and implementation of one such regular expression engine. MapReduce pipelines are a way of expressing a computation such that it can automatically and simply be parallelized. We detail the design and implementation of one such MapReduce pipeline that is efficient and debuggable. We show a clone detector that combines these cornerstones to detect code clones across all projects, across all versions of each project.

Item Type:

Thesis (Dissertation)


08 Faculty of Science > Institute of Computer Science (INF)
08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG)

UniBE Contributor:

Schwarz, Niko


000 Computer science, knowledge & systems
500 Science > 510 Mathematics




Oscar Nierstrasz

Date Deposited:

23 Apr 2015 10:54

Last Modified:

16 Dec 2019 23:04

Uncontrolled Keywords:

scg-phd snf-none scg14 jb14




Actions (login required)

Edit item Edit item
Provide Feedback