Scaleable Code Clone Detection

Schwarz, Niko (2014). Scaleable Code Clone Detection. (Dissertation, Universität Bern, Philosophisch-naturwissenschaftlichen Fakultät)

Full text not available from this repository. (Request a copy)

Code clone detection helps connect developers across projects, if we do it on a large scale. The cornerstones that allow clone detection to work on a large scale are: (1) bad hashing (2) lightweight parsing using regular expressions and (3) MapReduce pipelines. Bad hashing means to determine whether or not two artifacts are similar by checking whether their hashes are identical. We show a bad hashing scheme that works well on source code. Lightweight parsing using regular expressions is our technique of obtaining entire parse trees from regular expressions, robustly and efficiently. We detail the algorithm and implementation of one such regular expression engine. MapReduce pipelines are a way of expressing a computation such that it can automatically and simply be parallelized. We detail the design and implementation of one such MapReduce pipeline that is efficient and debuggable. We show a clone detector that combines these cornerstones to detect code clones across all projects, across all versions of each project.

Item Type: Thesis (Dissertation)
Division/Institute: 08 Faculty of Science > Institute of Computer Science (INF)
08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG)
UniBE Contributor: Schwarz, Niko
Subjects: 000 Computer science, knowledge & systems
500 Science > 510 Mathematics
Language: English
Submitter: Oscar Marius Nierstrasz
Date Deposited: 23 Apr 2015 10:54
Last Modified: 08 Feb 2017 11:43
Uncontrolled Keywords: scg-phd snf-none scg14 jb14
URI: http://boris.unibe.ch/id/eprint/67052

Actions (login required)

Edit item Edit item
Provide Feedback