Schwarz, Niko (2014). Scaleable Code Clone Detection. (Dissertation, Universität Bern, Philosophisch-naturwissenschaftlichen Fakultät)
Text
schwarz-phd (1).pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (1MB) |
Code clone detection helps connect developers across projects, if we do it on a large scale. The cornerstones that allow clone detection to work on a large scale are: (1) bad hashing (2) lightweight parsing using regular expressions and (3) MapReduce pipelines. Bad hashing means to determine whether or not two artifacts are similar by checking whether their hashes are identical. We show a bad hashing scheme that works well on source code. Lightweight parsing using regular expressions is our technique of obtaining entire parse trees from regular expressions, robustly and efficiently. We detail the algorithm and implementation of one such regular expression engine. MapReduce pipelines are a way of expressing a computation such that it can automatically and simply be parallelized. We detail the design and implementation of one such MapReduce pipeline that is efficient and debuggable. We show a clone detector that combines these cornerstones to detect code clones across all projects, across all versions of each project.
Item Type: |
Thesis (Dissertation) |
---|---|
Division/Institute: |
08 Faculty of Science > Institute of Computer Science (INF) 08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG) [discontinued] |
UniBE Contributor: |
Schwarz, Niko |
Subjects: |
000 Computer science, knowledge & systems 500 Science > 510 Mathematics |
Language: |
English |
Submitter: |
Oscar Nierstrasz |
Date Deposited: |
23 Apr 2015 10:54 |
Last Modified: |
05 Dec 2022 14:45 |
Uncontrolled Keywords: |
scg-phd snf-none scg14 jb14 |
BORIS DOI: |
10.7892/boris.67052 |
URI: |
https://boris.unibe.ch/id/eprint/67052 |