Ducasse, Stéphane; Nierstrasz, Oscar; Rieger, Matthias (2004). Lightweight Detection of Duplicated Code. A Language-Independent Approach Universität Bern
Text
Rieg04a-IAM-04-002.pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (274kB) |
Duplicated code can have a severe, negative impact on the maintainability of large software systems. Techniques for detecting duplicated code exist but they rely mostly on parsers, technology that is often fragile in the face of different languages and dialects. In this paper we show that a lightweight approach based on simple string-matching can be effectively used to detect a significant amount of code duplication. The approach scales well, and can be easily adapted to different languages and contexts. We validate our approach by applying it to a number of industrial and open source case studies, involving five different implementation languages and ranging from 256KB to 13MB of source code. Finally, we compare our approach to a more sophisticated one employing parameterized matching, and demonstrate that little if anything is gained by adopting a more heavyweight approach.
Item Type: |
Report (Report) |
---|---|
Division/Institute: |
08 Faculty of Science > Institute of Computer Science (INF) 08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG) [discontinued] |
UniBE Contributor: |
Ducasse, Stephane, Nierstrasz, Oscar |
Subjects: |
000 Computer science, knowledge & systems 500 Science > 510 Mathematics |
Publisher: |
Universität Bern |
Language: |
English |
Submitter: |
Anja Ebeling |
Date Deposited: |
22 Nov 2017 09:54 |
Last Modified: |
11 Apr 2024 16:11 |
BORIS DOI: |
10.7892/boris.104732 |
URI: |
https://boris.unibe.ch/id/eprint/104732 |