Lightweight Detection of Duplicated Code. A Language-Independent Approach

Ducasse, Stéphane; Nierstrasz, Oscar; Rieger, Matthias (2004). Lightweight Detection of Duplicated Code. A Language-Independent Approach Universität Bern

[img] Text
Rieg04a-IAM-04-002.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (274kB) | Request a copy

Duplicated code can have a severe, negative impact on the maintainability of large software systems. Techniques for detecting duplicated code exist but they rely mostly on parsers, technology that is often fragile in the face of different languages and dialects. In this paper we show that a lightweight approach based on simple string-matching can be effectively used to detect a significant amount of code duplication. The approach scales well, and can be easily adapted to different languages and contexts. We validate our approach by applying it to a number of industrial and open source case studies, involving five different implementation languages and ranging from 256KB to 13MB of source code. Finally, we compare our approach to a more sophisticated one employing parameterized matching, and demonstrate that little if anything is gained by adopting a more heavyweight approach.

Item Type:

Report (Report)

Division/Institute:

08 Faculty of Science > Institute of Computer Science (INF)
08 Faculty of Science > Institute of Computer Science (INF) > Software Composition Group (SCG) [discontinued]

UniBE Contributor:

Ducasse, Stephane, Nierstrasz, Oscar

Subjects:

000 Computer science, knowledge & systems
500 Science > 510 Mathematics

Publisher:

Universität Bern

Language:

English

Submitter:

Anja Ebeling

Date Deposited:

22 Nov 2017 09:54

Last Modified:

11 Apr 2024 16:11

BORIS DOI:

10.7892/boris.104732

URI:

https://boris.unibe.ch/id/eprint/104732

Actions (login required)

Edit item Edit item
Provide Feedback