Automated identification of diagnostic labelling errors in medicine

Hautz, Wolf E.; Kündig, Moritz M.; Tschanz, Roger; Birrenbach, Tanja; Schuster, Alexander; Bürkle, Thomas; Hautz, Stefanie C.; Sauter, Thomas C.; Krummrey, Gert (2021). Automated identification of diagnostic labelling errors in medicine. Diagnosis, 9(2), pp. 241-249. De Gruyter 10.1515/dx-2021-0039

10.1515_dx-2021-0039.pdf - Published Version
Available under License Creative Commons: Attribution (CC-BY).

Download (696kB) | Preview

Objectives: Identification of diagnostic error is complex and mostly relies on expert ratings, a severely limited procedure. We developed a system that allows to automatically identify diagnostic labelling error from diagnoses coded according to the international classification of diseases (ICD), often available as routine health care data.

Methods: The system developed (index test) was validated against rater based classifications taken from three previous studies of diagnostic labeling error (reference standard). The system compares pairs of diagnoses through calculation of their distance within the ICD taxonomy. Calculation is based on four different algorithms. To assess the concordance between index test and reference standard, we calculated the area under the receiver operating characteristics curve (AUROC) and corresponding confidence intervals. Analysis were conducted overall and separately per algorithm and type of available dataset.

Results: Diagnoses of 1,127 cases were analyzed. Raters previously classified 24.58% of cases as diagnostic labelling errors (ranging from 12.3 to 87.2% in the three datasets). AUROC ranged between 0.821 and 0.837 overall, depending on the algorithm used to calculate the index test (95% CIs ranging from 0.8 to 0.86). Analyzed per type of dataset separately, the highest AUROC was 0.924 (95% CI 0.887-0.962).

Conclusions: The trigger system to automatically identify diagnostic labeling error from routine health care data performs excellent, and is unaffected by the reference standards' limitations. It is however only applicable to cases with pairs of diagnoses, of which one must be more accurate or otherwise superior than the other, reflecting a prevalent definition of a diagnostic labeling error.

Item Type:

Journal Article (Original Article)


04 Faculty of Medicine > Department of Intensive Care, Emergency Medicine and Anaesthesiology (DINA) > University Emergency Center

UniBE Contributor:

Hautz, Wolf, Birrenbach, Tanja Nicole, Hautz, Stefanie Carola, Sauter, Thomas Christian, Krummrey, Gert


600 Technology > 610 Medicine & health




De Gruyter




Romana Saredi

Date Deposited:

07 Dec 2021 19:02

Last Modified:

05 Dec 2022 15:55

Publisher DOI:


PubMed ID:





Actions (login required)

Edit item Edit item
Provide Feedback