Peters, Alan A.; Decasper, Amanda; Munz, Jaro; Klaus, Jeremias; Loebelenz, Laura I.; Hoffner, Maximilian Korbinian Michael; Hourscht, Cynthia; Heverhagen, Johannes T.; Christe, Andreas; Ebner, Lukas (2021). Performance of an AI based CAD system in solid lung nodule detection on chest phantom radiographs compared to radiology residents and fellow radiologists. Journal of thoracic disease, 13(5), pp. 2728-2737. AME Publishing Company 10.21037/jtd-20-3522
Text
Peters_Performance.pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (706kB) |
Background: Despite the decreasing relevance of chest radiography in lung cancer screening, chest radiography is still frequently applied to assess for lung nodules. The aim of the current study was to determine the accuracy of a commercial AI based CAD system for the detection of artificial lung nodules on chest radiograph phantoms and compare the performance to radiologists in training.
Methods: Sixty-one anthropomorphic lung phantoms were equipped with 140 randomly deployed artificial lung nodules (5, 8, 10, 12 mm). A random generator chose nodule size and distribution before a two-plane chest X-ray (CXR) of each phantom was performed. Seven blinded radiologists in training (2 fellows, 5 residents) with 2 to 5 years of experience in chest imaging read the CXRs on a PACS-workstation independently.
Results of the software were recorded separately. McNemar test was used to compare each radiologist’s results to the AI-computer-aided-diagnostic (CAD) software in a per-nodule and a per-phantom approach and Fleiss-Kappa was applied for inter-rater and intra-observer agreements.
Results: Five out of seven readers showed a significantly higher accuracy than the AI algorithm. The pooled accuracies of the radiologists in a nodule-based and a phantom-based approach were 0.59 and 0.82 respectively, whereas the AI-CAD showed accuracies of 0.47 and 0.67, respectively. Radiologists’ average sensitivity for 10 and 12 mm nodules was 0.80 and dropped to 0.66 for 8 mm (P=0.04) and 0.14 for 5 mm nodules (P<0.001). The radiologists and the algorithm both demonstrated a significant higher sensitivity for peripheral compared to central nodules (0.66 vs. 0.48; P=0.004 and 0.64 vs. 0.094; P=0.025, respectively). Inter-rater agreements were moderate among the radiologists and between radiologists and AI-CAD software (K’=0.58±0.13 and 0.51±0.1). Intra-observer agreement was calculated for two readers and was almost perfect for the phantom-based (K’=0.85±0.05; K’=0.80±0.02); and substantial to almost perfect for the nodule-based approach (K’=0.83±0.02; K’=0.78±0.02).
Conclusions: The AI based CAD system as a primary reader acts inferior to radiologists regarding lung nodule detection in chest phantoms. Chest radiography has reasonable accuracy in lung nodule detection if read by a radiologist alone and may be further optimized by an AI based CAD system as a second reader.