Rizzo, Rudy; Dziadosz, Martyna; Kyathanahally, Sreenath P.; Reyes, Mauricio; Kreis, Roland (2022). Reliability of Quantification Estimates in MR Spectroscopy: CNNs vs Traditional Model Fitting. Lecture notes in computer science, 13438, pp. 715-724. Springer 10.1007/978-3-031-16452-1_68
Text
978-3-031-16452-1_68.pdf - Published Version Restricted to registered users only Available under License Publisher holds Copyright. Download (3MB) |
Magnetic Resonance Spectroscopy (MRS) and Spectroscopic Imaging (MRSI) are non-invasive techniques to map tissue contents of many metabolites in situ in humans. Quantification is traditionally done via model fitting (MF), and Cramer Rao Lower Bounds (CRLBs) are used as a measure of fitting uncertainties. Signal-to-noise is limited due to clinical time constraints and MF can be very time-consuming in MRSI with thousands of spectra. Deep Learning (DL) has introduced the possibility to speed up quantitation while reportedly preserving accuracy and precision. However, questions arise about how to access quantification uncertainties in the case of DL. In this work, an optimal-performance DL architecture that uses spectrograms as input and maps absolute concentrations of metabolites referenced to water content as output was taken to investigate this in detail. Distributions of predictions and Monte-Carlo dropout were used to investigate data and model-related uncertainties, exploiting ground truth knowledge in a synthetic setup mimicking realistic brain spectra with metabolic composition that uniformly varies from healthy to pathological cases. Bias and CRLBs from MF are then compared to DL-related uncertainties. It is confirmed that DL is a dataset-biased technique where accuracy and precision of predictions scale with metabolite SNR but hint towards bias and increased uncertainty at the edges of the explored parameter space (i.e., for very high and very low concentrations), even at infinite SNR (noiseless training and testing). Moreover, training with uniform datasets or if augmented with critical cases showed to be insufficient to prevent biases. This is dangerous in a clinical context that requires the algorithm to be unbiased also for concentrations far from the norm, which may well be the focus of the investigation since these correspond to pathology, the target of the diagnostic investigation