Adaptive data-driven selection of sequences of biological and cognitive markers in clinical diagnosis of dementia

Combining the right--potentially invasive and expensive, markers at the appropriate time is critical to obtain reliable yet economically sustainable decisions in the preclinical diagnosis of dementia. We propose a data-driven analytical framework to individualize the selection of prognostic biomarkers that balance accuracy, costs of opportunity due to delaying the decision, and cost of acquisition depending to prescribed cost parameters. We compared sequential and non-sequential decision strategies based on a linear mixed-effects classification model that integrates irregular, multi-variate longitudinal data. The framework was applied to separate participants that progress to Alzheimer's disease from the ones that do not within a time interval of three years. As expected, the highest accuracy was obtained by combining all available data from 20.9 measurements per subject on average that were acquired over 4.8 years on average. The proposed sequential algorithm empirically outperformed alternative methods by having lowest costs for a range of tested cost parameters. With the default cost parameters, the sequential algorithm reached an accuracy of 0.84, specificity of 0.86, and sensitivity of 0.82 (0.89, 0.91, and 0.88 with all available data, respectively) while requiring only 2.9 measurements on average (86 percent less observations than all available data) and a time interval of half a year on average (89 percent shorter than all time points). Our sequential algorithms established the decision based on individualized sequences of measurements with reduced process costs compared to non-sequential classification strategies while maintaining competitive accuracy.

linical diagnosis and prognosis of dementia due to Alzheimer's disease (AD), a progressive, neurodegenerative disorder, is based on a panel of cognitive assessments, invasive molecular markers, and non-invasive neuro-imaging signatures. Invasive markers such as A 1−42 cerebro-spinal fluid (CSF) markers (1) impose a high burden on patients and high monetary costs. These shortcomings are balanced by high sensitivity of the diagnosis. Conversely, cognitive assessments such as the Mini-Mental-State Examination (MMSE) (2) or Rey Auditory Verbal Learning (RAVLT) test (3) have a lower economic cost and patient burden, but also a lower sensitivity in early stages of the disease. Non-invasive magnetic resonance imaging (MRI) provides machine-learning derived measures of AD-like atrophy (SPARE-AD) ( To date, a typical diagnostic decision of dementia is based on a panel of cross-sectional or a sequence of repeatedly measured (longitudinal) markers from multiple modalities such as MRI and cognitive testing (5-7). The temporal trajectory of the biomarkers is also an indicator of disease state and progression. There is currently, however, no consensus or systematic approach to individualize the selection of panels and sequences of markers to acquire. It is expected that the accuracy increases with more variables and/or repeated measurements over time. However, the benefit of higher accuracy that more data brings to the table is accompanied by added costs of acquisition, a delay of the definitive diagnosis, and higher patient burden. The goal of the process for clinical diagnosis of dementia is to reach a conclusion under an acceptable level of uncertainty for an individual while globally optimizing the use of resources and patient burden.
Conceptually, sequential algorithms based on neutral zone classifiers (8)(9)(10) address the use case in which a definitive decision is taken after considering a sequence of measurement by adding d measurements only if an objective criterion for a confident decision is not met. Different algorithms were presented in recent years and successfully applied to clinical data (8,9,11). While some only depend on fixed decision boundaries like the posterior class probabilities (11), other approaches consider expected cost reductions of additional measurements in the decisions (8). However, to our knowledge, the sequential neutral zone classifiers published so far do not allow the selection of variables or skipping time points.
We propose an algorithm that uses knowledge about an underlying distribution of markers to prospectively decide if more observations are needed for a definitive classification and which variable should be acquired next. To estimate the distributions, we embedded mixed-effects models in a classification task to allow for more flexibility and more parsimonious modelling of longitudinal data than other generative approaches such as traditional discriminant models (12)(13)(14)(15)(16). The proposed framework integrates irregularly sampled repeated (longitudinal) measures with varying number of observations and derives an adaptive sequential expansion of panel of markers for classification (see Fig. 1). The experimental evaluation of a flexible clinical decision process is reflected in multiple aspects including accuracy, resource usage, and delay of diagnosis. The overall performance is a tradeoff resulting from the model design and associated cost parameters injected a priori. We applied the framework to predicting a conversion from sub-clinical to clinical stage of AD. C . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Costs of decision processes as objective evaluation criterion
A decision process d incorporates a subset of all available data for its decision according to a decision rule and-in case of sequential classifiers, also a sequence selection strategy. The cost of the decision process is an empirical metric of performance (lower is better) that is the sum of multiple costs each reflecting a different desirable quality. We implement binary decision processes that predict whether a participant ∈ {1, . . , } will progress from a sub-clinical stage of mild cognitive impairment (MCI) to the manifest stage of dementia due to Alzheimer's disease (AD) within a time interval of approximately three years. The total costs , of a process are defined as the sum of misclassification costs , and measurement costs c , ℳ consisting of the costs of delaying the decision c , and costs of acquisition , , such that , = , + , ℳ = , + , + , .

See Supplementary
Materials for details about the composition of the costs.
If the sequence , of a of a process is fixed, it includes all pre-defined assessments independently of the evidence given by the previously assessed measurements. The adaptive strategies sequentially weighing the expected accuracy from acquired evidence against expected gain in accuracy and costs of acquiring new data and delaying the decision.

Sequence selection strategies
We considered different processes implementing either fixed or sequential decision strategies to choose the set , . The implemented fixed decision strategies were categorized as (a) univariate (measuring only one marker) or multivariate (measuring multiple markers) and (b) cross-sectional (using only one measurement per marker at baseline) or longitudinal (using all repeated measurements of markers over time). We implemented two different sequential strategies that stepwise included one additional observation until a class is chosen. The greedy sequential strategy always choses the earliest observation for which a cost reduction is expected, while the exhaustive sequential strategy selects the observation for which the expected cost reduction is the highest. Since the resulting sequences differ in terms of number, type, and time of assessment between decision strategies we compared the processes using a diverse set of performance metrics characterizing different facets.

Multi-objective evaluation of decision processes
The prognostic decision processes were compared using multiple performance metrics including the area under the receiver operator characteristics curve (AUC), specificity, sensitivity, and accuracy. Of note, from these metrics only the AUC is cost independent. The misclassification costs calibrate the severeness of false positive and false negative cases and hence influence sensitivity, specificity, and accuracy (see Eq. S5 in the Supplementary Materials for details). Since a correct diagnosis of MCIconverter can either be made the pre-clinical MCI state or after the conversion to AD already occurred, we also considered a metric to assess suitability of disease prognosis that counts prognoses after the conversion to AD as misclassification. We defined the pre-conversion sensitivity as the portion of MCIconverter that were correctly classified before the conversion occurred. Moreover, given the arbitrariness in the assignment of cost parameters, multiple sets of cost parameters were prescribed to assess the robustness of the performance.

Performance of decision processes
The sequential algorithms were designed to select a sequence of observations with low expected total costs given a set of cost parameters. AUC and accuracy were less sensitive to the prescribed cost parameters whereas other performance measures (such as specificity and especially pre-conversion sensitivity) were highly dependent on prescribed cost parameters (see Fig. 3 and Tab. S1 and Tab. S2 in the Supplementary Material). For a large range of cost parameter, both-greedy and exhaustive, sequential algorithms had lower mean total costs than all the fixed strategies (see Tab. 1 below and Tab. S1 and Tab. S2 in the Supplementary Materials). Sequential strategies were competitive in terms of misclassification costs, accuracy, and sensitivity while saving measurement costs compared to longitudinal or the multivariate cross-sectional strategy (see Tab. 1, Fig. 2, Fig. 3 and Tab. S1 and Tab. S2 in the Supplementary Materials). Moreover, the sequential strategies adapted to the cost parameters shown by the varying mean number of observations, mean follow-up time and accuracy. Specifications of the cost parameters especially of the costs of delaying influenced performance metrics especially specificity and pre-conversion sensitivity (see Fig. 3 and Tab. S1 in the Supplementary Material). If not specified otherwise, the following results are based on the following default cost parameters: Cost of misclassification of MCI-stable=100; Cost of misclassification of MCI-converter=100; Cost for MRI acquisition=2; Cost of acquisition of 1−42 -CSF=4; Cost of acquisition of cognitive test=1; Cost for waiting one year=2. We chose small costs of delaying and acquisition as compared to misclassification costs to favor accuracy over measurement costs aiming to assess reductions in delaying and number of measurements of sequences that approach accuracy or AUC of classifications based on all available observations. The multi-objective evaluation revealed that both sequential strategies were competitive in terms of number of observations and the greedy strategy also for followup time (mean time needed for a decision of around 6 month) and resulting pre-conversion sensitivity compared to other strategies with lower accuracy and AUC (see Fig. 2 and Tab. 1).
The exhaustive sequential strategy had the lowest mean total costs whereas the multivariate longitudinal strategy had the highest mean total costs driven by the highest measurement costs (see Tab. 1). Highest accuracy, AUC and sensitivity were achieved with the multivariate longitudinal strategy, while the longitudinal strategy using all MMSE measures was the most specific. Lowest accuracy, AUC, and sensitivity was achieved when using the first MMSE and lowest specificity when using the first 1−42 -CSF measure for classification. For the multivariate cross-sectional strategy, the pre-conversion sensitivity was the highest, whereas longitudinal strategies showed the lowest preconversion sensitivity. Integrating the repeated measures over time led to higher specificity in case of multivariate or (univariate) cognitive markers and higher sensitivity in case of the structural marker.as compared to the corresponding cross-sectional strategy. As expected, the pre-conversion sensitivity was the same as the sensitivity for cross-sectional strategies and low for longitudinal strategies between 0.01 and 0.03 because the decision was made after conversion. The multivariate crosssectional strategy showed improvements in accuracy, AUC and sensitivity compared to univariate cross-sectional strategies. The exhaustive sequential strategy showed smaller mean total costs than the greedy strategy driven by the higher accuracy. Also, the exhaustive strategy showed a higher AUC, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.26.21265515 doi: medRxiv preprint specificity and sensitivity. The pre-conversion sensitivity was higher for the greedy than for the exhaustive strategy. For the greedy strategy 12 percent of MCI-converters where correctly classified after conversion while this percentage was 43 for the exhaustive strategy. Both sequential strategies outperformed all univariate cross-sectional strategies in terms of accuracy, AUC, sensitivity, and specificity, while all these measures were lower than the one achieved by some longitudinal strategies. The multivariate cross-sectional strategy was outperformed by both sequential strategies in terms of accuracy and specificity while sensitivity was higher (AUC in between: lower than for the exhaustive and higher than for the greedy). Both sequential strategies--and the greedy strategy in particular, led to more frequent pre-conversion diagnosis of AD than longitudinal strategies, but the level of the preconversion sensitivity of most cross-sectional strategies could not be retained when using the default cost parameters. In summary, there were strategies that outperformed others in multiple criteria (e.g., total costs as well as accuracy measures) but no strategy dominated all others in all considered criteria. Most fixed strategies were either specific (longitudinal) or prognostic (univariate cross-sectional), while the sequential strategies made a trade-off in between.
In Fig. 2(a) decision processes were evaluated relative to the performance of using all available information. While the univariate cross-sectional strategy showed reductions for all classificationbased performances compared to the multivariate longitudinal strategy, the multivariate crosssectional approached the level of sensitivity and sequential strategies approached accuracy and specificity. Longitudinal MMSE measures had similar AUC and accuracy with higher specificity but lower sensitivity. All cross-sectional and sequential strategies had a lower number of observations and lower follow-up time. Over the whole sample, 70 percent of subjects had at least two measurements, while the invasive 1−42 -CSF was included in 30percent of cases (Tab. 2). Besides the MMSE (first measurement after around 9 months and last around1.5 years in average), all other markers were assessed early (last observation before 4 months in average). Descriptive measures of other sequential strategies (with exhaustive selection criteria or other cost parameters) can be found in the Supplementary Material (see Tab. S3 -Tab. S27) An evaluation of decision strategies in terms of total, misclassification (which is equal to the percentage of false classifications for the considered cost parameters) and measurement costs is presented in Fig. 2(b). Univariate cross-sectional strategies had higher mean misclassification costs than other strategies. All univariate longitudinal strategies--except the one including 1−42 -CSF measures, were characterized by lower mean misclassification costs and higher mean measurement costs than all other non-sequential strategies. Sequential strategies had the lowest mean total costs on average and had lower misclassification costs than all cross-sectional and some longitudinal strategies. There was no strategy that consistently dominated all other strategies in all objectives. The greedy sequential strategy dominated the multivariate cross-sectional and univariate longitudinal strategies involving either all MRI, 1−42 -CSF or RAVLT measures. The exhaustive strategy additionally showed similar misclassification costs as the longitudinal strategy using all MMSE measures in all three objectives and was dominated in terms of misclassification costs only by the multivariate longitudinal strategy. Both sequential strategies had higher mean measurement costs than the univariate crosssectional strategies. Moreover, the sequential strategies were outperformed by some longitudinal strategies in terms of the cost independent AUC, univariate cross-sectional strategies in terms of both . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.26.21265515 doi: medRxiv preprint lower costs of acquisition and delaying. and the multivariate cross-sectional in terms of costs of delaying only (but lower costs of acquisition) (Fig. 2

(c)-(e)).
Survival curves estimating the portion of subject that did not convert as a function of the conversion time are shown in Fig. 2 (f). Observations of participants labelled as MCI-stable were considered as right-censored and the time until the last observation was used for the analysis. The multivariate crosssectional strategy showed more distinct survival curves of positive and negative labelled cases than the one using only the first MRI measure. Adding all repeated measures of the markers (multivariate longitudinal strategy) further increased the difference between curves. Sequential strategies based on the default cost structure scored in between the two fixed multivariate strategies. When using all MMSE measures (most specific strategy as shown in Tab. 1) for classification the survival curves of positively labelled cases are the steepest of all considered decision strategies, but the curves of cases with predicted negative labels were steeper than for multivariate cross-sectional and longitudinal and sequential strategies as well.
All considered cost parameters presented here assumed the same misclassification costs such that accuracy, specificity, and sensitivity can be compared (see Tab. S1 and Tab. S2 in the Supplementary Material for results with other misclassification costs and additional variations of costs of aquistion and time). Costs either differed in terms of costs of waiting, the cost of assessments or both from the default cost structure. In Fig. 3

(a)-(b)
AUCs of strategies are contrasted to the resources of time or number of considered measurements showing gains of performance when taking up more resources. All sequential strategies had lower AUCs as the strategy using all MMSE measures or the multivariate longitudinal and higher AUCs as univariate cross-sectional strategies. Depending on the specified cost parameters sequential strategies showed higher or lower AUCs than the multivariate cross-sectional strategy while always using lower number of observations but longer follow-up times in average. As shown in Fig. 3 (c) both specificity and sensitivity of sequential strategies were positioned between the fixed strategies by being always less or equal sensitive than the multivariate cross-sectional and always less sensitive than the multivariate longitudinal or the longitudinal strategy involving the RAVLT measures only as well as less specific than the multivariate longitudinal and the longitudinal strategy involving MMSE measures only. Both sequential strategies showed higher sensitivities and higher or equal specificities than all univariate cross-sectional and some univariate longitudinal strategies. The trade-off between specificity and pre-conversion sensitivity of different sequential strategies is illustrated in Fig. 3 (d)-(f). Sequential strategies were bounded by the specificity of the multivariate longitudinal and the pre-conversion sensitivity of the multivariate cross-sectional strategy ( Fig. 3 (d)). Greedy strategies were less specific but more prognostic than exhaustive strategies, while increasing the costs of waiting led to lower follow-up times and consequently higher pre-conversion sensitivities for the prize of lower specificities (Fig. 3 (c)-(d)).

Discussion
Personalized medical care tailors the diagnosis and treatment to individual patients based on circumstantial evidence and options. Motivated by a use case of pre-clinical diagnosis of AD-the most frequent cause of severe cognitive decline at old age, we developed a general statistical framework to objectify the selection of sequence and time of diagnostic markers to include in a decision process. As . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.26.21265515 doi: medRxiv preprint previous studies indicated the benefit of combining multiple, diverse markers (17), cognitive, brain anatomical, and molecular markers were included here. The performance of the non-sequential strategies consistently improved when multiple cross-sectional and multiple longitudinal measurements were combined. This confirmed that the implemented longitudinal discriminant model was able to integrate multi-variate and repeated measurements. This was further demonstrated in the time-to-event analyses in which the models with the most data consistently had highest odds ratio between the classes. The observed complementary information of different diagnostic markers has been shown and explored in the literature previously (17) and the performance achieved with all available data marks the maximum accuracy that can be expected from sequential algorithm with fewer variables or time points. However, under certain conditions, a single or very few markers may be sufficient for a definitive diagnosis and it's the objective of sequential neutral zone classifiers to balance misclassification and acquisition costs. The results of the sequential classification strategies showed the expected benefits of reaching similar accuracy (2 to 9 percent reduction) with fewer acquisitions and shorter follow-up interval. With the default cost parameters, the waiting time until a decision was around half a year in average, whereas 30.1 percent of the participants were definitively classified based on a single measurement and in about 69.7 percent of the cases, no A 1−42 -CSF measure was included in the panel of biomarkers, considerably reducing the administration of invasive procedures.
The cost parameters influenced the performance, the cost independent measure AUC was between 0.85 and 0.92 and the accuracy between 0.80 and 0.87. The influence of the prescribed parameter for the costs of time on AUC and accuracy was stronger than the one of the acquisition costs because delayed decisions (when specifying low costs of time) had higher specificity. When increasing costs of time, more converters were definitively diagnosed before the actual conversion occurred, making the setting better suited for prognosis. Interestingly, higher costs of acquisition did not result in drop in AUC or accuracy. The implemented greedy sequential strategies based on high acquisition costs considerably reduced number of observations especially for A 1−42 -CSF (between 0 and 40.4 percent of subjects with A 1−42 -CSF assessment for varying cost of acquisition) and instead chose to assess the cognitive MMSE scale after a longer follow-up time when gains in accuracy pay out against the high acquisition costs. All these considerations are relevant when aiming to prescribe cost parameters in clinical diagnosis.
Academic research of biomarkers is often focussed on methods to build good predictive model with a single or a few general-purpose modalities (18). While such an approach is adequate to maximize population-level classification accuracy, it is not useful to design a decision process. Acquisition of a complete set of markers for all participants with high accuracy on population-level might still be insufficient in clinical diagnosis since it does not consider individual uncertainty of prediction, nor does it optimize the use of resources. An individualized sequence of markers and their measurement time points that is chosen sequentially follows the precision diagnostic principle of "use it only when and in whom it is most needed" while optimizing resources and impose patient burden only when that is countered with a prognostic value.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.26.21265515 doi: medRxiv preprint application to derive more flexible approaches for discriminant analyses for clinical data (12)(13)(14)(15)(16)(31)(32)(33)(34)(35). Also, in recent years there have been studies that applied mixed-effects models to infer underlying structures of multivariate change or to derive flexible or even dynamic classification approaches (11,16,22,24,27) also in diagnosis of neurodegeneration (22,32,35). The approaches for sequential classification with mixed-effects models in resent studies (11,16) relied purely on acquired evidence without considering potential benefits of measurements. In the field of neutral zone prediction multiple approaches were presented in the last years (8)(9)(10)36). Application based on such neutral zone classifiers that enable sequential classification were derived and applied to clinical data. Existing sequential neutral zone classifiers (8,9,11) are designed for multi-stage classification Our algorithm can skip observations and can choose which biomarker to select next. Due to the flexibility of mixedeffects models to adapt model structures to the data, task, and sample size, our framework of nonsequential and sequential classification based on mixed-effects models may be of interest in different applications and research fields. The specific implementation here was limited to linear model and fixed prevalence, but the concept is applicable to non-linear mixed models and conditional prevalence.
The classification task was defined by a clinically motivated, yet arbitrary, thresholds of follow-up and conversion time. Moreover, in the two studies, patients with significant neurological disorders and most psychiatric disorders were excluded, limiting the validity to a prospective clinical population as shown previously in applications of machine-learning methods to data from clinical routine (37,38). Here we evaluated fixed cost parameters. In a clinical application, the costs could depend on the visit or on other factors. For instance, as already implemented in many clinical workups, initial suspicion of dementia due to AD requires a confirmatory MRI to exclude other neurological disorders. In our framework, this would lead to a cost penalty of zero in case of a suspected case of AD or worded differently: "no definitive diagnosis of AD without structural MRI". The data included in this study was inadequate to evaluate more complex cost structures that better cater to needs in the clinical routine. Furthermore, the simplified modelling of the distribution underlying the predictive model may limit the performance of the model and its application to clinical populations. While effective and computationally light, we implemented a greedy approach selecting a single measurement in a sequence which does not guarantee to find the globally optimal next step. A globally optimal algorithm would need to consider all possible future sequences which is computationally intensive and intractable for sufficiently high number of markers. A better approach might be derived by adapting the multi-stage approach presented in (8) in a way that allows to choose which observation should be taken next.
While the proposed framework does not alleviate the necessity of choosing cost parameters, it demonstrated the ability to adapt in out-of-sample classification to consistently lower costs compared to cross-sectional predictions or the ones using all available information of one or multiple markers we have for a subject. Sequential strategies that were designed to lower overall decision costs also showed to be competitive in individual performance measures. The proposed statistical framework constitutes a promising element for precision diagnostic that makes the panel of diagnostic tools conditional on past and potential future evidence, thus specifically objectifying the acquisition of the panel of markers after each visit.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Decision rules for classification
The decision rule governs the mapping from a test statics derived from observations to a binary decision. Suppose we have a class label ∈ {1; 2} a random binary variable with Bernoulli distribution with unconditional success probability π 0 (base rate) and ∈ ℝ , is a random vector taking continuous values, has density (1) for the population conditioning on class = 1, and density (2) for the population when class = 2. Furthermore, we assume that 0 , (1) and (2) are known and the densities of multivariate normal distributions with a common covariance matrix. Thus, with mean vectors ( ) = ( | ) ( ∈ {1; 2}, (1) ≠ (2) ) and the common covariance matrix = Σ (1) = Σ (2) = Var( | ). The posterior probability of = 2 conditioning on is given by: Suppose ∈ ℝ are past observations, and ∈ ℝ are optional future observations and without loss of generality, let = ( ). The posterior probability is given by the observed values and its assumed density functions for both population (1) and (2) (plugged in in Eq. 2) and is called the current evidence. Based on the current evidence, a forced choice classifier δ , definitively assigns one of the classes in a binary classification. The decision rule that minimizes misclassification costs of the forced choice classifier δ FC, is Neutral zone classifiers add a no decision class to the set of possible predicted outcomes and associated costs . The no decision class is picked when the test statistic is within a neutral zone. For the descriptive neutral zone classifier denoted by δ , based on the assessed , the classification boundaries for the current evidence depend only on the pre-specified costs parameters. The classifier δ , is given by In case 1 < 2 , the classifier in Eq. 4 minimizes the costs, if not, we end up with the forced-choice classifier in Eq. 3 as minimum cost classifier (adapted by (11)). Since = 1 1 (2) and = . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021. 10.26.21265515 doi: medRxiv preprint (1 − 2 ) 2 (1) , the descriptive neutral zone classifiers predict the label in case the expected misclassification costs for both classes are higher than the neutral costs .
The descriptive neutral zone classifier does not consider potential value of the future observations, that is, how much future observations may contribute in reducing the process costs. The decision process of the descriptive neutral zone classifier may be improved by postponing the decision only if the expected future costs given by conditional on the value of the already assessed denoted by . are lower than the ones expected by a forced-choice decision based on denoted by . To this end, the measurement costs ℳ associated by postponing the decision and assessing the remaining markers are considered. In such a case, the neutral costs are = .
given by the costs measurement costs and the expected future misclassification cost (for the descriptive neutral zone classifier we can think of neutral costs that are given by the measurement costs, i.e. = ℳ ) Consequently, for . the false positive and false negative rate of a forced choice classification based on the joint distribution of and conditional on is needed. In linear discriminant models, the misclassification rates depend solely on the misclassification costs 2 (1) and 1 (2) , current evidence and the prospective discriminability denoted by .
(see Fig. 4 for an illustration and the Supplementary Material for the derivation of Eq. S10). The prospective discriminability is defined as (1) − . (5) The exact formula (closed form solution) for the expected misclassification rates ( , Δ . ) and ( , Δ . ) can be found in the Supplementary Material (see Eq. S10). Given the expected costs we can derive a modified neutral zone classifier that assigns the label only in case > . , i.e., when a forced-choice classification including the future measurements given the already assessed is smaller than the expected costs for a forced-choice classification given solely by the already assessed . The prospective neutral zone classifier , is given by: Where the second part of Eq. 6 only holds in case 1 ( . ) < 2 ( . ) and the boundaries 1 (Δ . ) and 2 ( . ) cannot be computed in a closed form (see Eq. S11 and Eq. S12 and their description in the Supplementary Material for more information). Of note, we did not compute the boundaries and used the first part of Eq. 6 for classification. Fig. 4 illustrates the forced-choice, descriptive neutral zone, and prospective neutral zone classifiers.
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021 In this study we implemented classification framework that can deal with irregularly acquired repeated and multi-variate measurements. Mixed effects model-based estimation was embedded into linear (or quadratic, see Supplementary Material) discriminant models to account for inter-subject differences (12-15, 34, 35, 39). The vector from participant ∈ {1; 2; … ; } consists of longitudinal measurements from multiple time points , acquiring one of four response variables. We assumed for a subject with measurements and unknown label that The prevalence (unconditional probability) 0 = ( = 2) was estimated by the relative frequency ̂0 of subjects with diagnosis 2 in the training set. The population level estimators of the mean vector ̂( ) and covariance matrix ̂ were derived using linear mixed-effects models trained on labelled measurements denoted by , ( ) respectively ( ) . The LMM included age at baseline , the time , since baseline of observation and four dummy variables for identifying the response variable denoted by ℎ, , (ℎ ∈ ). We used a model adapted from (19,29,30) where whereas ℎ,1 ( ) , ℎ,2 ( ) and ℎ,3 ( ) ( ∈ {1; 2}) are the diagnosis and response variable specific fixed effects and ( ) the vector containing all diagnosis specific fixed effects (population-level), ℎ, ,1 and ℎ, ,2 the response specific random effects (subject-level, same for both labels) and , the (scaled) residuals which are multiplied with response specific intra-subject variance components ℎ (same for both labels). The distribution of the vector containing all random effects (for the intercept and time for all variables) is given by ~2 (0, ). The scaled residuals were assumed to be independent from each other and the random intercept and slopes and standard norm distributed i.e., ,~( 0,1). The distribution of the unscaled residuals , varies between response variables and is given as. ,~( 0, ∑ ℎ, , ℎ ) ℎ∈ . We denote with = ( ℎ ) ℎ∈H the vector containing all response-type specific intra-subject variances. All model parameter = [ 0 ; (1) ; (1) ; ; ] (prevalence for the diagnose 2 and model parameters of the LMM) for the discriminant models in Eq. 7 were estimated on training data with labelled observations using a 20-fold cross validation framework. The estimation of the parameters of the LMMs were obtained with the restricted maximum likelihood approach using the library nlme (see (40) for a detailed description) from the statistical programming language R 4.1.1 (41).
Given the 20 trained models cross-validated out-of-sample predictions of ̂( ) and ̂ from all observations of subject were conducted. With the estimated prevalence, mean vectors and covariance matrix we computed all previously derived quantities (posterior probabilities, expected missclassification rates, expected costs and classifiers) by plugging in the estimates ̂0, ̂( ) and ̂ of . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021. ; https://doi.org/10.1101/2021.10.26.21265515 doi: medRxiv preprint subject i for the true population values in the equations. Moreover, we derived a sequential algorithm that stepwise adds new observations based on decision and selection rules. The sequential classifier predicts for a subject at the step (1 ≤ ≤ ) one of the possible diagnoses or stay in the neutral zone (decision rule) and selects which future measurement should be included next (selection rules). The sequential classifier (Eq. S17) consisted of the application of the prospective neutral zone classifier as given by Eq. 6 to individual future observations. We can think of applying the prospective neutral zone classifier for every left-over observation separately and assigning the label if at least for one observation the prospective neutral zone classifier reveals the label as outcome (see the Supplementary material for more information). In case the label is chosen a selection rule is applied to choose which (single) observation is included next for the prediction. We used two different selection rules i.e., the greedy rule where the earliest observation with expected cost reduction or the exhaustive rule where the observations with highest expected cost reduction is chosen as the next observation. In every step all quantities needed to compute the decision as well as selection rules are given by the estimated means and covariances contained in ̂( 1) respectively ̂( 2) and ̂. The code for PrOspective SEquentIal DiagnOsis with Neutral zones (POSEIDON) is available at https://git.upd.unibe.ch/openscience/POSEIDON.

Study data and empirical evaluation
Longitudinal data from individuals from Alzheimer's Disease Neuroimaging Initiative (ADNI) (42) and the Australian Imaging Biomarkers and Lifestyle flagship study of ageing (AIBL) (43) were included. AIBL study methodology has been reported previously (43). The ADNI was launched in 2003 with the primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer's disease (AD). For up-to-date information and data access see https://www.adni-info.org and https://adni.loni.usc.edu.
We included biological as well as cognitive markers to separate patients with MCI that do not convert to AD over a follow-up time of at least 2.75 years (MCI-stable, label = 2) or convert to manifest AD within 3.25 years since study entry (MCI-converter, = 1). As structural biomarker we used the SPARE-AD score (4) computed from regional brain volumes obtained from standard structural MRI with a multi-atlas segmentation algorithm (44) that captures how "AD-like" the structure of the brain of a subject is. The SPARE-AD score is the average decision value of an ensemble of linear support vector machine classifiers trained to discriminate cognitively healthy subjects from patient suffering from AD based on regional anatomical brain volumes. The ensemble was trained on multi-centric data from the iSTAGING Consortium. We include A 1−42 levels in the CSF (1) as invasive, AD-specific marker. Cognitive markers were scores given by either the MMSE or RAVLT. All four markers were irregularly measured with differing time points and number of observations between subjects. We implemented the described linear mixed-effects model based classification approach to perform nonsequential and sequential classification. We fitted linear mixed-effects models considering the longitudinal measurements of all variables and included only subjects that match the definition either of MCI-stable or MCI-converter and had at least 8 observations independently of the variable. Eventually, a sample with 336 MCI-stables and 269 MCI-converters were used to fit the 20 models for . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Fixed cross-sectional (using only the measurement(s) at baseline) or longitudinal (using measurements over time) and sequential decision strategies were applied to perform the classification task. The strategies were evaluated from a multi-objective perspective using a set of performance metrics such as e.g. the mean misclassification or measurement costs. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 26, 2021. ;   that is needed to make a finite decision without assessing ( < 1 (Δ . ) or > 2 (Δ . ), Eq. S11 and Eq. S12 in the Supplementary Material for more infornation). For the prospective neutral zone classifier, the prediction outcome for a given value of . is displayed by the coloured areas, the boundaries between the green and yellow area is Tab. 1: Performances of non-sequential and sequential decision strategies based on the default cost structure.  [1][2][3][4] : Accuracy, specificity, sensitivity and pre-conversion sensitivity with decision boundaries depending on the estimated unconditional class probabilities and specified misclassification costs 5 : Depend on misclassification costs, marker-specific costs of acquisition and costs of waiting 6 : Depend on marker-specific costs of acquisition and costs of waiting 7 : Chosen sequence of measurements depend on the specified misclassification costs, marker-specific costs of acquisition and costs of waiting . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)