Does my high blood pressure improve your survival? Overall and subgroup learning curves in health

Learning curves in health are of interest for a wide range of medical disciplines, healthcare providers, and policy makers. In this paper, we distinguish between three types of learning when identifying overall learning curves: economies of scale, learning from cumulative experience, and human capital depreciation. In addition, we approach the question of how treating more patients with specific characteristics predicts provider performance. To soften collinearity problems, we explore the use of least absolute shrinkage and selection operator regression as a variable selection method and Theil-Goldberger mixed estimation to augment the available information. We use data from the Belgian Transcatheter Aorta Valve Implantation (TAVI) registry, containing information on the first 860 TAVI procedures in Belgium. We find that treating an additional TAVI patient is associated with an increase in the probability of 2-year survival by about 0.16%-points. For adverse events like renal failure and stroke, we find that an extra day between procedures is associated with an increase in the probability for these events by 0.12%-points and 0.07%-points, respectively. Furthermore, we find evidence for positive learning effects from physicians' experience with defibrillation, treating patients with hypertension, and the use of certain types of replacement valves during the TAVI procedure.


Introduction
The idea of a link between the volume of treated patients and healthcare provider performance is well-known, but uncertainty remains about the source of this relationship.
In this respect, the issue of learning by doing or cumulative learning is most frequently encountered in empirical research. Economic research additionally considers economies of scale, human capital depreciation, reverse causality, level of specialization and social learning or "learning by watching" as factors explaining volume-outcome relationships (Ho, 2002;Gaynor et al., 2005;Huesch, 2009;Hockenberry and Helmchen, 2014;Lee et al., 2015;Mesman et al., 2015). Data collinearities however hamper inference so that theoretical arguments lead most studies to include only a subset of these effects. In this paper, we analyze multiple factors simultaneously and we emphasize the potential role of patient subgroups in the learning process. Common policies are volume thresholds for hospitals, report cards and team/provider training (Huesch and Sakakibara, 2009). The approach followed in this paper provides more information on where improvements may be made by healthcare providers and this might improve the quality of team and provider training.
In this paper, we identify three types of learning: economies of scale, cumulative experience and human capital depreciation. Economies of scale refer to total volume effects captured by annual hospital patient volumes. The rationale here is that hospitals with more patients are likely to be better equipped and to have more standardized procedures potentially mapping into better provider performance. Economies of scale have been found to be important for both CABG (coronary artery bypass graft) and PTCA (percutaneous transluminal coronary angioplasty) in the literature (Ho, 2002;Gaynor et al., 2005). Cumulative experience 1 refers to the number of patients that have been treated in the past capturing classical learning by doing effects. Finally, human capital depreciation evaluates the role of time since the last procedure. For CABG and PTCA, cumulative learning has been found to only play a minor role (Ho, 2002;Gaynor et al., 2005). In 1 In what follows, we use the terms "cumulative experience" and "cumulative learning" interchangeably.
2 contrast, human capital depreciation seems to be important for CABG (Hockenberry and Helmchen, 2014).
In line with the existing literature, we disentangle these overall learning effects in a first step. Subsequently, we follow a data-driven approach to detect learning mechanisms in patient subgroups. Given that performance is predicted by economies of scale, cumulative experience and/or human capital depreciation, these effects may well be driven by particular patient subgroups. More specifically, we assess how treating more patients with certain subgroup characteristics predicts overall health outcomes. Quantifying such information goes beyond the typical volume-outcome relationship and potentially imparts new insights on how to improve health outcomes for new, under-performing or lower volume providers and possibly allows to better identify the sources of learning. For typical outcomes like in-hospital mortality, physicians may have a general feeling on how to improve performance. This is however much less the case for other outcomes like long-term mortality or procedural characteristics.
Existing studies mostly explore patient subgroups from a different point of view, namely how overall volumes influence specific patient subgroups. In contrast to that, in this paper we asses how overall learning effects can be attributed to certain subgroups and therefore providing information for policy makers on how to concentrate procedures in hospitals. An example related to subpopulation effects in trauma care is Matsushima et al. (2014), where larger volumes of geriatric patients were associated with lower mortality and complications among geriatric patients. Larger non-geriatric volumes on the other hand were associated with higher odds of major complications. In Pasquale et al. (2001), higher-(overall) volume centers were more successful in treating patients in seven out of nine injury types (Caputo et al., 2014).
Our data covers all patients in Belgium that underwent a transcatether aorta valve implantation (TAVI) between 2007 and 2012 2 . For TAVI, experience has been shown to have an impact on mortality (30-day and 1-year), the duration of procedures, contrast 2 The very first patient underwent TAVI in Belgium in 2007.
In this study, we go beyond simple univariate/descriptive analysis as we isolate the learning effects at hand by controlling for a broad range of patient-and procedure-specific characteristics, as well as hospital fixed-effects 3 .
Overall we find that different learning processes apply for different health-related outcomes: While cumulative experience significantly predicts 24-month survival, human capital depreciation has a significant effect on several Major Adverse Cardiac and Cerebrovascular Events (MACE). These events occur as a result of procedural complications.
In particular, our results suggest an increase in 24-month survival of 0.16%-points for every extra TAVI patient treated. Furthermore, the likelihood of having a renal failure or a cerebrovascular stroke is increased by 0.12%-points and 0.07%-points respectively for every additional day since the last TAVI procedure.
While multicollinearity is an issue when estimating overall learning curves, it is even more problematic in the estimation of subgroup effects. By treating more patients overall, by definition, also more patients are treated within specific subgroups. Therefore, an important contribution of this paper to the learning curves literature is that we propose two methodological approaches on how to disentangle these (subgroup) learning effects.
Firstly, we single out relevant predictors for two-year mortality using Lasso regression (Tibshirani, 1996). Secondly, in a Bayesian spirit, we apply Theil-Goldberger mixed estimation to add objective information to the model to soften multicollinearity problems (Theil and Goldberger, 1961). Theil-Goldberger estimation allows the inclusion of prior information on a sum of coefficients which improves the information to identify subgroup learning effects. Subgroup learning effects for 24-month survival are found for patients with aortic aneurysm, atrial fibrillation, carotis disease, hypertension, porcelain aorta, NYHA category three and transfemoral access. That is, treating more patients with 3 We do not claim causality regarding our estimated learning effects as the data does not allow us to completely rule out reverse causality and potential patient or provider self-selection. 4 these characteristics positively or negatively influences 24-month survival. These effects may be attributed to improved knowledge, knowledge transfer or to selection effects in these subgroups.
The remainder of the paper is organized as follows: In section 2, we discuss the data at hand (the Belgian TAVI registry) and the variables used in the analysis. In section 3, we discuss the identification strategy and substantive questions of the paper. In section 4, we present our main results for the overall and subgroup learning effects. Section 5 continues with robustness checks before we draw final conclusions in section 6.

Data
This study uses the Belgian TAVI registry which contains detailed information on the first 860 patients undergoing TAVI in Belgium in 23 different centers between 2007 and 2012. The data has been collected at the participating centers and has been approved by the institutional ethics committees. The registry holds a wide range of control variables on patient-and procedure-specific characteristics, as well as hospital identifiers. In each hospital, the TAVI procedure is only executed by one specialized team, but we do not have information on physician or hospital characteristics 4 . Whereas the workload with respect to TAVI steadily increased over time, the workload of TAVI during our sample period was relatively limited. We have detailed information on the demographic background of patients, comorbidities, indicators for the severity of the cardiac problem and procedural characteristics (see section 3.1 for more details). The patient outcomes we study are 24month survival, as well as indicators for major adverse cardiac events (MACE) including renal failure, pacemaker implantation and stroke. Renal failure is known to be related to the use of contrast volumes during the TAVI procedure. Furthermore, stroke and pacemaker implantation are both typical complications in cardiac procedures and surgeries.
Pacemaker implantation is known to be strongly related to the type of valve that is used. 4 Physician and hospital characteristics have shown to be significant predictors of health outcomes (see e.g. Ho (2002)) and would therefore ideally be included in the analysis to further address patient and provider selection issues.

5
Two brands of valves are used in Belgian hospitals: CoreValve and SAPIEN and each Belgian hospital only uses one of the two brands. More detailed descriptive statistics are shown in the appendix. Figure 1 below shows a positive relationship between the number of TAVI patients treated and 2-year survival providing first evidence for learning from cumulative experience (see top left graph). The raw data is overlaid with linear and quadratic prediction plots (dashed and solid dark blue lines) and the gray areas show the 95% confidence intervals for the quadratic fits. Note also that the quadratic and linear fits are nearly identical pointing toward a linear relationship between 2-year survival and cumulative experience.
In addition, there is clearly a positive association between pacemaker implantations and cumulative experience (top right graph). This suggests that as more patients are treated, they are more likely to receive a pacemaker during the TAVI procedure. This finding is most likely driven by the use of CoreValve valves in the larger centers because this type of valve is known to be associated with pacemaker implantation. In contrast to that, the rate of adverse events for stroke and renal failure is roughly constant across all experience levels suggesting no volume-outcome relationship.
-Insert Figure 1 here -In line with the findings of Hockenberry and Helmchen (2014), it is expected that the probability of mortality or major adverse cardiac events increases with temporal distance to the last procedure. Physicians' skills may suffer from a spell without practice which makes them more likely to make suboptimal decisions during procedures. We find evidence for such negative effects of human capital depreciation for renal failure and stroke as depicted in Figure 2 below. In both cases, we observe a slightly positive relationship between the number of days since the last procedure and the likelihood of having a stroke or suffering from renal failure (see bottom left and right graphs). As for 2-year survival and pacemaker implantations the linear and quadratic fits diverge and thus the presence 6 of human capital depreciation effects is unclear in this context 5 .

Methodology and substantive questions
In a first step, we focus on overall learning curves in long-term patient survival and major adverse cardiac events (MACE) including renal failure, stroke and pacemaker implantations. Subsequently, we broaden our scope to learning curves for specific patient subgroups. This evokes two substantive questions: First, if more patients are treated overall, are specific subgroups heatlhier on average? Second, when treating more patients in specific subgroups, do providers get better in their overall care provision? Unlike the existing literature [e.g. Matsushima et al. (2014)], this paper focuses on the second question as it provides useful information to transfer knowledge between practitioners and possibly imparts new knowledge on how to improve health outcomes for under-performing or lower volume providers.

Overall Learning Curves
We estimate linear probability models (LPM) to identify overall learning effects. Following Huesch and Sakakibara (2009), we distinguish between three types of learning: economies of scale (Huckman and Pisano, 2006;Gaynor et al., 2005;Ho, 2002), learning from cumulative experience (Ho, 2002;Karamanoukian et al., 2000) and human capital depreciation (Hockenberry and Helmchen, 2014;Ramanarayanan, 2008;Huckman and Pisano, 2006). Using patient-level data, we estimate models of the following form: 5 See the appendixes A, B and C for more descriptive statistics of the data at hand.

7
Our outcome variables are binary indicators for 24-month survival and MACE indicators for pacemaker implantation, renal failure and stroke for patient i treated in hospital h in year t. Economies of Scale h,t measures the annual number of procedures in hospital h in year t picking up static scale effects. The rationale here is that high-volume hospitals are more likely to be better equipped and possibly have improved processes of care and more standardized procedures (Gaynor et al., 2005;Ho, 2002). Cumulative Experience i,h,t is the patient number for individual i in hospital h in year t reflecting learning from cumulative experience (Ho, 2002). This variable indicates if the treatment of additional patients predicts provider performance. Human Capital Depreciation i,h,t is the amount of days that have passed since the last TAVI procedure for patient i in hospital h and year t and captures the above mentioned human capital depreciation effect. It is sensible that the longer the time between procedures, the more skills suffer from the absence of practice (Hockenberry and Helmchen, 2014).
Besides our three main volume and time indicators, we control for a vector of patientand procedure-specific characteristics X i,h,t which includes information on the demographic background of a patient (age, gender), comorbidities (indicators for various heart diseases, diabetes, renal failure, angina and existing pacemaker), the severity of the cardiac problem (NYHA 6 categories, ejection fraction, aortic valve area, peak and mean gradient), as well as procedure-specific characteristics (type of valve and size of valve implanted). We control for these observable characteristics as they have been identified in the literature to be key determinants of mortality (Holt et al., 2007). Conditioning on all these factors allows us therefore to isolate the different types of learning effects outlined above. In addition, we include a vector of hospital fixed-effects H h to account for time-invariant unobserved factors such as quality of care and hospital management quality that potentially differ across hospitals and affect the health-related outcomes of interest. Finally, ε i,h,t is a classical error term capturing all unobserved factors such as genetic endowment and health behaviors of patients that also explain our outcomes of interest besides the included explanatory variables.
Time fixed-effects can also be added to the empirical specification to capture "learningfrom-watching" and technological improvements. Similar to Ho (2002), adding year fixedeffects likely results in highly collinear effects. Leaving out the time fixed-effects from our models then necessitates interpretation of other learning effects as upper bounds on the true effects because they may pick up part of the positive effect of technological improvements over time.

Multicollinearity and Subgroup Learning Curves
In the subgroup analysis, all variables from the overall analysis are retained and experience variables are introduced for all background characteristics. That is, if for example the 30 th patient for a provider (hospital) is the 15 th patient with hypertension for the same provider, the patient gets patient number 30 (cumulative experience) and experience for hypertension 15. Statistically these variables are likely to be strongly correlated 7 . By treating more patients overall, also more patients with renal failure, porcelain aorta, etc.
will be treated.
To deal with multicollinearity, two general solutions are often proposed: Firstly, the selection of a subset of variables remedies the consequences of multicollinearity by removing the collinearities. Therefore, we explore the use of the Least absolute shrinkage and selection operator (Lasso) to obtain an optimal subset of experience variables predicting our health-related outcomes of interest.
Secondly, increasing information provides more evidence to disentangle even collinear effects. This extra information may come from an increase of the sample size or from a restriction on regression coefficients. In this paper, we apply the Theil-Goldberger mixed estimation method to introduce uncertain information on a sum of coefficients.
Although variable selection methods and the use of extra information have very different 7 Correlations larger than 0.9 are no exception in our sample. 9 motivations, they are both applications of constraints in a regression analysis. This is discussed in more detail throughout the next sections.

The Least absolute shrinkage and selection operator (Lasso)
To select an optimal subset of regressors, multiple statistical approaches can be considered: Subset selection techniques such as forward-and backward-stepwise selection, forwardstagewise regression or shrinkage methods including Ridge regression and the Lasso. These methods improve prediction accuracy and interpretation but come at the cost of biased estimates (Hastie et al., 2009). Shrinkage methods are based on the idea of shrinking single coefficients or sets of coefficients towards zero which trades off lower variance for increased bias. Among all shrinkage methods, Lasso regression introduced by Tibshirani (1996), is favored in this paper because it is more subtle compared to forward-and backwardselection, while at the same time it provides sparser results compared to Ridge regression and Elastic Net regression. Technically, the Lasso minimizes the residual sum of squares subject to the constraint that the sum of all absolute values of coefficients is below some constant. Following Hastie et al. (2009) we have: where the first part of equation 2 simply finds the β s for which the sum of squared residuals is lowest. The second part states that the minimization is subject to the condition that the sum of the absolute values of β should be lower than a constant t. Whereas the approach is similar to Ridge regression where a similar constraint is placed on the sum of all squared coefficients, the geometric properties of the Lasso set more coefficients exactly equal to zero (Tibshirani, 1996). Additionally, because it is not a discrete process in which variables are added one by one, the Lasso is less greedy than forward-or backwardvariable selection (Efron et al., 2004). Lasso estimates are obtained from the Least Angle Regression Selection (LARS) algorithm in which the optimal set of coefficients is the one where Mallows' Cp reaches a minimum.
The Lasso singles out the most significant variables that predict health outcomes. Next to the standard Lasso, we also employ some modifications and extensions of the Lasso as a robustness exercise. Firstly, we use the Lasso to select a subset of optimal predictors and use them in standard OLS regressions. This approach is suggested in Efron et al. (2004); Meinshausen (2007); Hastie et al. (2009) to reduce the bias and to allow for a simpler interpretation of the coefficients. Secondly, we also restrict the Lasso by adding the "main effects" first. Thirdly, we run logistic Lasso regressions as our outcomes of interest are binary. Similarly to the Least Squares Lasso, an L 1 8 penalty on the absolute values of coefficients can be introduced to logistic regression (Genkin et al., 2007). Fourthly, we also apply Elastic Net regression which is a hybrid between Lasso and Ridge regression and uses weighted L 1 and L 2 penalties. The benefit of Elastic Net is that it copes better with highly correlated regressors, i.e., among groups of highly correlated variables it singles out multiple variables whereas the Lasso only includes one variable.

Including prior information with Theil-Goldberger mixed estimation
Multicollinearity can be interpreted as the occurrence of "undominated uncertain prior information" (Leamer, 1973). This definition points out that including extra prior information might soften the multicollinearity problem. In this study, prior information from within the data can be used to estimate subgroup effects. Intuitively it is clear that the overall cumulative learning effect is the sum of all underlying subgroup effects.
Interesting in this regard is the Theil and Goldberger (1961) mixed estimation method which uses GLS on an augmented dataset. In this augmented dataset, the data is supplemented by a dummy observation with information on the mean and variance of a (sum of) coefficient(s).
The Theil-Goldberger coefficients and variances, where prior and data information are efficiently weighted in a GLS framework, are given by (Theil and Goldberger, 1961): and X is a n × k matrix of observations on independent variables; Ω is the n × n variancecovariance matrix of error terms and Ψ is the variance-covariance matrix of the prior information. For prior information on a sum of coefficients, the 1 × k vector R and the scalar r have to be specified. For example, imposing a constraint on the sum of β 1 and β 2 could be achieved by specifying: As such, equations (3) and (4) are the result of applying GLS to the following two equations: and Equation (7) holds the relationship for the "real" data. Next, the real data is augmented with an extra observation in the form of the constraint in equation (8). The main dif-ference with applying exact constraints is that there is some uncertainty about the prior information (hence the Ψ matrix). To see how the internal information can be used here as prior information, first consider a consistent (and linear) estimate of the learning curve: s ij stands for two-year survival for individual i in hospital j and Cumulative Experience is the patient number of an individual (e.g. patient number 1 in hospital 20). In equation (9), the coefficient on Cumulative Experience, β 1 , is consistently estimated using standard regression techniques. Now let us think of a second model: In this model Exper characteristic 1 ij is a variable taking the value zero for the first patients until there has been one person with a certain characteristic (say characteristic 1). From then onwards, the variable Exper characteristic 1 ij takes on the value one and is not increased until another patient with the same characteristic is treated. In equation (9), β 1 is a consistent estimate for the increase in the health outcome every time an extra patient is treated in a hospital. In equation (10), the same increase in health outcome for every extra patient is given by γ 1 and γ 3 . That is, every time an extra patient is treated, health increases by γ 1 and also with approximately the amount γ 3 ×avg(characteristic 1).
The increase with γ 1 is obvious while the second part is an increase of γ 3 for every patient with characteristic one and on average only avg(characteristic 1) of the patient population has the characteristic. As such on average, every time an extra patient is treated, the outcome increases by γ 3 × avg(characteristic 1). β 1 in equation (9) can therefore be seen as the sum of γ 1 and γ 3 × avg(characteristic 1). The translation of this prior theoretical knowledge to the matrices that define the constraint on the sum of 13 coefficients is as follows: and

Endogeneity
In our context, endogeneity issues regarding the estimated learning effects may typically arise because of (i) selective referral or (ii) reverse causality. In principle, more experienced providers might be able to select desirable patients with a higher likelihood of survival and refer others to their colleagues thus creating a classical omitted variable bias problem (patient selection). In addition, if overall health outcomes for different health care providers are publicly known, then this might cause certain patients to select into specific hospitals (provider selection). We cannot perfectly address these selection issues in our analysis due to the lack of detailed physician and hospital characteristics ruling out a causal interpretation of the estimated learning effects. However, by splitting up the learning effect in subgroups, as illustrated in section 3.2, we capture part of the patient selection (and selective referral effect) in the subgroup analyses. Subgroup experience effects represent either a true learning effect or a selection effect regarding a specific subgroup, e.g., over time patients with a less severe degree of a characteristic could be more likely to be treated than others. In interpreting the results, it is important to stress that both effects are interesting in their own right, but that they cannot be empirically disentangled. Furthermore, because subgroup experiences are included, the overall cumulative experience effect is measured more accurately because the selection in these subgroups is no longer captured by the overall effect.
Regarding reverse causality, the existing literature provides mixed evidence on the di-14 rection of causation: Gaynor et al. (2005) and Ho (2002) find that causality mainly runs from volume to outcome whereas Ramanarayanan (2008) finds that sicker patients may select higher volume providers. In the Belgian setting, there is very little public information on hospital quality, and even less on procedure related hospital quality. Moreover, mortality rates across hospitals are only available to practitioners and are anonymized.

Overall Learning Curves
We estimate the overall learning effects using linear probability models (LPM) for 24month survival, as well as several Major Adverse Cardiac Events (MACE) including pacemaker implantation, renal failure and stroke. Model one shows the plain overall learning effects for the three learning measures economies of scale, learning from cumulative experience and human capital depreciation; model two adds patient-and procedure-specific characteristics as described above in section 3.1; finally, in model three we also include hospital fixed-effects. In addition, we include a binary indicator for "Zero Days Since Last Procedure" in all model specifications because there are regularly two TAVI procedures scheduled on one day. We also replicated our findings using probit/logit specifications to relax the implicit linearity assumption in the marginal effects 9 The estimated overall learning effects on survival can be found in table I  -Insert Table I here - In addition to the effects on patient survival, we also analyze adverse cardiac events.
The results are summarized in table I above. While cumulative learning showed to be significant for survival, human capital depreciation is significant for several MACE. The likelihood of suffering a renal failure or a stroke during the procedure is significantly higher when more days have passed since the last TAVI procedure. The estimates suggest that the likelihood of renal failure after TAVI is 0.12%-points higher for every additional day since the last procedure. For stroke, an additional day since the last procedure is associated with an increase in the probability of suffering a stroke of about 0.07%-points.
Again these skill depreciation effects can be considered sizeable as the average number of days between procedures is more than 10 days across all hospitals and time periods.
Regarding stroke, we also find that patients treated on the same day ("Zero Days Since Last Procedure") have a higher probability of getting a stroke which may point out that the team loses concentration during the course of a given day. As can be seen in the robustness section, the results on MACE should be interpreted with care as the results are sometimes driven by only a few extreme observations.
Summing up, we find that different types of learning apply for different outcomes: Learning from cumulative experience is relevant for 24-month survival and more frequent practice plays a key role for adverse events. Skills required for preventing these events may depreciate over time as illustrated for renal failure and stroke.

Subgroup Learning Curves
Knowing that different types of learning apply for different outcomes, it is interesting to investigate to what extent patient subgroups account for these overall learning effects.  , where R 2 k is the R-squared from a regression of x k on all other explanatory variables 11 A porcelain aorta is a heavily calcified ascending thoracic aorta which may obviate usual aortic valve replacement through that approach.
higher probability of being alive after two years. Whereas the CoreValve indicator is the most robust experience variable across all specifications, valve types are constant within hospitals and might therefore pick up part of the learning differences between hospitals.
One possible explanation for this effect is the practice of clinical proctoring which is more extensive for Corevalve. In practice this means that Corevalve users receive a longer guidance from an experienced physician. Besides the subgroup experience variables, the Lasso also identifies the annual TAVI patient volumes (economies of scale) as a key negative predictor of 2-year survival. In column (2), the Elastic Net singles out more subgroup experience variables. Nevertheless, the original variables from Lasso are retained and they practically have the highest effects on 2-year survival 12 . Finally, the results in column (3) show the selection of optimal subsets of predictors for a logistic Lasso specification. The results again provide evidence for a positive subgroup learning effect of using CoreValve replacement valves. However, the logistic Lasso includes substantially fewer variables and adds several variables in comparison with the non-logistic Lasso. Whereas without shrinking, results between logistic and LPM's are mostly similar, this does not seem to hold when an L1 penalty is applied. The divergence may be due to different penalties in the ordinary and logistic lasso combined with the lasso adjustment to the lars algorithm (when a variable is changing signs, the coefficient is temporarily set to 0).
-Insert Table III here -

Theil-Goldberger Mixed Estimation
Whereas computation of the Lasso is relatively straightforward, the Theil-Goldberger (TG) method is not standardly available in statistical software 13 . Computation requires 12 Results from including quadratic terms to capture non-linearities are qualitatively similar to results in table 3. 13 In Stata, the tgmixed command implements a limited version of the Theil-Goldberger mixed estimation method. There is no option to include robust standard errors for the "real" data (which is a priori essential for a Linear Probability Model) and there is no possibility to insert prior information on the sum of coefficients. Mata program code is available upon request, the tgmixed command ado file was used as a guide. the calculation of the formulas in equation (4) and (5) (see section 3.2.2 above). Results of this computation with a stochastic constraint are provided in specifications (3)-(4) in table IV below next to OLS where specification (1) shows LPM estimates using heteroscedasticity-robust standard errors and specification (2) shows generalized least squares (GLS) estimates. For the Theil-Goldberger estimation we implement robust standard errors (specification (3)) and a GLS form (specification (4)) for the non-augmented part. The latter refers to the use of weighted least squares on the non-augmented data, i.e., the matrix Ω −1 is estimated to obtain a feasible GLS estimate.
Overall, the results show that the change from OLS to GLS contributes more in efficiency terms than TG estimation. In contrast, the effect of implementing the stochastic constraint on coefficients only has a limited impact on standard errors and significance.
However, there are some differences in significance between models (2) and (4). Intuitively, the additional information that is added in the TG estimation seems rather limited. Nevertheless, although there are no qualitative differences between the OLS and TG specifications without GLS, we clearly observe differences in significance between the second and fourth specification (see grey shaded bars in table IV below): On top of experience with aortic aneurysm ("Experience aortic aneurism"), carotis disease ("Experience carotis disease"), hypertension ("Experience hypertension"), porcelain aorta ("Experience porcelain aorta") and transfemoral access ("Experience transfemoral access"), we find additional significant effects for atrial fibrillation ("Experience atrial fibrillation") and New York Heart Association category 3 ("Experience NYHAcat3") when using TG mixed estimation.
These findings should be closely scrutinized to find how experience translates in better outcomes. In particular, we find evidence for positive learning effects on 2-year survival for treating more patients overall (learning from cumulative experience), as well as treating more patients with hypertension. On the other hand, treating additional patients with an aortic aneurism, atrial fibrillation, carotis disease, porcelain aorta or using the transfemoral access route is associated with lower patient survival and thus indicating negative 20 subgroup learning effects. This evidence for negative subgroup learning strongly suggests the presence of selection effects. For patients with these characteristics the severity of the characteristic changes over time such that they are more likely to die in two years' time.
-Insert Table IV here -Summing up, the comparison of the the TG mixed estimation with the Lasso results in Table III provides mixed evidence. While both methods single out subgroup effects as important factors for survival, there is no agreement on which subgroups are more relevant.
This finding may result from the fact that in the Lasso, some of the variables pick up effects from others that are truly controlled for in the Theil-Goldberger method or from the inappropriateness of the summation constraint on the subgroups. As a consequence, we suggest to compare both Lasso and Theil-Goldberger mixed estimation results and to interpret them with care. These results should then be further discussed and investigated by policy makers and practitioners to improve survival.

Robustness checks
While we have already assessed the robustness of our findings to different model specifications and estimation techniques, in this subsection, we check the robustness of our findings to changes in the sample size. In table V below, we remove the four largest hospitals one by one from the regressions with 2-year survival as the dependent variable. The results from these removals are almost perfectly identical to the original regressions shown above in table I. In particular, we find a positive and significant effect of cumulative experience on 2-year survival 14 . As above, there is no evidence for an effect of economies of scale or human capital depreciation on 24-month survival as essentially none of the coefficients is statistically significant different from zero.
For adverse cardiac events displayed in table VI, we find that the significant effect of human capital depreciation for renal failure disappears when cases that were more than 100 days apart were removed mainly because of a sizeable drop in the coefficient size.
However, the effect reappears if the sample is increased to procedures with less than 150 days time difference (see specification 2). The results for renal failure are therefore mainly driven by observations for which more than 100 days have passed since the team has last performed the TAVI procedure indicating that the negative effects of human capital depreciation only manifest themselves for relatively long time periods between procedures.
Interestingly, the same pattern emerges when analyzing the effects on having a stroke: The coefficient on human capital depreciation becomes insignificant for observations with fewer than 100 days difference but turns significant once including all observations with less than 150 days between procedures. The insignificance can be attributed to a drop in degrees of freedom because the coefficient increases in size. In line with the results above, when the sample is restricted, we find a highly significant and positive effect on the "Zero days since last procedure" indicator in all subsamples pointing towards that the cardiology team loses concentration during the course of a day and therefore is more likely to cause medical errors when performing more than one procedure on a given day.
-Insert Table VI here -Whereas a broad range of results is provided, several concerns remain. First, while in the Belgian case there is little evidence for a causal relationship from outcome to volume, it would have been better to explicitly address this issue in the analysis. To remove the endogeneity bias caused by selective referrals in the overall effect, the literature usually employs instrumental variable methods. However, we did not have any sensible instrument at our disposal. Because our focus lies on the subgroup analysis, also selection is informative to obtain knowledge to improve performance. The drawback of our method is 22 that we are unable to disentangle true learning from selection effects. Second, assuming effects to be linear imposes a heavy strain on the analysis. Including squared terms in the Lasso regressions did not qualitatively influence the results. For the Theil-Goldberger method, if in fact the overall learning curve would be non-linear, the specific structure of the experience variables may pick up these non-linearities. Graphical intuition on this argument is provided in Appendix C. The combination of the Theil-Goldberger mixed estimation method and logistic models is practically infeasible and therefore this limitation remains. Nevertheless, a linear approach provides a useful first insight in the decomposition of learning curves.

Conclusion
In the last decades, a whole strand of literature has contributed to learning, volume and scale effects in healthcare provision. In this paper we explore both overall, as well as subgroup learning curves using information on the first 860 Transcatheter Aorta Valve Implantations (TAVI) in Belgium. Considering overall learning, we distinguish between economies of scale, learning from cumulative experience and human capital depreciation and assess their role for patient survival and adverse cardiac events during the TAVI procedure. Overall, our analysis shows that different types of learning apply for different outcomes: while cumulative experience is of great importance for 24-month survival, more frequent practice plays a key role for adverse events like renal failure and stroke.
In addition, we extend the existing literature by exploring subgroup learning effects which provides an extra instrument to potentially improve and explain provider performance. Knowing that certain groups of patients contribute to the learning process gives more detailed information for both policy makers and healthcare providers to improve clinical practice. We apply both Lasso regression as a variable selection method and Theil-Goldberger mixed estimation to augment the data. Underlying the overall effects of treating more patients are subgroup learning effects for experience with using CoreValves 23 replacement valves, hypertension, aortic aneurysm and physicians experience with defibrillators -to name a few. Trying to improve processes of care, these groups or techniques should be closely investigated by both practitioners and policy makers.    (1) and (2) show the OLS coefficient estimates on the Lasso and Elastic Net selection of optimal predictors for 2-year survival. Column (3) displays the Logistic Lasso selection of predictors: *** p < 0.01 ** p < 0.05 * p < 0.1. Notes: OLS and Theil-Goldberger Mixed Estimation estimates of the overall and subgroup learning curves. For the sake of brevity, the coefficient estimates on the patient-and procedure-specific characteristics and the hospital indicators are not shown in the table above. Heteroscedasticityrobust standard errors in parentheses: *** p < 0.01 ** p < 0.05 * p < 0.1. Notes: For the sake of brevity, the coefficient estimates on the patient-and procedure-specific characteristics and the hospital indicators are not shown in the table above. Heteroscedasticity-robust standard errors in parentheses: *** p < 0.01 ** p < 0.05 * p < 0.1. Notes: For the sake of brevity, the coefficient estimates on the patient-and procedure-specific characteristics and the hospital indicators are not shown in the table above. Heteroscedasticity robust standard errors in parentheses: *** p < 0.01 ** p < 0.05 * p < 0.1.  The number of days since last performing TAVI ranges from a minimum of zero days to a maximum of 408 days with a mean value of roughly 26 days (median 14 days) in between procedures. The distribution of the temporal distance to the last TAVI procedure in the overall sample is shown in Figure 4 below.