Does Full Insurance Increase the Demand for Health Care?

We estimate the causal effect of having full health insurance on health care expenditures. We take advantage of a unique quasi-experimental setup in which deductibles and co-payments were zero in a managed care plan, and non-zero in regular insurance, until a policy change forced all individuals with an active plan to cover a minimum amount of their expenses. Using panel data and a non-linear difference-in-differences strategy, we find a demand elasticity of about -0.14 comparing full insurance with the cost-sharing model, and a significant upward shift in the likelihood to generate costs.


Introduction
Health insurance plans usually contain some form of patient cost-sharing such as deductibles or co-payments in order to mitigate moral hazard effects. The common measure of moral hazard effects in health insurance is the price sensitivity of health care demand, conditional on health status (Pauly, 1968;Cutler and Zeckhauser, 2000). The first estimates of this price sensitivity were obtained in the RAND Health Insurance Experiment (HIE), which was run during the 1970s. Aron-Dine et al. (2013) provide an account of the HIE and a re-analysis of the experimental data within the modern causal analysis framework. Their estimates of the demand elasticity comparing full insurance with several plans containing different degrees of cost-sharing are in the range of -0.1 to -0.2. This of course corresponds to the well-known benchmark estimate of roughly -0.2 reported in Keeler and Rolph (1988).
A major challenge in identifying the price elasticity is the definition of the price. Demand is usually measured by expenditures for health care, not units of health care services. The price of an additional dollar spent on health care is then the share of that dollar which the patient has to pay out-of-pocket. Given the non-linear cost-sharing schedules in most health insurance plans caused by deductibles, co-payments and stop-losses, this marginal price changes as health care expenditures accumulate during the year. Keeler et al. (1977) derived a model of medical consumption decisions in the presence of deductibles using a dynamic programming approach. They demonstrated that the correct price for consumers to use when making health consumption decisions is the shadow (or effective) price. They also showed that using the "wrong" price (e.g., the marginal instead of the effective price) leads to biased estimates of price responsiveness. Ellis (1986) provides evidence that the expected end-of-year price is a good proxy for the shadow price. However, due to the complexity of the model, the effective price has not been used in applied work until recently. 1 For example, the RAND elasticity of -0.2 is calculated assuming individuals respond only to the spot price. More recent stud-1 Ellis (1986) is a notable exception.
2 ies have assumed that individuals respond to the actual (realized) end-of-year price (Eichner, 1998;Kowalski, 2009;Marsh, 2012). Aron-Dine et al. (2012) provide evidence that the insured are not myopic and indeed respond to the expected end-of-year price.
In this paper, we sidestep the problem of defining the correct price taking advantage of panel data, provided by a large health insurance company in Switzerland, combined with a unique quasi-experimental setup. For a group of individuals (those in the HMO plan), there was a limited period of time without any cost-sharing. We estimate the behavioral changes induced by going from some (unknown) positive effective price to price zero, implying a price reduction of 100%. As a control group, we use individuals within the regular health care plan of the same insurance company living in the same cities.
The analysis is carried out within the potential outcomes framework. We look at the average causal effect of having full health insurance, as opposed to cost-sharing. This effect is defined as the difference between the actual average outcomes (health care expenditures or incidence of zero expenditures) with full health insurance and the counterfactual expected outcomes in absence of full insurance. In addition, we estimate the causal effects at different quantiles of the health care expenditures distribution. The econometric analysis uses the standard difference-in-differences (DID) estimator and the more recent changes-in-changes (CIC) estimator proposed by Athey and Imbens (2006). Note that in our case, treatment (full insurance) occurred in the first observation period (2002), and the no-treatment periods took place in the following years 2003 and 2004. We interpret this as a "reverse" DID model.
Having two no-treatment periods allows us to examine our identification strategy because we should not find any effect in the comparison of the years 2003 and 2004.
Our results suggest a small but significant positive effect of full insurance on expected expenditures within the HMO group. The estimated effect translates into an elasticity of -0.14. The probability of having positive health care expenditures is increased by about 7%points (from 0.76 to 0.83). This corresponds to an increase by roughly 10%.
Methodologically, the paper most related to ours is Borah et al. (2011). They analyze a 3 setup in which one employer in the US changed its policy such that only a high deductible plan with markedly larger deductibles than before was available for the employees. There was no such change in the plan offered by the control employer. Both employers operate in similar communities in close geographic proximity. They estimate both average and quantile treatment effects of having the high deductible plan on total health care expenditures using the DID and the CIC estimators. Their estimated causal effect is roughly -10%, but the estimate is only significant when they exclude the top percentile. From the information given in the paper it is not possible to compute a price elasticity.
The remainder of the paper is organized as follows. Section 2 briefly gives some institutional background, Section 3 describes the data, Section 4 provides the econometric framework, and Section 5 discusses the empirical results. Section 6 concludes.

Institutional context
Since the reform of the Swiss health insurance law in 1996, a basic health insurance is mandatory for all people living or working in Switzerland (with few exceptions, e.g., staff of international organizations or diplomats). Coverage is for a rather comprehensive set of medical services and pharmaceuticals, offered by about 80 private, not-for-profit insurers competing in a regulated market. Free consumer choice of plan is a distinctive feature of the system.
There is no pre-selection of plans by employers or government agencies. Insurers are obliged to accept all applicants during annual open enrollment periods. The contracts have a duration of one year. Premiums are community-rated, not risk-rated. In the baseline contract (referred to as the regular health care plan), insured individuals enjoy unlimited access to all licensed physicians and most hospitals in their region of residence. In managed care type plans, access to health care is through a gatekeeper.
In the regular plan, there was a minimum annual deductible of CHF 230 in the years 2002 and 2003, which was increased to CHF 300 since 2004, and a co-payment rate of 10 percent up to a stop-loss at CHF 700 (plus the deductible) per year. The insured can opt for higher deductibles, which were regulated at 400, 600, 1200 or 1500 CHF during the analysis period.
These are offered with premium reductions, which are fixed percentages of the base premium or 80 percent of the additional financial risk taken by the consumer (deductible minus 230), whichever is less. The same deductible levels are typically applied to managed-care plans, although the health insurance law allows insurers to set zero cost-sharing in those plans. We will further explore this exemption below. Since all contracts are on an individual basis, there are no family-related shared deductibles.
Physicians in independent practice are reimbursed fee-for-service according to an administered fee schedule that is collectively bargained between the providers' and the insurers' associations. Hospitals receive per diems for patients treated (the nation-wide introduction of a DRG system was introduced in 2012). In addition to the mandatory basic insurance, there are several voluntary supplementary insurance types available. By contrast, these are risk-rated, and we do not consider them in our analysis.

Health insurance data
To make the HMO plan more attractive, the health insurance company CSS offered it without deductibles and co-payments from 1996 until 2002. In other words, the HMO plan provided full health insurance. In 2003, after complaints by the HMO physicians regarding above average health costs generated by CSS-insured patients (HMO practices also treated patients from other insurance companies), CSS introduced the same cost-sharing instruments as in their other plans. This introduction of cost-sharing provides a possibility to estimate the causal effect of full health insurance on health care demand.
CSS provided data for four large Swiss cities (Zurich, Basel, Berne, and Lucerne) where the HMO plan was on offer. The data cover the years 2002 to 2004. Prior to 2002 the data were not available in comparable quality due to changes in the electronic billing system. Hence, we confine ourselves to the last year in which the HMO plan provided full insurance and the two following years in which all insured faced the same cost-sharing options. In addition, we have data for all insured in regular contracts living in the same cities during the same time period.
These insured will serve as our control group.
The data are very reliable in the outcome variable effective health cost (which for simplicity we treat synonymously to health care expenditures) per year and the chosen insurance contract.
In addition to the total cost, we also use a discretized version indicating the extensive margin, i.e., the incidence of zero cost. As is often the case with administrative data, there are only a few control variables available. In the present case, we have age, gender and city of residence. Table 1 shows descriptive statistics of the variables. The sample consists of all insured in the regular or the HMO plan observed over the full three years. This yields a total of 85,626 observations (28,542 insured times 3 years). The fraction of HMO is about 12.5 percent. As one would expect, the HMO insured generate significantly smaller health costs with about CHF 1700 per year (fraction of 22% with no cost) as opposed to about CHF 4300 on average (16% with no cost) for the regularly insured.
-Insert Table 1 about here - The background characteristics also indicate that the HMO group differs from the regularly insured, in terms of age, gender, and regional composition. Nevertheless, we will argue below that the latter group can serve as a suitable control when looking at trends, not levels. Table   2 gives an overview of these trends in mean outcomes by group (HMO vs. regular) and by year for observation period. Between 2002 and 2003, average health care expenditures for the HMO group slightly decreased (from CHF 1678 to CHF 1563) whereas that for the regularly insured increased by almost CHF 400. The fraction of zero cost increased by about 9%-points for HMO, and it stayed about constant for the regularly insured.
-Insert Table 2 about here -6 The 2002/03 changes for HMO possibly reflect the switch from full insurance to costsharing. On the other hand, the 2003/04 comparison can help us to evaluate our identification strategy, because all insured faced the same cost-sharing instruments. We observe that the fraction of zero cost is higher for HMO as opposed to regular in these two years, as one would expect, but the fraction stays about constant for both groups from 2003 to 2004. And even though the cost increase for the regularly insured is almost twice as large as for the HMO insured, in terms of growth rates we observe similar changes (+23.8% for HMO as opposed to +22.5% for regular).

Econometrics
To formalize the comparisons (and underlying assumptions) of the previous section in a potential outcomes framework, we now introduce our econometric model. It is based on variants of the difference-in-differences (DID) estimator. The basic idea of DID is that the effect of full health insurance for HMO can be estimated by a comparison of the 2002 and 2003 outcomes for that group, net of the time trend in a suitable control group (see Imbens and Wooldridge (2009) for an overview of DID methods). We have to deal with two complications, however. First, in reaction to the changes in cost-sharing for HMO, some insured may either have changed their plan, or left CSS for another insurance company. This behavior may affect the composition of the treatment and control groups, which would invalidate the DID identification assumptions.
We address this potential selection problem in Section 5.1.
Second, there is the issue of the distribution of health cost, which has a large probability mass at zero. The traditional approach in health economics is to use a two-part model in which the discrete and the continuous part of the distribution are analyzed separately (Mullahy, 1998;Buntin and Zaslavsky, 2004). However, Angrist (2001) argues that in the second part (conditional on positive cost) the causal effect is not well-defined, essentially because conditioning on an outcome is not valid in the potential outcomes framework. Furthermore, in a difference-in-differences context, it is unclear how to apply the two-part model because both treated and control observations may switch between the two parts across time periods.
For this reason we model the extensive margin (incidence of zero expenditures) as well as the mean and quantiles of the entire outcome distribution.

The standard DID model
We consider the standard model in which there are two time periods T ∈ {0, 1} and two groups G ∈ {0, 1}. The potential outcome without treatment is denoted as Y 0 , the potential outcome with treatment as Y 1 . The observed outcome for individual i is given by In the standard DID model, the potential outcome without treatment is given by where β measures the time effect, γ the group (or selection) effect, and ε i represents unobserved characteristics independent of G i and T i . The standard DID estimand, τ DID , is which identifies the average effect of the treatment on the treated (ATT), formally The four expectations are easily estimated by their sample analogues or by regressing the observed outcome Y i on T i , G i and D i . The main identifying assumption is that in absence of treatment the time trend of Y is the same for the two groups defined by G. It is important to note that this assumption cannot hold for nonlinear transformations of Y . In the present context this means that the common trend assumption may either hold in levels or in logs (growth), but not in both. We come back to this point in Section 4.3. Athey and Imbens (2006) propose to generalize the standard DID model. First, they write the outcome in absence of treatment as

The CIC model
where h(u, t) is an increasing function in u. The random variable U represents unobservable characteristics (e.g., risk aversion, health status). The distribution of U is allowed to vary across groups, but not over time, i.e., U i ⊥T i |G i . Hence, in absence of treatment, an individual with U i = u will have the same outcome in a given time period t irrespective of her group membership. The standard DID model is nested in eq. (4) under linearity assumptions imposed To derive the CIC estimator, denote the four observed outcomes of Y as Y gt . For example, Y 11 is the observed outcome of Y for individuals with g = 1 and t = 1. Assuming these are the observations receiving the treatment, we have Y 11 = Y 1 11 , i.e., the observed outcome corresponds to the potential outcome. The other three observed outcomes, Y 00 , Y 01 and Y 10 are used to estimate the counterfactual outcome Y 0 11 . The cumulative distribution functions of the observed outcomes Y gt are denoted by F gt .
The assumptions required for the identification of the CIC model for continuous outcomes include eq. (4) combined with the time invariance assumption U i ⊥T i |G i . To illustrate the idea of the CIC model, consider an individual in the treatment group (g = 1) prior to the treatment (t = 0), whose observed outcome is y. Let F 10 (y) = q. Given the CIC assumptions, this individual would have the same y if it were in the control group instead. Hence, the rank of this individual in the distribution of Y 00 is given by F 00 (y) = q . Because the distribution of U is independent of time, the second period outcome of an individual with U i = u in absence of treatment is the q -quantile of F 01 . Formally, y is the estimate of the counterfactual outcome without treatment for a treatment group observation with pre-treatment outcome y. The counterfactual distribution of Y 0 11 , denoted by F 0 11 (y), is given by Now, the CIC estimand for the average treatment effect τ CIC can be written as The corresponding quantile treatment effects at quantile q are given by For additional details see Athey and Imbens (2006). Note that the described estimator is for continuous outcomes and requires that h(u, t) is strictly increasing in u. Treatment effects for discrete outcomes are are not point identified in the CIC framework. Athey and Imbens (2006) derive bounds for these cases. They show that the estimator described above corresponds to the lower bound of the treatment effect in the case of discrete outcomes.

Estimation model
In our baseline model, we focus on individuals who remain in their insurance plan throughout the observation period. For this reason, we do not need to condition on individual characteristics like age, gender and region of residence. Their effects are absorbed in the group effect.
The DID estimates are obtained from the following general specification where G i = 1 denotes membership in the HMO plan, T 02 and T 04 are dummies for 2002 and 2004, and 2003 is used as the base year. Note that this setup forms a "reversed" difference-indifferences design in which the treatment (full insurance) takes place in the first period (2002).
In periods 2 and 3, treated and control face identical cost-sharing incentives.
The average effect of the treatment on the treated (ATT) is calculated as This gives the average causal effect of having full health insurance on Y i (health expenditures and incidence of zero cost) for the HMO group. The placebo effect (PE) is calculated as The placebo effect serves as a test of our identification strategy. Since both the treatment and the control group face identical cost-sharing options in 2003 and 2004, the placebo effect should be zero, which can be formally tested.
The specification of m(·) depends on the outcome considered. For the zero cost indicator, the linear index is fully saturated, and hence we estimate the model by OLS with m(·) the identity function. The left-hand side of eq. (9) corresponds to the probability of having zero health costs in this case (denoted by P (0) below), and the coefficients on the interaction terms, δ 1 and δ 2 , are the ATT of having full insurance and the placebo effect, respectively.
For the effect of full health insurance on total expenditures (including the zeros) the specification of m(·) is more subtle. Given that the common trend assumption cannot be fulfilled in say levels and growth rates, the specification of m(·) is critical for the validity of the assumption. In a first step, we use the traditional specification for m(·), the identity function, which yields the common linear DID model. In the second specification, and to avoid the log transformation of the dependent variable (which would require some arbitrary assumptions for the observations with zero expenditures), we choose a generalized linear model (GLM), in which m(·) = exp(·). In this case, we estimate eq. (9) by quasi maximum likelihood assuming a Poisson distribution. Gourieroux et al. (1984) have shown that this estimator is consistent if the mean function is correctly specified (see also Wooldridge (2010)). We use the placebo effect as a device to decide upon the preferred specification of m(·).
Estimation of the CIC model does not require the specification of m(·), and the quantities in eqs. (7) and (8) can be estimated fully nonparametrically. The CIC model can therefore be used as an additional check for the specification of m(·) in the parametric alternatives.

Selection out of HMO
Restricting the baseline sample to stayers in their plan over the full three years implies that the group effect accounts for all individual time-constant background. Common changes over time are captured by the time effect. Moreover, within the stayer sample the independence assumption U ⊥T |G in the CIC model seems very plausible due to the short panel structure of the data, i.e., the distribution of unobserved factors like health status or preferences likely remains constant within group (HMO or regular insurance) over the three years. The restriction to stayers affected 1.96% of the total sample, or 1,686 out of the 85,626 observations. 4 Even though this fraction is relatively low, we evaluate whether selection out of HMO may pose a problem to the external validity of our results, i.e., whether the selection is in any way related to the introduction of cost-sharing in HMO.  Table 3 shows the marginal/discrete probability effects for the three outcomes.  Table 3 do not significantly differ between the 2002/03 and 2003/04 samples.
-Insert Table 3 about here -Hence, we conclude that overall there is little evidence that would suggest that the introduction of cost-sharing in HMO caused different selection mechanisms than we would observe otherwise. For our DID/CIC analysis this implies that restricting the sample to stayers within their plans is a reasonable choice for the baseline sample. Any impact that we find in the 2002/03 comparison can then be attributed to having full insurance in HMO. -Insert Table 4 about here -Ultimately, this depends on whether the common trend assumption seems more plausible for one or the other specification. In judging that we use the "control" The results in Table 4 focus on the average effects of having full health insurance on mean expenditures and on the extensive margin of the HMO group. In addition to that, the 14 CIC model allows us to estimate the effects of full health insurance on the entire outcome distribution by looking at quantiles. Figure 2 shows the CIC effects for the 0.05 to 0.85 quantiles for the impact of full insurance on the HMO group (left graph), and the placebo effect (right graph). The placebo effect is zero over the entire distribution, as we would expect. There is no effect of having full insurance below the 0.2 quantile (because that part of the outcome distribution does not change). The effect is positive and almost constant in absolute terms at around CHF 100 to 150 above the 0.2 quantile. In relative terms, the CIC effects are decreasing over the outcome distribution. The CIC effects become insignificant above the 0.7 quantile and are rather noisy for the upper tail of the distribution.

Impact of full insurance on health cost
-Insert Figure 2 about here - To put the effects of having full insurance into perspective, we calculate price elasticities from the numbers obtained above. The usual formula for the coefficient of price elasticity relates the change in demand (here measured by health expenditures) to a marginal change in the price. Since having full insurance implies a zero price, the traditional formula is not applicable. Instead, we use the concept of an arc elasticity where the percentage change in demand is related to the percentage change in the price: 5 Q f ull denotes health care expenditures under full insurance, Q cost denotes expenditures under the cost-sharing regime (analogous for prices P f ull and P cost in the denominator). Numbers in the numerator can be extracted from our estimates in Table 4, e.g., for the DID effect in the GLM model we can calculate 232.6/1563.9 ≈ 0.148, which is relative to the 2003 average in the HMO group (see Table 2). 6 The idea then is to compare the zero price under full insurance to the price P cost under cost-sharing. This yields a denominator of -1, independent of the value of P cost , i.e., the price elasticity is -0.148. Regarding the mean effect in the CIC model, we obtain a price elasticity of about -0.124, close to the elasticity in the GLM model.
For the quantile CIC effects, the elasticities range from an almost unit elasticity for the 0.3 quantile of the cost distribution to an elasticity of about -0.1 for the 0.7 quantile. Intuitively, this is what we would expect because introducing a cost-sharing element to basic health insurance will most likely affect individuals in the lower ranges of the cost distribution, who possibly save an optional check-up, but hardly influences those in the upper range.
In a secondary analysis (detailed results are available upon request), we estimated GLM models conditional on low (CHF 230) and high (CHF 1200 or 1500) deductible choices in 2003.
The level of the chosen deductible may be interpreted as an indicator for health status. This yields an elasticity of -0.11 for the low deductible and -0.57 for the high deductible, consistent with the CIC results. Since deductible choice is likely endogenous to health care utilization, however, we do not want to place too much emphasis on these conditional GLM results.

Sensitivity analysis
The results of the previous section are remarkably robust. A first concern could be the restriction of the sample to stayers in their health plan over the full observation period. Table   5 shows that when including the switchers in the analysis, the effects remain stable. Since we have panel data we include individual fixed effects in the DID models to control for timeconstant characteristics, beyond those available in the data (gender, age, and city of residence).
Unfortunately, allowing for fixed effects is not straightforward in the CIC model, but still the estimates in the full sample are very close to those in the stayer sample. This holds for the mean as well as the quantile CIC effects (see Figure A1 in the appendix).
-Insert Table 5 about here -A second concern could be that the mean effects are driven by outliers, i.e., those with large health costs, or severe health problems. We have information about whether a hospital stay occurred in the past year. Excluding those from the sample hardly affects our results. In the same way, excluding the top 1% of the cost distribution has little impact.
A last concern from the results in Table 3 could be that there is regional heterogeneity. In particular, we find some weakly significant differences (p-value 0.074) between the 2002/03 and 2003/04 samples in the MNL models that are driven by the switching out of HMO behavior in Berne. When we exclude those insured from the sample, then the estimated effects are a bit smaller, but still in the order of Table 4. Overall, we conclude that our results are not very sensitive to these changes in the baseline sample, and that our DID/CIC estimates provide robust evidence of the effect of full insurance on health expenditures.

Discussion and conclusion
Ellis (2012)  Our results are informative regarding optimal insurance designs. As discussed in Cutler and Zeckhauser (2000) health insurances are confronted with an inherent trade-off between risk-sharing and moral hazard. Our results can be interpreted in this light. Permitting consumers the access to health services without any cost will increase utilization significantly, especially at the extensive margin. Hence, we provide further evidence that cost-sharing instruments are an effective tool to reduce overconsumption of medical care (e.g., Newhouse et al., 1993). However, given that the cost-saving effects are concentrated at the bottom of the health cost distribution, the overall impact on cost containment may be limited.       Shaded bars show 95% confidence intervals (based on bootstrapped standard errors, 500 replications).