Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes

Quantile and quantile effect functions are important tools for descriptive and causal analyses due to their natural and intuitive interpretation. Existing inference methods for these functions do not apply to discrete random variables. This paper offers a simple, practical construction of simultaneous confidence bands for quantile and quantile effect functions of possibly discrete random variables. It is based on a natural transformation of simultaneous confidence bands for distribution functions, which are readily available for many problems. The construction is generic and does not depend on the nature of the underlying problem. It works in conjunction with parametric, semiparametric, and nonparametric modeling methods for observed and counterfactual distributions, and does not depend on the sampling scheme. We apply our method to characterize the distributional impact of insurance coverage on health care utilization and obtain the distributional decomposition of the racial test score gap. We find that universal insurance coverage increases the number of doctor visits across the entire distribution, and that the racial test score gap is small at early ages but grows with age due to socio economic factors affecting child development especially at the top of the distribution. These are new, interesting empirical findings that complement previous analyses that focused on mean effects only. In both applications, the outcomes of interest are discrete rendering existing inference methods invalid for obtaining uniform confidence bands for observed and counterfactual quantile functions and for their difference -- the quantile effects functions.


Introduction
The quantile function (QF), introduced by Galton (1874), has become a standard tool for descriptive and inferential analysis due to its straightforward and intuitive interpretation. Doksum (1974) suggested to report the quantile effect (QE) function-the difference between two QFs-to compare the entire distribution of a random variable in two different populations. For example, in Section 4.4, we analyze the racial test score gap by taking the difference of the QFs of the IQ test scores between white and black children. Looking at this QE function allows us to describe the gap in the test scores not only at the center, but also at the various percentiles of the test scores. In randomized control trials and natural experiments, QEs have a causal interpretation, and are usually referred to as quantile treatment effects (QTEs). In Section 4.3, we estimate the treatment effect of insurance coverage on health care utilization based on a conditionally randomized experiment.
In this paper we propose a generic procedure to obtain confidence bands for QFs and QE functions. Plotting these bands allows us to visualize the sampling uncertainty associated with the estimates of these functions. The bands have a straightforward interpretation: they cover the true functions with a pre-specified probability, e.g. 95%, such that any function that lies outside of the band even at a single quantile can be rejected at the corresponding level, e.g. 5%. In addition, they are versatile: the same confidence band can be used for testing different null hypotheses. The researcher does not even need to know the hypothesis that will be considered by the reader. For instance, the hypothesis that a treatment has no effect on an outcome can be rejected if the confidence band for the QE function does not cover the zero line. First-order stochastic dominance implies that some non-negative values are covered at all quantiles. The location-shift hypothesis that the QE function is constant implies that there is at least one value covered by the band at all quantiles.
The proposed method relies on a natural transformation of simultaneous confidence bands for distribution functions (DFs) into simultaneous confidence bands for QFs and QE functions. We invert confidence bands for DFs (DF-bands) into confidence bands for QFs (QF-bands) and impose shape and support restrictions. We then take the Minkowski difference of the QF-bands, viewed as sets, to construct confidence bands for the QE functions (QE-bands). This method is generic and applies to a wide collection of model-based estimators of conditional and marginal DFs of discrete, continuous and mixed continuousdiscrete outcomes with and without covariates. The only requirement is the existence of a valid method for obtaining simultaneous DF-bands, which is readily available under general sampling conditions for cross-section, time series and panel data. This includes the classical settings of the empirical DFs as a special case, but is much more general than that. For instance, in our empirical applications, we analyze QFs and QE functions that are obtained by inverting counterfactual DFs constructed from regression models with covariates. 1 Our method can be used to construct three types of bands. First, we show how to invert a DF-band into a QF-band. We prove that there is no loss of coverage by the inversion in that the resulting QF-band cover the entire QF with the same probability as the source DF-band covers the entire DF. Second, we iterate the method to construct simultaneous QF-bands for multiple QFs. Here simultaneity not only means that the bands are uniformin that they cover the whole function-but also that all the functions are covered by the corresponding bands jointly with the prescribed probability. These bands can be used to test any comparison between the QFs such as that differences or ratios of them are constant. By construction, our simultaneous bands are not conservative whenever the source DF-bands are not conservative. Third, for the leading case of differences of QFs, we construct QE-bands as differences of QF-bands. Our QE-bands can be conservative due to the projection implicit when we take differences of the QF-bands. However, as we discuss below in the literature review, we are not aware of any generic method to construct valid QE-bands for discrete outcomes. We also show how to make the the QF-bands and QE-bands more informative by imposing support restrictions when the outcome is discrete. To implement all types of bands, we provide explicit algorithms based on bootstrap.
One important application of our method concerns the estimation of QFs and QE functions from models with covariates. When the outcome is continuous, quantile regression (QR), introduced by Koenker and Bassett (1978), is a convenient model to incorporate covariates. In many interesting applications, however, the outcome is not continuously distributed. This is naturally the case with count data, ordinal data, and discrete duration data, but it also concerns test scores that are functions of a finite number of questions, censored variables, and other mixed discrete-continuous variables. Examples include the number of doctor visits in our first application (see Panel A in Figure 1), IQ test scores for children in our second application (see Panels B and C in Figure 1), and wages that have mass points at round values and at the minimum wage. For discrete outcomes, however, the existing (uniform) inference methods for QR (e.g., Gutenbrunner and Jureckova (1992), Koenker and Xiao (2002), Angrist et al. (2006b), Qu and Yoon (2015), Belloni et al. (2017a)) break down. In addition, the linearity assumption for the conditional quantiles underlying QR is highly implausible in that case. For instance, the Poisson regression model does not have linear conditional quantiles.
The classical models for discrete outcomes such as Poisson, Cox proportional hazard or ordered response models are highly parametrized. These models have the advantage of being parsimonious in terms of parameters and easy to interpret. However, they impose strong homogeneity restrictions on the effects of the covariates. For instance, if a covariate increases the average outcome, then it must increase all the quantiles of the outcome distribution. Moreover, Poisson models imply a restrictive single crossing property on the sign of the estimated probability effects (Winkelmann, 2006). These limitations are avoided by the distribution regression (DR) method (e.g., Williams and Grizzle (1972), Foresi and Peracchi (1995), Rothe and Wied (2013), and Chernozhukov et al. (2013), which we employ in our paper. DR is a comprehensive tool for modeling and estimating the entire conditional distribution of any type of outcome (discrete, continuous, or mixed). DR allows the covariates to affect differently the outcome at different points of the distribution and encompasses the classical parametric models as special cases. The cost of this flexibility is that the DR parameters can be hard to interpret because they do not correspond to QEs. To overcome this problem, we propose to report QEs computed as differences between the QFs of counterfactual distributions estimated by DR in conjunction with confidence bands constructed using our projection method. These one-dimensional functions provide an intuitive summary of the effects of the covariates. We argue that this combination of our generic procedure with the DR model provides a comprehensive and practical approach for estimation QFs and QE functions with discrete data. While we focus on DR in this paper, we emphasize that our method also combines well with classical parametric models such as Poisson, Cox proportional hazard and ordered response regression models. To the best of out knowledge, there is no inference method available to construct valid QF-bands and QE-bands even in these simple models. In addition, our method works in conjunction with more recent inference approaches for DFs with potentially discrete data. Examples include Frandsen et al. (2012), Donald and Hsu (2014), Hsu et al. (2015), and Belloni et al. (2017b).
We apply our approach to two data sets, corresponding to two common types of discrete outcomes. In the first application, we exploit a large-scale randomized control trial in Oregon to estimate the distributional impact of insurance coverage on health care utilization measured by the number of doctor visits. Since this outcome is a count, we estimate the conditional DFs using both Poisson and distribution regressions. Poisson regression clearly underestimates the probability of having zero visits as well as that of having a large number of visits. The more flexible DR finds a positive effect, especially at the upper tail of the distribution. This is an interesting empirical finding in its own right; it complements the mean regression analysis results reported in Finkelstein et al. (2012a). In the second application, we reanalyze the racial test score gap of young children. As shown in Figure  1, test scores are discrete. We find that while there is very little gap at eight months, a large gap arises at seven years. In addition, looking at the whole distribution, we uncover that the observed racial gap is widening in the upper tail of the distribution of test scores. The increase in the gap can be mostly explained by differences in observed characteristics. These results complement and expand the findings of Fryer and Levitt (2013) for the mean racial test score gap, revealing what happens to the entire distribution.
Literature Review. To the best of our knowledge, this is the first paper that provides asymptotically similar simultaneous QF-bands for discrete outcomes. Scheffe and Tukey (1945) were the first to consider empirical quantiles for discrete data. They showed that pointwise confidence intervals obtained by inverting pointwise confidence intervals for the DF based on the empirical DF are still valid but conservative when the outcome is discrete. 2 Frydman and Simon (2008) and Larocque and Randles (2008) suggested methods to obtain the exact coverage rate of these confidence intervals. In contrast, our confidence bands for the QFs are uniform in the probability index and not conservative, and can be based on more general estimators of the DF.
Another strand of the literature tried to overcome the discreteness in the data by adding a small random noise to the outcome (also called jittering), see for instance Machado and Silva (2005) and the applications in Koenker and Xiao (2002) and Chernozhukov et al. (2013). Ma et al. (2011) considered an alternative definition of quantiles based on linearly interpolated DFs. These strategies restore asymptotic Gaussianity of the empirical QFs and QE functions, at the price of changing the estimand. One might argue that this change is not a serious issue when the number of points in the support of the outcome is large, but we find it more transparent to work directly with the observed discrete outcome. Thus, we keep the focus on the original QE function at the price that our QE-bands might be conservative. However, we find that they lead to informative inferences in two empirical applications and in extensive numerical simulations, where they are not conservative in most cases. Since we are not aware of any alternative generic method to construct asymptotically similar QEbands of discrete outcomes, we believe this is a useful addition to the statistical toolkit.
Our method can also be applied to obtain QF-bands and QE-bands of continuous outcomes, complementing the existing well-established methods for this type of outcomes. For example, Kiatsupaibul and Hayter (2015) provided a method to construct exact DFbands and QF-bands based on the empirical distribution of independent and identically distributed data. 3 Our method has the more modest goal of constructing asymptotic confidence bands, but applies more generally to any estimator of the distribution that obeys a functional central limit theorem. This includes Poisson, Cox proportional hazard and distribution regression-based estimators of the distribution from weakly dependent data. Chernozhukov et al. (2013) also used DR as the basis to construct QF-bands and QE-bands of continuous outcomes. Their construction consists of two steps. First, obtain estimators of the QFs and QE functions from estimators of DFs by inversion. Second, construct QFbands and QE-bands from the limit distributions of the estimators of the QFs and QE functions derived from the limit distribution of the estimators of the distribution functions via delta method. This construction has the advantage of producing asymptotically similar QE-bands, but breaks down for discrete outcomes because the quantile (left-inverse) mapping is not smooth (Hadamard differentiable), which precludes the application of the delta method in the second step. Our construction of the confidence bands consists of similar steps but the order of the steps is different. First, we construct DF-bands using the limit distribution of the estimators of the DFs. Second, we construct the QF-bands by inversion the DF-bands and take differences to construct the QE-bands. This difference in the order of the steps completely avoids the delta method and is the key to apply our method to discrete outcomes. Hence, we see our method as complementary to Chernozhukov et al. (2013).
Finally, we would like to emphasize that none of the existing methods that we are aware of can be applied to construct valid QF-bands and QE-bands in our two empirical applications where the outcomes are discrete and the estimators of the distribution are model-based.
Outline. The rest of the paper is organized as follows. Section 2 introduces our generic method to construct QF-bands and QE-bands. Section 3 provides an explicit algorithm based on bootstrappable estimators for the DFs. Section 4 presents the two empirical applications. Appendix A shows how to improve the finite sample properties of DF-bands by imposing logical monotonicity or range restrictions. Appendix B provides an additional algorithm to construct confidence bands for single QFs. Appendix C reports the results of a simulation study.

Generic Confidence Bands for Quantile and Quantile Effect Functions
This section contains the main theoretical results of the paper. Our only assumption is the availability of simultaneous confidence bands for DFs. Since the seminal work of Kolmogoroff (1933), various methods to obtain simultaneous confidence bands have been developed. 4 In Section 3 we provide an algorithm to construct simultaneous DF-bands that can be applied when the estimators of the DFs are known to be bootstrappable, which is often the case.
2.1. Confidence Bands for Distribution Functions. Let Y be a closed subinterval in the extended real number line R = R ∪ {−∞, +∞}. Let D denote the set of nondecreasing functions, mapping Y to [0, 1]. A function F is nondecreasing if for all x, y ∈ Y such that x ≤ y, one has F (x) ≤ F (y). We will call the elements of the set D "distribution functions", albeit some of them need not be proper DFs. In what follows, we let F denote some target DF. F could be a conditional DF, a marginal DF, or a counterfactual DF.
Definition 1 (DF-Band of Level p). Given two functions y → U (y) and y → L(y) in the set D such that L ≤ U , pointwise, we define a band I = [L, U ] as the collection of intervals We say that I covers F if F ∈ I pointwise, namely F (y) ∈ I(y) for all y ∈ Y. If U and L are some data-dependent bands, we say that I = [L, U ] is a DF-band of level p, if I covers F with probability at least p.
In many applications the point estimatesF and confidence bands [L , U ] for the target distribution F do not satisfy logical monotonicity or range restrictions, namely they do not take values in the set D. Appendix A shows that given such an ordered triple L ≤F ≤ U , we can always transform it into another ordered triple L ≤F ≤ U that obeys the logical monotonicity and range restrictions. Such a transformation will generally improve the finite sample properties of the point estimates and confidence bands.
2.2. Confidence Bands for a Single Quantile Function. Here we discuss the construction of confidence bands for the left-inverse function of F , F ← , which we call "quantile function" of F .
The following theorem provides a confidence band I ← for the QF F ← based on a generic confidence band I for F .
Theorem 1 (Generic QR-Band). Consider a distribution function F and band functions L and U in the class D. Suppose that the distribution function F is covered by I with probability p. Then, the quantile function F ← is covered by I ← with probability p, where Proof. Here we adopt the convention inf{∅} = +∞ so that the left-inverse function can be defined as G ← (a) := inf{y ∈ Y : G(y) ≥ a} ∧ sup{y ∈ Y}, which avoids distinguishing the two cases of Definition 2.
It suffices to show that U ← ≤ F ← ≤ L ← if and only if L ≤ F ≤ U . We first show that F ← ≤ L ← if and only if F ≥ L. For the "if" part, note that for any a ∈ [0, 1], since L(y) ≤ F (y) for each y ∈ Y and F, L ∈ D, For the "only if" part, we use that for any G ∈ D, where we use the convention sup{∅} = −∞. Then, for any y ∈ Y, since F ← (a) ≤ L ← (a) for each a ∈ [0, 1] and a → F ← (a) and a → L ← (a) are nondecreasing, Analogously, we can conclude that F ← ≥ U ← if and only if F ≤ U .
Remark 1 (Similarity). The QF-band I ← is constructed by applying the left-inverse transformation to the DF-band I. Theorem 1 shows that there is no loss of coverage in the inversion of the band. Hence, our generic method carries over the similarity (nonconservativeness) of the DF-band to the QF-band.
We can narrow I ← without affecting its coverage by exploiting the support restriction that the quantiles can only take the values of the underlying random variable. This is relevant when the variable of interest is discrete as in the applications presented in Section 4. Suppose that T is the support of the random variable with DF F . Then it makes sense to exploit the support restriction that F ← (a) ∈ T by intersecting the confidence band for F ← with T . Clearly, this will not affect the coverage properties of the bands.
The band I ← is easy to obtain by rotating and flipping I, but does not exploit the fact that the support of the variable with distribution F in this example is the set T = {0, 1, . . . , 10}. By intersecting I ← with T we obtain in the right panel the QF-band I ← which reflects the support restrictions.

Generic Confidence Bands for Multiple Quantile Functions and Quantile
Effects. The quantile effect (QE) function a → ∆ j,m (a) is the difference between the QFs of two random variables with DFs F j and F m and support sets T j and T m , i.e., , a ∈ [0, 1]. Our next goal is to construct simultaneous confidence bands that jointly cover the DFs, (F k ) k∈K , the corresponding QFs, (F ← k ) k∈K , and the QE functions, (∆ j,m ) (j,m)∈K 2 , where K is a finite set. For example, K = {0, 1} (treated and control outcome distributions) in our first application and K = {W, B, C} (white, black and counterfactual test score distributions) in our second application.
Specifically, suppose we have the DF-bands (I k ) k∈K , which jointly cover the DFs (F k ) k∈K with probability at least p. For example, we can construct these bands using Theorem 1 in conjunction with the Bonferroni inequality. 5 Alternatively, the generic Algorithm 1 presented in Section 3 provides a construction of a joint confidence band that is not conservative. First we construct the QF-bands (I ← k ) k∈K , which jointly cover the QFs (F ← k ) k∈K with probability at least p by Theorem 1. Then we convert these bands to confidence bands for (∆ j,m ) (j,k)∈K 2 by taking the pointwise Minkowski difference of each of the pairs of the two bands, viewed as sets. Recall that the Minkowski difference between two subsets V and U of a vector space is In words, the upper-end of the interval for the difference v − u is the difference between the upper-end of the interval for v and the lower-end of the interval for u. Symmetrically, the lower-end of the interval for the difference v − u is the difference between the lower-end of the interval for v and the upper-end of the interval for u. This greatly simplifies the practical computation of the bands.
Theorem 2 (Generic Simultaneous QF-Bands and QE-Bands). Consider the distribution functions (F k ) k∈K and the band functions (I k := [L k , U k ]) k∈K in the class D. Suppose that the distribution functions (F k ) k∈K are jointly covered by (I k ) k∈K with probability p. Then: with probability at least p, where the minus operator is defined by a pointwise Minkowski difference: (3) The confidence bands have the following joint coverage property: Proof. The results follows from the definition of the Minkowski difference and because the event Theorem 2 shows that simultaneous QF-bands can be obtained by inverting simultaneous DF-bands, and QE-bands by taking the Minkowski difference between the two simulataneous QF-bands for the corresponding QFs. As in Theorem 1, we can narrow the band I ← ∆ without affecting coverage by imposing support restrictions as demonstrated in Corollary 2.
Remark 2 (Joint Support Restrictions). The bandĨ ← ∆(j,m) can be further narrowed if the two random variables with distributions F j and F m have restrictions in their joint support Remark 3 (Similarity). Part 1 of Theorem 2 provides a construction of simultaneous QFbands. These bands can be used to test any comparison between two or more QFs. These include that the difference between each pair of functions is zero or constant, or that all the ratios between each pair of functions is one or constant (see Remark 4).
Part 3 shows that our generic method of constructing bands carries over the similarity (non-conservativeness) of the DF-bands to the simultaneous QF-bands and QE-bands. Moreover, our construction is optimal in the sense that if we want to simultaneously cover all the DFs, QFs and QE functions of interest, it is not possible to construct uniformly shorter bands while preserving the joint coverage rate once all the joint support restrictions are imposed. It is common to report at the same time several QFs and QE functions. For instance, Figures 5 and 6 provide three different QFs (two observed and one counterfactual) and the differences between these functions, which are all of interest. Theorem 2 (together with Corollary 3 for the asymptotic similarity of the bands for the DFs) shows that our bands jointly cover asymptotically all these functions with probability p. This allows for a transparent and honest assessment of hypotheses about these functions.
On the other hand, when the goal is to cover only a single QE function independently from the other functions, then our QE-band can be marginally conservative (Part 2 of Theorem 2). This is due to the projection implicit in the application of the Minkowski difference and is the price to pay for the joint uniform coverage property. However, our empirical results in Section 4 and numerical simulations in Appendix C clearly demonstrate the usefulness of these bands that allow for testing hypotheses that could not be considered using existing methods. We are not aware of any generic method to construct nonconservative QE-bands of discrete outcomes.  Figure 3. Construction of the QE-bands using Theorem 2 and Corollary 2. Left: QFs F ← 0 and F ← 1 and QF-bands I ← 0 and I ← 1 . Middle: the QE function ∆ and the QE-bandĪ ∆ without support restrictions. Right: the QE function ∆ and the QE-bandĪ ∆ with support restrictions. Figure 3 illustrates the construction of QE-bands using Theorem 2 and Corollary 2. The left panel shows the bands I ← 0 and I ← 1 for the QFs F ← 0 and F ← 1 . The middle panel shows the band I ∆(1,0) for the QE function ∆ 1,0 = F ← 1 − F ← 0 , obtained by taking the Minkowski difference of I ← 1 and I ← 0 . The right panel shows the confidence bandĨ ∆(1,0) for the QE function ∆ 1,0 resulting from imposing the support restrictions. As the Theorem 2 proves, the QE function ∆ 1,0 is covered by the QE-band I ∆(1,0) .
Remark 4 (Confidence Bands for Ratios of QFs). Theorem 2 provides an explicit construction of bands for differences of QFs, the leading example of comparisons between QFs. Similar simple bands can be constructed for other comparisons of QFs. For example, a confidence band for the ratio of QFs, ρ j,m (a) := F ← j (a)/F ← m (a), can be formed as where the division operator is defined pointwise by:

Computation of Simultaneous Confidence Bands for Distribution Functions
In Section 2 we assumed the existence of simultaneous DF-bands. Here we describe an algorithm that is shown to provide asymptotically valid simultaneous bands for any bootstrappable estimator of the DFs. Many commonly used estimators of the DF are bootstrappable under suitable conditions. For example, Chernozhukov et al. (2013) give conditions for bootstrap consistency for the DR-based estimators that we use in the empirical applications. Maximum likelihood estimators, such as the Poisson regression that we use as a benchmark in the first application, are also bootstrappable under weak differentiability conditions, see Arcones and Giné (1992). We note that if the data are discrete, these existing results still yield validity of bootstrapping the DF estimator to construct DF-bands, but they do not justify the validity of bootstrapping the QF and QE estimators to construct QF-bands and QE-bands. The reason is that the delta-method breaks down because the left-inverse mapping is no longer (Hadamard) differentiable.
Algorithm 1 provides simultaneous confidence bands that asymptotically jointly cover the DFs (F k ) k∈K , the corresponding QFs (F ← k ) k∈K , and the QE functions F ← j − F ← k for all (j, k) ∈ K 2 , with probability p. In practice, we estimate the DFs on a grid of points. Let T be a finite subset of Y. For the DF of a discrete random variable Y with finite support, we can choose T as the support of Y . Otherwise, we can set T as a grid of values covering the region of interest of the support of Y .
(1) Obtain many bootstrap draws of the estimator (F k ) k∈K , where the index j enumerates the bootstrap draws and B is the number of bootstrap draws (e.g., B = 1, 000).
(2) For each y ∈ T and k ∈ K, compute the robust standard error ofF k (y): whereQ k (α, y) denotes the empirical α-quantile of the bootstrap sample (F * (j) k (y)) B j=1 , and Φ −1 denotes the inverse of the standard normal distribution.
(3) Compute the critical value For each k ∈ K impose the shape restrictions onF k , L k and U k as described in Appendix A.
In step (1) we bootstrap jointly all the estimators of the DFs. In our applications it is important to obtain jointly the bootstrap draws of these estimators because they are not independent. There are multiple ways to obtain the bootstrap draws ofF . A generic resampling procedure is the exchangeable bootstrap (Praestgaard and Wellner, 1993;van der Vaart and Wellner, 1996), which recomputesF using sampling weights drawn independently from the data. This procedure incorporates many popular bootstrap schemes as special cases by a suitable choice of the distribution of the weights. For example, the empirical bootstrap corresponds to multinomial weights, and the weighted or Bayesian bootstrap corresponds to standard exponential weights. Exchangeable bootstrap can also accommodate dependences or clustering in the data by drawing the same weight for all the observations that belong to the same cluster (Sherman and Cessie, 1997;Cheng et al., 2013). For example, in the application of Section 4.3 we draw the same weights for all the individuals of the same household.
In the second step we estimate pointwise standard errors. We use the bootstrap rescaled interquartile range because it is more robust than the bootstrap standard deviation in that it requires weaker conditions for consistency (Chernozhukov et al., 2013). In the third step, we compute, for each bootstrap draw, the weighted recentered Kolmogorov-Smirnov maximal t-statistic over all distributions F k (y) with k ∈ K and y ∈ T . The maximum over k ∈ K ensures joint coverage of all the DFs. Then we take the p-quantile of the bootstrap Kolmogorov-Smirnov statistics. This allows us, in the fourth step, to construct preliminary DF-bands that jointly cover all the DFs (F k ) k∈K with probability p. We improve these bands by imposing the shape restrictions.
In the fifth step we invert the DF-bands to obtain QF-bands, as justified by Theorem 1. In the last step we obtain the QE-bands by taking Minkowski differences of the QF-bands, as justified by Theorem 2. If needed, we can impose the support conditions in the last two steps.
The following corollary of Theorem 2 provides theoretical justification for Algorithm 1. To state the result, let ∞ (Y) denote the metric space of bounded functions from Y to R equipped with the sup-norm and |K| denote the cardinality of the set K.
Corollary 3 (Validity of Algorithm 1). Suppose that the rescaled DF estimators {a n (F k − F k )} k∈K converge in law in ∞ (Y) |K| to a Gaussian process (G k ) k∈K , having zero mean and a non-degenerate variance function, for some sequence of constants a n → ∞ as n → ∞, where n is some index (typically the sample size). Suppose that a bootstrap method can consistently approximate the limit law of {a n (F k − F k )} k∈K , namely the distance between the law of {a n (F * k −F k )} k∈K conditional on data, and that of (G k ) k∈K , converges to zero in probability as n → ∞. The distance is the bounded Lipschitz metric that metrizes weak convergence. Then, the confidence bands constructed by Algorithm 1 have the following covering property: Proof. Lemma SA.1 of Chernozhukov et al. (2013) implies that lim n→∞ P(∩ k∈K {F k ∈ [L k , U k ]}) = p. The result then follows from Lemma 1, Theorems 1 and 2, and Corollaries 1 and 2.
Algorithm 1 provides confidence bands that jointly cover the DFs, the QFs, and the QE functions. If one is only interested in one single QF, say F ← 1 , the corresponding QF-band obtained from Algorithm 1 can be conservative. This is because we compute the maximal t-statistic over all distributions (F k ) k∈K to ensure joint coverage, which is not required if one is only interested in F ← 1 . Appendix B provides a bootstrap algorithm that yields an asymptotically similar QF-band for a single QF.
Remark 5 (Validity of High-Level Conditions in Corollary 3 for DR-based Estimators). The high-level conditions in Corollary 3 are satisfied by many estimators. In particular, Theorem 5.2 in Chernozhukov et al. (2013) verifies these assumptions for the DR-based estimators that we use in the empirical applications in Section 4.

Applications to Distribution Regression Analysis of Discrete Data
In this section we apply our approach to two data sets, corresponding to two common types of discrete outcomes. In both cases we use the distribution regression model and obtain QE as differences between counterfactual distributions. For this reason, we first introduce the specific methods and then present both empirical illustrations. 4.1. Distribution Regression. In the absence of covariates, the empirical DF is a minimal sufficient statistic for a non-parametric marginal DF. Distribution regression (DR) generalizes this concept to a conditional DF like OLS generalizes the univariate mean to the conditional mean function. The key, simple observation underlying DR is that the conditional distribution of the outcome Y given the covariates X at a point y can be expressed as F Y |X (y | x) = E[1{Y ≤ y} | X = x]. Accordingly, we can construct a collection of binary response variables, which record the events that the outcome Y falls bellow a set of thresholds T , i.e., 1{Y ≤ y}, y ∈ T, and use a binary regression model for each variable in this collection. This yields the DR model: where Λ y (·) is a known link function which is allowed to change with the threshold level y; B(x) is a vector of transformations of x with good approximating properties such as polynomials, B-splines, and interactions; and β (y) is an unknown vector of parameters.
Knowledge of the function y → β(y) implies knowledge of the distribution of Y conditional on X. The DR model is flexible in the sense that, for any given link function, we can approximate the conditional DF arbitrarily well by using a rich enough set of transformations of the original covariates B(x). In the extreme case when X is discrete and B(x) is fully saturated, the estimated conditional distribution is numerically equal to the empirical DF in each cell of X for any monotonic link function. When B(x) is not fully saturated, one can choose a DF such as the normal or logistic as the link function to guarantee that the model probabilities lie between 0 and 1.
DR nests a variety of classical models such as the Normal regression, the Cox proportional hazard, ordered logit, ordered probit, Poisson regression, as well as other generalized linear models. Example 1 shows the inclusion of the Poisson regression model which we use as a benchmark in our first empirical application. In what follows we set B(x) = x to lighten the notation without loss of generality.
Example 1. Let Y be a nonnegative integer-valued outcome and X a vector of covariates. The Poisson regression model assumes that the probability mass function of Y conditional on X is The corresponding conditional distribution is: where Q is the incomplete gamma function. Thus, the Poisson regression can be seen as a special case of a DR model with exponentiated incomplete gamma link function, and parameter function y → β(y) that does not vary with y, i.e. β(y) = β. The Poisson regression model therefore imposes strong homogeneity restrictions on the effect of the covariates at different parts of the distribution that are often rejected by the data (see, e.g., Section 4.3).
Assume that we have a sample {(Y i , X i ) : i = 1, ..., n} of (Y, X). The DR estimator of the conditional distribution iŝ Williams and Grizzle (1972) introduced DR in the context of ordered outcomes. Foresi and Peracchi (1995) applied this method to estimate the conditional distribution of excess return evaluated at a finite number of points. Chernozhukov et al. (2013) extended Williams and Grizzle (1972)'s definition to arbitrary outcomes and established functional central limit theorems and bootstrap validity results for DR as an estimator of the whole conditional distribution. One of the main advantages of DR is that it not only accommodates continuous but also discrete and mixed discrete continuous outcomes very naturally.

Marginal and Counterfactual Distributions.
In the two applications that we present below there are two groups: the treated and control units in the first application, and the black and white children in the second application. We use DR to model and estimate the conditional distribution of the outcome in each group at each value of the covariates, that we denote by F Y 0 |X 0 (y | x) and F Y 1 |X 1 (y | x). The difference between these two high-dimensional DFs is, however, difficult to convey. Instead, we integrate these conditional distributions with respect to observed covariate distributions and compare the resulting marginal distributions.
For instance, in the first application, the marginal distribution where F X is the distribution of X in the entire population including the treated and control units, represents the distribution of a potential outcome. When k = 1 is the outcome distribution that would be observed if every units were treated, and when k = 0 is the outcome distribution if every units were not treated. These two distributions are called counterfactual, since they do not arise as distributions from any observable population. They nevertheless have a causal interpretation as distributions of potential outcomes when the treatment is randomized conditionally on the control variables X.
LetF Y k |X k denote the DR estimator of F Y k |X k , k ∈ {0, 1}. We estimate F k by the plugging-in rule, namely integratingF Y k |X k with respect to the empirical distribution of X for treated and control units. For k ∈ {0, 1}, We then report the empirical QE function: Chernozhukov et al. (2013) derived joint functional central limit theorems for (F 0 ,F 1 ) and established bootstrap validity. We can thus use the algorithms in Section 3 to construct asymptotically valid simultaneous confidence bands for the counterfactual QFs (F ← 1 , F ← 0 ) and the QE function ∆ = F ← 1 − F ← 0 . Remark 6 (Continuous covariates). The proposed approach can also be used to analyze the effect of continuous covariates. For instance, we can compare the status quo QF with the QF that we would observe if everyone received ∆d additional units of the continuous covariate of interest D, e.g. ∆d = 1 for a unitary increase. Formally, assume that we are interested in the effect of a continuous variable D on the outcome Y while controlling for a vector of covariates X. We can define the counterfactual distribution and the QE function F ← ∆d (a)−F ← 0 (a), where F 0 is the marginal (status quo) distribution of Y . This experiment can be interpreted as an unconditional quantile regression. Also in this case, our methods provide valid confidence bands for the counterfactual quantile and QE functions. 4.3. Insurance coverage and health care utilization. Our first application illustrates the construction of confidence bands using data from the Oregon health insurance experiment. In 2008, the state of Oregon initiated a limited expansion of its Medicaid program for uninsured low-income adults by offering insurance coverage to the lottery winners from a waiting list of 90,000 people (see www.nber.org/oregon for details). This experiment constitutes a unique opportunity to study the impact of insurance by means of a largescale randomized controlled trial (e.g., Finkelstein et al., 2012a;Baicker et al., 2013Baicker et al., , 2014Taubman et al., 2014).
We investigate the impact of insurance coverage on health care utilization as analyzed in Finkelstein et al. (2012a, Section V) using a publicly available dataset (Finkelstein et al., 2012b). The data are available via: http://www.nber.org/oregon/4.data.html. Detailed information about the dataset and descriptive statistics are available in Finkelstein et al. (2012a) and the corresponding online appendix. We focus on one count outcome Y : the number of outpatient visits in the last six months, which was elicited via a large mail survey. After excluding individuals with missing information in any of the variables used in the analysis, the resulting sample consists of 23,441 observations. The top histogram in Figure 1 illustrates the discrete nature of our dependent variable. Almost 40% of the outcomes are zeros, more than 90% of the mass is concentrated between zero and five, but a few people have a greater number of visits. Finkelstein et al. (2012a) find a positive effect of winning the lottery on the number of outpatient visits. 6 Their results are based on ordinary least squares (OLS) regressions, where the covariates X include household size, indicators for the survey wave, and interactions of the household size indicators and the survey wave. Although individuals were chosen randomly, these covariates are included as controls because the entire household for any selected individual became eligible to apply for insurance and the fraction of treated individuals varies across survey waves. We complement their findings by looking at the whole distribution of the number outpatient visits. We first estimate the conditional outcome distributions separately for the lottery winners and losers via Poisson regression and DR. For DR, we use the exponentiated incomplete gamma link in (4.2) such that DR nests the Poisson regression as an exact special case. As explained in Section 4.2, we integrate the conditional outcome distributions with respect to the covariate distribution for both lottery winners and losers to obtain estimates of the counterfactual distributions F 1 and F 0 .
The top panels of Figure 4 displays the DFsF 1 andF 0 estimated by the Poisson regression and DR. The corresponding QFsF ← 1 and F ← 0 are displayed in both middle panels. Finally, the estimated QE functions,F ← 1 −F ← 0 , are plotted in the bottom panels. In all cases, the figure also shows 95% simultaneous confidence bands, constructed using Algorithm 1 with B = 1, 000 Bayesian bootstrap draws that take into account the possible clustering of the observations at the household level. Reflecting the discrete nature of our outcome variables, we impose the support restrictions T 0 = T 1 = {0, 1, . . .}.
A comparison between the Poisson and DR results reveals striking differences. The Poisson model predicts a much lower mass at zero and a much thinner upper tail of the distribution for both groups. Indeed, these differences are statistically significant as the Poisson and DR simultaneous DF-bands and QF-bands do not overlap for a large part of the support. A formal test rejects the equality of these distributions with a p-value below 0.001. Since the DR model with exponentiated incomplete gamma link nests the Poisson model, we conclude that the Poisson model is rejected by the data. For this reason, we focus the discussion on the DR results.
The QE-band do not fully cover the zero-line such that we can reject the null hypothesis that winning the lottery has no effect on the number of outpatient visits. We can also reject the hypothesis that F 0 first-order stochastically dominates F 1 because the band 6 They label these effects intention-to-treat (ITT) effects and also report local average treatment effects (LATE) estimated using IV regressions. In this section, we focus on ITT effects.
for F ← 0 is strictly below the band for F ← 1 at some probability indexes, but we cannot reject the opposite hypothesis. In other words, at no quantile index the confidence band contains strictly negative effects while at some probability indexes it contains strictly positive effects.
Health economists distinguish between the treatment effect on the extensive (whether to see a doctor) and intensive (the number of visits given at least one) margins. The first effect is easy to estimate: the probability of not seeing a doctor decreased significantly from 43% to 37% with the treatment. The effect on the intensive margin is more difficult to gauge because we do not observe both potential outcomes for any individual. If we assume that the individuals induced to see a doctor by the insurance coverage are not seriously sick and visit the doctor only once, then the effect on the intensive margin can also be seen in Figure 4: the effect from 0 to 1 visit represents the effect on the extensive margin and the effect on the rest of the distribution represent the effect on the intensive margin. Both effects are statistically significant. We note in particular that the quantile differences do not vanish at the top of the distribution.
The assumption made to justify this interpretation may be too strong and lead to an overestimation of the effect on the intensive margin. For instance, the doctor may find a serious problem and schedule other visits. Following Zhang and Rubin (2003) and Angrist et al. (2006a), we can bound the effect on the intensive margin from below by assuming that patients who see a doctor anyway visit their doctor at least as often as patients who see a doctor only if insured. Under this weaker assumption, the effect on the intensive margin is bounded from below by the QE function obtained by keeping only observations with at least one visit. We also find a positive treatment effect with this method, which re-inforce the evidence of a positive effect among the existing users. 4.4. Racial differences in mental ability of young children. As a second application, we reanalyze the racial IQ test score gap examined in Fryer and Levitt (2013). We use data from the US Collaborative Perinatal Project (CPP). These data contain information on children from 30,002 women who gave birth in 12 medical centers between 1959 and 1965. Our main outcomes of interest are the standardized test scores at the ages of eight months (Bayley Scale of Infant Development) and seven years (both Stanford-Binet and Wechsler Intelligence Test). In addition to the test score measures, the dataset contains a rich set of background characteristics for the children, X, including information on age, gender, region, socioeconomic status, home environment, prenatal conditions, and interviewer fixed effects. Fryer and Levitt (2013) provide a comprehensive description of the dataset and extensive descriptive statistics.
A key feature of the test scores is the discrete nature of their distribution. We observe only 76 and 128 different values for the standardized test scores at the ages of eight months and seven years, respectively. The middle and bottom panels of Figure 1 present the corresponding histograms. Note that each bar corresponds to exactly one value. For

Quantile treatment effects − Distribution Regression
Probability Difference in the number of outpatient visits Figure 4. Effect of insurance coverage on the number of outpatient visits in the last six months. DFs, QFs, and QTE estimated by Poisson regression and DR including support restricted 95% confidence bands. The lines of the QF for the control group are slightly shifted upward to avoid overlapping with the QF for the treatment group.
instance, at eight months, almost 12% of the observations have exactly the same score and 60% of the observations have one of the most frequent six values. This is a common feature of test scores, which are necessarily discrete because they are based on a finite number of questions.
To gain a better understanding of the causes of the observed black-white test score gap, we provide a distributional decomposition into explained and unexplained parts by observable background characteristics. Let F W |W and F B|B represent the observed test score DFs for white and black children, and F W |B represent the counterfactual DF of test scores that would have prevailed for white children had they had the distribution of background characteristics of black children, F X B , namely, With this counterfactual test score distribution it is possible to decompose the quantiles of the observed black-white test score gap into where the first term in brackets corresponds is the composition effect due to differences in observable background characteristics and the second term is the unexplained difference.
We estimate F W |W and F B|B by the empirical test score distributions for white and black children, respectively. We estimate the counterfactual distribution F W |B by the sample analog of (4.3) replacing F Y W |X W by the DR estimator for white children, and F X B by the empirical distribution of X for black children. We use the logistic link function for the DR, but the results using the linear link function or the normal link function are similar.
Figures 5 and 6 report the results for the eight months and seven years outcomes, respectively. The first panels show the observed and counterfactual QFs, F ← W |W , F ← B|B and F ← W |B . The second panels show the difference between the observed QFs, F ← W |W −F ← B|B . The third and fourth panels decompose these observed differences into the composition ef- The point estimates are shown with their respective 95% simultaneous confidence bands constructed using Algorithm 1 with B = 1, 000 Bayesian bootstrap draws. The bands impose the restrictions that the supports of the test scores correspond to the observed values in the sample.
For eight months old children, we find very small differences between the test score distributions of black and white children. The black-white gap is positive at the lower tail and is mainly due to unobserved characteristics. While these effects are statistically significant, they are so small in magnitude that they should not worry any policy maker.

Unexplained difference
Probability Difference in test scores Figure 5. Decomposition of observed racial differences in mental ability of young children; results for eight months old children. QFs, raw difference, composition effect, and unexplained difference including support restricted 95% confidence bands. The QF lines have been slightly shifted vertically to avoid overlap.
The composition effect is very small, probably simply because there was no difference to explain to begin with.
The results are completely different for seven years old children. We find a large and statistically significant positive raw black-white gap. A formal test based on the uniform bands rejects the null hypothesis of a zero or a negative racial test score gap at all quantiles. The estimated QE function is increasing in the probability index ranging from below 0.6 standard deviation units at the lower tail up to over one standard deviation unit at the

Unexplained difference
Probability Difference in test scores Figure 6. Decomposition of observed racial differences in mental ability of young children; results for seven year old children. QFs, raw difference, composition effect, and unexplained difference including support restricted 95% confidence bands.
upper tail of the distribution. The quantile differences at the tails substantially differ from the mean difference of 0.85 standard deviation units reported in Fryer and Levitt (2013). In fact, we can formally reject the null hypothesis of a constant raw test score gap across the distribution because we can not draw a horizontal line at any value of the difference of test scores, which is covered by the confidence band of the QE function at all probability indexes.
Our decomposition analysis shows that about two third of this gap can be explained by differences in the distribution of observable characteristics. Nevertheless, the remaining unexplained difference is significant, both in economic and in statistical terms. Looking at the QE function, we can see that there is substantial effect heterogeneneity along the distribution. Interestingly, the increase in the test score gap at the upper quantiles can be fully explained by differences in background characteristics between black and white children. The resulting unexplained difference is maximized in the center of the distribution. Finally, our simultaneous confidence bands allow for testing several interesting hypothesis' about the whole QE function. For instance, we can reject the null hypothesis that the composition effect and the unexplained difference are zero, negative, or constant at all quantiles but we cannot reject that they are positive everywhere.

Appendix A. Imposing Monotonicity and Range Restrictions on Estimates and Confidence Bands for Distribution Functions
In many applications the point estimatesF and interval estimates [L , U ] for the target distribution F do not satisfy logical monotonicity or range restrictions, namely they do not take values in the set D defined in Section 2. Given such an ordered triple L ≤F ≤ U , we can always transform it into another ordered triple L ≤F ≤ U that obeys the logical monotonicity and shape restrictions. For example, we can seť where S is the shaping operator that given a function y → f (y) yields a mapping y → S(f )(y) ∈ D with S(f ) = M(0 ∨ f ∧ 1), where the maximum and minimum are taken pointwise, and M is the rearrangement operator that given a function f : Y → [0, 1] yields a map y → M(f )(y) ∈ D. Other monotonization operators, such as the projection on the set of weakly increasing functions, can also be used, as we remark further below.
The rearrangement operator is defined as follows. Let T be a countable subset of Y. In our leading case where f is the distribution function of a discrete random variable Y , we can choose T as the support of Y and extend f to Y by constant interpolation, yielding a step function as the distribution of Y on Y. If f is a distribution function of a continuous or mixed random variable Y , we can set T as a grid of values covering the support of Y where we evaluate f and extend f to Y by linear interpolation. Given a f : T → [0, 1], we first consider Mf as a vector of sorted values of the set {f (t) : t ∈ T }, where the sorting is done in a non-decreasing order. Since T is an ordered set of the same cardinality as Mf , we can assign the elements of Mf to T in one-to-one manner: to the k-th smallest element of T we assign the k-th smallest element of Mf . The resulting mapping t → Mf (t) is the rearrangement operator. We can extend the rearranged function Mf to Y by constant or linear interpolation as we describe above.
The following lemma shows that shape restrictions improve the finite-sample properties of the estimators and confidence bands.

Consequently,
(1) the re-shaped point estimate constructed via (A.1) is weakly closer to F than the initial estimate under the max distance: (2) the re-shaped confidence band constructed via (A.1) has weakly greater coverage than the initial confidence band: (3) and the re-shaped confidence band is weakly shorter than the original confidence bands under the max distance, Proof. The result follows from Chernozhukov et al. (2009).
The band [L, U ] is therefore weakly better than the original band [L , U ], in the sense that coverage is preserved while the width of the confidence band is weakly shorter.
Remark 7 (Isotonization is Another Option). An alternative to the rearrangement is the isotonization, which projects a given function on the set of weakly increasing functions that map T to [0,1]. This also has the improving properties stated in Lemma 1. In fact any convex combination between isotonization and rearrangement has the improving properties stated in Lemma 1.

Remark 8 (Shape Restrictions on Confidence Bands by Intersection
). An alternative way of imposing shape restrictions on the confidence band, is to intersect the initial band [L , U ] with D. That is, we simply set Thus, U I is the greatest nondecreasing minorant of 0 ∨ U ∧ 1 and L I is the smallest nodecreasing majorant of 0 ∨ L ∧ 1. This approach gives the tightest confidence bands, in particular However, this construction might be less robust to misspecification than the rearrangement. For example, imagine that the target function F is not monotone, i.e. F ∈ D. This situation might arise when F is the probability limit of some estimatorF that is inconsistent for the DF due to misspecification. If the confidence band [L , U ] is sufficiently tight, then we can end up with an empty intersection band, [L I , U I ] = ∅. By contrast [L, U ] is non-empty and covers the reshaped target function F * = S(F ) ∈ D.

Appendix B. Bootstrap Algorithms for Confidence Bands for Single Quantile Functions
If one is only interested in a single QF F ← , the QF-band constructed based on Algorithm 1 will generally be conservative. Here, we provide an algorithm that provides asymptotically similar (non-conservative) uniform confidence bands that jointly cover the DF, F , and the corresponding QF, F ← .
(1) Obtain many bootstrap draws of the estimatorF , Corollary 4 (Validity of Algorithm 2). Suppose that the rescaled DF estimator a n (F − F ) converges in law in ∞ (Y) to a Gaussian process G, having zero mean and a nondegenerate variance function, for some sequence of constants a n → ∞ as n → ∞, where n is some index (typically the sample size). Suppose that a bootstrap method can consistently approximate the limit law of a n (F − F ), namely the distance between the law of a n (F * −F ) conditional on data, and that of G, converges to zero in probability as n → ∞. The distance is the bounded Lipschitz metric that metrizes weak convergence. Then, Proof. Lemma SA.1 of Chernozhukov et al. (2013) implies that lim n→∞ P(F ∈ [L , U ]) = p. The result then follows from Lemma 1, Theorem 1 and Corollary 1.

Appendix C. Simulation Study
This section presents simulation evidence on the finite sample performance of our bands. To keep the simulations computationally tractable we analyze a setup without covariates. We generate two independent random samples {Y 1i } n i=1 and {Y 0i } n i=1 for the treated and control outcomes, respectively. The estimators of the DFsF Y 0 andF Y 1 are simply the empirical distribution functions in the respective samples. We perform 5000 simulations and let the sample size n ∈ {400, 1, 600, 6, 400} vary in order to examine the convergence of the coverage rates with respect to the sample size. We consider the problem of constructing uniform confidence bands that cover (i) a single QF: either F ← Y 1 or F ← Y 0 , (ii) simultaneously both DFs, both QFs and the QE function: The confidence bands for a coverage of type (i) are constructed based on Algorithm 2 while the bands for a coverage of type (ii) or (iii) are constructed based on Algorithm 1. We consider three confidence levels p ∈ {0.9, 0.95, 0.99}.
We consider two families of distributions: a count variable similar to the outcome in the first application and an ordered variable similar to the outcome in the second application. In the first case, Y 1i is distributed Poisson with parameter λ = 3 and Y 0i is distributed Poisson with λ ∈ {3, 2.75, 2.5}. Since the support of the Poisson distribution is unbounded, we estimate the QFs and QE functions for a ∈ [0.1, 0.9] and invert the bands for the DF over the part of the support that is relevant for the range of quantiles considered. Table 1 displays the empirical coverage rate of the true functions. We report the coverage rate of the DFs and QFs in a single column because they are numerically equal by construction. In addition, we also provide the empirical probability to reject the null hypothesis that This allows us to measure the empirical size in the first panel (where this hypothesis is satisfied) and the empirical power in the other panels.
The empirical coverage rates of the bands for a single QF (in the third and fourth columns of Table 1) as well as the coverage rate for both DFs, both QFs and the QE function (in the fifth column) confirms the theoretical results in corollaries 4 and 3. The empirical coverage rates are very close to the intended confidence levels p. The bands for these parameters are not conservative. We know from Theorem 2 that the bands for the QE function are valid but may be conservative when the goal is to cover only the QE function independently from the other functions. One of the objective of the simulations is to assess if our QEbands are narrow enough to be informative. The results in the sixth column of  (3)  show that the coverage rate of the bands is indeed larger than the theoretical coverage rate p when the true QE function is uniformly 0 (design 1) but is very close to p when the distributions of the treated and control outcomes are different. The reason for this result is that the Minkowski difference of two non-conservative confidence sets for two QF is not conservative for the difference in the parameters when (at least) one of the confidence set is a singleton. While this case is irrelevant for continuous outcomes, it often happens for discrete outcomes. As it can be seen for instance in Figures 4 or 5, the confidence bands for the QFs contains a single value at many probability indices. Asymptotically, the bands for the QF of a discrete outcome will contain a single value at all quantiles except in the neighborhoods of the quantiles at which the QF jumps. Thus, asymptotically our bands for the QE function are not conservative except for the case when the QFs of Y 1 and Y 0 are identical, i.e. when F ← 1 = F ← 0 uniformly. The second and third panels of the last column in Table 1 provide the empirical power of our bands to reject the null hypothesis that F ← 1 = F ← 0 . Even quite small deviations from the null hypothesis are detected with relatively moderate sample sizes. As expected, the power increases with the sample size and with the deviation from the null hypothesis. Table 2 presents the results for an ordered outcome. Y 0 and Y 1 are both discretized random Gaussian variables that can take the values {0, 1, ..., 5}. Y 1 is based on a latent standard Gaussian random variable while we consider three different latent variables for Y 0 : N (0, 1), N (0.2, 1) and N (0.4, 1). The cut-off values are the same for both outcomes. They are chosen such that Y 1 takes the values {0, 1, ..., 5} with probability {0.1, 0.16, 0.24, 0.24, 0.16, 0.1} respectively. The results are extremely similar to the results in Table 1: (i) the coverage rates for a single QF are very close to the intended coverage rate, (ii) the coverage rate for all QFs, DFs and the QE function is also very close to the intended rate, (iii) the coverage rate for the QE function is higher than the intended rate only when the true QE function is uniformly 0, (iv) the power of our bands to reject an incorrect null hypothesis is substantial and increases in the sample size and the deviation from the null hypothesis.
While our bands are-to the best of our knowledge-the only ones that have been proven to cover uniformly the QFs and the QE functions of discrete outcomes, applied researchers may be tempted to use alternative heuristic approaches. For this reason, we compare the performance of our bands for the QE function with four alternative methods. 7 We first experiment with directly bootstrapping the QE function and calculating sup-t bands. However, the pointwise standard errors obtained via bootstrap are numerically equal to zero at many quantiles such that the t-statistic cannot be computed. We tried putting a lower bound on the pointwise standard errors to be able to calculate the t statistics but this resulted in extremely wide bands that always covered the true function. For this reason we do not report these results in the following tables. The second approach that we consider consists in bootstrapping the QE function and calculate constant width bands. This method avoids the need to divide by the estimated pointwise standard errors and could therefore be implemented. The last two approaching are based on jittering (adding random noise) as suggested by Machado and Silva (2005) for count outcomes. We bootstrap the QE function of the smoothed outcomes and construct sup-t bands centered either around the smoothed QE function or around the original, unsmoothed QE function. Machado and Silva (2005) show that standard methods can be used to make inference about the smoothed quantile function. On the contrary, we are interested in covering the original, unsmoothed QE function.
The results for the count outcomes are provided in Table 3, which compares the coverage probability of our new bands with that of the competing bands as well as the average length of the bands. 8 The constant width bands obtained by bootstrapping directly the QE function are very conservative in all cases. Their average length is two to four times higher than the average length of the bands that we have suggested. This bad behavior of the bootstrap for the QF of a discrete outcome comes at no surprise because it is known to be inconsistent for the estimation of the pointwise variance. Huang (1991) finds in simulations that the bootstrap grossly overestimate the variance of the sample median of a discrete outcome, except when the QF jumps exactly at the median. The estimators based on jittering have the opposite problem: their coverage rate is below the intended rate and is even equal to zero for many distributions. The reason is simple: adding noise smoothes the differences over the whole range of quantiles such that the variance is underestimated where the QF jumps but is overestimated where the QF is flat. Note that these results do not contradict the results in Machado and Silva (2005), which consider the smoothed QF as the true function, but show that adding noise to the outcome cannot help covering the unsmoothed QF. Table 4 presents the results of the simulations for the ordered outcomes. The conclusion are similar: bootstrapping the QE function directly leads to very wide bands while bootstrapping the jittered QE function leads to extreme undercoverage of the true function.
To summarize, for both types of discrete outcomes we come to the conclusion that the alternative methods either do not cover the true QE function with at least the chosen coverage rate or are much longer than the suggested bands. As an interesting by-product of these simulations, we note that the average length of our bands converges to zero at the √ n-rate.