Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024
Required packages: estout
, oaxaca
,
smithwelch
, xtoaxaca
Set the seed of the random number generator for sake of reproducibility:
. set seed 432987
Repeat the Smith-Welch example analysis from the slides
(5-did.pdf
) and evaluate how changing the reference and
“benchmark” estimates (as they are called in the help file) affects the
results. Try to provide clear interpretations of the different elements of
the output and explain how the interpretations change depending on the
choice of the reference and benchmark estimates.
Data preparation as on slides:
. use gsoep-extract, clear (Example data based on the German Socio-Economic Panel) . keep if inlist(wave,1995,2015) (23,792 observations deleted) . keep if inrange(age, 25, 55) (8,147 observations deleted) . generate lnwage = ln(wage) (2,734 missing values generated) . generate expft2 = expft^2 (56 missing values generated) . generate byte t = wave==2015 // 0 = 1995, 1 = 2015 . generate byte female = sex==2 // 0 = male, 1 = female . summarize lnwage yeduc expft expft2 t female Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- lnwage | 8,277 2.71872 .483595 1.108563 4.86638 yeduc | 10,735 12.04332 2.700811 7 18 expft | 10,955 12.21353 9.640926 0 40.5 expft2 | 10,955 242.1094 305.9972 0 1640.25 t | 11,011 .6637908 .4724329 0 1 -------------+--------------------------------------------------------- female | 11,011 .5466352 .497843 0 1 . drop if missing(lnwage,yeduc,expft) // remove unused observation (2,940 observations deleted) . svyset psu [pw=weight], strata(strata) Sampling weights: weight VCE: linearized Single unit: missing Strata 1: strata Sampling unit 1: psu FPC 1: <zero>
The benchmark()
option selects the sample from which the
reference coefficients should be taken, i.e. either from the 1995 sample or
from the 2015 sample. In the example on the slides,
benchmark()
was set such that the reference coefficients are
taken from the 1995 sample. Furthermore, the reference()
option sets whether the reference coefficients are taken from males or from
females. In the example on the slides, reference()
was set
such that male coefficients are used.
Results will depend on these settings, just like the results of the standard OB decomposition depend on the choice of the reference coefficients. We will now compute results for four different combinations (t=1995 vs. t=2015 and male vs. female).
First we need to estimate the four regression models:
. // t=1995 . svy: regress lnwage yeduc expft expft2 if female==0 & t==0 (running regress on estimation sample) Survey: Linear regression Number of strata = 4 Number of obs = 1,486 Number of PSUs = 711 Population size = 6,811,820 Design df = 707 F(3, 705) = 39.19 Prob > F = 0.0000 R-squared = 0.1454 ------------------------------------------------------------------------------ | Linearized lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .0566528 .005958 9.51 0.000 .0449553 .0683502 expft | .0317598 .0056468 5.62 0.000 .0206733 .0428464 expft2 | -.0006142 .0001503 -4.09 0.000 -.0009093 -.000319 _cons | 1.855365 .0898357 20.65 0.000 1.678989 2.031742 ------------------------------------------------------------------------------ . estimates store male_t0 . svy: regress lnwage yeduc expft expft2 if female==1 & t==0 (running regress on estimation sample) Survey: Linear regression Number of strata = 4 Number of obs = 1,123 Number of PSUs = 618 Population size = 4,895,550 Design df = 614 F(3, 612) = 39.54 Prob > F = 0.0000 R-squared = 0.1227 ------------------------------------------------------------------------------ | Linearized lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .0585973 .0060868 9.63 0.000 .0466438 .0705509 expft | .0206192 .0066341 3.11 0.002 .007591 .0336475 expft2 | -.0003731 .0001929 -1.93 0.053 -.0007519 5.64e-06 _cons | 1.768793 .085835 20.61 0.000 1.600227 1.937359 ------------------------------------------------------------------------------ . estimates store female_t0 . // t=2015 . svy: regress lnwage yeduc expft expft2 if female==0 & t==1 (running regress on estimation sample) Survey: Linear regression Number of strata = 15 Number of obs = 2,642 Number of PSUs = 1,536 Population size = 6,310,318 Design df = 1,521 F(3, 1519) = 80.48 Prob > F = 0.0000 R-squared = 0.2803 ------------------------------------------------------------------------------ | Linearized lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .0829495 .005573 14.88 0.000 .072018 .093881 expft | .035725 .0059186 6.04 0.000 .0241155 .0473345 expft2 | -.0005933 .0001565 -3.79 0.000 -.0009002 -.0002863 _cons | 1.429661 .0955876 14.96 0.000 1.242164 1.617159 ------------------------------------------------------------------------------ . estimates store male_t1 . svy: regress lnwage yeduc expft expft2 if female==1 & t==1 (running regress on estimation sample) Survey: Linear regression Number of strata = 15 Number of obs = 2,820 Number of PSUs = 1,658 Population size = 5,841,900 Design df = 1,643 F(3, 1641) = 110.35 Prob > F = 0.0000 R-squared = 0.2772 ------------------------------------------------------------------------------ | Linearized lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .0789256 .0052929 14.91 0.000 .0685439 .0893072 expft | .0313377 .0049286 6.36 0.000 .0216707 .0410046 expft2 | -.0005405 .0001472 -3.67 0.000 -.0008292 -.0002518 _cons | 1.403916 .0741614 18.93 0.000 1.258456 1.549377 ------------------------------------------------------------------------------ . estimates store female_t1
We can no run the four variants of the decomposition. For easy comparison, we also store the results in a matrix.
. capture matrix drop results . // benchmark: t=1995; reference: male . smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(1) reference(1) Decompositions of individual differentials: ----------------------------------------------- | D E C -------------+--------------------------------- Sample 1 | .2207358 .0758792 .1448567 Sample 2 | .2053074 .0904872 .1148202 ----------------------------------------------- Difference in (components of) differentials: ----------------------------------------------- | dD dE dC -------------+--------------------------------- | -.0154284 .014608 -.0300365 ----------------------------------------------- Decomposition of difference in differentials: ----------------------------------------------- | D E C -------------+--------------------------------- dE | .014608 -.0087573 .0233653 dC | -.0300365 -.0105462 -.0194903 ----------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients . matrix results = r(DD)' . // benchmark: t=1995; reference: female . smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(1) reference(2) Decompositions of individual differentials: ----------------------------------------------- | D E C -------------+--------------------------------- Sample 1 | .2207358 .0590409 .1616949 Sample 2 | .2053074 .0739469 .1313605 ----------------------------------------------- Difference in (components of) differentials: ----------------------------------------------- | dD dE dC -------------+--------------------------------- | -.0154284 .014906 -.0303344 ----------------------------------------------- Decomposition of difference in differentials: ----------------------------------------------- | D E C -------------+--------------------------------- dE | .014906 -.0154103 .0303163 dC | -.0303344 -.0038931 -.0264413 ----------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients . matrix results = results, r(DD)' . // benchmark: t=2015; reference: male . smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(2) reference(1) Decompositions of individual differentials: ----------------------------------------------- | D E C -------------+--------------------------------- Sample 1 | .2207358 .0758792 .1448567 Sample 2 | .2053074 .0904872 .1148202 ----------------------------------------------- Difference in (components of) differentials: ----------------------------------------------- | dD dE dC -------------+--------------------------------- | -.0154284 .014608 -.0300365 ----------------------------------------------- Decomposition of difference in differentials: ----------------------------------------------- | D E C -------------+--------------------------------- dE | .014608 -.0157749 .0303829 dC | -.0300365 -.0003061 -.0297303 ----------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients . matrix results = results, r(DD)' . // benchmark: t=2015; reference: female . smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(2) reference(2) Decompositions of individual differentials: ----------------------------------------------- | D E C -------------+--------------------------------- Sample 1 | .2207358 .0590409 .1616949 Sample 2 | .2053074 .0739469 .1313605 ----------------------------------------------- Difference in (components of) differentials: ----------------------------------------------- | dD dE dC -------------+--------------------------------- | -.0154284 .014906 -.0303344 ----------------------------------------------- Decomposition of difference in differentials: ----------------------------------------------- | D E C -------------+--------------------------------- dE | .014906 -.0175483 .0324543 dC | -.0303344 .0014673 -.0318018 ----------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients . matrix results = results, r(DD)'
First, note that the gender-wage gap is somewhat smaller in 2015 than in 1995 (0.205 vs. 0.221; difference of 0.015). Furthermore, have a look at the table labeled "Decompositions of individual differentials". The choice of the benchmark sample has no effect on these results because they are separate Oaxaca-Blinder decompositions by time point (1995 vs. 2015). The results, however, depend on whether the male coefficients or the female coefficients are used as reference coefficients. At both time points, the explained part is larger with the male coefficients than with the female coefficients (0.076 vs. 0.059 in 1995; 0.090 vs. 0.074 in 2015). This is mostly due to the fact that men have steeper wage profiles across work experience at both time points.
Now have a look at the other results. The overview is as follows (the
matrix also contains results labeled as "EC" and "dEC", which are all zero;
these components are interaction terms that are only relevant in three-fold
decompositions, i.e. when the reference()
and/or
benchmark()
options are omitted):
. matrix colnames results = "1995:male" "1995:female" "2015:male" "2015:female" . matlist results | 1995 | 2015 | male female | male female -------------+----------------------+--------------------- dE | | D | .014608 .014906 | .014608 .014906 E | -.0087573 -.0154103 | -.0157749 -.0175483 C | .0233653 .0303163 | .0303829 .0324543 EC | 0 0 | 0 0 -------------+----------------------+--------------------- dC | | D | -.0300365 -.0303344 | -.0300365 -.0303344 E | -.0105462 -.0038931 | -.0003061 .0014673 C | -.0194903 -.0264413 | -.0297303 -.0318018 EC | 0 0 | 0 0 -------------+----------------------+--------------------- dEC | | D | 0 0 | 0 0 E | 0 0 | 0 0 C | 0 0 | 0 0 EC | 0 0 | 0 0
The rows "dE:D" and "dC:D" show the time differences in the explained part and in the unexplained part. A positive value means that the corresponding component is larger in 2015 than in 1995. We see that the "dE:D" component is positive, that is, the explained wage gap is larger in 2015 than in 1995. At the same time, the unexplained wage gap has become smaller in 2015 when compared to 1995 (negative "dC:D" component). Moreover, in terms of magnitude, the change in the unexplained part is more pronounced than the change in the explained part.
The exact breakdown depends on whether the male coefficients or the female coefficients are used as the reference coefficients, but the differences are very small (0.0146 vs. 0.0149 for the explained part; -0,0300 vs. -0.0303 for the unexplained part; the choice of the benchmark sample is irrelevant for these results). That is, although the choice of the reference coefficients is relevant for the division between the explained part and the unexplained part of the wage gap (see above), the time differences in these components remain stable. This is because the change from male to female reference coefficients has a similar effect in both time points.
The main terms of interest of the Smith-Welch decomposition are the terms "dE:E" and "dC:C". Term "dE:E" indicates how much of the difference in the explained part between 2015 and 1995 is because men and women are, in fact, more similar or dissimilar to each other in terms of their wage predictors in 1995 than in 2015. We see that the value is negative, meaning that, in total, men and women became more similar over time. (But: As we know, women became "more similar" to men by surpassing them with respect to schooling, while at the same time losing ground with respect to full-time experience.) Term "dE:E" is a bit more pronounced if we use female coefficients as the reference coefficients. Since this component has a negative sign, we conclude that changes in the gender gap in wage predictors per se has led to a narrowing gender wage gap.
The other component of the difference in the explained part (term "dE:C") is positive because a given male–female gap in predictors is more relevant for the gender wage gap in 2015 than in 1995 (i.e., the gap in predictors in 2015 or 1995 is multiplied by larger coefficients). This is due to the fact that for both men and women returns to education increased and wage profiles across work experience got steeper between 1995 and 2015. Term "dE:C" is a bit less pronounced if we use male coefficients as the reference coefficients. Likewise, the terms is a bit less pronounced when using 1995 as "benchmark". Term "dE:C" is larger than "dE:E", leading to an overall increase in the explained part of the decomposition over time. That is, even though the gender gap in predictors decreased (negative "dE:E"), the explained part of the decomposition increased because the gap in predictors became more relevant due to increasing effects of the predictors-
Term "dC:C" indicates how much of the difference in the unexplained part between 2015 and 1995 is because the male–female difference in coefficients is more/less pronounced in 2015 than in 1995. As the term is negative, we conclude, that between 1995 and 2015 men and women became more similar with respect to the gender-specific wage structure (i.e. the regression coefficients). We see that most of the decline in the unexplained part over time is due to this.
The other component of the difference in the unexplained part (term "dC:E") is also (mostly) negative. This implies that changes in male or female wage predictors would have narrowed the gender wage gap if male–female difference in coefficients had stayed at their 1995/2015-level. Choosing male coefficients as the reference as well as choosing 1995 as "benchmark" results in somewhat stronger effects for this component.
Compute bootstrap standard errors for the different decompositions.
The whole process – the estimation of the regression models and the
subsequent decomposition – needs to be bootstrapped. That is, we need to
write a little program that estimates all models, then applies
smithwelch
, and then posts the results in a way such that
bootstrap
can easily collect them. The program could look
about as follows.
capt prog drop mysmithwelch
program mysmithwelch, eclass
regress lnwage yeduc expft expft2 [pw=weight] if female==0 & t==0
estimates store male_t0
regress lnwage yeduc expft expft2 [pw=weight] if female==1 & t==0
estimates store female_t0
regress lnwage yeduc expft expft2 [pw=weight] if female==0 & t==1
estimates store male_t1
regress lnwage yeduc expft expft2 [pw=weight] if female==1 & t==1
estimates store female_t1
smithwelch male_t0 female_t0 male_t1 female_t1 `0'
matrix b = r(DD)
ereturn post b
end
Remarks:
smithwelch
stores its results in r()
; see the
help file for details. For bootstrap
, however, it is
easiest if the results to be bootstrapped are returned in vector
e(b)
. Posting results in e()
can be
accomplished using command ereturn post
; a prerequisite
for using ereturn post
is that the program is declared as
an "eclass" program.
smithwelch
in matrix
r(DD)
) are included in the bootstrap, as these results are
the main results of interest in the Smith-Welch decomposition. Of
course, other results could be included by modifying the program.
mysmithwelch
will be passed
through to smithwelch
; this allows us to specify the
benchmark()
and reference()
options.
We can now use the program to obtain bootstrap estimates for the different variants of the decomposition:
. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(1) reference(1) (running mysmithwelch on estimation sample) Bootstrap replications (100): .........10.........20.........30.........40.........50.........60.... > .....70.........80.........90.........100 done Bootstrap results Number of strata = 15 Number of obs = 8,071 Replications = 100 (Replications based on 2,459 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- dE | D | .014608 .0211848 0.69 0.490 -.0269134 .0561295 E | -.0087573 .0101853 -0.86 0.390 -.0287202 .0112056 C | .0233653 .0169194 1.38 0.167 -.009796 .0565267 EC | 0 (omitted) -------------+---------------------------------------------------------------- dC | D | -.0300365 .0330158 -0.91 0.363 -.0947463 .0346733 E | -.0105462 .0118447 -0.89 0.373 -.0337614 .0126691 C | -.0194903 .0355744 -0.55 0.584 -.0892148 .0502342 EC | 0 (omitted) -------------+---------------------------------------------------------------- dEC | D | 0 (omitted) E | 0 (omitted) C | 0 (omitted) EC | 0 (omitted) ------------------------------------------------------------------------------ . estimates store b1995m . bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(1) reference(2) (running mysmithwelch on estimation sample) Bootstrap replications (100): .........10.........20.........30.........40.........50.........60.... > .....70.........80.........90.........100 done Bootstrap results Number of strata = 15 Number of obs = 8,071 Replications = 100 (Replications based on 2,459 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- dE | D | .014906 .0179688 0.83 0.407 -.0203122 .0501242 E | -.0154103 .0109453 -1.41 0.159 -.0368628 .0060421 C | .0303163 .0159724 1.90 0.058 -.000989 .0616217 EC | 0 (omitted) -------------+---------------------------------------------------------------- dC | D | -.0303344 .0284484 -1.07 0.286 -.0860923 .0254235 E | -.0038931 .0063029 -0.62 0.537 -.0162465 .0084602 C | -.0264413 .030142 -0.88 0.380 -.0855186 .032636 EC | 0 (omitted) -------------+---------------------------------------------------------------- dEC | D | 0 (omitted) E | 0 (omitted) C | 0 (omitted) EC | 0 (omitted) ------------------------------------------------------------------------------ . estimates store b1995f . bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(2) reference(1) (running mysmithwelch on estimation sample) Bootstrap replications (100): .........10.........20.........30.........40.........50.........60.... > .....70.........80.........90.........100 done Bootstrap results Number of strata = 15 Number of obs = 8,071 Replications = 100 (Replications based on 2,459 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- dE | D | .014608 .0171974 0.85 0.396 -.0190983 .0483143 E | -.0157749 .0143642 -1.10 0.272 -.0439282 .0123784 C | .0303829 .0122791 2.47 0.013 .0063164 .0544494 EC | 0 (omitted) -------------+---------------------------------------------------------------- dC | D | -.0300365 .0265342 -1.13 0.258 -.0820425 .0219695 E | -.0003061 .0085659 -0.04 0.971 -.017095 .0164827 C | -.0297303 .0283896 -1.05 0.295 -.085373 .0259123 EC | 0 (omitted) -------------+---------------------------------------------------------------- dEC | D | 0 (omitted) E | 0 (omitted) C | 0 (omitted) EC | 0 (omitted) ------------------------------------------------------------------------------ . estimates store b2015m . bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(2) reference(2) (running mysmithwelch on estimation sample) Bootstrap replications (100): .........10.........20.........30.........40.........50.........60.... > .....70.........80.........90.........100 done Bootstrap results Number of strata = 15 Number of obs = 8,071 Replications = 100 (Replications based on 2,459 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- dE | D | .014906 .0149276 1.00 0.318 -.0143515 .0441635 E | -.0175483 .0141851 -1.24 0.216 -.0453507 .010254 C | .0324543 .0133635 2.43 0.015 .0062623 .0586464 EC | 0 (omitted) -------------+---------------------------------------------------------------- dC | D | -.0303344 .0277915 -1.09 0.275 -.0848048 .0241359 E | .0014673 .0053881 0.27 0.785 -.0090932 .0120278 C | -.0318018 .0275465 -1.15 0.248 -.0857919 .0221884 EC | 0 (omitted) -------------+---------------------------------------------------------------- dEC | D | 0 (omitted) E | 0 (omitted) C | 0 (omitted) EC | 0 (omitted) ------------------------------------------------------------------------------ . estimates store b2015f
Important remark: We applied option nowarn
to
bootstrap
to suppress a warning message about the estimation
sample not being marked. When using the bootstrap
, one
typically wants to restrict the resampling to observations that are
actually used by the bootstrapped command (the "estimation sample"). This
is why bootstrap
checks whether the estimation sample has been
marked (by looking at whether e(sample)
is defined). We did
not bother constructing the e(sample)
in the example above
because we know that all observations in the dataset will be used by
mysmithwelch
(remember that we dropped all observations with
missing values at the beginning of the exercise). If there were
observations in the dataset that are not used by mysmithwelch
it would be important either to drop these observations before applying the
bootstrap or to modify mysmithwelch
such that it identifies
the estimation sample.
Overview table:
. esttab b1995m b1995f b2015m b2015f, mtitle nonum drop(dEC: :EC) ---------------------------------------------------------------------------- b1995m b1995f b2015m b2015f ---------------------------------------------------------------------------- dE D 0.0146 0.0149 0.0146 0.0149 (0.69) (0.83) (0.85) (1.00) E -0.00876 -0.0154 -0.0158 -0.0175 (-0.86) (-1.41) (-1.10) (-1.24) C 0.0234 0.0303 0.0304* 0.0325* (1.38) (1.90) (2.47) (2.43) ---------------------------------------------------------------------------- dC D -0.0300 -0.0303 -0.0300 -0.0303 (-0.91) (-1.07) (-1.13) (-1.09) E -0.0105 -0.00389 -0.000306 0.00147 (-0.89) (-0.62) (-0.04) (0.27) C -0.0195 -0.0264 -0.0297 -0.0318 (-0.55) (-0.88) (-1.05) (-1.15) ---------------------------------------------------------------------------- N 8071 8071 8071 8071 ---------------------------------------------------------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001
The overall change in the gender wage gap can be obtained from the bootstrap results as follows (we can use any of the variants as the point estimate for the overall change will be the same; the standard error will be slightly different depending on variant due to the random error of bootstrap estimation):
. estimates restore b1995m (results b1995m are active now) . lincom _b[dE:D] + _b[dC:D] + _b[dEC:D] ( 1) [dE]D + [dC]D + [dEC]D = 0 ------------------------------------------------------------------------------ | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- (1) | -.0154284 .0297243 -0.52 0.604 -.0736871 .0428302 ------------------------------------------------------------------------------
Interpretation: Hardly any of the results are statistically significant, which is not very surprising since already the small overall decrease in the gender wage gap over time is far from being significant (although a non-significant overall change does not imply that the decomposition components will not be significant; the different components may offset each other). If, in fact, there were systematic changes over time, these changes have likely been small and the sample at hand does not seem to have sufficient statistical power to detect them. Only for "dE:C" we find some weak evidence of a systematic effect.
Compute the “interventionist” decomposition proposed by Kröger and Hartmann (2021) (see the slides) using 1995 as the starting point and interpret the results. Optionally provide bootstrap standard errors.
Hint: You can try to use the xtoaxaca
command provided by
Kröger and Hartmann (type findit xtoaxaca
). However, we find
it more transparent to do this decomposition “by hand” using the formulas
on the slides. A further alternative is to compute the “interventionist”
decomposition from the results returned by smithwelch
.
We put together a small program that computes all required regressions and
also collects the means, and then computes the results using matrix
multiplication. This way it is easy to apply bootstrap
.
capt prog drop mydecomp
program mydecomp, eclass
local y lnwage
local x yeduc expft expft2
local wgt [pw=weight]
tempname bm0 bf0 bm1 bf1 Xm0 Xf0 Xm1 Xf1
// males in 1995
regress `y' `x' `wgt' if t==0 & female==0
matrix `bm0' = e(b)'
mean `x' `wgt' if e(sample)
matrix `Xm0' = e(b), 1
// females in 1995
regress `y' `x' `wgt' if t==0 & female==1
matrix `bf0' = e(b)'
mean `x' `wgt' if e(sample)
matrix `Xf0' = e(b), 1
// males in 2015
regress `y' `x' `wgt' if t==1 & female==0
matrix `bm1' = e(b)'
mean `x' `wgt' if e(sample)
matrix `Xm1' = e(b), 1
// females in 2015
regress `y' `x' `wgt' if t==1 & female==1
matrix `bf1' = e(b)'
mean `x' `wgt' if e(sample)
matrix `Xf1' = e(b), 1
// decomposition
tempname b
matrix `b' = ///
(`Xm1'-`Xm0')*`bm0' - (`Xf1'-`Xf0')*`bf0', ///
`Xm0'*(`bm1'-`bm0') - `Xf0'*(`bf1'-`bf0'), ///
(`Xm1'-`Xm0')*(`bm1'-`bm0') - (`Xf1'-`Xf0')*(`bf1'-`bf0')
matrix `b' = `b', `b'[1,1] + `b'[1,2] + `b'[1,3]
matrix colnames `b' = "Endowments" "Coefficients" "Interactions" "Total"
ereturn post `b'
end
We now apply bootstrap
to the program (see above for
remarks on the nowarn
option).
. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mydecomp (running mydecomp on estimation sample) Bootstrap replications (100): .........10.........20.........30.........40.........50.........60.... > .....70.........80.........90.........100 done Bootstrap results Number of strata = 15 Number of obs = 8,071 Replications = 100 (Replications based on 2,459 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Endowments | -.0193035 .0138118 -1.40 0.162 -.0463741 .0077672 Coefficients | .0006526 .0264038 0.02 0.980 -.051098 .0524031 Interactions | .0032225 .0109917 0.29 0.769 -.0183208 .0247657 Total | -.0154284 .0292767 -0.53 0.598 -.0728097 .0419528 ------------------------------------------------------------------------------
The results indicate that the gender wage gap decreases if the gender-specific levels of schooling and work experience found in 1995 are adjusted to those found in 2014. Conversely, if we adjusted the gender-specific wage structure found in 1995 to the one found in 2015, the gender wage gap slightly increases. However, none of these results is statistically significant.
Overall, the results seem to be at odds with the results from the Smith-Welch decomposition. However, note that the “interventionist” decomposition has a somewhat different perspective than the Smith-Welch decomposition. The starting point of the Smith-Welch decomposition are two cross-sectional Oaxaca-Blinder decompositions and the goal is to explain how the differences in the explained part and in the unexplained part of these decompositions come about (i.e. the difference in the explained part and the difference in the unexplained part are separately decomposed into a part due to differences in endowments and a part due to differences in coefficients). The starting point of the “interventionist” decomposition are the overall gaps in two cross-sections and the goal is to show how much of the difference in the gaps is due to a different configuration of endowments and how much is due to a different configuration of coefficients.
Despite the different perspectives, looking at the formulas we see that the
(threefold) Smith-Welch decomposition computes all components
required for the “interventionist” decomposition. The sums of the columns
in the last table of the Smith-Welch output are equal to the components
of the “interventionist” decomposition. So we can simply use
smithwelch
to compute the “interventionist” decomposition:
. quietly svy: regress lnwage yeduc expft expft2 if female==0 & t==0 . estimates store male_t0 . quietly svy: regress lnwage yeduc expft expft2 if female==1 & t==0 . estimates store female_t0 . quietly svy: regress lnwage yeduc expft expft2 if female==0 & t==1 . estimates store male_t1 . quietly svy: regress lnwage yeduc expft expft2 if female==1 & t==1 . estimates store female_t1 . smithwelch male_t0 female_t0 male_t1 female_t1 Decompositions of individual differentials: ---------------------------------------------------------- | D E C EC -------------+-------------------------------------------- Sample 1 | .2207358 .0590409 .1448567 .0168382 Sample 2 | .2053074 .0739469 .1148202 .0165403 ---------------------------------------------------------- Difference in (components of) differentials: ---------------------------------------------------------- | dD dE dC dEC -------------+-------------------------------------------- | -.0154284 .014906 -.0300365 -.000298 ---------------------------------------------------------- Decomposition of difference in differentials: ---------------------------------------------------------- | D E C EC -------------+-------------------------------------------- dE | .014906 -.0154103 .0324543 -.002138 dC | -.0300365 -.0105462 -.0297303 .01024 dEC | -.000298 .006653 -.0020714 -.0048796 ---------------------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients EC = interaction E x C . matrix DD = r(DD) . display "Endowments = " DD[1,"dE:E"] + DD[1,"dC:E"] + DD[1,"dEC:E"] Endowments = -.01930346 . display "Coefficients = " DD[1,"dE:C"] + DD[1,"dC:C"] + DD[1,"dEC:C"] Coefficients = .00065257 . display "Interactions = " DD[1,"dE:EC"] + DD[1,"dC:EC"] + DD[1,"dEC:EC"] Interactions = .00322246 . display "Total = " DD[1,"dE:D"] + DD[1,"dC:D"] + DD[1,"dEC:D"] Total = -.01542843
We could also specify option reference(1)
or
reference(2)
to reduce the number of terms to be
added up. Results will stay the same:
. smithwelch male_t0 female_t0 male_t1 female_t1, reference(1) Decompositions of individual differentials: ----------------------------------------------- | D E C -------------+--------------------------------- Sample 1 | .2207358 .0758792 .1448567 Sample 2 | .2053074 .0904872 .1148202 ----------------------------------------------- Difference in (components of) differentials: ----------------------------------------------- | dD dE dC -------------+--------------------------------- | -.0154284 .014608 -.0300365 ----------------------------------------------- Decomposition of difference in differentials: ---------------------------------------------------------- | D E C EC -------------+-------------------------------------------- dE | .014608 -.0087573 .0303829 -.0070176 dC | -.0300365 -.0105462 -.0297303 .01024 ---------------------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients EC = interaction E x C . matrix DD = r(DD) . display "Endowments = " DD[1,"dE:E"] + DD[1,"dC:E"] Endowments = -.01930346 . display "Coefficients = " DD[1,"dE:C"] + DD[1,"dC:C"] Coefficients = .00065257 . display "Interactions = " DD[1,"dE:EC"] + DD[1,"dC:EC"] Interactions = .00322246 . display "Total = " DD[1,"dE:D"] + DD[1,"dC:D"] Total = -.01542843 . smithwelch male_t0 female_t0 male_t1 female_t1, reference(2) Decompositions of individual differentials: ----------------------------------------------- | D E C -------------+--------------------------------- Sample 1 | .2207358 .0590409 .1616949 Sample 2 | .2053074 .0739469 .1313605 ----------------------------------------------- Difference in (components of) differentials: ----------------------------------------------- | dD dE dC -------------+--------------------------------- | -.0154284 .014906 -.0303344 ----------------------------------------------- Decomposition of difference in differentials: ---------------------------------------------------------- | D E C EC -------------+-------------------------------------------- dE | .014906 -.0154103 .0324543 -.002138 dC | -.0303344 -.0038931 -.0318018 .0053605 ---------------------------------------------------------- D = differential / difference in component of differential E = part of D due to differences in endowments C = part of D due to differences in coefficients EC = interaction E x C . matrix DD = r(DD) . display "Endowments = " DD[1,"dE:E"] + DD[1,"dC:E"] Endowments = -.01930346 . display "Coefficients = " DD[1,"dE:C"] + DD[1,"dC:C"] Coefficients = .00065257 . display "Interactions = " DD[1,"dE:EC"] + DD[1,"dC:EC"] Interactions = .00322246 . display "Total = " DD[1,"dE:D"] + DD[1,"dC:D"] Total = -.01542843
Using xtoaxaca
: Kröger and Hartmann (2021) provide
command xtoaxaca
that can be used to compute the
“interventionist” decomposition (as well as other types of
difference-in-differences decompositions). The procedure is such that first
a model including all data is estimated, including all interactions with
the two grouping variables (in our case this is equivalent to estimating
four separate models). The model has to be stored and xtoaxaca
is then applied to this stored model. The results from above can be
reproduced using xtoaxaca
as follows:
. quietly regress lnwage c.(yeduc expft expft2)##i.female##i.t [pw=weight] . estimates store m1 . xtoaxaca yeduc expft expft2, groupvar(female) groupcat(0 1) /// > timevar(t) times(0 1) timeref(0) change(interventionist) model(m1) /// > weights(weight) WARNING: This is a beta version. Please check the results carefully and report bugs and suggestions to hkroeger@diw.de Decomposition of Levels Summary of level decomposition -------------------------------------- t 0 1 -------------------------------------- Level non-parame~c 0.214 0.232 -------------------------------------- Decomp Endowments 0.059 0.074 Coefficients 0.145 0.115 Interaction 0.017 0.017 Total 0.221 0.205 -------------------------------------- Decomp % Endowments 26.747 36.018 Coefficients 65.624 55.926 Interaction 7.628 8.056 Total 100.000 100.000 -------------------------------------- Decomposition of Change Summary of changes in the outcome -------------------------------------- t 0 1 -------------------------------------- Change non-parame~c 0.000 0.017 -------------------------------------- Decomp Endowments 0.000 -0.019 Coefficients 0.000 0.001 Interactions 0.000 0.003 Total 0.000 -0.015 -------------------------------------- Decomp % Endowments . 125.116 Coefficients . -4.230 Interactions . -20.887 Total . 100.000 -------------------------------------- For an explanation of this change decomposition, please see: Kröger, H., & Hartmann, J. (2020). xtoaxaca - Extending the Kitagawa-Oaxaca-Blinder Decomposition Ap > proach to Panel Data. https://doi.org/10.31235/osf.io/egj79