Decomposition Methods in the Social Sciences

Solutions to Exercise 5: Difference-in-differences decomposition

Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024

Required packages: estout, oaxaca, smithwelch, xtoaxaca

Set the seed of the random number generator for sake of reproducibility:

. set seed 432987

Task 1: Smith-Welch decomposition

Repeat the Smith-Welch example analysis from the slides (5-did.pdf) and evaluate how changing the reference and “benchmark” estimates (as they are called in the help file) affects the results. Try to provide clear interpretations of the different elements of the output and explain how the interpretations change depending on the choice of the reference and benchmark estimates.

Data preparation as on slides:

. use gsoep-extract, clear
(Example data based on the German Socio-Economic Panel)

. keep if inlist(wave,1995,2015)
(23,792 observations deleted)

. keep if inrange(age, 25, 55)
(8,147 observations deleted)

. generate lnwage = ln(wage)
(2,734 missing values generated)

. generate expft2 = expft^2
(56 missing values generated)

. generate byte t = wave==2015   // 0 = 1995, 1 = 2015

. generate byte female = sex==2  // 0 = male, 1 = female

. summarize lnwage yeduc expft expft2 t female

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      lnwage |      8,277     2.71872     .483595   1.108563    4.86638
       yeduc |     10,735    12.04332    2.700811          7         18
       expft |     10,955    12.21353    9.640926          0       40.5
      expft2 |     10,955    242.1094    305.9972          0    1640.25
           t |     11,011    .6637908    .4724329          0          1
-------------+---------------------------------------------------------
      female |     11,011    .5466352     .497843          0          1

. drop if missing(lnwage,yeduc,expft) // remove unused observation
(2,940 observations deleted)

. svyset psu [pw=weight], strata(strata)

Sampling weights: weight
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

The benchmark() option selects the sample from which the reference coefficients should be taken, i.e. either from the 1995 sample or from the 2015 sample. In the example on the slides, benchmark() was set such that the reference coefficients are taken from the 1995 sample. Furthermore, the reference() option sets whether the reference coefficients are taken from males or from females. In the example on the slides, reference() was set such that male coefficients are used.

Results will depend on these settings, just like the results of the standard OB decomposition depend on the choice of the reference coefficients. We will now compute results for four different combinations (t=1995 vs. t=2015 and male vs. female).

First we need to estimate the four regression models:

. // t=1995
. svy: regress lnwage yeduc expft expft2 if female==0 & t==0
(running regress on estimation sample)

Survey: Linear regression

Number of strata =   4                             Number of obs   =     1,486
Number of PSUs   = 711                             Population size = 6,811,820
                                                   Design df       =       707
                                                   F(3, 705)       =     39.19
                                                   Prob > F        =    0.0000
                                                   R-squared       =    0.1454

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       yeduc |   .0566528    .005958     9.51   0.000     .0449553    .0683502
       expft |   .0317598   .0056468     5.62   0.000     .0206733    .0428464
      expft2 |  -.0006142   .0001503    -4.09   0.000    -.0009093    -.000319
       _cons |   1.855365   .0898357    20.65   0.000     1.678989    2.031742
------------------------------------------------------------------------------

. estimates store male_t0

. svy: regress lnwage yeduc expft expft2 if female==1 & t==0
(running regress on estimation sample)

Survey: Linear regression

Number of strata =   4                             Number of obs   =     1,123
Number of PSUs   = 618                             Population size = 4,895,550
                                                   Design df       =       614
                                                   F(3, 612)       =     39.54
                                                   Prob > F        =    0.0000
                                                   R-squared       =    0.1227

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       yeduc |   .0585973   .0060868     9.63   0.000     .0466438    .0705509
       expft |   .0206192   .0066341     3.11   0.002      .007591    .0336475
      expft2 |  -.0003731   .0001929    -1.93   0.053    -.0007519    5.64e-06
       _cons |   1.768793    .085835    20.61   0.000     1.600227    1.937359
------------------------------------------------------------------------------

. estimates store female_t0

. // t=2015
. svy: regress lnwage yeduc expft expft2 if female==0 & t==1
(running regress on estimation sample)

Survey: Linear regression

Number of strata =    15                           Number of obs   =     2,642
Number of PSUs   = 1,536                           Population size = 6,310,318
                                                   Design df       =     1,521
                                                   F(3, 1519)      =     80.48
                                                   Prob > F        =    0.0000
                                                   R-squared       =    0.2803

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       yeduc |   .0829495    .005573    14.88   0.000      .072018     .093881
       expft |    .035725   .0059186     6.04   0.000     .0241155    .0473345
      expft2 |  -.0005933   .0001565    -3.79   0.000    -.0009002   -.0002863
       _cons |   1.429661   .0955876    14.96   0.000     1.242164    1.617159
------------------------------------------------------------------------------

. estimates store male_t1

. svy: regress lnwage yeduc expft expft2 if female==1 & t==1
(running regress on estimation sample)

Survey: Linear regression

Number of strata =    15                           Number of obs   =     2,820
Number of PSUs   = 1,658                           Population size = 5,841,900
                                                   Design df       =     1,643
                                                   F(3, 1641)      =    110.35
                                                   Prob > F        =    0.0000
                                                   R-squared       =    0.2772

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       yeduc |   .0789256   .0052929    14.91   0.000     .0685439    .0893072
       expft |   .0313377   .0049286     6.36   0.000     .0216707    .0410046
      expft2 |  -.0005405   .0001472    -3.67   0.000    -.0008292   -.0002518
       _cons |   1.403916   .0741614    18.93   0.000     1.258456    1.549377
------------------------------------------------------------------------------

. estimates store female_t1

We can no run the four variants of the decomposition. For easy comparison, we also store the results in a matrix.

. capture matrix drop results

. // benchmark: t=1995; reference: male
. smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(1) reference(1)

Decompositions of individual differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
    Sample 1 |   .2207358   .0758792   .1448567
    Sample 2 |   .2053074   .0904872   .1148202
-----------------------------------------------

Difference in (components of) differentials:
-----------------------------------------------
             |         dD         dE         dC
-------------+---------------------------------
             |  -.0154284    .014608  -.0300365
-----------------------------------------------

Decomposition of difference in differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
          dE |    .014608  -.0087573   .0233653
          dC |  -.0300365  -.0105462  -.0194903
-----------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients

. matrix results = r(DD)'

. // benchmark: t=1995; reference: female
. smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(1) reference(2)

Decompositions of individual differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
    Sample 1 |   .2207358   .0590409   .1616949
    Sample 2 |   .2053074   .0739469   .1313605
-----------------------------------------------

Difference in (components of) differentials:
-----------------------------------------------
             |         dD         dE         dC
-------------+---------------------------------
             |  -.0154284    .014906  -.0303344
-----------------------------------------------

Decomposition of difference in differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
          dE |    .014906  -.0154103   .0303163
          dC |  -.0303344  -.0038931  -.0264413
-----------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients

. matrix results = results, r(DD)'

. // benchmark: t=2015; reference: male
. smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(2) reference(1)

Decompositions of individual differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
    Sample 1 |   .2207358   .0758792   .1448567
    Sample 2 |   .2053074   .0904872   .1148202
-----------------------------------------------

Difference in (components of) differentials:
-----------------------------------------------
             |         dD         dE         dC
-------------+---------------------------------
             |  -.0154284    .014608  -.0300365
-----------------------------------------------

Decomposition of difference in differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
          dE |    .014608  -.0157749   .0303829
          dC |  -.0300365  -.0003061  -.0297303
-----------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients

. matrix results = results, r(DD)'

. // benchmark: t=2015; reference: female
. smithwelch male_t0 female_t0 male_t1 female_t1, benchmark(2) reference(2)

Decompositions of individual differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
    Sample 1 |   .2207358   .0590409   .1616949
    Sample 2 |   .2053074   .0739469   .1313605
-----------------------------------------------

Difference in (components of) differentials:
-----------------------------------------------
             |         dD         dE         dC
-------------+---------------------------------
             |  -.0154284    .014906  -.0303344
-----------------------------------------------

Decomposition of difference in differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
          dE |    .014906  -.0175483   .0324543
          dC |  -.0303344   .0014673  -.0318018
-----------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients

. matrix results = results, r(DD)'

First, note that the gender-wage gap is somewhat smaller in 2015 than in 1995 (0.205 vs. 0.221; difference of 0.015). Furthermore, have a look at the table labeled "Decompositions of individual differentials". The choice of the benchmark sample has no effect on these results because they are separate Oaxaca-Blinder decompositions by time point (1995 vs. 2015). The results, however, depend on whether the male coefficients or the female coefficients are used as reference coefficients. At both time points, the explained part is larger with the male coefficients than with the female coefficients (0.076 vs. 0.059 in 1995; 0.090 vs. 0.074 in 2015). This is mostly due to the fact that men have steeper wage profiles across work experience at both time points.

Now have a look at the other results. The overview is as follows (the matrix also contains results labeled as "EC" and "dEC", which are all zero; these components are interaction terms that are only relevant in three-fold decompositions, i.e. when the reference() and/or benchmark() options are omitted):

. matrix colnames results = "1995:male" "1995:female" "2015:male" "2015:female"

. matlist results

             | 1995                 | 2015                
             |      male     female |      male     female 
-------------+----------------------+---------------------
dE           |                      |                     
           D |   .014608    .014906 |   .014608    .014906 
           E | -.0087573  -.0154103 | -.0157749  -.0175483 
           C |  .0233653   .0303163 |  .0303829   .0324543 
          EC |         0          0 |         0          0 
-------------+----------------------+---------------------
dC           |                      |                     
           D | -.0300365  -.0303344 | -.0300365  -.0303344 
           E | -.0105462  -.0038931 | -.0003061   .0014673 
           C | -.0194903  -.0264413 | -.0297303  -.0318018 
          EC |         0          0 |         0          0 
-------------+----------------------+---------------------
dEC          |                      |                     
           D |         0          0 |         0          0 
           E |         0          0 |         0          0 
           C |         0          0 |         0          0 
          EC |         0          0 |         0          0 

The rows "dE:D" and "dC:D" show the time differences in the explained part and in the unexplained part. A positive value means that the corresponding component is larger in 2015 than in 1995. We see that the "dE:D" component is positive, that is, the explained wage gap is larger in 2015 than in 1995. At the same time, the unexplained wage gap has become smaller in 2015 when compared to 1995 (negative "dC:D" component). Moreover, in terms of magnitude, the change in the unexplained part is more pronounced than the change in the explained part.

The exact breakdown depends on whether the male coefficients or the female coefficients are used as the reference coefficients, but the differences are very small (0.0146 vs. 0.0149 for the explained part; -0,0300 vs. -0.0303 for the unexplained part; the choice of the benchmark sample is irrelevant for these results). That is, although the choice of the reference coefficients is relevant for the division between the explained part and the unexplained part of the wage gap (see above), the time differences in these components remain stable. This is because the change from male to female reference coefficients has a similar effect in both time points.

The main terms of interest of the Smith-Welch decomposition are the terms "dE:E" and "dC:C". Term "dE:E" indicates how much of the difference in the explained part between 2015 and 1995 is because men and women are, in fact, more similar or dissimilar to each other in terms of their wage predictors in 1995 than in 2015. We see that the value is negative, meaning that, in total, men and women became more similar over time. (But: As we know, women became "more similar" to men by surpassing them with respect to schooling, while at the same time losing ground with respect to full-time experience.) Term "dE:E" is a bit more pronounced if we use female coefficients as the reference coefficients. Since this component has a negative sign, we conclude that changes in the gender gap in wage predictors per se has led to a narrowing gender wage gap.

The other component of the difference in the explained part (term "dE:C") is positive because a given male–female gap in predictors is more relevant for the gender wage gap in 2015 than in 1995 (i.e., the gap in predictors in 2015 or 1995 is multiplied by larger coefficients). This is due to the fact that for both men and women returns to education increased and wage profiles across work experience got steeper between 1995 and 2015. Term "dE:C" is a bit less pronounced if we use male coefficients as the reference coefficients. Likewise, the terms is a bit less pronounced when using 1995 as "benchmark". Term "dE:C" is larger than "dE:E", leading to an overall increase in the explained part of the decomposition over time. That is, even though the gender gap in predictors decreased (negative "dE:E"), the explained part of the decomposition increased because the gap in predictors became more relevant due to increasing effects of the predictors-

Term "dC:C" indicates how much of the difference in the unexplained part between 2015 and 1995 is because the male–female difference in coefficients is more/less pronounced in 2015 than in 1995. As the term is negative, we conclude, that between 1995 and 2015 men and women became more similar with respect to the gender-specific wage structure (i.e. the regression coefficients). We see that most of the decline in the unexplained part over time is due to this.

The other component of the difference in the unexplained part (term "dC:E") is also (mostly) negative. This implies that changes in male or female wage predictors would have narrowed the gender wage gap if male–female difference in coefficients had stayed at their 1995/2015-level. Choosing male coefficients as the reference as well as choosing 1995 as "benchmark" results in somewhat stronger effects for this component.

Task 2: bootstrap standard errors

Compute bootstrap standard errors for the different decompositions.

The whole process – the estimation of the regression models and the subsequent decomposition – needs to be bootstrapped. That is, we need to write a little program that estimates all models, then applies smithwelch, and then posts the results in a way such that bootstrap can easily collect them. The program could look about as follows.

capt prog drop mysmithwelch
program mysmithwelch, eclass
    regress lnwage yeduc expft expft2 [pw=weight] if female==0 & t==0
    estimates store male_t0
    regress lnwage yeduc expft expft2 [pw=weight] if female==1 & t==0
    estimates store female_t0
    regress lnwage yeduc expft expft2 [pw=weight] if female==0 & t==1
    estimates store male_t1
    regress lnwage yeduc expft expft2 [pw=weight] if female==1 & t==1
    estimates store female_t1
    smithwelch male_t0 female_t0 male_t1 female_t1 `0'
    matrix b = r(DD)
    ereturn post b
end

Remarks:

We can now use the program to obtain bootstrap estimates for the different variants of the decomposition:

. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(1) reference(1)
(running mysmithwelch on estimation sample)

Bootstrap replications (100): .........10.........20.........30.........40.........50.........60....
> .....70.........80.........90.........100 done

Bootstrap results

Number of strata = 15                                    Number of obs = 8,071
                                                         Replications  =   100

                                 (Replications based on 2,459 clusters in psu)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
dE           |
           D |    .014608   .0211848     0.69   0.490    -.0269134    .0561295
           E |  -.0087573   .0101853    -0.86   0.390    -.0287202    .0112056
           C |   .0233653   .0169194     1.38   0.167     -.009796    .0565267
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dC           |
           D |  -.0300365   .0330158    -0.91   0.363    -.0947463    .0346733
           E |  -.0105462   .0118447    -0.89   0.373    -.0337614    .0126691
           C |  -.0194903   .0355744    -0.55   0.584    -.0892148    .0502342
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dEC          |
           D |          0  (omitted)
           E |          0  (omitted)
           C |          0  (omitted)
          EC |          0  (omitted)
------------------------------------------------------------------------------

. estimates store b1995m

. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(1) reference(2)
(running mysmithwelch on estimation sample)

Bootstrap replications (100): .........10.........20.........30.........40.........50.........60....
> .....70.........80.........90.........100 done

Bootstrap results

Number of strata = 15                                    Number of obs = 8,071
                                                         Replications  =   100

                                 (Replications based on 2,459 clusters in psu)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
dE           |
           D |    .014906   .0179688     0.83   0.407    -.0203122    .0501242
           E |  -.0154103   .0109453    -1.41   0.159    -.0368628    .0060421
           C |   .0303163   .0159724     1.90   0.058     -.000989    .0616217
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dC           |
           D |  -.0303344   .0284484    -1.07   0.286    -.0860923    .0254235
           E |  -.0038931   .0063029    -0.62   0.537    -.0162465    .0084602
           C |  -.0264413    .030142    -0.88   0.380    -.0855186     .032636
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dEC          |
           D |          0  (omitted)
           E |          0  (omitted)
           C |          0  (omitted)
          EC |          0  (omitted)
------------------------------------------------------------------------------

. estimates store b1995f

. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(2) reference(1)
(running mysmithwelch on estimation sample)

Bootstrap replications (100): .........10.........20.........30.........40.........50.........60....
> .....70.........80.........90.........100 done

Bootstrap results

Number of strata = 15                                    Number of obs = 8,071
                                                         Replications  =   100

                                 (Replications based on 2,459 clusters in psu)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
dE           |
           D |    .014608   .0171974     0.85   0.396    -.0190983    .0483143
           E |  -.0157749   .0143642    -1.10   0.272    -.0439282    .0123784
           C |   .0303829   .0122791     2.47   0.013     .0063164    .0544494
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dC           |
           D |  -.0300365   .0265342    -1.13   0.258    -.0820425    .0219695
           E |  -.0003061   .0085659    -0.04   0.971     -.017095    .0164827
           C |  -.0297303   .0283896    -1.05   0.295     -.085373    .0259123
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dEC          |
           D |          0  (omitted)
           E |          0  (omitted)
           C |          0  (omitted)
          EC |          0  (omitted)
------------------------------------------------------------------------------

. estimates store b2015m

. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mysmithwelch, benchmark(2) reference(2)
(running mysmithwelch on estimation sample)

Bootstrap replications (100): .........10.........20.........30.........40.........50.........60....
> .....70.........80.........90.........100 done

Bootstrap results

Number of strata = 15                                    Number of obs = 8,071
                                                         Replications  =   100

                                 (Replications based on 2,459 clusters in psu)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
dE           |
           D |    .014906   .0149276     1.00   0.318    -.0143515    .0441635
           E |  -.0175483   .0141851    -1.24   0.216    -.0453507     .010254
           C |   .0324543   .0133635     2.43   0.015     .0062623    .0586464
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dC           |
           D |  -.0303344   .0277915    -1.09   0.275    -.0848048    .0241359
           E |   .0014673   .0053881     0.27   0.785    -.0090932    .0120278
           C |  -.0318018   .0275465    -1.15   0.248    -.0857919    .0221884
          EC |          0  (omitted)
-------------+----------------------------------------------------------------
dEC          |
           D |          0  (omitted)
           E |          0  (omitted)
           C |          0  (omitted)
          EC |          0  (omitted)
------------------------------------------------------------------------------

. estimates store b2015f

Important remark: We applied option nowarn to bootstrap to suppress a warning message about the estimation sample not being marked. When using the bootstrap, one typically wants to restrict the resampling to observations that are actually used by the bootstrapped command (the "estimation sample"). This is why bootstrap checks whether the estimation sample has been marked (by looking at whether e(sample) is defined). We did not bother constructing the e(sample) in the example above because we know that all observations in the dataset will be used by mysmithwelch (remember that we dropped all observations with missing values at the beginning of the exercise). If there were observations in the dataset that are not used by mysmithwelch it would be important either to drop these observations before applying the bootstrap or to modify mysmithwelch such that it identifies the estimation sample.

Overview table:

. esttab b1995m b1995f b2015m b2015f, mtitle nonum drop(dEC: :EC)

----------------------------------------------------------------------------
                   b1995m          b1995f          b2015m          b2015f   
----------------------------------------------------------------------------
dE                                                                          
D                  0.0146          0.0149          0.0146          0.0149   
                   (0.69)          (0.83)          (0.85)          (1.00)   

E                -0.00876         -0.0154         -0.0158         -0.0175   
                  (-0.86)         (-1.41)         (-1.10)         (-1.24)   

C                  0.0234          0.0303          0.0304*         0.0325*  
                   (1.38)          (1.90)          (2.47)          (2.43)   
----------------------------------------------------------------------------
dC                                                                          
D                 -0.0300         -0.0303         -0.0300         -0.0303   
                  (-0.91)         (-1.07)         (-1.13)         (-1.09)   

E                 -0.0105        -0.00389       -0.000306         0.00147   
                  (-0.89)         (-0.62)         (-0.04)          (0.27)   

C                 -0.0195         -0.0264         -0.0297         -0.0318   
                  (-0.55)         (-0.88)         (-1.05)         (-1.15)   
----------------------------------------------------------------------------
N                    8071            8071            8071            8071   
----------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

The overall change in the gender wage gap can be obtained from the bootstrap results as follows (we can use any of the variants as the point estimate for the overall change will be the same; the standard error will be slightly different depending on variant due to the random error of bootstrap estimation):

. estimates restore b1995m
(results b1995m are active now)

. lincom _b[dE:D] + _b[dC:D] + _b[dEC:D]

 ( 1)  [dE]D + [dC]D + [dEC]D = 0

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         (1) |  -.0154284   .0297243    -0.52   0.604    -.0736871    .0428302
------------------------------------------------------------------------------

Interpretation: Hardly any of the results are statistically significant, which is not very surprising since already the small overall decrease in the gender wage gap over time is far from being significant (although a non-significant overall change does not imply that the decomposition components will not be significant; the different components may offset each other). If, in fact, there were systematic changes over time, these changes have likely been small and the sample at hand does not seem to have sufficient statistical power to detect them. Only for "dE:C" we find some weak evidence of a systematic effect.

Task 3: interventionist decomposition

Compute the “interventionist” decomposition proposed by Kröger and Hartmann (2021) (see the slides) using 1995 as the starting point and interpret the results. Optionally provide bootstrap standard errors.

Hint: You can try to use the xtoaxaca command provided by Kröger and Hartmann (type findit xtoaxaca). However, we find it more transparent to do this decomposition “by hand” using the formulas on the slides. A further alternative is to compute the “interventionist” decomposition from the results returned by smithwelch.

We put together a small program that computes all required regressions and also collects the means, and then computes the results using matrix multiplication. This way it is easy to apply bootstrap.

capt prog drop mydecomp
program mydecomp, eclass
    local y lnwage
    local x yeduc expft expft2
    local wgt [pw=weight]
    tempname bm0 bf0 bm1 bf1 Xm0 Xf0 Xm1 Xf1
    // males in 1995
    regress `y' `x' `wgt' if t==0 & female==0 
    matrix `bm0' = e(b)'
    mean `x' `wgt' if e(sample)
    matrix `Xm0' = e(b), 1
    // females in 1995
    regress `y' `x' `wgt' if t==0 & female==1
    matrix `bf0' = e(b)'
    mean `x' `wgt' if e(sample)
    matrix `Xf0' = e(b), 1
    // males in 2015
    regress `y' `x' `wgt' if t==1 & female==0 
    matrix `bm1' = e(b)'
    mean `x' `wgt' if e(sample)
    matrix `Xm1' = e(b), 1 
    // females in 2015
    regress `y' `x' `wgt' if t==1 & female==1 
    matrix `bf1' = e(b)'
    mean `x' `wgt' if e(sample)
    matrix `Xf1' = e(b), 1
    // decomposition
    tempname b
    matrix `b' = ///
        (`Xm1'-`Xm0')*`bm0' - (`Xf1'-`Xf0')*`bf0', ///
        `Xm0'*(`bm1'-`bm0') - `Xf0'*(`bf1'-`bf0'), ///
        (`Xm1'-`Xm0')*(`bm1'-`bm0') - (`Xf1'-`Xf0')*(`bf1'-`bf0')
    matrix `b' = `b', `b'[1,1] + `b'[1,2] + `b'[1,3]
    matrix colnames `b' = "Endowments" "Coefficients" "Interactions" "Total"
    ereturn post `b'
end

We now apply bootstrap to the program (see above for remarks on the nowarn option).

. bootstrap, cluster(psu) strata(strata) reps(100) nowarn: mydecomp
(running mydecomp on estimation sample)

Bootstrap replications (100): .........10.........20.........30.........40.........50.........60....
> .....70.........80.........90.........100 done

Bootstrap results

Number of strata = 15                                    Number of obs = 8,071
                                                         Replications  =   100

                                 (Replications based on 2,459 clusters in psu)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  Endowments |  -.0193035   .0138118    -1.40   0.162    -.0463741    .0077672
Coefficients |   .0006526   .0264038     0.02   0.980     -.051098    .0524031
Interactions |   .0032225   .0109917     0.29   0.769    -.0183208    .0247657
       Total |  -.0154284   .0292767    -0.53   0.598    -.0728097    .0419528
------------------------------------------------------------------------------

The results indicate that the gender wage gap decreases if the gender-specific levels of schooling and work experience found in 1995 are adjusted to those found in 2014. Conversely, if we adjusted the gender-specific wage structure found in 1995 to the one found in 2015, the gender wage gap slightly increases. However, none of these results is statistically significant.

Overall, the results seem to be at odds with the results from the Smith-Welch decomposition. However, note that the “interventionist” decomposition has a somewhat different perspective than the Smith-Welch decomposition. The starting point of the Smith-Welch decomposition are two cross-sectional Oaxaca-Blinder decompositions and the goal is to explain how the differences in the explained part and in the unexplained part of these decompositions come about (i.e. the difference in the explained part and the difference in the unexplained part are separately decomposed into a part due to differences in endowments and a part due to differences in coefficients). The starting point of the “interventionist” decomposition are the overall gaps in two cross-sections and the goal is to show how much of the difference in the gaps is due to a different configuration of endowments and how much is due to a different configuration of coefficients.

Despite the different perspectives, looking at the formulas we see that the (threefold) Smith-Welch decomposition computes all components required for the “interventionist” decomposition. The sums of the columns in the last table of the Smith-Welch output are equal to the components of the “interventionist” decomposition. So we can simply use smithwelch to compute the “interventionist” decomposition:

. quietly svy: regress lnwage yeduc expft expft2 if female==0 & t==0

. estimates store male_t0

. quietly svy: regress lnwage yeduc expft expft2 if female==1 & t==0

. estimates store female_t0

. quietly svy: regress lnwage yeduc expft expft2 if female==0 & t==1

. estimates store male_t1

. quietly svy: regress lnwage yeduc expft expft2 if female==1 & t==1

. estimates store female_t1

. smithwelch male_t0 female_t0 male_t1 female_t1 

Decompositions of individual differentials:
----------------------------------------------------------
             |          D          E          C         EC
-------------+--------------------------------------------
    Sample 1 |   .2207358   .0590409   .1448567   .0168382
    Sample 2 |   .2053074   .0739469   .1148202   .0165403
----------------------------------------------------------

Difference in (components of) differentials:
----------------------------------------------------------
             |         dD         dE         dC        dEC
-------------+--------------------------------------------
             |  -.0154284    .014906  -.0300365   -.000298
----------------------------------------------------------

Decomposition of difference in differentials:
----------------------------------------------------------
             |          D          E          C         EC
-------------+--------------------------------------------
          dE |    .014906  -.0154103   .0324543   -.002138
          dC |  -.0300365  -.0105462  -.0297303     .01024
         dEC |   -.000298    .006653  -.0020714  -.0048796
----------------------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients
EC = interaction E x C

. matrix DD = r(DD)

. display "Endowments   = " DD[1,"dE:E"]  + DD[1,"dC:E"]  + DD[1,"dEC:E"]
Endowments   = -.01930346

. display "Coefficients = " DD[1,"dE:C"]  + DD[1,"dC:C"]  + DD[1,"dEC:C"]
Coefficients = .00065257

. display "Interactions = " DD[1,"dE:EC"] + DD[1,"dC:EC"] + DD[1,"dEC:EC"]
Interactions = .00322246

. display "Total        = " DD[1,"dE:D"]  + DD[1,"dC:D"]  + DD[1,"dEC:D"]
Total        = -.01542843

We could also specify option reference(1) or reference(2) to reduce the number of terms to be added up. Results will stay the same:

. smithwelch male_t0 female_t0 male_t1 female_t1, reference(1)

Decompositions of individual differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
    Sample 1 |   .2207358   .0758792   .1448567
    Sample 2 |   .2053074   .0904872   .1148202
-----------------------------------------------

Difference in (components of) differentials:
-----------------------------------------------
             |         dD         dE         dC
-------------+---------------------------------
             |  -.0154284    .014608  -.0300365
-----------------------------------------------

Decomposition of difference in differentials:
----------------------------------------------------------
             |          D          E          C         EC
-------------+--------------------------------------------
          dE |    .014608  -.0087573   .0303829  -.0070176
          dC |  -.0300365  -.0105462  -.0297303     .01024
----------------------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients
EC = interaction E x C

. matrix DD = r(DD)

. display "Endowments   = " DD[1,"dE:E"]  + DD[1,"dC:E"]
Endowments   = -.01930346

. display "Coefficients = " DD[1,"dE:C"]  + DD[1,"dC:C"]
Coefficients = .00065257

. display "Interactions = " DD[1,"dE:EC"] + DD[1,"dC:EC"]
Interactions = .00322246

. display "Total        = " DD[1,"dE:D"]  + DD[1,"dC:D"]
Total        = -.01542843

. smithwelch male_t0 female_t0 male_t1 female_t1, reference(2)

Decompositions of individual differentials:
-----------------------------------------------
             |          D          E          C
-------------+---------------------------------
    Sample 1 |   .2207358   .0590409   .1616949
    Sample 2 |   .2053074   .0739469   .1313605
-----------------------------------------------

Difference in (components of) differentials:
-----------------------------------------------
             |         dD         dE         dC
-------------+---------------------------------
             |  -.0154284    .014906  -.0303344
-----------------------------------------------

Decomposition of difference in differentials:
----------------------------------------------------------
             |          D          E          C         EC
-------------+--------------------------------------------
          dE |    .014906  -.0154103   .0324543   -.002138
          dC |  -.0303344  -.0038931  -.0318018   .0053605
----------------------------------------------------------

D  = differential / difference in component of differential
E  = part of D due to differences in endowments
C  = part of D due to differences in coefficients
EC = interaction E x C

. matrix DD = r(DD)

. display "Endowments   = " DD[1,"dE:E"]  + DD[1,"dC:E"]
Endowments   = -.01930346

. display "Coefficients = " DD[1,"dE:C"]  + DD[1,"dC:C"]
Coefficients = .00065257

. display "Interactions = " DD[1,"dE:EC"] + DD[1,"dC:EC"]
Interactions = .00322246

. display "Total        = " DD[1,"dE:D"]  + DD[1,"dC:D"]
Total        = -.01542843

Using xtoaxaca: Kröger and Hartmann (2021) provide command xtoaxaca that can be used to compute the “interventionist” decomposition (as well as other types of difference-in-differences decompositions). The procedure is such that first a model including all data is estimated, including all interactions with the two grouping variables (in our case this is equivalent to estimating four separate models). The model has to be stored and xtoaxaca is then applied to this stored model. The results from above can be reproduced using xtoaxaca as follows:

. quietly regress lnwage c.(yeduc expft expft2)##i.female##i.t [pw=weight]

. estimates store m1

. xtoaxaca yeduc expft expft2, groupvar(female) groupcat(0 1) ///
>     timevar(t) times(0 1) timeref(0) change(interventionist) model(m1) ///
>     weights(weight)
WARNING: This is a beta version. Please check the results carefully
         and report bugs and suggestions to hkroeger@diw.de







Decomposition of Levels

Summary of level decomposition
--------------------------------------
                        t             
                        0            1
--------------------------------------
Level                                 
non-parame~c        0.214        0.232
--------------------------------------
Decomp                                
Endowments          0.059        0.074
Coefficients        0.145        0.115
Interaction         0.017        0.017
Total               0.221        0.205
--------------------------------------
Decomp %                              
Endowments         26.747       36.018
Coefficients       65.624       55.926
Interaction         7.628        8.056
Total             100.000      100.000
--------------------------------------


Decomposition of Change

Summary of changes in the outcome
--------------------------------------
                        t             
                        0            1
--------------------------------------
Change                                
non-parame~c        0.000        0.017
--------------------------------------
Decomp                                
Endowments          0.000       -0.019
Coefficients        0.000        0.001
Interactions        0.000        0.003
Total               0.000       -0.015
--------------------------------------
Decomp %                              
Endowments              .      125.116
Coefficients            .       -4.230
Interactions            .      -20.887
Total                   .      100.000
--------------------------------------
For an explanation of this change decomposition, please see:
Kröger, H., & Hartmann, J. (2020). xtoaxaca - Extending the Kitagawa-Oaxaca-Blinder Decomposition Ap
> proach to Panel Data. https://doi.org/10.31235/osf.io/egj79