Decomposition Methods in the Social Sciences

Solutions to Exercise 7: RIF decomposition

Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024

Required packages: rifreg (from N. Fortin's website), rif, oaxaca, dstat, kmatch, moremata

Task 1: RIF decomposition

Repeat the example analysis of the private–public gap in wage inequality. This time, use the Gini coefficient as well as the D9/D1 ratio, the D5/D1 ratio, and the D9/D5 ratio as inequality measures. If possible, use rifreg, rifvar, and oaxaca_rif to calculate these results.

Data preparation as on slides:

. use gsoep-extract, clear
(Example data based on the German Socio-Economic Panel)

. keep if wave==2015
(29,970 observations deleted)

. keep if inrange(age, 25, 55)
(5,671 observations deleted)

. generate lnwage = ln(wage)
(1,709 missing values generated)

. generate expft2 = expft^2
(35 missing values generated)

. svyset psu [pw=weight], strata(strata)

Sampling weights: weight
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. summarize wage lnwage yeduc expft expft2 public

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      5,600    17.57278    9.858855       3.03     121.42
      lnwage |      5,600    2.736721    .5062968   1.108563   4.799255
       yeduc |      7,121    12.28823    2.783974          7         18
       expft |      7,274    11.63359    9.556508          0       39.5
      expft2 |      7,274    226.6548    293.3739          0    1560.25
-------------+---------------------------------------------------------
      public |      5,770    .2353553    .4242574          0          1

. drop if missing(lnwage, yeduc, expft, public)
(1,851 observations deleted)

We first do the decomposition of the Gini coefficient (using variable wage, not lnwage).

Interpretation: There is a difference of 0.057 in the Gini coefficient (higher wage inequality in the private sector). This difference cannot be explained by compositional differences w.r.t. years of education and full-time employment experience. There is some weak evidence that the gap would even be larger if composition was the same in the two sectors. Thus, wage inequality is higher in the private sector because wage-setting mechanisms in the private sector are more inequality enhancing than they are in the public sector.

We now look at D9/D1, D5/D1, and D9/D5. These measures are not supported by rifreg, so we only use the approach based on rifvar followed by oaxaca. Furthermore, we analyze the inter-quantile range of lnwage (rather than the inter-quantile ratio of wage).

Overall, the pattern is such that the difference in wage inequality between the public and the private sector cannot be explained by compositional differences with respect to education and work experience. If anything, the gap would even be larger if the two sectors had the same composition (this result is mostly related to education; if in the private sector average level of education would be as high as in the public sector, wage inequality in the private sector would even be higher). We also see that the gap in wage inequality is mostly driven by the upper half of the distribution; yet, also in the lower part of the distribution compositional differences seem to have a suppressing effect on the inequality gap.

Task 2: reweighted RIF decomposition

Combine your RIF decomposition for the Gini coefficient with reweighting (analogous to the reweighted OB decomposition) and calculate the specification error. Use oaxaca_rif for this exercise.

. oaxaca_rif wage yeduc (experience: expft expft2) ///
>     [pw=weight], by(public) cluster(psu) wgt(1) rif(gini) ///
>     rwlogit(c.yeduc##c.expft##c.expft)
Estimating Reweighted RIF-OAXACA using RIF:gini
Model  : Blinder-Oaxaca RIF-decomposition
Type   : Reweighted
RIF    : gini
Scale  : 1
Group 1: public = 0 x1*b1                        N of obs 1      = 4184
Group c: X1~>rw~>X2 or x2*b1                     N of obs C      = 4184
Group 2: public = 1 x2*b2                        N of obs 2      = 1274

                                 (Std. err. adjusted for 2,036 clusters in psu)
-------------------------------------------------------------------------------
              |               Robust
         wage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
Overall       |
      group_1 |   .2783233   .0056211    49.51   0.000     .2673062    .2893404
      group_c |   .2840161    .003174    89.48   0.000     .2777951     .290237
      group_2 |   .2213006   .0083934    26.37   0.000     .2048497    .2377514
  tdifference |   .0570227   .0101002     5.65   0.000     .0372266    .0768188
  t_explained |  -.0056928   .0063401    -0.90   0.369    -.0181192    .0067336
t_unexplained |   .0627155   .0175209     3.58   0.000     .0283752    .0970559
--------------+----------------------------------------------------------------
explained     |
        total |  -.0056928   .0063401    -0.90   0.369    -.0181192    .0067336
  p_explained |  -.0093923   .0069699    -1.35   0.178    -.0230531    .0042685
   specif_err |   .0036995   .0090898     0.41   0.684    -.0141163    .0215152
--------------+----------------------------------------------------------------
p_explained   |
        yeduc |   -.007792   .0074926    -1.04   0.298    -.0224772    .0068931
   experience |  -.0016002   .0009587    -1.67   0.095    -.0034792    .0002787
--------------+----------------------------------------------------------------
specif_err    |
        yeduc |   .0788211   .0473546     1.66   0.096    -.0139922    .1716344
   experience |  -.0154299   .0196131    -0.79   0.431    -.0538709    .0230111
        _cons |  -.0596917   .0564535    -1.06   0.290    -.1703386    .0509552
--------------+----------------------------------------------------------------
unexplained   |
        total |   .0627155   .0175209     3.58   0.000     .0283752    .0970559
    rwg_error |  -.0000639   .0006264    -0.10   0.919    -.0012916    .0011638
p_unexplained |   .0627794   .0174699     3.59   0.000      .028539    .0970199
--------------+----------------------------------------------------------------
p_unexplained |
        yeduc |   -.021478   .0936426    -0.23   0.819     -.205014    .1620581
   experience |   .0521253   .0431672     1.21   0.227    -.0324808    .1367314
        _cons |   .0321321   .1088606     0.30   0.768    -.1812308     .245495
--------------+----------------------------------------------------------------
rwg_error     |
        yeduc |  -.0000266   .0001822    -0.15   0.884    -.0003836    .0003304
   experience |  -.0000373   .0006552    -0.06   0.955    -.0013214    .0012468
-------------------------------------------------------------------------------
experience:
expft
expft2

Interpretation: In total, the specification error is not significant, but there is some weak evidence for a misspecified effect regarding education. The reweighting error is very small.

Task 3: manual reweighted RIF decomposition

Try to replicate the results for the reweighted Gini decomposition manually by first computing the RIF and then applying oaxaca to the RIF taking reweighting into account. You will need two calls to oaxaca to compute all results.

Step 1: Generate the sector-specific RIF of the Gini using command dstat with option rif().

. dstat (gini) wage [pw=weight], over(public) rif(RIF, compact)

gini                              Number of obs   =      5,458

--------------------------------------------------------------
        wage | Coefficient  Std. err.     [95% conf. interval]
-------------+------------------------------------------------
      public |
         no  |   .2783233   .0056617      .2672241    .2894224
        yes  |   .2213006   .0080971       .205427    .2371741
--------------------------------------------------------------

Variable      Storage   Display    Value
    name         type    format    label      Variable label
----------------------------------------------------------------------------------------------------
RIF             double  %10.0g                RIF of _b[#]

. svy: mean RIF, over(public)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =    15          Number of obs   =      5,458
Number of PSUs   = 2,036          Population size = 12,146,771
                                  Design df       =      2,021

--------------------------------------------------------------
             |             Linearized
             |       Mean   std. err.     [95% conf. interval]
-------------+------------------------------------------------
c.RIF@public |
         no  |   .2783233   .0056664      .2672108    .2894358
        yes  |   .2213006   .0081404      .2053362    .2372649
--------------------------------------------------------------

Step 2: Generate balancing weights that adjust the distribution of covariates among people in the private sector to the distribution observed among people in the public sector (i.e. the private sector is reweighted) using kmatch.

. kmatch ipw public c.yeduc##c.expft##c.expft [pw=weight], att wgen(ipw)

Inverse probability weighting                            Number of obs = 5,458

Treatment   : public = 1
Covariates  : yeduc expft c.yeduc#c.expft c.expft#c.expft c.yeduc#c.expft#c.expft
PS model    : logit (pr)

Matching statistics
------------------------------------------------------------------------------
           |             Matched             |            Controls           
           |       Yes         No      Total |      Used     Unused      Total
-----------+---------------------------------+--------------------------------
   Treated |      1274          0       1274 |      4184          0       4184
------------------------------------------------------------------------------

Stored variables
Variable      Storage   Display    Value
    name         type    format    label      Variable label
----------------------------------------------------------------------------------------------------
ipw             double  %10.0g                Matching weights for ATT

. bysort public: summarize ipw 

----------------------------------------------------------------------------------------------------
-> public = no

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         ipw |      4,184    696.6616    1297.235   .6081152   20413.38

----------------------------------------------------------------------------------------------------
-> public = yes

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         ipw |      1,274    2287.937    3061.095       28.6    32681.6

Step 3: Compute the reweighted RIF of the Gini for the private sector using dstat.

. dstat (gini) wage if public==0 [pw=ipw], rif(RIFipw)

Summary statistics                Number of obs   =      4,184

--------------------------------------------------------------
        wage | Coefficient  Std. err.     [95% conf. interval]
-------------+------------------------------------------------
        gini |   .2840161   .0063869      .2714944    .2965378
--------------------------------------------------------------

Variable      Storage   Display    Value
    name         type    format    label      Variable label
----------------------------------------------------------------------------------------------------
RIFipw          double  %10.0g                RIF of _b[gini]

Step 4: Apply oaxaca to the raw RIF of private sector and the reweighted RIF of the private sector to obtain the “pure explained part” and the “specification error” (the “explained part” of this decomposition quantifies the “pure explained part”; the “unexplained part” quantifies the “specification error”).

. // preserve the data so that they can be restored later
. preserve

. // add a unique ID for each observation
. generate ID = _n

. // duplicate each private sector observation
. expand 2 if public==0
(4,184 observations created)

. // generate a 0/1 variable that tags the duplicates; we can then use this as
. // the group variable in oaxaca
. bysort ID: generate byte G = (_n==2) if public==0
(1,274 missing values generated)

. // generate weights containing 1 (or weights if survey weigths are applied) 
. // for G==0 and the balancing weights for G==1
. replace ipw = weight if G==0 
(4,184 real changes made)

. // generate a RIF variable containing the raw RIF for G==0 and the reweighted
. // RIF for G==1
. replace RIF = RIFipw if G==1
(4,184 real changes made)

. // apply oaxaca to the RIF variable while applying the weights; G is the group
. // variable
. oaxaca RIF yeduc (experience: expft expft2) [pw=ipw], by(G) weight(1) cluster(ID)

Blinder-Oaxaca decomposition                    Number of obs     =      8,368
                                                Model             =     linear
Group 1: G = 0                                  N of obs 1        =      4,184
Group 2: G = 1                                  N of obs 2        =      4,184

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

                                 (Std. err. adjusted for 4,184 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
         RIF | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   .2783233   .0056825    48.98   0.000     .2671858    .2894608
     group_2 |   .2840161   .0064432    44.08   0.000     .2713877    .2966444
  difference |  -.0056928   .0038987    -1.46   0.144     -.013334    .0019484
   explained |  -.0093923   .0045321    -2.07   0.038     -.018275   -.0005095
 unexplained |   .0036995   .0030621     1.21   0.227    -.0023021    .0097011
-------------+----------------------------------------------------------------
explained    |
       yeduc |   -.007792   .0048499    -1.61   0.108    -.0172977    .0017137
  experience |  -.0016002   .0006765    -2.37   0.018    -.0029262   -.0002743
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0788211   .0191598     4.11   0.000     .0412686    .1163736
  experience |  -.0154299    .010939    -1.41   0.158      -.03687    .0060101
       _cons |  -.0596917   .0246583    -2.42   0.015    -.1080211   -.0113623
------------------------------------------------------------------------------
experience: expft expft2

. estimates store IPW1

. // restore original data
. restore

Step 5: Apply oaxaca to the reweighted RIF of the private sector and the raw RIF of public sector to obtain the “pure unexplained part” and the “reweighting error” (the “explained part” of this decomposition quantifies the “reweighting error”; the “unexplained part” quantifies the “pure unexplained part”).

. replace RIFipw = RIF if public==1 // fill in RIFipw for public sector
(1,274 real changes made)

. oaxaca RIFipw yeduc (experience: expft expft2) [pw=ipw], by(public) weight(1)

Blinder-Oaxaca decomposition                    Number of obs     =      5,458
                                                Model             =     linear
Group 1: public = 0                             N of obs 1        =      4,184
Group 2: public = 1                             N of obs 2        =      1,274

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
      RIFipw | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   .2840161   .0064455    44.06   0.000     .2713832     .296649
     group_2 |   .2213006     .00814    27.19   0.000     .2053464    .2372547
  difference |   .0627155   .0103829     6.04   0.000     .0423654    .0830656
   explained |  -.0000639   .0006038    -0.11   0.916    -.0012474    .0011196
 unexplained |   .0627794   .0102786     6.11   0.000     .0426338     .082925
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0000266   .0001785    -0.15   0.882    -.0003765    .0003233
  experience |  -.0000373   .0006216    -0.06   0.952    -.0012557    .0011811
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   -.021478   .0569927    -0.38   0.706    -.1331816    .0902256
  experience |   .0521253   .0246197     2.12   0.034     .0038715     .100379
       _cons |   .0321321   .0646791     0.50   0.619    -.0946366    .1589008
------------------------------------------------------------------------------
experience: expft expft2

. estimates store IPW2

Overview of results:

. esttab IPW1 IPW2, nogap mti

--------------------------------------------
                      (1)             (2)   
                     IPW1            IPW2   
--------------------------------------------
overall                                     
group_1             0.278***        0.284***
                  (48.98)         (44.06)   
group_2             0.284***        0.221***
                  (44.08)         (27.19)   
difference       -0.00569          0.0627***
                  (-1.46)          (6.04)   
explained        -0.00939*     -0.0000639   
                  (-2.07)         (-0.11)   
unexplained       0.00370          0.0628***
                   (1.21)          (6.11)   
--------------------------------------------
explained                                   
yeduc            -0.00779      -0.0000266   
                  (-1.61)         (-0.15)   
experience       -0.00160*     -0.0000373   
                  (-2.37)         (-0.06)   
--------------------------------------------
unexplained                                 
yeduc              0.0788***      -0.0215   
                   (4.11)         (-0.38)   
experience        -0.0154          0.0521*  
                  (-1.41)          (2.12)   
_cons             -0.0597*         0.0321   
                  (-2.42)          (0.50)   
--------------------------------------------
N                    8368            5458   
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

The point estimates are the same as the ones computed by oaxaca_rif above (standard errors are not reliable in either case; a quick comparison against the bootstrap indicates that some of the standard errors reported by oaxaca_rif are completely off; the standard errors obtained by the manual procedure seem to be more accurate, but still biased).

Task 4: using entropy balancing

Repeat the analysis using entropy balancing for the reweighting. How do the results change?

Step 1 can remain the same. In Step 2 we need to use entropy balancing to generate the weights. Then use these weights in the subsequent steps.

. // Step 2
. kmatch eb public c.yeduc##c.expft##c.expft [pw=weight], att wgen(eb)
(fitting balancing weights ... done)

Entropy balancing                                        Number of obs = 5,458
                                                Balance tolerance =     .00001
Treatment   : public = 1
Targets     : 1
Covariates  : yeduc expft c.yeduc#c.expft c.expft#c.expft c.yeduc#c.expft#c.expft

Matching statistics
------------------------------------------------------------------------------------------
           |             Matched             |             Controls            |  Balance 
           |       Yes         No      Total |      Used     Unused      Total |     loss 
-----------+---------------------------------+---------------------------------+----------
   Treated |      1274          0       1274 |      4184          0       4184 |  1.22e-15
------------------------------------------------------------------------------------------

Stored variables
Variable      Storage   Display    Value
    name         type    format    label      Variable label
----------------------------------------------------------------------------------------------------
eb              double  %10.0g                Matching weights for ATT

. bysort public: summarize eb

----------------------------------------------------------------------------------------------------
-> public = no

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          eb |      4,184    696.6616    1290.758   .6088637   20498.96

----------------------------------------------------------------------------------------------------
-> public = yes

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          eb |      1,274    2287.937    3061.095       28.6    32681.6


. // Step 3
. dstat (gini) wage if public==0 [pw=eb], rif(RIFeb)

Summary statistics                Number of obs   =      4,184

--------------------------------------------------------------
        wage | Coefficient  Std. err.     [95% conf. interval]
-------------+------------------------------------------------
        gini |   .2834279   .0063278       .271022    .2958338
--------------------------------------------------------------

Variable      Storage   Display    Value
    name         type    format    label      Variable label
----------------------------------------------------------------------------------------------------
RIFeb           double  %10.0g                RIF of _b[gini]

. // Step 4
. // preserve the data so that they can be restored later
. preserve

. // add a unique ID for each observation
. generate ID = _n

. // duplicate each private sector observation
. expand 2 if public==0
(4,184 observations created)

. // generate a 0/1 variable that tags the duplicates; we can then use this as
. // the group variable in oaxaca
. bysort ID: generate byte G = (_n==2) if public==0
(1,274 missing values generated)

. // generate weights containing 1 (or weights if survey weigths are applied) 
. //for G==0 and the balancing weights for G==1
. replace eb = weight if G==0 
(4,184 real changes made)

. // generate a RIF variable containing the raw RIF for G==0 and the reweighted
. // RIF for G==1
. replace RIF = RIFeb if G==1
(4,184 real changes made)

. // apply oaxaca to the RIF variable while applying the weights; G is the group
. // variable
. oaxaca RIF yeduc (experience: expft expft2) [pw=eb], by(G) weight(1) cluster(ID)

Blinder-Oaxaca decomposition                    Number of obs     =      8,368
                                                Model             =     linear
Group 1: G = 0                                  N of obs 1        =      4,184
Group 2: G = 1                                  N of obs 2        =      4,184

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

                                 (Std. err. adjusted for 4,184 clusters in ID)
------------------------------------------------------------------------------
             |               Robust
         RIF | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   .2783233   .0056825    48.98   0.000     .2671858    .2894608
     group_2 |   .2834279   .0063834    44.40   0.000     .2709167    .2959391
  difference |  -.0051046   .0038161    -1.34   0.181     -.012584    .0023747
   explained |  -.0093274   .0044369    -2.10   0.036    -.0180235   -.0006313
 unexplained |   .0042227   .0029837     1.42   0.157    -.0016251    .0100706
-------------+----------------------------------------------------------------
explained    |
       yeduc |   -.007659   .0047672    -1.61   0.108    -.0170026    .0016846
  experience |  -.0016684   .0006952    -2.40   0.016     -.003031   -.0003057
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0776307   .0187366     4.14   0.000     .0409076    .1143537
  experience |  -.0155733   .0107915    -1.44   0.149    -.0367243    .0055777
       _cons |  -.0578346   .0242516    -2.38   0.017    -.1053669   -.0103024
------------------------------------------------------------------------------
experience: expft expft2

. estimates store EB1

. // restore original data
. restore

. // Step 5
. replace RIFeb = RIF if public==1
(1,274 real changes made)

. oaxaca RIFeb yeduc (experience: expft expft2) [pw=eb], by(public) weight(1)

Blinder-Oaxaca decomposition                    Number of obs     =      5,458
                                                Model             =     linear
Group 1: public = 0                             N of obs 1        =      4,184
Group 2: public = 1                             N of obs 2        =      1,274

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
       RIFeb | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   .2834279   .0063857    44.38   0.000     .2709122    .2959436
     group_2 |   .2213006     .00814    27.19   0.000     .2053464    .2372547
  difference |   .0621274   .0103459     6.01   0.000     .0418498    .0824049
   explained |          0   .0006017     0.00   1.000    -.0011792    .0011792
 unexplained |   .0621274   .0102771     6.05   0.000     .0419846    .0822702
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -9.15e-18   .0001493    -0.00   1.000    -.0002927    .0002927
  experience |   1.12e-17    .000628     0.00   1.000    -.0012309    .0012309
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |  -.0204472   .0569986    -0.36   0.720    -.1321624    .0912681
  experience |   .0522995   .0245928     2.13   0.033     .0040984    .1005005
       _cons |   .0302751   .0646503     0.47   0.640    -.0964372    .1569873
------------------------------------------------------------------------------
experience: expft expft2

. estimates store EB2

Overview of results:

. esttab IPW1 IPW2 EB1 EB2, nogap mti

----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)   
                     IPW1            IPW2             EB1             EB2   
----------------------------------------------------------------------------
overall                                                                     
group_1             0.278***        0.284***        0.278***        0.283***
                  (48.98)         (44.06)         (48.98)         (44.38)   
group_2             0.284***        0.221***        0.283***        0.221***
                  (44.08)         (27.19)         (44.40)         (27.19)   
difference       -0.00569          0.0627***     -0.00510          0.0621***
                  (-1.46)          (6.04)         (-1.34)          (6.01)   
explained        -0.00939*     -0.0000639        -0.00933*              0   
                  (-2.07)         (-0.11)         (-2.10)          (0.00)   
unexplained       0.00370          0.0628***      0.00422          0.0621***
                   (1.21)          (6.11)          (1.42)          (6.05)   
----------------------------------------------------------------------------
explained                                                                   
yeduc            -0.00779      -0.0000266        -0.00766       -9.15e-18   
                  (-1.61)         (-0.15)         (-1.61)         (-0.00)   
experience       -0.00160*     -0.0000373        -0.00167*       1.12e-17   
                  (-2.37)         (-0.06)         (-2.40)          (0.00)   
----------------------------------------------------------------------------
unexplained                                                                 
yeduc              0.0788***      -0.0215          0.0776***      -0.0204   
                   (4.11)         (-0.38)          (4.14)         (-0.36)   
experience        -0.0154          0.0521*        -0.0156          0.0523*  
                  (-1.41)          (2.12)         (-1.44)          (2.13)   
_cons             -0.0597*         0.0321         -0.0578*         0.0303   
                  (-2.42)          (0.50)         (-2.38)          (0.47)   
----------------------------------------------------------------------------
N                    8368            5458            8368            5458   
----------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

We see that the reweighting error is now zero (apart from roundoff error).