Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024
Required packages: rifreg
(from
N. Fortin's website),
rif
, oaxaca
, dstat
, kmatch
, moremata
Repeat the example analysis of the private–public gap in wage inequality.
This time, use the Gini coefficient as well as the D9/D1 ratio, the D5/D1
ratio, and the D9/D5 ratio as inequality measures. If possible, use
rifreg
, rifvar
, and oaxaca_rif
to
calculate these results.
Data preparation as on slides:
. use gsoep-extract, clear (Example data based on the German Socio-Economic Panel) . keep if wave==2015 (29,970 observations deleted) . keep if inrange(age, 25, 55) (5,671 observations deleted) . generate lnwage = ln(wage) (1,709 missing values generated) . generate expft2 = expft^2 (35 missing values generated) . svyset psu [pw=weight], strata(strata) Sampling weights: weight VCE: linearized Single unit: missing Strata 1: strata Sampling unit 1: psu FPC 1: <zero> . summarize wage lnwage yeduc expft expft2 public Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- wage | 5,600 17.57278 9.858855 3.03 121.42 lnwage | 5,600 2.736721 .5062968 1.108563 4.799255 yeduc | 7,121 12.28823 2.783974 7 18 expft | 7,274 11.63359 9.556508 0 39.5 expft2 | 7,274 226.6548 293.3739 0 1560.25 -------------+--------------------------------------------------------- public | 5,770 .2353553 .4242574 0 1 . drop if missing(lnwage, yeduc, expft, public) (1,851 observations deleted)
We first do the decomposition of the Gini coefficient (using
variable wage
, not lnwage
).
rifreg
and oaxaca
. rifreg wage [aw=weight] if public==0, gini retain(RIFprivate) Source | SS df MS Number of obs = 4184 -------------+------------------------------ F( 0, 4183) = 0.00 Model | 0 0 . Prob > F = . Residual | 215.765865 4183 .051581608 R-squared = 0.0000 -------------+------------------------------ Adj R-squared = 0.0000 Total | 215.765865 4183 .051581608 Root MSE = .22712 ------------------------------------------------------------------------------ RIFprivate | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- _cons | .2783231 .0035112 79.27 0.000 .2714394 .2852069 ------------------------------------------------------------------------------ . rifreg wage [aw=weight] if public==1, gini retain(RIFpublic) Source | SS df MS Number of obs = 1274 -------------+------------------------------ F( 0, 1273) = 0.00 Model | 0 0 . Prob > F = . Residual | 41.781098 1273 .032820972 R-squared = 0.0000 -------------+------------------------------ Adj R-squared = 0.0000 Total | 41.781098 1273 .032820972 Root MSE = .18117 ------------------------------------------------------------------------------ RIFpublic | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- _cons | .2212987 .0050756 43.60 0.000 .2113412 .2312563 ------------------------------------------------------------------------------
. generate double RIF = cond(public==1, RIFpublic, RIFprivate) . oaxaca RIF yeduc (experience: expft expft2), by(public) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,458 Number of PSUs = 2,036 Population size = 12,146,771 Design df = 2,021 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized RIF | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2783231 .0056681 49.10 0.000 .2672073 .289439 group_2 | .2212987 .008183 27.04 0.000 .2052507 .2373468 difference | .0570244 .009956 5.73 0.000 .0374993 .0765495 explained | -.0093274 .0047177 -1.98 0.048 -.0185794 -.0000754 unexplained | .0663518 .010933 6.07 0.000 .0449107 .0877929 -------------+---------------------------------------------------------------- explained | yeduc | -.007659 .0048939 -1.56 0.118 -.0172567 .0019387 experience | -.0016684 .0012616 -1.32 0.186 -.0041426 .0008058 -------------+---------------------------------------------------------------- unexplained | yeduc | .057185 .0600618 0.95 0.341 -.0606045 .1749745 experience | .0367267 .0264099 1.39 0.164 -.0150668 .0885202 _cons | -.0275599 .0718063 -0.38 0.701 -.1683821 .1132622 ------------------------------------------------------------------------------ experience: expft expft2 . drop RIF*
rifvar
and oaxaca
. egen double RIF = rifvar(wage), gini by(public) weight(weight) . oaxaca RIF yeduc (experience: expft expft2), by(public) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,458 Number of PSUs = 2,036 Population size = 12,146,771 Design df = 2,021 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized RIF | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2783233 .0056681 49.10 0.000 .2672074 .2894391 group_2 | .2213006 .008183 27.04 0.000 .2052525 .2373486 difference | .0570227 .009956 5.73 0.000 .0374976 .0765479 explained | -.0093274 .0047177 -1.98 0.048 -.0185794 -.0000754 unexplained | .0663501 .010933 6.07 0.000 .044909 .0877912 -------------+---------------------------------------------------------------- explained | yeduc | -.007659 .0048939 -1.56 0.118 -.0172567 .0019387 experience | -.0016684 .0012616 -1.32 0.186 -.0041426 .0008058 -------------+---------------------------------------------------------------- unexplained | yeduc | .0571835 .0600618 0.95 0.341 -.0606061 .1749731 experience | .0367262 .0264099 1.39 0.164 -.0150673 .0885197 _cons | -.0275596 .0718063 -0.38 0.701 -.1683817 .1132626 ------------------------------------------------------------------------------ experience: expft expft2 . drop RIF
oaxaca_rif
(this is the same as applying
rifvar
followed oaxaca
; however, option
svy
is not supported by oaxaca_rif
, so we only
take account of weights and PSUs)
. oaxaca_rif wage yeduc (experience: expft expft2) [pw=weight], by(public) /// > wgt(1) rif(gini) cluster(psu) No Reweighted Strategy Choosen Estimating Standard RIF-OAXACA using RIF:gini Model : Blinder-Oaxaca RIF-decomposition Type : Standard RIF : gini Scale : 1 Group 1: public = 0 x1*b1 N of obs 1 = 4184 Group c: x2*b1 N of obs C = . Group 2: public = 1 x2*b2 N of obs 2 = 1274 (Std. err. adjusted for 2,036 clusters in psu) ------------------------------------------------------------------------------ | Robust wage | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2783233 .0056687 49.10 0.000 .2672128 .2894338 group_2 | .2213006 .0081746 27.07 0.000 .2052786 .2373225 difference | .0570227 .0099462 5.73 0.000 .0375285 .0765169 explained | -.0093274 .0047289 -1.97 0.049 -.0185959 -.0000589 unexplained | .0663501 .0109418 6.06 0.000 .0449046 .0877956 -------------+---------------------------------------------------------------- explained | yeduc | -.007659 .0049092 -1.56 0.119 -.0172808 .0019628 experience | -.0016684 .0012623 -1.32 0.186 -.0041424 .0008056 -------------+---------------------------------------------------------------- unexplained | yeduc | .0571835 .0601235 0.95 0.342 -.0606564 .1750234 experience | .0367262 .0263916 1.39 0.164 -.0150005 .0884529 _cons | -.0275596 .0718729 -0.38 0.701 -.1684278 .1133086 ------------------------------------------------------------------------------ experience: expft expft2
Interpretation: There is a difference of 0.057 in the Gini coefficient (higher wage inequality in the private sector). This difference cannot be explained by compositional differences w.r.t. years of education and full-time employment experience. There is some weak evidence that the gap would even be larger if composition was the same in the two sectors. Thus, wage inequality is higher in the private sector because wage-setting mechanisms in the private sector are more inequality enhancing than they are in the public sector.
We now look at D9/D1, D5/D1, and D9/D5. These measures are not supported by
rifreg
, so we only use the approach based on
rifvar
followed by oaxaca
. Furthermore, we
analyze the inter-quantile range of lnwage
(rather than the
inter-quantile ratio of wage
).
. egen double RIF = rifvar(lnwage), iqr(10 90) by(public) weight(weight) . oaxaca RIF yeduc (experience: expft expft2), by(public) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,458 Number of PSUs = 2,036 Population size = 12,146,771 Design df = 2,021 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized RIF | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | 1.241953 .0273998 45.33 0.000 1.188218 1.295688 group_2 | 1.050144 .0580852 18.08 0.000 .9362306 1.164057 difference | .1918092 .064148 2.99 0.003 .0660061 .3176123 explained | -.0867787 .0218341 -3.97 0.000 -.1295984 -.043959 unexplained | .2785879 .0669526 4.16 0.000 .1472846 .4098912 -------------+---------------------------------------------------------------- explained | yeduc | -.0837546 .0222836 -3.76 0.000 -.1274558 -.0400535 experience | -.0030241 .0034399 -0.88 0.379 -.0097702 .003722 -------------+---------------------------------------------------------------- unexplained | yeduc | .3766515 .3289786 1.14 0.252 -.2685211 1.021824 experience | .3232709 .1623586 1.99 0.047 .0048631 .6416786 _cons | -.4213345 .3958018 -1.06 0.287 -1.197557 .3548878 ------------------------------------------------------------------------------ experience: expft expft2 . drop RIF
Interpretation: D9/D1 is about 0.19 log points larger in the private sector (i.e., higher wage inequality in the private sector). This difference cannot be explained by compositional differences w.r.t. years of education and full-time employment experience. Quite the opposite: If workers in the public sector were paid like workers in the private sector, we would expect a higher wage inequality in the public sector than in the private sector.
. egen double RIF = rifvar(lnwage), iqr(10 50) by(public) weight(weight) . oaxaca RIF yeduc (experience: expft expft2), by(public) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,458 Number of PSUs = 2,036 Population size = 12,146,771 Design df = 2,021 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized RIF | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .5880972 .0199184 29.53 0.000 .5490346 .6271599 group_2 | .5869841 .0489037 12.00 0.000 .4910772 .682891 difference | .0011131 .0534977 0.02 0.983 -.1038034 .1060296 explained | -.0521251 .0127427 -4.09 0.000 -.0771152 -.0271349 unexplained | .0532382 .0524387 1.02 0.310 -.0496013 .1560776 -------------+---------------------------------------------------------------- explained | yeduc | -.0514701 .0131737 -3.91 0.000 -.0773056 -.0256347 experience | -.000655 .0025296 -0.26 0.796 -.0056158 .0043059 -------------+---------------------------------------------------------------- unexplained | yeduc | .096698 .2429009 0.40 0.691 -.3796643 .5730603 experience | .3139347 .1391728 2.26 0.024 .0409977 .5868718 _cons | -.3573945 .3165408 -1.13 0.259 -.9781749 .2633859 ------------------------------------------------------------------------------ experience: expft expft2 . drop RIF
Interpretation: There is almost no difference in D5/D1 of the private sector and the public sector (i.e., wage inequality in the lower half of the wage distributions in the private sector and in the public sector is very similar). However, we see that if workers in the public sector were paid like workers in the private sector, we would expect a higher wage inequality in the lower half of the wage distribution in the public than in the private sector.
. egen double RIF = rifvar(lnwage), iqr(50 90) by(public) weight(weight) . oaxaca RIF yeduc (experience: expft expft2), by(public) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,458 Number of PSUs = 2,036 Population size = 12,146,771 Design df = 2,021 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized RIF | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .6538557 .0228574 28.61 0.000 .6090292 .6986822 group_2 | .4631595 .0321735 14.40 0.000 .4000629 .5262562 difference | .1906961 .0386247 4.94 0.000 .1149478 .2664445 explained | -.0346536 .0174382 -1.99 0.047 -.0688523 -.0004549 unexplained | .2253498 .0453288 4.97 0.000 .1364536 .3142459 -------------+---------------------------------------------------------------- explained | yeduc | -.0322845 .0174944 -1.85 0.065 -.0665934 .0020244 experience | -.0023691 .0025353 -0.93 0.350 -.0073412 .002603 -------------+---------------------------------------------------------------- unexplained | yeduc | .2799535 .2250151 1.24 0.214 -.1613322 .7212393 experience | .0093361 .0795222 0.12 0.907 -.1466179 .1652902 _cons | -.0639399 .230135 -0.28 0.781 -.5152665 .3873867 ------------------------------------------------------------------------------ experience: expft expft2 . drop RIF
Interpretation: D9/D5 is about 0.19 log points larger in the private sector (i.e., higher wage inequality in the private sector than in the public sector if we look at the upper half of the wage distribution). This difference cannot be explained by compositional differences w.r.t. years of education and full-time employment experience. There is a small and marginally significant effect of education, suggesting that if workers in the public sector were paid like workers in the private sector, we would expect a (slightly) higher wage inequality in the upper half of the wage distribution in the public sector than in the private sector.
Overall, the pattern is such that the difference in wage inequality between the public and the private sector cannot be explained by compositional differences with respect to education and work experience. If anything, the gap would even be larger if the two sectors had the same composition (this result is mostly related to education; if in the private sector average level of education would be as high as in the public sector, wage inequality in the private sector would even be higher). We also see that the gap in wage inequality is mostly driven by the upper half of the distribution; yet, also in the lower part of the distribution compositional differences seem to have a suppressing effect on the inequality gap.
Combine your RIF decomposition for the Gini coefficient with reweighting
(analogous to the reweighted OB decomposition) and calculate the
specification error. Use oaxaca_rif
for this exercise.
. oaxaca_rif wage yeduc (experience: expft expft2) /// > [pw=weight], by(public) cluster(psu) wgt(1) rif(gini) /// > rwlogit(c.yeduc##c.expft##c.expft) Estimating Reweighted RIF-OAXACA using RIF:gini Model : Blinder-Oaxaca RIF-decomposition Type : Reweighted RIF : gini Scale : 1 Group 1: public = 0 x1*b1 N of obs 1 = 4184 Group c: X1~>rw~>X2 or x2*b1 N of obs C = 4184 Group 2: public = 1 x2*b2 N of obs 2 = 1274 (Std. err. adjusted for 2,036 clusters in psu) ------------------------------------------------------------------------------- | Robust wage | Coefficient std. err. z P>|z| [95% conf. interval] --------------+---------------------------------------------------------------- Overall | group_1 | .2783233 .0056211 49.51 0.000 .2673062 .2893404 group_c | .2840161 .003174 89.48 0.000 .2777951 .290237 group_2 | .2213006 .0083934 26.37 0.000 .2048497 .2377514 tdifference | .0570227 .0101002 5.65 0.000 .0372266 .0768188 t_explained | -.0056928 .0063401 -0.90 0.369 -.0181192 .0067336 t_unexplained | .0627155 .0175209 3.58 0.000 .0283752 .0970559 --------------+---------------------------------------------------------------- explained | total | -.0056928 .0063401 -0.90 0.369 -.0181192 .0067336 p_explained | -.0093923 .0069699 -1.35 0.178 -.0230531 .0042685 specif_err | .0036995 .0090898 0.41 0.684 -.0141163 .0215152 --------------+---------------------------------------------------------------- p_explained | yeduc | -.007792 .0074926 -1.04 0.298 -.0224772 .0068931 experience | -.0016002 .0009587 -1.67 0.095 -.0034792 .0002787 --------------+---------------------------------------------------------------- specif_err | yeduc | .0788211 .0473546 1.66 0.096 -.0139922 .1716344 experience | -.0154299 .0196131 -0.79 0.431 -.0538709 .0230111 _cons | -.0596917 .0564535 -1.06 0.290 -.1703386 .0509552 --------------+---------------------------------------------------------------- unexplained | total | .0627155 .0175209 3.58 0.000 .0283752 .0970559 rwg_error | -.0000639 .0006264 -0.10 0.919 -.0012916 .0011638 p_unexplained | .0627794 .0174699 3.59 0.000 .028539 .0970199 --------------+---------------------------------------------------------------- p_unexplained | yeduc | -.021478 .0936426 -0.23 0.819 -.205014 .1620581 experience | .0521253 .0431672 1.21 0.227 -.0324808 .1367314 _cons | .0321321 .1088606 0.30 0.768 -.1812308 .245495 --------------+---------------------------------------------------------------- rwg_error | yeduc | -.0000266 .0001822 -0.15 0.884 -.0003836 .0003304 experience | -.0000373 .0006552 -0.06 0.955 -.0013214 .0012468 ------------------------------------------------------------------------------- experience: expft expft2
Interpretation: In total, the specification error is not significant, but there is some weak evidence for a misspecified effect regarding education. The reweighting error is very small.
Try to replicate the results for the reweighted Gini decomposition manually
by first computing the RIF and then applying oaxaca
to the
RIF taking reweighting into account. You will need two calls to
oaxaca
to compute all results.
Step 1: Generate the sector-specific RIF of the Gini using command
dstat
with option rif()
.
. dstat (gini) wage [pw=weight], over(public) rif(RIF, compact) gini Number of obs = 5,458 -------------------------------------------------------------- wage | Coefficient Std. err. [95% conf. interval] -------------+------------------------------------------------ public | no | .2783233 .0056617 .2672241 .2894224 yes | .2213006 .0080971 .205427 .2371741 -------------------------------------------------------------- Variable Storage Display Value name type format label Variable label ---------------------------------------------------------------------------------------------------- RIF double %10.0g RIF of _b[#] . svy: mean RIF, over(public) (running mean on estimation sample) Survey: Mean estimation Number of strata = 15 Number of obs = 5,458 Number of PSUs = 2,036 Population size = 12,146,771 Design df = 2,021 -------------------------------------------------------------- | Linearized | Mean std. err. [95% conf. interval] -------------+------------------------------------------------ c.RIF@public | no | .2783233 .0056664 .2672108 .2894358 yes | .2213006 .0081404 .2053362 .2372649 --------------------------------------------------------------
Step 2: Generate balancing weights that adjust the distribution of
covariates among people in the private sector to the distribution observed
among people in the public sector (i.e. the private sector is reweighted)
using kmatch
.
. kmatch ipw public c.yeduc##c.expft##c.expft [pw=weight], att wgen(ipw) Inverse probability weighting Number of obs = 5,458 Treatment : public = 1 Covariates : yeduc expft c.yeduc#c.expft c.expft#c.expft c.yeduc#c.expft#c.expft PS model : logit (pr) Matching statistics ------------------------------------------------------------------------------ | Matched | Controls | Yes No Total | Used Unused Total -----------+---------------------------------+-------------------------------- Treated | 1274 0 1274 | 4184 0 4184 ------------------------------------------------------------------------------ Stored variables Variable Storage Display Value name type format label Variable label ---------------------------------------------------------------------------------------------------- ipw double %10.0g Matching weights for ATT . bysort public: summarize ipw ---------------------------------------------------------------------------------------------------- -> public = no Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- ipw | 4,184 696.6616 1297.235 .6081152 20413.38 ---------------------------------------------------------------------------------------------------- -> public = yes Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- ipw | 1,274 2287.937 3061.095 28.6 32681.6
Step 3: Compute the reweighted RIF of the Gini for the private sector
using dstat
.
. dstat (gini) wage if public==0 [pw=ipw], rif(RIFipw) Summary statistics Number of obs = 4,184 -------------------------------------------------------------- wage | Coefficient Std. err. [95% conf. interval] -------------+------------------------------------------------ gini | .2840161 .0063869 .2714944 .2965378 -------------------------------------------------------------- Variable Storage Display Value name type format label Variable label ---------------------------------------------------------------------------------------------------- RIFipw double %10.0g RIF of _b[gini]
Step 4: Apply oaxaca
to the raw RIF of private sector and the
reweighted RIF of the private sector to obtain the “pure explained part”
and the “specification error” (the “explained part” of this decomposition
quantifies the “pure explained part”; the “unexplained part” quantifies the
“specification error”).
. // preserve the data so that they can be restored later . preserve . // add a unique ID for each observation . generate ID = _n . // duplicate each private sector observation . expand 2 if public==0 (4,184 observations created) . // generate a 0/1 variable that tags the duplicates; we can then use this as . // the group variable in oaxaca . bysort ID: generate byte G = (_n==2) if public==0 (1,274 missing values generated) . // generate weights containing 1 (or weights if survey weigths are applied) . // for G==0 and the balancing weights for G==1 . replace ipw = weight if G==0 (4,184 real changes made) . // generate a RIF variable containing the raw RIF for G==0 and the reweighted . // RIF for G==1 . replace RIF = RIFipw if G==1 (4,184 real changes made) . // apply oaxaca to the RIF variable while applying the weights; G is the group . // variable . oaxaca RIF yeduc (experience: expft expft2) [pw=ipw], by(G) weight(1) cluster(ID) Blinder-Oaxaca decomposition Number of obs = 8,368 Model = linear Group 1: G = 0 N of obs 1 = 4,184 Group 2: G = 1 N of obs 2 = 4,184 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) (Std. err. adjusted for 4,184 clusters in ID) ------------------------------------------------------------------------------ | Robust RIF | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2783233 .0056825 48.98 0.000 .2671858 .2894608 group_2 | .2840161 .0064432 44.08 0.000 .2713877 .2966444 difference | -.0056928 .0038987 -1.46 0.144 -.013334 .0019484 explained | -.0093923 .0045321 -2.07 0.038 -.018275 -.0005095 unexplained | .0036995 .0030621 1.21 0.227 -.0023021 .0097011 -------------+---------------------------------------------------------------- explained | yeduc | -.007792 .0048499 -1.61 0.108 -.0172977 .0017137 experience | -.0016002 .0006765 -2.37 0.018 -.0029262 -.0002743 -------------+---------------------------------------------------------------- unexplained | yeduc | .0788211 .0191598 4.11 0.000 .0412686 .1163736 experience | -.0154299 .010939 -1.41 0.158 -.03687 .0060101 _cons | -.0596917 .0246583 -2.42 0.015 -.1080211 -.0113623 ------------------------------------------------------------------------------ experience: expft expft2 . estimates store IPW1 . // restore original data . restore
Step 5: Apply oaxaca
to the reweighted RIF of the private
sector and the raw RIF of public sector to obtain the “pure unexplained
part” and the “reweighting error” (the “explained part” of this
decomposition quantifies the “reweighting error”; the “unexplained part”
quantifies the “pure unexplained part”).
. replace RIFipw = RIF if public==1 // fill in RIFipw for public sector (1,274 real changes made) . oaxaca RIFipw yeduc (experience: expft expft2) [pw=ipw], by(public) weight(1) Blinder-Oaxaca decomposition Number of obs = 5,458 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ RIFipw | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2840161 .0064455 44.06 0.000 .2713832 .296649 group_2 | .2213006 .00814 27.19 0.000 .2053464 .2372547 difference | .0627155 .0103829 6.04 0.000 .0423654 .0830656 explained | -.0000639 .0006038 -0.11 0.916 -.0012474 .0011196 unexplained | .0627794 .0102786 6.11 0.000 .0426338 .082925 -------------+---------------------------------------------------------------- explained | yeduc | -.0000266 .0001785 -0.15 0.882 -.0003765 .0003233 experience | -.0000373 .0006216 -0.06 0.952 -.0012557 .0011811 -------------+---------------------------------------------------------------- unexplained | yeduc | -.021478 .0569927 -0.38 0.706 -.1331816 .0902256 experience | .0521253 .0246197 2.12 0.034 .0038715 .100379 _cons | .0321321 .0646791 0.50 0.619 -.0946366 .1589008 ------------------------------------------------------------------------------ experience: expft expft2 . estimates store IPW2
Overview of results:
. esttab IPW1 IPW2, nogap mti -------------------------------------------- (1) (2) IPW1 IPW2 -------------------------------------------- overall group_1 0.278*** 0.284*** (48.98) (44.06) group_2 0.284*** 0.221*** (44.08) (27.19) difference -0.00569 0.0627*** (-1.46) (6.04) explained -0.00939* -0.0000639 (-2.07) (-0.11) unexplained 0.00370 0.0628*** (1.21) (6.11) -------------------------------------------- explained yeduc -0.00779 -0.0000266 (-1.61) (-0.15) experience -0.00160* -0.0000373 (-2.37) (-0.06) -------------------------------------------- unexplained yeduc 0.0788*** -0.0215 (4.11) (-0.38) experience -0.0154 0.0521* (-1.41) (2.12) _cons -0.0597* 0.0321 (-2.42) (0.50) -------------------------------------------- N 8368 5458 -------------------------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001
The point estimates are the same as the ones computed by
oaxaca_rif
above (standard errors are not reliable in either
case; a quick comparison against the bootstrap indicates that some of the
standard errors reported by oaxaca_rif
are completely off; the
standard errors obtained by the manual procedure seem to be more accurate,
but still biased).
Repeat the analysis using entropy balancing for the reweighting. How do the results change?
Step 1 can remain the same. In Step 2 we need to use entropy balancing to generate the weights. Then use these weights in the subsequent steps.
. // Step 2 . kmatch eb public c.yeduc##c.expft##c.expft [pw=weight], att wgen(eb) (fitting balancing weights ... done) Entropy balancing Number of obs = 5,458 Balance tolerance = .00001 Treatment : public = 1 Targets : 1 Covariates : yeduc expft c.yeduc#c.expft c.expft#c.expft c.yeduc#c.expft#c.expft Matching statistics ------------------------------------------------------------------------------------------ | Matched | Controls | Balance | Yes No Total | Used Unused Total | loss -----------+---------------------------------+---------------------------------+---------- Treated | 1274 0 1274 | 4184 0 4184 | 1.22e-15 ------------------------------------------------------------------------------------------ Stored variables Variable Storage Display Value name type format label Variable label ---------------------------------------------------------------------------------------------------- eb double %10.0g Matching weights for ATT . bysort public: summarize eb ---------------------------------------------------------------------------------------------------- -> public = no Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- eb | 4,184 696.6616 1290.758 .6088637 20498.96 ---------------------------------------------------------------------------------------------------- -> public = yes Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- eb | 1,274 2287.937 3061.095 28.6 32681.6 . // Step 3 . dstat (gini) wage if public==0 [pw=eb], rif(RIFeb) Summary statistics Number of obs = 4,184 -------------------------------------------------------------- wage | Coefficient Std. err. [95% conf. interval] -------------+------------------------------------------------ gini | .2834279 .0063278 .271022 .2958338 -------------------------------------------------------------- Variable Storage Display Value name type format label Variable label ---------------------------------------------------------------------------------------------------- RIFeb double %10.0g RIF of _b[gini] . // Step 4 . // preserve the data so that they can be restored later . preserve . // add a unique ID for each observation . generate ID = _n . // duplicate each private sector observation . expand 2 if public==0 (4,184 observations created) . // generate a 0/1 variable that tags the duplicates; we can then use this as . // the group variable in oaxaca . bysort ID: generate byte G = (_n==2) if public==0 (1,274 missing values generated) . // generate weights containing 1 (or weights if survey weigths are applied) . //for G==0 and the balancing weights for G==1 . replace eb = weight if G==0 (4,184 real changes made) . // generate a RIF variable containing the raw RIF for G==0 and the reweighted . // RIF for G==1 . replace RIF = RIFeb if G==1 (4,184 real changes made) . // apply oaxaca to the RIF variable while applying the weights; G is the group . // variable . oaxaca RIF yeduc (experience: expft expft2) [pw=eb], by(G) weight(1) cluster(ID) Blinder-Oaxaca decomposition Number of obs = 8,368 Model = linear Group 1: G = 0 N of obs 1 = 4,184 Group 2: G = 1 N of obs 2 = 4,184 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) (Std. err. adjusted for 4,184 clusters in ID) ------------------------------------------------------------------------------ | Robust RIF | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2783233 .0056825 48.98 0.000 .2671858 .2894608 group_2 | .2834279 .0063834 44.40 0.000 .2709167 .2959391 difference | -.0051046 .0038161 -1.34 0.181 -.012584 .0023747 explained | -.0093274 .0044369 -2.10 0.036 -.0180235 -.0006313 unexplained | .0042227 .0029837 1.42 0.157 -.0016251 .0100706 -------------+---------------------------------------------------------------- explained | yeduc | -.007659 .0047672 -1.61 0.108 -.0170026 .0016846 experience | -.0016684 .0006952 -2.40 0.016 -.003031 -.0003057 -------------+---------------------------------------------------------------- unexplained | yeduc | .0776307 .0187366 4.14 0.000 .0409076 .1143537 experience | -.0155733 .0107915 -1.44 0.149 -.0367243 .0055777 _cons | -.0578346 .0242516 -2.38 0.017 -.1053669 -.0103024 ------------------------------------------------------------------------------ experience: expft expft2 . estimates store EB1 . // restore original data . restore . // Step 5 . replace RIFeb = RIF if public==1 (1,274 real changes made) . oaxaca RIFeb yeduc (experience: expft expft2) [pw=eb], by(public) weight(1) Blinder-Oaxaca decomposition Number of obs = 5,458 Model = linear Group 1: public = 0 N of obs 1 = 4,184 Group 2: public = 1 N of obs 2 = 1,274 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ RIFeb | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .2834279 .0063857 44.38 0.000 .2709122 .2959436 group_2 | .2213006 .00814 27.19 0.000 .2053464 .2372547 difference | .0621274 .0103459 6.01 0.000 .0418498 .0824049 explained | 0 .0006017 0.00 1.000 -.0011792 .0011792 unexplained | .0621274 .0102771 6.05 0.000 .0419846 .0822702 -------------+---------------------------------------------------------------- explained | yeduc | -9.15e-18 .0001493 -0.00 1.000 -.0002927 .0002927 experience | 1.12e-17 .000628 0.00 1.000 -.0012309 .0012309 -------------+---------------------------------------------------------------- unexplained | yeduc | -.0204472 .0569986 -0.36 0.720 -.1321624 .0912681 experience | .0522995 .0245928 2.13 0.033 .0040984 .1005005 _cons | .0302751 .0646503 0.47 0.640 -.0964372 .1569873 ------------------------------------------------------------------------------ experience: expft expft2 . estimates store EB2
Overview of results:
. esttab IPW1 IPW2 EB1 EB2, nogap mti ---------------------------------------------------------------------------- (1) (2) (3) (4) IPW1 IPW2 EB1 EB2 ---------------------------------------------------------------------------- overall group_1 0.278*** 0.284*** 0.278*** 0.283*** (48.98) (44.06) (48.98) (44.38) group_2 0.284*** 0.221*** 0.283*** 0.221*** (44.08) (27.19) (44.40) (27.19) difference -0.00569 0.0627*** -0.00510 0.0621*** (-1.46) (6.04) (-1.34) (6.01) explained -0.00939* -0.0000639 -0.00933* 0 (-2.07) (-0.11) (-2.10) (0.00) unexplained 0.00370 0.0628*** 0.00422 0.0621*** (1.21) (6.11) (1.42) (6.05) ---------------------------------------------------------------------------- explained yeduc -0.00779 -0.0000266 -0.00766 -9.15e-18 (-1.61) (-0.15) (-1.61) (-0.00) experience -0.00160* -0.0000373 -0.00167* 1.12e-17 (-2.37) (-0.06) (-2.40) (0.00) ---------------------------------------------------------------------------- unexplained yeduc 0.0788*** -0.0215 0.0776*** -0.0204 (4.11) (-0.38) (4.14) (-0.36) experience -0.0154 0.0521* -0.0156 0.0523* (-1.41) (2.12) (-1.44) (2.13) _cons -0.0597* 0.0321 -0.0578* 0.0303 (-2.42) (0.50) (-2.38) (0.47) ---------------------------------------------------------------------------- N 8368 5458 8368 5458 ---------------------------------------------------------------------------- t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001
We see that the reweighting error is now zero (apart from roundoff error).