Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024
Required packages: cdist
, dstat
, estout
,
jmpierce
, moremata
, grstyle
, palettes
, colrspace
Set the seed of the random number generator for sake of reproducibility:
. set seed 439028
Extend the model from the session on distribution decompositions.
Include the international socio-economic index (isei
) as well as
the number of children in the household (children
). Decompose the
private–public gap in the D9/D1, the D9/D5 and the D5/D1 ratio. Use the
approaches based on JMP, conditional quantiles and distribution regressions
and compare the results. The decompositions should be such that the
covariate distribution of the private sector is adjusted to the
covariate distribution of the public sector (i.e. use the wage structure
from the private sector as the reference wage structure).
Data preparation including additional predictors:
. use gsoep-extract, clear (Example data based on the German Socio-Economic Panel) . keep if wave==2015 (29,970 observations deleted) . keep if inrange(age, 25, 55) (5,671 observations deleted) . generate lnwage = ln(wage) (1,709 missing values generated) . generate expft2 = expft^2 (35 missing values generated) . summarize lnwage yeduc expft expft2 public isei children Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- lnwage | 5,600 2.736721 .5062968 1.108563 4.799255 yeduc | 7,121 12.28823 2.783974 7 18 expft | 7,274 11.63359 9.556508 0 39.5 expft2 | 7,274 226.6548 293.3739 0 1560.25 public | 5,770 .2353553 .4242574 0 1 -------------+--------------------------------------------------------- isei | 6,451 45.07115 17.00982 16 90 children | 7,309 1.090163 1.174416 0 4 . drop if missing(lnwage, yeduc, expft, public, isei, children) // remove unused observation (1,879 observations deleted)
Overview of characteristics:
. tabstat yeduc expft isei children [aw=weight], by(public) Summary statistics: Mean Group variable: public (public service) public | yeduc expft isei children -------+---------------------------------------- no | 12.4304 14.35697 45.18642 .5606184 yes | 14.0728 13.57464 53.27206 .553369 -------+---------------------------------------- Total | 12.8238 14.16958 47.12315 .558882 ------------------------------------------------
People in the public sector are on average higher educated, have less full- time labor experience, higher occupational status and about the same number of children than people in the private sector.
JMP decomposition:
. regress lnwage yeduc expft expft2 isei children [pw=weight] if public==0 (sum of wgt is 9,175,995.0951793) Linear regression Number of obs = 4,163 F(5, 4157) = 160.97 Prob > F = 0.0000 R-squared = 0.3580 Root MSE = .40198 ------------------------------------------------------------------------------ | Robust lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .0482309 .0058667 8.22 0.000 .0367291 .0597328 expft | .0263459 .0041235 6.39 0.000 .0182617 .0344301 expft2 | -.0003025 .0001168 -2.59 0.010 -.0005315 -.0000736 isei | .0103354 .0007699 13.42 0.000 .008826 .0118447 children | .0460693 .0101806 4.53 0.000 .0261099 .0660288 _cons | 1.355772 .0686932 19.74 0.000 1.221097 1.490448 ------------------------------------------------------------------------------ . estimates store private . regress lnwage yeduc expft expft2 isei children [pw=weight] if public==1 (sum of wgt is 2,890,165.7029972) Linear regression Number of obs = 1,267 F(5, 1261) = 60.73 Prob > F = 0.0000 R-squared = 0.3723 Root MSE = .3497 ------------------------------------------------------------------------------ | Robust lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .036099 .0080594 4.48 0.000 .0202877 .0519104 expft | .0415204 .0071014 5.85 0.000 .0275885 .0554523 expft2 | -.0007342 .0001928 -3.81 0.000 -.0011124 -.0003559 isei | .0079597 .001322 6.02 0.000 .0053661 .0105532 children | .0524704 .0134812 3.89 0.000 .0260223 .0789184 _cons | 1.554496 .0948874 16.38 0.000 1.368342 1.740651 ------------------------------------------------------------------------------ . estimates store public . jmpierce private public, reference(1) statistics(mean d9010 d9050 d5010 variance) Juhn-Murphy-Pierce decomposition (reference estimates: private) T Q P U mean -.13884181 -.15301408 .01080046 .00337181 d9010 .19181561 -.06719494 .13387465 .1251359 d9050 .19879556 .03245497 .09367847 .07266212 d5010 -.00697994 -.09964991 .04019618 .05247378 variance .05731634 .00172229 .02227498 .03331907 T = Total difference (private-public) Q = Contribution of differences in observable quantities P = Contribution of differences in observable prices U = Contribution of differences in unobservable quantities and prices
We see that differences in average log wages between the private and the public sector can be fully explained by compositional differences between workers in both sectors (Q-component). However, if the private sector had the public sector's covariate distribution, overall wage inequality (as measured by D9D1) in the private sector would increase (i.e. the inequality-gap would be larger). Interestingly, at the same time, the sector-gap in the dispersion of wages in the upper half of the wage distribution (D9D5) is predicted to be a little lower than it actually is, whereas for the lower half of the wage distribution (D5D1) it is predicted to be much larger. Finally, with respect to the sector-gap in the variance of log wages, compositional differences cannot explain the observed gap.
Looking at the P- and the U-components, we see that (with the exception of the mean) large parts of the inequality-gaps are due to a less "inequality-prone" wage structure in the public sector (component P) and less "inequality-prone" unobservable characteristics and wage returns to these characteristics.
Decomposition based on conditional quantiles:
. // Estimate counterfactuals . cdist lnwage yeduc c.expft##c.expft2 isei children [pw=weight], by(public) method(qr) /// > statistics(mean iqr(10 90) iqr(50 90) iqr(10 50) variance) /// > vce(bootstrap, cluster(psu)) (running cdist on estimation sample) Bootstrap replications (50): .........10.........20.........30.........40.........50 done Counterfactual distribution estimation Number of obs = 5,430 Replications = 50 Pooled = no Group 0: public = 0 N of obs 0 = 4,163 Group 1: public = 1 N of obs 1 = 1,267 Estimation method = qr Grid size = 100 (Replications based on 2,034 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based lnwage | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- obs0 | mean | 2.733252 .0126214 216.56 0.000 2.708514 2.757989 iqr(10,90)| 1.244804 .0308 40.42 0.000 1.184437 1.305171 iqr(50,90)| .6547899 .0257502 25.43 0.000 .6043205 .7052594 iqr(10,50)| .5900142 .0164323 35.91 0.000 .5578074 .622221 variance | .2513125 .0105011 23.93 0.000 .2307306 .2718943 -------------+---------------------------------------------------------------- fit0 | mean | 2.732335 .0126247 216.43 0.000 2.707591 2.757079 iqr(10,90)| 1.259786 .0286964 43.90 0.000 1.203542 1.31603 iqr(50,90)| .6420845 .0159623 40.23 0.000 .610799 .67337 iqr(10,50)| .6177015 .0215533 28.66 0.000 .5754579 .6599451 variance | .2466493 .0107218 23.00 0.000 .225635 .2676636 -------------+---------------------------------------------------------------- adj0 | mean | 2.881914 .0181275 158.98 0.000 2.846385 2.917444 iqr(10,90)| 1.280105 .0283041 45.23 0.000 1.22463 1.33558 iqr(50,90)| .6258292 .0234882 26.64 0.000 .5797932 .6718652 iqr(10,50)| .654276 .0209768 31.19 0.000 .6131623 .6953897 variance | .2561258 .0109183 23.46 0.000 .2347262 .2775254 -------------+---------------------------------------------------------------- obs1 | mean | 2.872093 .0219387 130.91 0.000 2.829094 2.915092 iqr(10,90)| 1.052989 .0580101 18.15 0.000 .9392908 1.166686 iqr(50,90)| .4559944 .0402445 11.33 0.000 .3771166 .5348722 iqr(10,50)| .5969942 .0499454 11.95 0.000 .499103 .6948853 variance | .1939033 .0164413 11.79 0.000 .1616789 .2261278 -------------+---------------------------------------------------------------- fit1 | mean | 2.872006 .0218841 131.24 0.000 2.829114 2.914898 iqr(10,90)| 1.003765 .051603 19.45 0.000 .9026249 1.104905 iqr(50,90)| .4307655 .0191634 22.48 0.000 .3932059 .4683252 iqr(10,50)| .5729995 .0404368 14.17 0.000 .4937449 .652254 variance | .1908335 .0161043 11.85 0.000 .1592697 .2223973 -------------+---------------------------------------------------------------- adj1 | mean | 2.76083 .0207492 133.06 0.000 2.720163 2.801498 iqr(10,90)| .9915735 .0544667 18.21 0.000 .8848208 1.098326 iqr(50,90)| .4348294 .0189262 22.98 0.000 .3977348 .471924 iqr(10,50)| .5567441 .0429097 12.97 0.000 .4726427 .6408456 variance | .1924975 .0170734 11.27 0.000 .1590343 .2259607 ------------------------------------------------------------------------------ covariates: yeduc expft expft2 c.expft#c.expft2 isei children
. // Decomposition with private sector wage structure as reference . cdist decomp Delta: fit0 - fit1 Chars: fit0 - adj0 Coefs: adj0 - fit1 (Replications based on 2,034 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based lnwage | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Delta | mean | -.1396714 .026973 -5.18 0.000 -.1925376 -.0868053 iqr(10,90)| .256021 .05867 4.36 0.000 .1410299 .3710121 iqr(50,90)| .2113189 .0271698 7.78 0.000 .1580671 .2645708 iqr(10,50)| .0447021 .0434172 1.03 0.303 -.0403941 .1297983 variance | .0558158 .0185365 3.01 0.003 .019485 .0921466 -------------+---------------------------------------------------------------- Chars | mean | -.1495793 .0208423 -7.18 0.000 -.1904295 -.1087291 iqr(10,90)| -.0203191 .0244797 -0.83 0.407 -.0682985 .0276602 iqr(50,90)| .0162553 .0170899 0.95 0.342 -.0172403 .0497509 iqr(10,50)| -.0365744 .0162786 -2.25 0.025 -.0684798 -.004669 variance | -.0094765 .0077148 -1.23 0.219 -.0245972 .0056442 -------------+---------------------------------------------------------------- Coefs | mean | .0099078 .0202989 0.49 0.625 -.0298772 .0496929 iqr(10,90)| .2763402 .0616817 4.48 0.000 .1554462 .3972341 iqr(50,90)| .1950636 .0323579 6.03 0.000 .1316433 .258484 iqr(10,50)| .0812765 .0452189 1.80 0.072 -.0073509 .1699039 variance | .0652923 .0199136 3.28 0.001 .0262624 .1043222 ------------------------------------------------------------------------------ covariates: yeduc expft expft2 c.expft#c.expft2 isei children . estimates store qr_priv
Decomposition based on distribution regression:
. // Estimate counterfactuals . cdist lnwage yeduc c.expft##c.expft2 isei children [pw=weight], by(public) /// > statistics(mean iqr(10 90) iqr(50 90) iqr(10 50) variance) /// > vce(bootstrap, cluster(psu)) (running cdist on estimation sample) Bootstrap replications (50): .........10.........20.........30.........40.........50 done Counterfactual distribution estimation Number of obs = 5,430 Replications = 50 Pooled = no Group 0: public = 0 N of obs 0 = 4,163 Group 1: public = 1 N of obs 1 = 1,267 Estimation method = logit Grid size = 100 (Replications based on 2,034 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based lnwage | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- obs0 | mean | 2.733252 .0134363 203.42 0.000 2.706917 2.759586 iqr(10,90)| 1.244804 .0290057 42.92 0.000 1.187954 1.301654 iqr(50,90)| .6547899 .0228014 28.72 0.000 .6101 .6994798 iqr(10,50)| .5900142 .016284 36.23 0.000 .5580981 .6219303 variance | .2513125 .0085851 29.27 0.000 .234486 .2681389 -------------+---------------------------------------------------------------- fit0 | mean | 2.748031 .0134993 203.57 0.000 2.721573 2.774489 iqr(10,90)| 1.251093 .0275141 45.47 0.000 1.197166 1.305019 iqr(50,90)| .6690338 .0231009 28.96 0.000 .6237568 .7143107 iqr(10,50)| .5820589 .014857 39.18 0.000 .5529396 .6111782 variance | .2591116 .0085388 30.35 0.000 .2423758 .2758474 -------------+---------------------------------------------------------------- adj0 | mean | 2.902411 .0222176 130.64 0.000 2.858866 2.945957 iqr(10,90)| 1.31026 .0291286 44.98 0.000 1.253168 1.367351 iqr(50,90)| .6639874 .033464 19.84 0.000 .5983991 .7295757 iqr(10,50)| .6462722 .029791 21.69 0.000 .5878828 .7046615 variance | .2789182 .0101444 27.49 0.000 .2590356 .2988008 -------------+---------------------------------------------------------------- obs1 | mean | 2.872093 .021535 133.37 0.000 2.829886 2.914301 iqr(10,90)| 1.052989 .0624575 16.86 0.000 .930574 1.175403 iqr(50,90)| .4559944 .0404873 11.26 0.000 .3766407 .535348 iqr(10,50)| .5969942 .0498412 11.98 0.000 .4993072 .6946811 variance | .1939033 .0172408 11.25 0.000 .160112 .2276947 -------------+---------------------------------------------------------------- fit1 | mean | 2.883538 .0217245 132.73 0.000 2.840959 2.926118 iqr(10,90)| 1.050459 .0669272 15.70 0.000 .9192838 1.181634 iqr(50,90)| .4620192 .0418941 11.03 0.000 .3799083 .5441301 iqr(10,50)| .5884395 .053075 11.09 0.000 .4844144 .6924645 variance | .1910696 .0177449 10.77 0.000 .1562902 .2258489 -------------+---------------------------------------------------------------- adj1 | mean | 2.776849 .0169242 164.08 0.000 2.743678 2.810019 iqr(10,90)| .9898577 .0511921 19.34 0.000 .889523 1.090192 iqr(50,90)| .4412684 .0261397 16.88 0.000 .3900356 .4925013 iqr(10,50)| .5485892 .0434873 12.61 0.000 .4633557 .6338227 variance | .1725809 .0167969 10.27 0.000 .1396596 .2055021 ------------------------------------------------------------------------------ covariates: yeduc expft expft2 c.expft#c.expft2 isei children
. // Decomposition with private sector wage structure as reference . cdist decomp Delta: fit0 - fit1 Chars: fit0 - adj0 Coefs: adj0 - fit1 (Replications based on 2,034 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based lnwage | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Delta | mean | -.1355075 .0256462 -5.28 0.000 -.1857732 -.0852418 iqr(10,90)| .200634 .0747203 2.69 0.007 .0541848 .3470832 iqr(50,90)| .2070146 .0476355 4.35 0.000 .1136507 .3003784 iqr(10,50)| -.0063806 .0581234 -0.11 0.913 -.1203003 .1075391 variance | .068042 .0195721 3.48 0.001 .0296814 .1064026 -------------+---------------------------------------------------------------- Chars | mean | -.1543804 .0178541 -8.65 0.000 -.1893737 -.1193871 iqr(10,90)| -.0591669 .0286742 -2.06 0.039 -.1153673 -.0029665 iqr(50,90)| .0050464 .03985 0.13 0.899 -.0730581 .0831509 iqr(10,50)| -.0642133 .0266276 -2.41 0.016 -.1164025 -.0120241 variance | -.0198066 .0080578 -2.46 0.014 -.0355995 -.0040137 -------------+---------------------------------------------------------------- Coefs | mean | .0188729 .0228139 0.83 0.408 -.0258414 .0635873 iqr(10,90)| .2598009 .0681203 3.81 0.000 .1262876 .3933142 iqr(50,90)| .2019682 .0512275 3.94 0.000 .1015641 .3023723 iqr(10,50)| .0578327 .0567116 1.02 0.308 -.0533201 .1689855 variance | .0878486 .0191852 4.58 0.000 .0502462 .125451 ------------------------------------------------------------------------------ covariates: yeduc expft expft2 c.expft#c.expft2 isei children . estimates store dr_priv
We again see that the sector-gap in mean log wages can be fully explained by the compositional differences between workers in the private and in the public sector. In contrast, sector differences in wage inequality cannot be accounted for by compositional differences. The results again suggest that wage inequality in the private sector would even be larger than it actually is if this sector's covariate distribution was the same as the covariate distribution in the public sector.
How do results change if you adjust the covariates of people in the public to those of people in the private sector (i.e. if you use the wage structure from the public sector as the reference wage structure)?
Decomposition based on conditional quantiles:
. // Decomposition with private sector wage structure as reference . estimates restore qr_priv (results qr_priv are active now) . cdist decomp, reverse Delta: fit0 - fit1 Chars: adj1 - fit1 Coefs: fit0 - adj1 (Replications based on 2,034 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based lnwage | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Delta | mean | -.1396714 .026973 -5.18 0.000 -.1925376 -.0868053 iqr(10,90)| .256021 .05867 4.36 0.000 .1410299 .3710121 iqr(50,90)| .2113189 .0271698 7.78 0.000 .1580671 .2645708 iqr(10,50)| .0447021 .0434172 1.03 0.303 -.0403941 .1297983 variance | .0558158 .0185365 3.01 0.003 .019485 .0921466 -------------+---------------------------------------------------------------- Chars | mean | -.1111762 .0179451 -6.20 0.000 -.146348 -.0760044 iqr(10,90)| -.0121915 .0219515 -0.56 0.579 -.0552157 .0308328 iqr(50,90)| .0040638 .0148798 0.27 0.785 -.0251001 .0332278 iqr(10,50)| -.0162553 .0156615 -1.04 0.299 -.0469512 .0144406 variance | .001664 .0071956 0.23 0.817 -.0124391 .0157671 -------------+---------------------------------------------------------------- Coefs | mean | -.0284953 .0208803 -1.36 0.172 -.0694199 .0124294 iqr(10,90)| .2682125 .0556574 4.82 0.000 .1591259 .3772991 iqr(50,90)| .2072551 .0256171 8.09 0.000 .1570466 .2574636 iqr(10,50)| .0609574 .0422411 1.44 0.149 -.0218337 .1437484 variance | .0541518 .0176212 3.07 0.002 .0196148 .0886888 ------------------------------------------------------------------------------ covariates: yeduc expft expft2 c.expft#c.expft2 isei children . estimates store qr_publ
. // Comparison . esttab qr_priv qr_publ, b(3) not nonum mti -------------------------------------------- qr_priv qr_publ -------------------------------------------- Delta mean -0.140*** -0.140*** iqr(10,90) 0.256*** 0.256*** iqr(50,90) 0.211*** 0.211*** iqr(10,50) 0.045 0.045 variance 0.056** 0.056** -------------------------------------------- Chars mean -0.150*** -0.111*** iqr(10,90) -0.020 -0.012 iqr(50,90) 0.016 0.004 iqr(10,50) -0.037* -0.016 variance -0.009 0.002 -------------------------------------------- Coefs mean 0.010 -0.028 iqr(10,90) 0.276*** 0.268*** iqr(50,90) 0.195*** 0.207*** iqr(10,50) 0.081 0.061 variance 0.065** 0.054** -------------------------------------------- N 5430 5430 -------------------------------------------- * p<0.05, ** p<0.01, *** p<0.001
Results in the explained part are a little less pronounced if we use the public sector wage structure as reference (i.e. adjusting the covariate distribution in the public sector to that of the private sector). However, the overall interpretation stays the same: The gap in mean log wages can at least to a large extent be explained by compositional differences, whereas the inequality-gaps cannot.
Decomposition based on distribution regression:
. //Decomposition with public sector wage structure as reference . estimates restore dr_priv (results dr_priv are active now) . cdist decomp, reverse Delta: fit0 - fit1 Chars: adj1 - fit1 Coefs: fit0 - adj1 (Replications based on 2,034 clusters in psu) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based lnwage | coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Delta | mean | -.1355075 .0256462 -5.28 0.000 -.1857732 -.0852418 iqr(10,90)| .200634 .0747203 2.69 0.007 .0541848 .3470832 iqr(50,90)| .2070146 .0476355 4.35 0.000 .1136507 .3003784 iqr(10,50)| -.0063806 .0581234 -0.11 0.913 -.1203003 .1075391 variance | .068042 .0195721 3.48 0.001 .0296814 .1064026 -------------+---------------------------------------------------------------- Chars | mean | -.1066897 .0185163 -5.76 0.000 -.142981 -.0703984 iqr(10,90)| -.060601 .0457185 -1.33 0.185 -.1502077 .0290057 iqr(50,90)| -.0207508 .0396793 -0.52 0.601 -.0985207 .0570192 iqr(10,50)| -.0398502 .0413679 -0.96 0.335 -.1209298 .0412294 variance | -.0184887 .0099965 -1.85 0.064 -.0380814 .0011041 -------------+---------------------------------------------------------------- Coefs | mean | -.0288178 .0196911 -1.46 0.143 -.0674116 .009776 iqr(10,90)| .261235 .0593779 4.40 0.000 .1448564 .3776136 iqr(50,90)| .2277653 .0345312 6.60 0.000 .1600855 .2954451 iqr(10,50)| .0334697 .0492617 0.68 0.497 -.0630815 .1300209 variance | .0865307 .0195185 4.43 0.000 .0482752 .1247862 ------------------------------------------------------------------------------ covariates: yeduc expft expft2 c.expft#c.expft2 isei children . estimates store dr_publ
. // Comparison . esttab dr_priv dr_publ, b(3) not nonum mti -------------------------------------------- dr_priv dr_publ -------------------------------------------- Delta mean -0.136*** -0.136*** iqr(10,90) 0.201** 0.201** iqr(50,90) 0.207*** 0.207*** iqr(10,50) -0.006 -0.006 variance 0.068*** 0.068*** -------------------------------------------- Chars mean -0.154*** -0.107*** iqr(10,90) -0.059* -0.061 iqr(50,90) 0.005 -0.021 iqr(10,50) -0.064* -0.040 variance -0.020* -0.018 -------------------------------------------- Coefs mean 0.019 -0.029 iqr(10,90) 0.260*** 0.261*** iqr(50,90) 0.202*** 0.228*** iqr(10,50) 0.058 0.033 variance 0.088*** 0.087*** -------------------------------------------- N 5430 5430 -------------------------------------------- * p<0.05, ** p<0.01, *** p<0.001
Similar as above.
Optional: Compare your results to results from analogous decompositions using reweighting (e.g. compare the results from the distribution regression decomposition to results from decompositions based on IPW or entropy balancing). Can you reduce the difference between results by fine-tuning the models used in the various decompositions?
To make things more convenient, let's write a small program to run reweighting decompositions:
capt prog drop ipwdecomp
program ipwdecomp, eclass
// syntax
syntax varlist(fv min=2) [if] [in] [fw pw iw], by(varname) [ s(str) eb ]
gettoken depvar controls : varlist
if `"`eb'"'=="" local method ipw
else local method eb
// estimation sample
marksample touse
markout `touse' `by'
// counterfactual
tempname adj0
qui dstat (`s') `depvar' if `touse' [`weight'`exp'], over(`by') nose ///
balance(`method':`controls', reference(1))
local grps `"`e(over_namelist)'"'
if `:list sizeof grps'!=2 {
di as err "by() must dichotomous (two groups)"
exit 498
}
local g0: word 1 of `grps'
local g1: word 2 of `grps'
matrix `adj0' = e(b)[1,`"`g0':"']
// observed
tempname obs0 obs1
qui dstat (`s') `depvar' if `touse' [`weight'`exp'], over(`by') nose
matrix `obs0' = e(b)[1,`"`g0':"']
matrix `obs1' = e(b)[1,`"`g1':"']
// decomposition
tempname b tmp
matrix `b' = `obs0'-`obs1'
matrix coleq `b' = "Difference"
matrix `tmp' = `obs0' - `adj0'
matrix coleq `tmp' = "Explained"
matrix `b' = `b', `tmp'
matrix `tmp' = `adj0' - `obs1'
matrix coleq `tmp' = "Unexplained"
matrix `b' = `b', `tmp'
// post results
eret post `b' [`weight'`exp'], depname(`depvar') esample(`touse') obs(`e(N)')
eret local cmd "ipwdecomp"
eret local eb "`eb'"
// display
eret display, vsquish
end
The syntax is
ipwdecomp depvar indepvars [if] [in] [weight], by(groupvar) [ s(statistics) eb ]
where statistics
is a list of target statistics (any
statistic supported by dstat
is allowed) and option
eb
requests using entropy balancing rather than logit-based
IPW.
We now estimate several variants of the decomposition using the same specification as above for the covariates (linear terms for education, ISEI, and number of children, quadratic term for work experience, no interactions) and plot the results in a graph.
. local lhs yeduc c.expft##c.expft isei children . local stats mean iqr(10,90) iqr(50,90) iqr(10,50) variance . ipwdecomp lnwage `lhs' [pw=weight], by(public) s(`stats') ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Difference | mean | -.1388418 iqr(10,90)| .1918156 iqr(50,90)| .1987956 iqr(10,50)| -.0069799 variance | .0573163 -------------+---------------------------------------------------------------- Explained | mean | -.1501454 iqr(10,90)| -.0707943 iqr(50,90)| -.0119915 iqr(10,50)| -.0588028 variance | -.0135576 -------------+---------------------------------------------------------------- Unexplained | mean | .0113036 iqr(10,90)| .26261 iqr(50,90)| .2107871 iqr(10,50)| .0518229 variance | .070874 ------------------------------------------------------------------------------ . estimates store ipw . ipwdecomp lnwage `lhs' [pw=weight], by(public) s(`stats') eb ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Difference | mean | -.1388418 iqr(10,90)| .1918156 iqr(50,90)| .1987956 iqr(10,50)| -.0069799 variance | .0573163 -------------+---------------------------------------------------------------- Explained | mean | -.1495155 iqr(10,90)| -.0707943 iqr(50,90)| -.0142324 iqr(10,50)| -.0565619 variance | -.0136304 -------------+---------------------------------------------------------------- Unexplained | mean | .0106737 iqr(10,90)| .26261 iqr(50,90)| .213028 iqr(10,50)| .049582 variance | .0709467 ------------------------------------------------------------------------------ . estimates store eb . cdist lnwage `lhs' [pw=weight], by(public) s(`stats') decomp group 0: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done group 1: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done Counterfactual distribution estimation Number of obs = 5,430 Pooled = no Group 0: public = 0 N of obs 0 = 4,163 Group 1: public = 1 N of obs 1 = 1,267 Estimation method = logit Grid size = 100 Delta: fit0 - fit1 Chars: fit0 - adj0 Coefs: adj0 - fit1 ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Delta | mean | -.1355075 iqr(10,90)| .200634 iqr(50,90)| .2070146 iqr(10,50)| -.0063806 variance | .068042 -------------+---------------------------------------------------------------- Chars | mean | -.1544988 iqr(10,90)| -.0591669 iqr(50,90)| .0205505 iqr(10,50)| -.0797174 variance | -.0183628 -------------+---------------------------------------------------------------- Coefs | mean | .0189913 iqr(10,90)| .2598009 iqr(50,90)| .1864641 iqr(10,50)| .0733368 variance | .0864048 ------------------------------------------------------------------------------ covariates: yeduc expft c.expft#c.expft isei children . estimates store dr . cdist lnwage `lhs' [pw=weight], by(public) s(`stats') decomp method(qr) group 0: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done group 1: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done Counterfactual distribution estimation Number of obs = 5,430 Pooled = no Group 0: public = 0 N of obs 0 = 4,163 Group 1: public = 1 N of obs 1 = 1,267 Estimation method = qr Grid size = 100 Delta: fit0 - fit1 Chars: fit0 - adj0 Coefs: adj0 - fit1 ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Delta | mean | -.1374013 iqr(10,90)| .256021 iqr(50,90)| .2113189 iqr(10,50)| .0447021 variance | .0605626 -------------+---------------------------------------------------------------- Chars | mean | -.1497873 iqr(10,90)| -.024383 iqr(50,90)| .0121915 iqr(10,50)| -.0365744 variance | -.0111118 -------------+---------------------------------------------------------------- Coefs | mean | .012386 iqr(10,90)| .280404 iqr(50,90)| .1991275 iqr(10,50)| .0812765 variance | .0716744 ------------------------------------------------------------------------------ covariates: yeduc expft c.expft#c.expft isei children . estimates store qr
. grstyle init . grstyle set plain, grid . grstyle set color sb . grstyle set legend 3, inside nobox . coefplot ipw eb dr qr, noci keep(*:*) /// > eqrename(Delta = Difference Chars = Explained Coefs = Unexplained) /// > recast(bar) barwidth(.15) xline(0) /// > plotlabels("logit IPW" "entropy balancing IPW" /// > "distribution regression" "quantile regression")
In the top panel the raw differences are reported. In principle, these should be the same for all methods. However, in the distribution and quantile regression approaches, fitted raw differences are computed which are affected by approximation error. For the mean, the approximation seems to be good, but for some of the inequality measures there are substantial deviations, particularly when using the quantile regression approach.
With respect to the breakup into an explained and an unexplained component, logit IPW and entropy balancing pretty much agree. Results from distribution regression and quantile regression deviate here an there, but the overall picture is similar.
We now check whether differences between the results can be reduced by using a more flexible specification for the covariates. Here are the results we obtain if we include all two-way interaction and a squared terms for each variable.
. local lhs c.yeduc##c.expft##c.isei##c.children/* > */ c.yeduc#c.yeduc c.expft#c.expft/* > */ c.isei#c.isei c.children#c.children . local stats mean iqr(10,90) iqr(50,90) iqr(10,50) variance . ipwdecomp lnwage `lhs' [pw=weight], by(public) s(`stats') ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Difference | mean | -.1388418 iqr(10,90)| .1918156 iqr(50,90)| .1987956 iqr(10,50)| -.0069799 variance | .0573163 -------------+---------------------------------------------------------------- Explained | mean | -.1489135 iqr(10,90)| -.0463133 iqr(50,90)| .0071507 iqr(10,50)| -.0534639 variance | -.0097855 -------------+---------------------------------------------------------------- Unexplained | mean | .0100717 iqr(10,90)| .2381289 iqr(50,90)| .1916449 iqr(10,50)| .046484 variance | .0671018 ------------------------------------------------------------------------------ . estimates store ipw . ipwdecomp lnwage `lhs' [pw=weight], by(public) s(`stats') eb ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Difference | mean | -.1388418 iqr(10,90)| .1918156 iqr(50,90)| .1987956 iqr(10,50)| -.0069799 variance | .0573163 -------------+---------------------------------------------------------------- Explained | mean | -.1448618 iqr(10,90)| -.0470619 iqr(50,90)| .0067458 iqr(10,50)| -.0538077 variance | -.0089903 -------------+---------------------------------------------------------------- Unexplained | mean | .00602 iqr(10,90)| .2388775 iqr(50,90)| .1920497 iqr(10,50)| .0468278 variance | .0663066 ------------------------------------------------------------------------------ . estimates store eb . cdist lnwage `lhs' [pw=weight], by(public) s(`stats') /// > lincom((Difference:fit0-fit1) (Explained:fit0-adj0) (Unexplained:adj0-fit1)) group 0: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done group 1: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done Counterfactual distribution estimation Number of obs = 5,430 Pooled = no Group 0: public = 0 N of obs 0 = 4,163 Group 1: public = 1 N of obs 1 = 1,267 Estimation method = logit Grid size = 100 Difference: fit0-fit1 Explained: fit0-adj0 Unexplained: adj0-fit1 ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Difference | mean | -.1355075 iqr(10,90)| .200634 iqr(50,90)| .2070146 iqr(10,50)| -.0063806 variance | .068042 -------------+---------------------------------------------------------------- Explained | mean | -.145334 iqr(10,90)| -.0591669 iqr(50,90)| .0050464 iqr(10,50)| -.0642133 variance | -.0169221 -------------+---------------------------------------------------------------- Unexplained | mean | .0098265 iqr(10,90)| .2598009 iqr(50,90)| .2019682 iqr(10,50)| .0578327 variance | .0849642 ------------------------------------------------------------------------------ covariates: yeduc expft c.yeduc#c.expft isei c.yeduc#c.isei c.expft#c.isei ... . estimates store dr . cdist lnwage `lhs' [pw=weight], by(public) s(`stats') method(qr) /// > lincom((Difference:fit0-fit1) (Explained:fit0-adj0) (Unexplained:adj0-fit1)) group 0: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done group 1: fitting models 0%....20%....40%....60%....80%....100% enumerating predictions ... done Counterfactual distribution estimation Number of obs = 5,430 Pooled = no Group 0: public = 0 N of obs 0 = 4,163 Group 1: public = 1 N of obs 1 = 1,267 Estimation method = qr Grid size = 100 Difference: fit0-fit1 Explained: fit0-adj0 Unexplained: adj0-fit1 ------------------------------------------------------------------------------ lnwage | Coefficient -------------+---------------------------------------------------------------- Difference | mean | -.1397298 iqr(10,90)| .2275742 iqr(50,90)| .1991275 iqr(10,50)| .0284468 variance | .0622992 -------------+---------------------------------------------------------------- Explained | mean | -.1467292 iqr(10,90)| -.024383 iqr(50,90)| .0203191 iqr(10,50)| -.0447021 variance | -.0124196 -------------+---------------------------------------------------------------- Unexplained | mean | .0069995 iqr(10,90)| .2519572 iqr(50,90)| .1788083 iqr(10,50)| .0731489 variance | .0747188 ------------------------------------------------------------------------------ covariates: yeduc expft c.yeduc#c.expft isei c.yeduc#c.isei c.expft#c.isei ... . estimates store qr . coefplot ipw eb dr qr, noci keep(*:*) /// > eqrename(Delta = Difference Chars = Explained Coefs = Unexplained) /// > recast(bar) barwidth(.15) xline(0) /// > plotlabels("logit IPW" "entropy balancing IPW" /// > "distribution regression" "quantile regression")
The agreement of results across methods became somewhat better, but the changes are not dramatic. Some approximation error still remains for the distribution regression and quantile regression approaches, and there are still some differences in the explained–unexplained breakup across the methods.
Which of the results are more appropriate is hard to say. We are not aware of any research systematically comparing the approaches (e.g., using simulations), to find out whether some approaches generally (or under some conditions) outperform others. One might expect that the approaches based on distribution regression and quantile regression perform better than reweighting because they flexibly model the counterfactual distribution. However, the fact that the results from these two approaches do not agree very well makes us skeptical whether this is indeed the case.
Yet, despite the discussed differences, the results from the various methods are qualitatively similar. Essentially all of the private-public gap in average wages is accounted for by differential distributions of characteristics, but the gap in wage inequality within the sectors remains largely unexplained. If anything, wage inequality in private sector would even be larger if the private sector had a distribution of characteristics like the public sector, and we see that mostly the bottom half of the distribution would be affected.