Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024
Required packages: fre
, estout
,
oaxaca
, nldecompose
, fairlie
Extend the example analysis from the slides (4-nonlinear.pdf
)
by X variables “locus of control” (LoC
) and “willingness to
take risk” (risk
). Compute the aggregate and detailed
decomposition using the Fairlie, Yun and LPM decomposition for non-linear
models and interpret the results.
Set the seed of the random number generator for sake of reproducibility:
. set seed 5432334
Data preparation as on slides:
. use gsoep-extract, clear (Example data based on the German Socio-Economic Panel) . keep if wave==2015 (29,970 observations deleted) . keep if inrange(age, 25, 55) (5,671 observations deleted) . generate byte male = sex==1 . generate byte female = 1 - male . summarize supvis yeduc expft exppt male Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- supvis | 5,757 .2749696 .4465377 0 1 yeduc | 7,121 12.28823 2.783974 7 18 expft | 7,274 11.63359 9.556508 0 39.5 exppt | 7,274 3.271481 5.052598 0 35.25 male | 7,309 .4338487 .4956386 0 1
Additional predictors:
. // locus of control . fre LoC, t(5) LoC -- locus of control (1 int - 7 ext) ----------------------------------------------------------- | Freq. Percent Valid Cum. --------------+-------------------------------------------- Valid 1 | 28 0.38 0.40 0.40 1.1 | 21 0.29 0.30 0.70 1.3 | 33 0.45 0.47 1.17 1.4 | 66 0.90 0.94 2.11 1.6 | 93 1.27 1.33 3.44 : | : : : : 6.1 | 9 0.12 0.13 99.84 6.3 | 7 0.10 0.10 99.94 6.4 | 2 0.03 0.03 99.97 6.6 | 1 0.01 0.01 99.99 6.7 | 1 0.01 0.01 100.00 Total | 7006 95.85 100.00 Missing . | 303 4.15 Total | 7309 100.00 ----------------------------------------------------------- . // willingness to take risks . fre risk risk -- willingness to take risks (0-10) ----------------------------------------------------------- | Freq. Percent Valid Cum. --------------+-------------------------------------------- Valid 0 | 302 4.13 4.14 4.14 1 | 334 4.57 4.58 8.72 2 | 735 10.06 10.08 18.80 3 | 901 12.33 12.35 31.15 4 | 697 9.54 9.56 40.70 5 | 1357 18.57 18.60 59.31 6 | 874 11.96 11.98 71.29 7 | 1018 13.93 13.96 85.25 8 | 686 9.39 9.40 94.65 9 | 226 3.09 3.10 97.75 10 | 164 2.24 2.25 100.00 Total | 7294 99.79 100.00 Missing . | 15 0.21 Total | 7309 100.00 ----------------------------------------------------------- . // summarize . summarize LoC risk Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- LoC | 7,006 3.227362 .9334673 1 6.7 risk | 7,294 4.882369 2.415302 0 10
Drop observations with missing values and set survey design:
. drop if missing(supvis, yeduc, expft, exppt, LoC, risk) (1,898 observations deleted) . svyset psu [pw=weight], strata(strata) Sampling weights: weight VCE: linearized Single unit: missing Strata 1: strata Sampling unit 1: psu FPC 1: <zero>
Run a logistic regression to check whether the added variables are relevant for the distinction between supervising and not supervising:
. svy: logit supvis yeduc expft exppt LoC risk (running logit on estimation sample) Survey: Logistic regression Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Design df = 2,028 F(5, 2024) = 17.15 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .122624 .0190426 6.44 0.000 .0852789 .1599691 expft | .0253453 .0056269 4.50 0.000 .0143101 .0363805 exppt | -.0272412 .0127392 -2.14 0.033 -.0522244 -.002258 LoC | -.143153 .0580278 -2.47 0.014 -.2569533 -.0293527 risk | .0956502 .0237726 4.02 0.000 .049029 .1422714 _cons | -2.779065 .3890668 -7.14 0.000 -3.542077 -2.016053 ------------------------------------------------------------------------------
Both variables seem to be relevant: As may have been expected, people working as supervisors/in leadership positions seem to be more willing to take risks. Moreover, their locus of control is less likely to be externally oriented when compared to people not working as supervisors.
Now have a look at gender differences:
. svy: mean LoC risk, over(sex) (running mean on estimation sample) Survey: Mean estimation Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Design df = 2,028 -------------------------------------------------------------- | Linearized | Mean std. err. [95% conf. interval] -------------+------------------------------------------------ c.LoC@sex | male | 3.100191 .0306054 3.04017 3.160213 female | 3.159952 .0291557 3.102774 3.21713 | c.risk@sex | male | 5.382067 .0699371 5.244911 5.519223 female | 4.696685 .0735806 4.552384 4.840987 --------------------------------------------------------------
Women seem to be slightly more externally oriented and are considerably more risk averse than men. We thus might expect that these variables will "explain" at least some of the gender gap in supervision.
We can also look at gender differences in coefficients:
. svy, subpop(if sex==1): /// > logit supvis yeduc expft exppt LoC risk, nolog (running logit on estimation sample) Survey: Logistic regression Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Subpop. no. obs = 2,599 Subpop. size = 6,322,622.6 Design df = 2,028 F(5, 2024) = 8.41 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .1225298 .0264803 4.63 0.000 .0705983 .1744613 expft | .0182957 .0075255 2.43 0.015 .0035372 .0330541 exppt | -.0338933 .0278222 -1.22 0.223 -.0884563 .0206698 LoC | -.2378459 .0820053 -2.90 0.004 -.3986693 -.0770225 risk | .0820778 .0339389 2.42 0.016 .015519 .1486366 _cons | -2.097989 .5461519 -3.84 0.000 -3.169066 -1.026912 ------------------------------------------------------------------------------ . svy, subpop(if sex==2): /// > logit supvis yeduc expft exppt LoC risk, nolog (running logit on estimation sample) Survey: Logistic regression Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Subpop. no. obs = 2,812 Subpop. size = 5,832,426.8 Design df = 2,028 F(5, 2024) = 5.93 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | .1328677 .0289446 4.59 0.000 .0761034 .189632 expft | .0228878 .0088796 2.58 0.010 .0054737 .0403019 exppt | -.0008128 .0147863 -0.05 0.956 -.0298107 .028185 LoC | .0032806 .0822922 0.04 0.968 -.1581055 .1646667 risk | .0802607 .0320892 2.50 0.012 .0173294 .1431919 _cons | -3.639839 .5707868 -6.38 0.000 -4.759228 -2.520449 ------------------------------------------------------------------------------
Locus of control seems to have an effect only among men but not among women. The effect of risk taking is very similar for men and women.
We now turn to the decompositions. First we run the aggregate decomposition
with and without the added variables, using the nldecompose
command:
. // reduced model . nldecompose, by(male): svy: logit supvis yeduc expft exppt Number of obs (A) = 2599 Number of obs (B) = 2812 ------------------------------------------------------------------------------ Results | Coef. Percentage --------------+--------------------------------------------------------------- Omega = 1 | Char | .049239 33.68346% Coef | .0969426 66.31654% --------------+--------------------------------------------------------------- Omega = 0 | Char | .0220991 15.11756% Coef | .1240826 84.88244% --------------+--------------------------------------------------------------- Raw | .1461817 100% ------------------------------------------------------------------------------ . // full model . nldecompose, by(male): svy: logit supvis yeduc expft exppt LoC risk Number of obs (A) = 2599 Number of obs (B) = 2812 ------------------------------------------------------------------------------ Results | Coef. Percentage --------------+--------------------------------------------------------------- Omega = 1 | Char | .0613367 41.95923% Coef | .084845 58.04077% --------------+--------------------------------------------------------------- Omega = 0 | Char | .0308127 21.07834% Coef | .115369 78.92166% --------------+--------------------------------------------------------------- Raw | .1461817 100% ------------------------------------------------------------------------------
The overall gender gap is about 15 percentage points (i.e. for males, the proportion working as supervisors/in leadership positions is about 15 percentage points higher than for females). Gender differences in schooling as well as full-time and part-time experience partly explain this difference (34% or 15% of the gap, depending on whether male or the female coefficients are used as reference; interestingly, the explained part is larger if we used the male coefficients as reference; the effects of the predictors thus seem to be stronger in the male sample).
Adding "locus of control" and "risk taking" increases the explained part to 21–42% of the overall gap. It is still the case that the explained part is larger if we used the male coefficients as reference.
To obtain a detailed decomposition we now run Farlie decompositions (with random ordering), Yun decompositions, as well as decompositions based on the linear probability model (LPM):
. // Fairlie . // - male coefficients as reference . fairlie supvis yeduc expft exppt LoC risk [pw=weight], by(female) noest /// > ro reps(1000) nodots Non-linear decomposition by female (G) Number of obs = 5,411 N of obs G=0 = 2599 N of obs G=1 = 2812 Pr(Y!=0|G=0) = .36917804 Pr(Y!=0|G=1) = .22299637 Difference = .14618167 Total explained = .0613367 ------------------------------------------------------------------------------ supvis | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | -.0050414 .0016288 -3.10 0.002 -.0082339 -.001849 expft | .0243854 .0098696 2.47 0.013 .0050414 .0437295 exppt | .0260367 .0198629 1.31 0.190 -.0128939 .0649672 LoC | .0037098 .0014612 2.54 0.011 .0008459 .0065738 risk | .0121009 .0048034 2.52 0.012 .0026864 .0215154 ------------------------------------------------------------------------------ . est sto fairlie_m . // - female coefficients as reference . fairlie supvis yeduc expft exppt LoC risk [pw=weight], by(female) noest /// > ro reps(1000) nodots reference(1) Non-linear decomposition by female (G) Number of obs = 5,411 N of obs G=0 = 2599 N of obs G=1 = 2812 Pr(Y!=0|G=0) = .36917804 Pr(Y!=0|G=1) = .22299637 Difference = .14618167 Total explained = .03081266 ------------------------------------------------------------------------------ supvis | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- yeduc | -.0043725 .0017766 -2.46 0.014 -.0078546 -.0008904 expft | .025314 .0101165 2.50 0.012 .005486 .0451421 exppt | .0005391 .0097917 0.06 0.956 -.0186522 .0197305 LoC | -.0000368 .0010098 -0.04 0.971 -.002016 .0019424 risk | .0093716 .0038792 2.42 0.016 .0017684 .0169747 ------------------------------------------------------------------------------ . est sto fairlie_f . // Yun . // - male coefficients as reference . oaxaca supvis yeduc expft exppt LoC risk, by(female) weight(1) logit svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Design df = 2,028 Model = logit Group 1: female = 0 N of obs 1 = 2,599 Group 2: female = 1 N of obs 2 = 2,812 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .369178 .0160281 23.03 0.000 .3377449 .4006112 group_2 | .2229964 .0134008 16.64 0.000 .1967155 .2492772 difference | .1461817 .0209953 6.96 0.000 .105007 .1873564 explained | .0613367 .0215271 2.85 0.004 .0191192 .1035542 unexplained | .084845 .0293953 2.89 0.004 .0271968 .1424932 -------------+---------------------------------------------------------------- explained | yeduc | -.0049599 .0030737 -1.61 0.107 -.0109878 .001068 expft | .0236865 .0098568 2.40 0.016 .0043561 .043017 exppt | .0283093 .0215572 1.31 0.189 -.0139673 .0705858 LoC | .0028845 .0022062 1.31 0.191 -.0014421 .0072112 risk | .0114162 .0049661 2.30 0.022 .001677 .0211555 -------------+---------------------------------------------------------------- unexplained | yeduc | -.0264444 .1010975 -0.26 0.794 -.2247102 .1718214 expft | -.0096922 .0245371 -0.40 0.693 -.0578127 .0384283 exppt | -.0353219 .0352082 -1.00 0.316 -.1043699 .033726 LoC | -.1510524 .0758564 -1.99 0.047 -.2998169 -.0022879 risk | .0016919 .0433828 0.04 0.969 -.0833875 .0867714 _cons | .305664 .1489065 2.05 0.040 .0136384 .5976896 ------------------------------------------------------------------------------ . est sto yun_m . // - female coefficients as reference . oaxaca supvis yeduc expft exppt LoC risk, by(female) weight(0) logit svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Design df = 2,028 Model = logit Group 1: female = 0 N of obs 1 = 2,599 Group 2: female = 1 N of obs 2 = 2,812 explained: (X1 - X2) * b2 unexplained: X1 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .369178 .0160281 23.03 0.000 .3377449 .4006112 group_2 | .2229964 .0134008 16.64 0.000 .1967155 .2492772 difference | .1461817 .0209953 6.96 0.000 .105007 .1873564 explained | .0308127 .0129802 2.37 0.018 .0053567 .0562686 unexplained | .115369 .0247499 4.66 0.000 .0668311 .1639069 -------------+---------------------------------------------------------------- explained | yeduc | -.0045962 .002864 -1.60 0.109 -.010213 .0010205 expft | .0253226 .0101994 2.48 0.013 .0053203 .0453249 exppt | .0005802 .0105492 0.05 0.956 -.0201082 .0212685 LoC | -.000034 .0008531 -0.04 0.968 -.001707 .001639 risk | .0095401 .0040993 2.33 0.020 .0015007 .0175794 -------------+---------------------------------------------------------------- unexplained | yeduc | -.0274208 .1053496 -0.26 0.795 -.2340255 .1791839 expft | -.0163246 .0415358 -0.39 0.694 -.0977819 .0651326 exppt | -.0087738 .0083879 -1.05 0.296 -.0252237 .007676 LoC | -.1560804 .0772106 -2.02 0.043 -.3075009 -.00466 risk | .002042 .0523371 0.04 0.969 -.100598 .104682 _cons | .3219267 .160614 2.00 0.045 .0069409 .6369124 ------------------------------------------------------------------------------ . est sto yun_f . // LPM . // - male coefficients as reference . oaxaca supvis yeduc expft exppt LoC risk, by(female) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Design df = 2,028 Model = linear Group 1: female = 0 N of obs 1 = 2,599 Group 2: female = 1 N of obs 2 = 2,812 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .369178 .0159972 23.08 0.000 .3378054 .4005507 group_2 | .2229964 .0134079 16.63 0.000 .1967017 .2492911 difference | .1461817 .0209677 6.97 0.000 .1050612 .1873022 explained | .0602719 .0224017 2.69 0.007 .0163391 .1042047 unexplained | .0859098 .0300496 2.86 0.004 .0269784 .1448411 -------------+---------------------------------------------------------------- explained | yeduc | -.0054351 .0033939 -1.60 0.109 -.0120909 .0012208 expft | .0249603 .0104546 2.39 0.017 .0044574 .0454632 exppt | .0255638 .020584 1.24 0.214 -.0148041 .0659318 LoC | .0030198 .002288 1.32 0.187 -.0014672 .0075068 risk | .0121631 .0052187 2.33 0.020 .0019285 .0223977 -------------+---------------------------------------------------------------- unexplained | yeduc | .0573095 .1019952 0.56 0.574 -.1427168 .2573358 expft | .0012023 .0238214 0.05 0.960 -.0455146 .0479192 exppt | -.0324088 .0295013 -1.10 0.272 -.0902649 .0254472 LoC | -.1619925 .069732 -2.32 0.020 -.2987463 -.0252387 risk | .0218128 .0414527 0.53 0.599 -.0594815 .1031071 _cons | .1999864 .1496528 1.34 0.182 -.0935029 .4934758 ------------------------------------------------------------------------------ . est sto LPM_m . // - female coefficients as reference . oaxaca supvis yeduc expft exppt LoC risk, by(female) weight(0) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,411 Number of PSUs = 2,043 Population size = 12,155,049 Design df = 2,028 Model = linear Group 1: female = 0 N of obs 1 = 2,599 Group 2: female = 1 N of obs 2 = 2,812 explained: (X1 - X2) * b2 unexplained: X1 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized supvis | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | .369178 .0159972 23.08 0.000 .3378054 .4005507 group_2 | .2229964 .0134079 16.63 0.000 .1967017 .2492911 difference | .1461817 .0209677 6.97 0.000 .1050612 .1873022 explained | .0294254 .0125396 2.35 0.019 .0048335 .0540172 unexplained | .1167563 .0245189 4.76 0.000 .0686715 .1648411 -------------+---------------------------------------------------------------- explained | yeduc | -.0045492 .002856 -1.59 0.111 -.0101501 .0010518 expft | .0242398 .0098033 2.47 0.013 .0050142 .0434654 exppt | .0007985 .0093824 0.09 0.932 -.0176017 .0191987 LoC | -.0000438 .000801 -0.05 0.956 -.0016146 .001527 risk | .00898 .0037982 2.36 0.018 .0015312 .0164288 -------------+---------------------------------------------------------------- unexplained | yeduc | .0564236 .1004186 0.56 0.574 -.1405108 .2533581 expft | .0019228 .0380955 0.05 0.960 -.0727876 .0766332 exppt | -.0076435 .00698 -1.10 0.274 -.0213322 .0060452 LoC | -.1589289 .0684155 -2.32 0.020 -.2931009 -.0247569 risk | .0249959 .0475013 0.53 0.599 -.0681605 .1181524 _cons | .1999864 .1496528 1.34 0.182 -.0935029 .4934758 ------------------------------------------------------------------------------ . est sto LPM_f
Overview of the results:
. esttab fairlie_m yun_m LPM_m fairlie_f yun_f LPM_f, /// > compress varw(12) equations(Explained=1:2:2:1:2:2) mtitle nonumber /// > keep(Explained: overall:difference overall:explained overall:unexplained) /// > mgroup("Male coefficients as reference" "Female coefficients as reference", /// > pattern(1 0 0 1 0 0) span) ------------------------------------------------------------------------------------------ Male coefficients as reference Female coefficients as reference fairlie_m yun_m LPM_m fairlie_f yun_f LPM_f ------------------------------------------------------------------------------------------ Explained yeduc -0.00504** -0.00496 -0.00544 -0.00437* -0.00460 -0.00455 (-3.10) (-1.61) (-1.60) (-2.46) (-1.60) (-1.59) expft 0.0244* 0.0237* 0.0250* 0.0253* 0.0253* 0.0242* (2.47) (2.40) (2.39) (2.50) (2.48) (2.47) exppt 0.0260 0.0283 0.0256 0.000539 0.000580 0.000799 (1.31) (1.31) (1.24) (0.06) (0.05) (0.09) LoC 0.00371* 0.00288 0.00302 -0.0000368 -0.0000340 -0.0000438 (2.54) (1.31) (1.32) (-0.04) (-0.04) (-0.05) risk 0.0121* 0.0114* 0.0122* 0.00937* 0.00954* 0.00898* (2.52) (2.30) (2.33) (2.42) (2.33) (2.36) ------------------------------------------------------------------------------------------ overall difference 0.146*** 0.146*** 0.146*** 0.146*** (6.96) (6.97) (6.96) (6.97) explained 0.0613** 0.0603** 0.0308* 0.0294* (2.85) (2.69) (2.37) (2.35) unexplained 0.0848** 0.0859** 0.115*** 0.117*** (2.89) (2.86) (4.66) (4.76) ------------------------------------------------------------------------------------------ N 5411 5411 5411 5411 5411 5411 ------------------------------------------------------------------------------------------ t statistics in parentheses * p<0.05, ** p<0.01, *** p<0.001
The results from the different methods are very similar. When using the male coefficients as reference, locus of control and risk taking explain a larger part of the gap than when using the female coefficients as reference. Given the total gap of about 15 percentage points, however, these contributions still seem rather moderate.