Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024
Required packages (install using command ssc install
):
oaxaca
, estout
, coefplot
Use the same setup as in Exercise 1 and do the following.
Compute the percentage contribution of each variable to the explained part of the decomposition (including standard errors/confidence intervals) for the extended decomposition in Exercise 1.
Load data and estimate the decomposition from Exercise 1.
. use gsoep-extract, clear (Example data based on the German Socio-Economic Panel) . keep if wave==2015 (29,970 observations deleted) . keep if inrange(age, 25, 55) (5,671 observations deleted) . generate lnwage = ln(wage) (1,709 missing values generated) . generate expft2 = expft^2 (35 missing values generated) . drop if missing(sex, lnwage, yeduc, expft, isei, children) (1,875 observations deleted) . svyset psu [pw=weight], strata(strata) Sampling weights: weight VCE: linearized Single unit: missing Strata 1: strata Sampling unit 1: psu FPC 1: <zero> . oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy Blinder-Oaxaca decomposition Number of strata = 15 Number of obs = 5,434 Number of PSUs = 2,035 Population size = 12,071,607 Design df = 2,020 Model = linear Group 1: sex = 1 N of obs 1 = 2,624 Group 2: sex = 2 N of obs 2 = 2,810 explained: (X1 - X2) * b1 unexplained: X2 * (b1 - b2) ------------------------------------------------------------------------------ | Linearized lnwage | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- overall | group_1 | 2.865592 .0162802 176.02 0.000 2.833665 2.89752 group_2 | 2.659247 .0151807 175.17 0.000 2.629476 2.689019 difference | .206345 .0205365 10.05 0.000 .1660701 .2466199 explained | .1036131 .016076 6.45 0.000 .0720858 .1351404 unexplained | .1027319 .0204223 5.03 0.000 .062681 .1427829 -------------+---------------------------------------------------------------- explained | yeduc | -.0092558 .0052244 -1.77 0.077 -.0195016 .00099 experience | .1076302 .0126921 8.48 0.000 .0827391 .1325212 isei | .0033122 .0075391 0.44 0.660 -.011473 .0180973 children | .0019265 .0012233 1.57 0.115 -.0004725 .0043256 -------------+---------------------------------------------------------------- unexplained | yeduc | -.0533151 .12035 -0.44 0.658 -.2893382 .182708 experience | .0274948 .0429278 0.64 0.522 -.0566927 .1116822 isei | .0863664 .0618819 1.40 0.163 -.0349926 .2077254 children | .0012029 .0089191 0.13 0.893 -.0162887 .0186946 _cons | .0409829 .1149839 0.36 0.722 -.1845165 .2664824 ------------------------------------------------------------------------------ experience: expft expft2
Now compute the percentage contribution using command nlcom
;
this is very similar to the example on the slides.
. nlcom (schooling: (_b[explained:yeduc] / _b[overall:explained]) * 100) /// > (experience: (_b[explained:experience] / _b[overall:explained]) * 100) /// > (isei: (_b[explained:isei] / _b[overall:explained]) * 100) /// > (children: (_b[explained:children] / _b[overall:explained]) * 100) schooling: (_b[explained:yeduc] / _b[overall:explained]) * 100 experience: (_b[explained:experience] / _b[overall:explained]) * 100 isei: (_b[explained:isei] / _b[overall:explained]) * 100 children: (_b[explained:children] / _b[overall:explained]) * 100 ------------------------------------------------------------------------------ lnwage | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- schooling | -8.933042 5.819703 -1.53 0.125 -20.33945 2.473367 experience | 103.877 11.4546 9.07 0.000 81.42641 126.3276 isei | 3.196667 6.979454 0.46 0.647 -10.48281 16.87615 children | 1.859366 1.213792 1.53 0.126 -.519623 4.238355 ------------------------------------------------------------------------------
To illustrate the effect of the survey design, draw a graph that displays the above results with and without taking the survey design into account.
To draw the graph, we use command coefplot
, a general-purpose
command to graph arbitrary estimation results; see
repec.sowi.unibe.ch/stata/coefplot
for more information. coefplot
requires that the results
are stored as estimation sets; to post result from
nlcom
as an estimation set, apply option post
.
. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy (output omitted) . nlcom (schooling: (_b[explained:yeduc] / _b[overall:explained]) * 100) /// > (experience: (_b[explained:experience] / _b[overall:explained]) * 100) /// > (isei: (_b[explained:isei] / _b[overall:explained]) * 100) /// > (children: (_b[explained:children] / _b[overall:explained]) * 100), post schooling: (_b[explained:yeduc] / _b[overall:explained]) * 100 experience: (_b[explained:experience] / _b[overall:explained]) * 100 isei: (_b[explained:isei] / _b[overall:explained]) * 100 children: (_b[explained:children] / _b[overall:explained]) * 100 ------------------------------------------------------------------------------ lnwage | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- schooling | -8.933042 5.819703 -1.53 0.125 -20.33945 2.473367 experience | 103.877 11.4546 9.07 0.000 81.42641 126.3276 isei | 3.196667 6.979454 0.46 0.647 -10.48281 16.87615 children | 1.859366 1.213792 1.53 0.126 -.519623 4.238355 ------------------------------------------------------------------------------ . estimates store svy . oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) (output omitted) . nlcom (schooling: (_b[explained:yeduc] / _b[overall:explained]) * 100) /// > (experience: (_b[explained:experience] / _b[overall:explained]) * 100) /// > (isei: (_b[explained:isei] / _b[overall:explained]) * 100) /// > (children: (_b[explained:children] / _b[overall:explained]) * 100), post schooling: (_b[explained:yeduc] / _b[overall:explained]) * 100 experience: (_b[explained:experience] / _b[overall:explained]) * 100 isei: (_b[explained:isei] / _b[overall:explained]) * 100 children: (_b[explained:children] / _b[overall:explained]) * 100 ------------------------------------------------------------------------------ lnwage | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- schooling | -7.667882 2.709313 -2.83 0.005 -12.97804 -2.357726 experience | 102.6241 5.67097 18.10 0.000 91.5092 113.739 isei | 2.090444 3.574878 0.58 0.559 -4.916189 9.097076 children | 2.953344 .9456072 3.12 0.002 1.099988 4.8067 ------------------------------------------------------------------------------ . estimates store nosvy . coefplot svy nosvy, recast(bar) barwidth(0.3) citop ciopts(recast(rcap)) xline(0) xlab(#10)
Create a Word table (rtf
file type in esttab
)
of the underlying decomposition results (with and without survey design)
that could be included in the appendix of your paper.
Command esttab
can be used to compile a table from multiple estimation
results and export the table to different formats; see
repec.sowi.unibe.ch/stata/estout
for more information.
. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy (output omitted) . estimates store svy . oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) (output omitted) . estimates store nosvy . esttab svy nosvy using Exercise-2-table.rtf, replace /// > nostar b(3) se wide nonumber drop(group_*) /// > mtitle("With survey design" "Without survey design") /// > eqlabels(, prefix("{\i ") suffix("}")) (output written to Exercise-2-table.rtf)
Opening the table in Word looks about as follows.
In Stata 17, the collect
command has been introduced
as an official tool to compile and export tables from arbitrary results.
The command looks very promising, but has a rather steep learning curve.
Here is an example producing a table similar to the one above.
. collect clear . oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy (output omitted) . collect get _r_b _r_se, tag(model["With survey design"]) . oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) (output omitted) . collect get _r_b _r_se, tag(model["Without survey design"]) Note: collect is ignoring label "z" for level _r_z of dimension result. Note: collect is ignoring label "|z|" for level _r_z_abs of dimension result. . collect style column, dups(center) . collect style cell, nformat(%6.3f) . collect style cell result[_r_se], sformat("(%s)") . collect style header result, level(hide) . collect style header colname, level(value) . collect style cell border_block, border(right, pattern(nil)) . collect layout (coleq#colname) (model#result) Collection: default Rows: coleq#colname Columns: model#result Table 1: 17 x 4 ------------------------------------------------------ With survey design Without survey design ------------------------------------------------------ overall group_1 2.866 (0.016) 2.861 (0.010) group_2 2.659 (0.015) 2.628 (0.009) difference 0.206 (0.021) 0.233 (0.013) explained 0.104 (0.016) 0.147 (0.012) unexplained 0.103 (0.020) 0.086 (0.013) explained yeduc -0.009 (0.005) -0.011 (0.003) experience 0.108 (0.013) 0.151 (0.009) isei 0.003 (0.008) 0.003 (0.005) children 0.002 (0.001) 0.004 (0.001) unexplained yeduc -0.053 (0.120) 0.007 (0.063) experience 0.027 (0.043) 0.092 (0.022) isei 0.086 (0.062) 0.066 (0.037) children 0.001 (0.009) 0.006 (0.009) _cons 0.041 (0.115) -0.085 (0.058) ------------------------------------------------------ . collect export Exercise-2-table-2.docx, replace (collection default exported to file Exercise-2-table-2.docx)
The table looks as follows in Word.
Alternatively, you could use command etable
, which is a wrapper
for collect
.