Decomposition Methods in the Social Sciences

Solutions to Exercise 2: Postestimation

Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024

Required packages (install using command ssc install): oaxaca, estout, coefplot

Use the same setup as in Exercise 1 and do the following.

Task 1: compute contributions to explained part in percent

Compute the percentage contribution of each variable to the explained part of the decomposition (including standard errors/confidence intervals) for the extended decomposition in Exercise 1.

Load data and estimate the decomposition from Exercise 1.

. use gsoep-extract, clear
(Example data based on the German Socio-Economic Panel)

. keep if wave==2015
(29,970 observations deleted)

. keep if inrange(age, 25, 55)
(5,671 observations deleted)

. generate lnwage = ln(wage)
(1,709 missing values generated)

. generate expft2 = expft^2
(35 missing values generated)

. drop if missing(sex, lnwage, yeduc, expft, isei, children)
(1,875 observations deleted)

. svyset psu [pw=weight], strata(strata)

Sampling weights: weight
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
   explained |   .1036131    .016076     6.45   0.000     .0720858    .1351404
 unexplained |   .1027319   .0204223     5.03   0.000      .062681    .1427829
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0092558   .0052244    -1.77   0.077    -.0195016      .00099
  experience |   .1076302   .0126921     8.48   0.000     .0827391    .1325212
        isei |   .0033122   .0075391     0.44   0.660     -.011473    .0180973
    children |   .0019265   .0012233     1.57   0.115    -.0004725    .0043256
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |  -.0533151     .12035    -0.44   0.658    -.2893382     .182708
  experience |   .0274948   .0429278     0.64   0.522    -.0566927    .1116822
        isei |   .0863664   .0618819     1.40   0.163    -.0349926    .2077254
    children |   .0012029   .0089191     0.13   0.893    -.0162887    .0186946
       _cons |   .0409829   .1149839     0.36   0.722    -.1845165    .2664824
------------------------------------------------------------------------------
experience: expft expft2

Now compute the percentage contribution using command nlcom; this is very similar to the example on the slides.

. nlcom (schooling:  (_b[explained:yeduc]      / _b[overall:explained]) * 100) ///
>       (experience: (_b[explained:experience] / _b[overall:explained]) * 100) ///
>       (isei:       (_b[explained:isei]       / _b[overall:explained]) * 100) ///
>       (children:   (_b[explained:children]   / _b[overall:explained]) * 100)

   schooling: (_b[explained:yeduc]      / _b[overall:explained]) * 100
  experience: (_b[explained:experience] / _b[overall:explained]) * 100
        isei: (_b[explained:isei]       / _b[overall:explained]) * 100
    children: (_b[explained:children]   / _b[overall:explained]) * 100

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   schooling |  -8.933042   5.819703    -1.53   0.125    -20.33945    2.473367
  experience |    103.877    11.4546     9.07   0.000     81.42641    126.3276
        isei |   3.196667   6.979454     0.46   0.647    -10.48281    16.87615
    children |   1.859366   1.213792     1.53   0.126     -.519623    4.238355
------------------------------------------------------------------------------

Task 2: draw graph of results

To illustrate the effect of the survey design, draw a graph that displays the above results with and without taking the survey design into account.

To draw the graph, we use command coefplot, a general-purpose command to graph arbitrary estimation results; see repec.sowi.unibe.ch/stata/coefplot for more information. coefplot requires that the results are stored as estimation sets; to post result from nlcom as an estimation set, apply option post.

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy
(output omitted)

. nlcom (schooling:  (_b[explained:yeduc]      / _b[overall:explained]) * 100) ///
>       (experience: (_b[explained:experience] / _b[overall:explained]) * 100) ///
>       (isei:       (_b[explained:isei]       / _b[overall:explained]) * 100) ///
>       (children:   (_b[explained:children]   / _b[overall:explained]) * 100), post

   schooling: (_b[explained:yeduc]      / _b[overall:explained]) * 100
  experience: (_b[explained:experience] / _b[overall:explained]) * 100
        isei: (_b[explained:isei]       / _b[overall:explained]) * 100
    children: (_b[explained:children]   / _b[overall:explained]) * 100

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   schooling |  -8.933042   5.819703    -1.53   0.125    -20.33945    2.473367
  experience |    103.877    11.4546     9.07   0.000     81.42641    126.3276
        isei |   3.196667   6.979454     0.46   0.647    -10.48281    16.87615
    children |   1.859366   1.213792     1.53   0.126     -.519623    4.238355
------------------------------------------------------------------------------

. estimates store svy

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1)
(output omitted)

. nlcom (schooling:  (_b[explained:yeduc]      / _b[overall:explained]) * 100) ///
>       (experience: (_b[explained:experience] / _b[overall:explained]) * 100) ///
>       (isei:       (_b[explained:isei]       / _b[overall:explained]) * 100) ///
>       (children:   (_b[explained:children]   / _b[overall:explained]) * 100), post

   schooling: (_b[explained:yeduc]      / _b[overall:explained]) * 100
  experience: (_b[explained:experience] / _b[overall:explained]) * 100
        isei: (_b[explained:isei]       / _b[overall:explained]) * 100
    children: (_b[explained:children]   / _b[overall:explained]) * 100

------------------------------------------------------------------------------
      lnwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
   schooling |  -7.667882   2.709313    -2.83   0.005    -12.97804   -2.357726
  experience |   102.6241    5.67097    18.10   0.000      91.5092     113.739
        isei |   2.090444   3.574878     0.58   0.559    -4.916189    9.097076
    children |   2.953344   .9456072     3.12   0.002     1.099988      4.8067
------------------------------------------------------------------------------

. estimates store nosvy

. coefplot svy nosvy, recast(bar) barwidth(0.3) citop ciopts(recast(rcap)) xline(0) xlab(#10)
Stata Graph - Graph schooling experience international socio-economic index number of children (age<16) in HH -20 0 20 40 60 80 100 120 140 svy nosvy

Task 3: create Word table of results

Create a Word table (rtf file type in esttab) of the underlying decomposition results (with and without survey design) that could be included in the appendix of your paper.

Command esttab can be used to compile a table from multiple estimation results and export the table to different formats; see repec.sowi.unibe.ch/stata/estout for more information.

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy
(output omitted)

. estimates store svy

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1)
(output omitted)

. estimates store nosvy

. esttab svy nosvy using Exercise-2-table.rtf, replace ///
>     nostar b(3) se wide nonumber drop(group_*) ///
>     mtitle("With survey design" "Without survey design") ///
>     eqlabels(, prefix("{\i ") suffix("}"))
(output written to Exercise-2-table.rtf)

Opening the table in Word looks about as follows.

Exercise-2-table

In Stata 17, the collect command has been introduced as an official tool to compile and export tables from arbitrary results. The command looks very promising, but has a rather steep learning curve. Here is an example producing a table similar to the one above.

. collect clear

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1) svy
(output omitted)

. collect get _r_b _r_se, tag(model["With survey design"])

. oaxaca lnwage yeduc (experience: expft expft2) isei children, by(sex) weight(1)
(output omitted)

. collect get _r_b _r_se, tag(model["Without survey design"])
Note: collect is ignoring label "z" for level _r_z of dimension result.
Note: collect is ignoring label "|z|" for level _r_z_abs of dimension result.

. collect style column, dups(center)

. collect style cell, nformat(%6.3f)

. collect style cell result[_r_se], sformat("(%s)")

. collect style header result, level(hide)

. collect style header colname, level(value)

. collect style cell border_block, border(right, pattern(nil))

. collect layout (coleq#colname) (model#result)

Collection: default
      Rows: coleq#colname
   Columns: model#result
   Table 1: 17 x 4

------------------------------------------------------
              With survey design Without survey design
------------------------------------------------------
overall                                               
  group_1        2.866   (0.016)      2.861    (0.010)
  group_2        2.659   (0.015)      2.628    (0.009)
  difference     0.206   (0.021)      0.233    (0.013)
  explained      0.104   (0.016)      0.147    (0.012)
  unexplained    0.103   (0.020)      0.086    (0.013)
explained                                             
  yeduc         -0.009   (0.005)     -0.011    (0.003)
  experience     0.108   (0.013)      0.151    (0.009)
  isei           0.003   (0.008)      0.003    (0.005)
  children       0.002   (0.001)      0.004    (0.001)
unexplained                                           
  yeduc         -0.053   (0.120)      0.007    (0.063)
  experience     0.027   (0.043)      0.092    (0.022)
  isei           0.086   (0.062)      0.066    (0.037)
  children       0.001   (0.009)      0.006    (0.009)
  _cons          0.041   (0.115)     -0.085    (0.058)
------------------------------------------------------

. collect export Exercise-2-table-2.docx, replace
(collection default exported to file Exercise-2-table-2.docx)

The table looks as follows in Word.

Exercise-2-table-2

Alternatively, you could use command etable, which is a wrapper for collect.