Decomposition Methods in the Social Sciences

Solutions to Exercise 3: Index and transformation problem

Johannes Giesecke and Ben Jann, GESIS Training Course, January 29 – February 1, 2024

Required packages (install using command ssc install): fre, oaxaca, estout

Data preparation

. // get data as in Exercise 1
. use gsoep-extract, clear
(Example data based on the German Socio-Economic Panel)

. keep if wave==2015
(29,970 observations deleted)

. keep if inrange(age, 25, 55)
(5,671 observations deleted)

. generate lnwage = ln(wage)
(1,709 missing values generated)

. generate expft2 = expft^2
(35 missing values generated)

. drop if missing(sex, lnwage, yeduc, expft, isei, children)
(1,875 observations deleted)

. svyset psu [pw=weight], strata(strata)

Sampling weights: weight
             VCE: linearized
     Single unit: missing
        Strata 1: strata
 Sampling unit 1: psu
           FPC 1: <zero>

Part I: The index problem

Task 1: choice of reference coefficients

Using the extended decomposition from Exercise 1 (i.e. schooling, full-time experience, ISEI, number of children), evaluate how the results change depending on how you handle the index problem. Compute the following variants:

Using male coefficients as "nondiscriminatory" coefficients: set option weight() to 1; this gives the coefficients from the first group (males in this case) a weight of one and the coefficients from the second group (females) a weight of zero.

. oaxaca lnwage yeduc (exp: expft expft2) isei children, by(sex) svy weight(1)

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
   explained |   .1036131    .016076     6.45   0.000     .0720858    .1351404
 unexplained |   .1027319   .0204223     5.03   0.000      .062681    .1427829
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0092558   .0052244    -1.77   0.077    -.0195016      .00099
         exp |   .1076302   .0126921     8.48   0.000     .0827391    .1325212
        isei |   .0033122   .0075391     0.44   0.660     -.011473    .0180973
    children |   .0019265   .0012233     1.57   0.115    -.0004725    .0043256
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |  -.0533151     .12035    -0.44   0.658    -.2893382     .182708
         exp |   .0274948   .0429278     0.64   0.522    -.0566927    .1116822
        isei |   .0863664   .0618819     1.40   0.163    -.0349926    .2077254
    children |   .0012029   .0089191     0.13   0.893    -.0162887    .0186946
       _cons |   .0409829   .1149839     0.36   0.722    -.1845165    .2664824
------------------------------------------------------------------------------
exp: expft expft2

. estimates store male

Using female coefficients as "nondiscriminatory" coefficients: set option weight() to 0.

. oaxaca lnwage yeduc (exp: expft expft2) isei children, by(sex) svy weight(0)

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b2
  unexplained: X1 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
   explained |    .085632   .0145757     5.87   0.000      .057047    .1142169
 unexplained |   .1207131    .017753     6.80   0.000      .085897    .1555291
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0101468    .005649    -1.80   0.073    -.0212252    .0009315
         exp |   .0912236   .0108276     8.43   0.000     .0699892     .112458
        isei |   .0027324   .0062227     0.44   0.661    -.0094711     .014936
    children |   .0018227   .0011951     1.53   0.127     -.000521    .0041664
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |  -.0524241   .1183388    -0.44   0.658    -.2845029    .1796547
         exp |   .0439013   .0536963     0.82   0.414    -.0614046    .1492073
        isei |   .0869461    .062297     1.40   0.163    -.0352269    .2091192
    children |   .0013067    .009689     0.13   0.893    -.0176946    .0203081
       _cons |   .0409829   .1149839     0.36   0.722    -.1845165    .2664824
------------------------------------------------------------------------------
exp: expft expft2

. estimates store female

Pooled model: apply option pooled instead of using the weight() option (the pooled model will automatically include a group dummy; if you want to use a pooled model without group dummy, you can apply option omega instead of pooled).

. oaxaca lnwage yeduc (exp: expft expft2) isei children, by(sex) svy pooled

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b
  unexplained: X1 * (b1 - b) + X2 * (b - b2)
               with b from pooled model (including group dummy)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
   explained |     .09528   .0136868     6.96   0.000     .0684383    .1221218
 unexplained |    .111065   .0181452     6.12   0.000     .0754797    .1466503
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0096868    .005335    -1.82   0.070    -.0201494    .0007759
         exp |   .0999682   .0095154    10.51   0.000     .0813071    .1186293
        isei |   .0030201   .0068725     0.44   0.660    -.0104579     .016498
    children |   .0019785   .0011979     1.65   0.099    -.0003707    .0043278
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |  -.0528841   .1193906    -0.44   0.658    -.2870257    .1812574
         exp |   .0351568   .0490598     0.72   0.474    -.0610562    .1313698
        isei |   .0866585   .0620786     1.40   0.163    -.0350863    .2084033
    children |   .0011509   .0092786     0.12   0.901    -.0170457    .0193475
       _cons |   .0409829   .1149839     0.36   0.722    -.1845165    .2664824
------------------------------------------------------------------------------
exp: expft expft2

. estimates store pooled

Threefold decomposition (view of women): omit weight() and pooled (threefold is the default).

. oaxaca lnwage yeduc (exp: expft expft2) isei children, by(sex) svy

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

   endowments: (X1 - X2) * b2
 coefficients: X2 * (b1 - b2)
  interaction: (X1 - X2) * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
  endowments |    .085632   .0145757     5.87   0.000      .057047    .1142169
coefficients |   .1027319   .0204223     5.03   0.000      .062681    .1427829
 interaction |   .0179811   .0130391     1.38   0.168    -.0075903    .0435526
-------------+----------------------------------------------------------------
endowments   |
       yeduc |  -.0101468    .005649    -1.80   0.073    -.0212252    .0009315
         exp |   .0912236   .0108276     8.43   0.000     .0699892     .112458
        isei |   .0027324   .0062227     0.44   0.661    -.0094711     .014936
    children |   .0018227   .0011951     1.53   0.127     -.000521    .0041664
-------------+----------------------------------------------------------------
coefficients |
       yeduc |  -.0533151     .12035    -0.44   0.658    -.2893382     .182708
         exp |   .0274948   .0429278     0.64   0.522    -.0566927    .1116822
        isei |   .0863664   .0618819     1.40   0.163    -.0349926    .2077254
    children |   .0012029   .0089191     0.13   0.893    -.0162887    .0186946
       _cons |   .0409829   .1149839     0.36   0.722    -.1845165    .2664824
-------------+----------------------------------------------------------------
interaction  |
       yeduc |    .000891   .0020681     0.43   0.667    -.0031647    .0049468
         exp |   .0164066   .0133163     1.23   0.218    -.0097086    .0425217
        isei |   .0005797   .0013825     0.42   0.675    -.0021316     .003291
    children |   .0001038   .0007721     0.13   0.893    -.0014104     .001618
------------------------------------------------------------------------------
exp: expft expft2

. estimates store tf_female

Threefold decomposition (view of men): add threefold(reverse).

. oaxaca lnwage yeduc (exp: expft expft2) isei children, by(sex) svy threefold(reverse)

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

   endowments: (X1 - X2) * b1
 coefficients: X1 * (b1 - b2)
  interaction: (X1 - X2) * (b2 - b1)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
  endowments |   .1036131    .016076     6.45   0.000     .0720858    .1351404
coefficients |   .1207131    .017753     6.80   0.000      .085897    .1555291
 interaction |  -.0179811   .0130391    -1.38   0.168    -.0435526    .0075903
-------------+----------------------------------------------------------------
endowments   |
       yeduc |  -.0092558   .0052244    -1.77   0.077    -.0195016      .00099
         exp |   .1076302   .0126921     8.48   0.000     .0827391    .1325212
        isei |   .0033122   .0075391     0.44   0.660     -.011473    .0180973
    children |   .0019265   .0012233     1.57   0.115    -.0004725    .0043256
-------------+----------------------------------------------------------------
coefficients |
       yeduc |  -.0524241   .1183388    -0.44   0.658    -.2845029    .1796547
         exp |   .0439013   .0536963     0.82   0.414    -.0614046    .1492073
        isei |   .0869461    .062297     1.40   0.163    -.0352269    .2091192
    children |   .0013067    .009689     0.13   0.893    -.0176946    .0203081
       _cons |   .0409829   .1149839     0.36   0.722    -.1845165    .2664824
-------------+----------------------------------------------------------------
interaction  |
       yeduc |   -.000891   .0020681    -0.43   0.667    -.0049468    .0031647
         exp |  -.0164066   .0133163    -1.23   0.218    -.0425217    .0097086
        isei |  -.0005797   .0013825    -0.42   0.675     -.003291    .0021316
    children |  -.0001038   .0007721    -0.13   0.893     -.001618    .0014104
------------------------------------------------------------------------------
exp: expft expft2

. estimates store tf_male

Task 2: overview table

Generate an overview table and try to make sense of the results. What is the correct interpretation of the various results? How can the differences be explained?

. esttab male female pooled tf_female tf_male, se  nonumber mtitles ///
>     equations(Overall=1, Explained=2, Unexplained=3) ///
>     rename(endowments explained coefficients unexplained)

--------------------------------------------------------------------------------------------
                     male          female          pooled       tf_female         tf_male   
--------------------------------------------------------------------------------------------
Overall                                                                                     
group_1             2.866***        2.866***        2.866***        2.866***        2.866***
                 (0.0163)        (0.0163)        (0.0163)        (0.0163)        (0.0163)   

group_2             2.659***        2.659***        2.659***        2.659***        2.659***
                 (0.0152)        (0.0152)        (0.0152)        (0.0152)        (0.0152)   

difference          0.206***        0.206***        0.206***        0.206***        0.206***
                 (0.0205)        (0.0205)        (0.0205)        (0.0205)        (0.0205)   

explained           0.104***       0.0856***       0.0953***       0.0856***        0.104***
                 (0.0161)        (0.0146)        (0.0137)        (0.0146)        (0.0161)   

unexplained         0.103***        0.121***        0.111***        0.103***        0.121***
                 (0.0204)        (0.0178)        (0.0181)        (0.0204)        (0.0178)   

interaction                                                        0.0180         -0.0180   
                                                                 (0.0130)        (0.0130)   
--------------------------------------------------------------------------------------------
Explained                                                                                   
yeduc            -0.00926         -0.0101        -0.00969         -0.0101        -0.00926   
                (0.00522)       (0.00565)       (0.00534)       (0.00565)       (0.00522)   

exp                 0.108***       0.0912***       0.1000***       0.0912***        0.108***
                 (0.0127)        (0.0108)       (0.00952)        (0.0108)        (0.0127)   

isei              0.00331         0.00273         0.00302         0.00273         0.00331   
                (0.00754)       (0.00622)       (0.00687)       (0.00622)       (0.00754)   

children          0.00193         0.00182         0.00198         0.00182         0.00193   
                (0.00122)       (0.00120)       (0.00120)       (0.00120)       (0.00122)   
--------------------------------------------------------------------------------------------
Unexplained                                                                                 
yeduc             -0.0533         -0.0524         -0.0529         -0.0533         -0.0524   
                  (0.120)         (0.118)         (0.119)         (0.120)         (0.118)   

exp                0.0275          0.0439          0.0352          0.0275          0.0439   
                 (0.0429)        (0.0537)        (0.0491)        (0.0429)        (0.0537)   

isei               0.0864          0.0869          0.0867          0.0864          0.0869   
                 (0.0619)        (0.0623)        (0.0621)        (0.0619)        (0.0623)   

children          0.00120         0.00131         0.00115         0.00120         0.00131   
                (0.00892)       (0.00969)       (0.00928)       (0.00892)       (0.00969)   

_cons              0.0410          0.0410          0.0410          0.0410          0.0410   
                  (0.115)         (0.115)         (0.115)         (0.115)         (0.115)   
--------------------------------------------------------------------------------------------
interaction                                                                                 
yeduc                                                            0.000891       -0.000891   
                                                                (0.00207)       (0.00207)   

exp                                                                0.0164         -0.0164   
                                                                 (0.0133)        (0.0133)   

isei                                                             0.000580       -0.000580   
                                                                (0.00138)       (0.00138)   

children                                                         0.000104       -0.000104   
                                                               (0.000772)      (0.000772)   
--------------------------------------------------------------------------------------------
N                    5434            5434            5434            5434            5434   
--------------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Explanation for use of options equations() and rename(): A complication when compiling an overview table is that the different parts in the output are labeled differently across the decompositions. In the two-fold decomposition label "explained" is used for ΔX and "unexplained" for ΔS. In the three-fold decomposition the corresponding labels are "endowments" and "coefficients". By default esttab places differently named elements into different rows, which results in a messy table in the current case. To tidy up the table, option equations() specifies how equations be merged and option rename() renames some of the coefficients.

Interpretation: The choice of the reference coefficients changes results somewhat. When using the male coefficients, the explained part of the gender wage gap is larger than when using the female coefficients (using male coefficients, 0.104/0.206 = 50% of the wage gap is explained; using female coefficients, only 0.0856/0.206 = 42% is explained). The difference is mostly due to the steeper earnings profile of men across work experience. Because of that, the gender difference in work experience explains more of the overall wage gap if male coefficients are used as reference coefficients. This can also nicely be seen in the "interaction" equation of the three-fold decomposition that quantifies the differences in the contributions to the explained part depending on whether the male or the female coefficients are used as reference (only for experience this difference is substantial). Using the pooled model leads to a compromise between the two extremes.

Fundamentally, the difference between using the male coefficients and the female coefficients is a change in perspective in the sense that different counterfactual exercises are performed. When using the male coefficients, we essentially ask how much men would lose if their work experience was reduced to that of women. When using the female coefficients, we ask how much women would gain if their work experience was increased to that of men. (While assuming that everything else stays the same.)

Task 3: average treatment effect

Optional: Compute a decomposition that is defined in a way such that the unexplained component can be interpreted as an "average treatment effect" (see slides for details).

Giving the male coefficients a weight equal to the proportion of females in the sample, and giving the female coefficients a weight equal to the proportion of males, leads to an unexplained part that is equal in size to the average treatment effect obtained by a regression-adjustment estimator:

. svy: proportion sex
(running proportion on estimation sample)

Survey: Proportion estimation

Number of strata =    15          Number of obs   =      5,434
Number of PSUs   = 2,035          Population size = 12,071,607
                                  Design df       =      2,020

--------------------------------------------------------------
             |             Linearized            Logit
             | Proportion   std. err.     [95% conf. interval]
-------------+------------------------------------------------
         sex |
       male  |   .5187475   .0094003       .500295     .537149
     female  |   .4812525   .0094003       .462851     .499705
--------------------------------------------------------------

. local p_female = _b[2.sex]

. oaxaca lnwage yeduc (exp: expft expft2) isei children, by(sex) svy weight(`p_female') nodetail

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,434
Number of PSUs   = 2,035                        Population size   = 12,071,607
                                                Design df         =      2,020
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,624
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b
  unexplained: X1 * (b1 - b) + X2 * (b - b2)
               with b = .481253 * b1  + (1 - .481253) * b2

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865592   .0162802   176.02   0.000     2.833665     2.89752
     group_2 |   2.659247   .0151807   175.17   0.000     2.629476    2.689019
  difference |    .206345   .0205365    10.05   0.000     .1660701    .2466199
   explained |   .0942854   .0138614     6.80   0.000     .0671013    .1214695
 unexplained |   .1120596   .0179378     6.25   0.000     .0768811    .1472381
------------------------------------------------------------------------------

Confirm using teffects ra (the command does not support survey estimation, but we can still take account of sampling weights and clustering; the only element of the survey design we then ignore is stratification; this seems acceptable; we are only interested here in the point estimate anyhow, which is not affected by clustering and stratification):

. teffects ra (lnwage yeduc expft expft2 isei children) (sex) [pw=weight], vce(cluster psu)

Iteration 0:  EE criterion =  5.263e-30  
Iteration 1:  EE criterion =  4.851e-32  

Treatment-effects estimation                    Number of obs     =      5,434
Estimator      : regression adjustment
Outcome model  : linear
Treatment model: none
                                     (Std. err. adjusted for 2,035 clusters in psu)
-----------------------------------------------------------------------------------
                  |               Robust
           lnwage | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
------------------+----------------------------------------------------------------
ATE               |
              sex |
(female vs male)  |  -.1120596   .0179062    -6.26   0.000    -.1471552    -.076964
------------------+----------------------------------------------------------------
POmean            |
              sex |
            male  |   2.815728   .0167039   168.57   0.000     2.782989    2.848467
-----------------------------------------------------------------------------------

Part II: The transformation problem

Task 1: decomposition with categorical predictor

1. Replace ISEI in the extended model of Exercise 1 by the (categorical) EGP variable (egp). Before you do that, inspect the variable egp carefully and drop categories with a very low number of observations. Only report the aggregate contribution of EGP. Illustrate how the results change if you switch the base level.

Make dummies (we do not use tabulate, generate() here because we want to name the dummies using the corresponding EGP values; tabulate, generate() would use consecutive numbers):

. fre egp

egp -- EGP class
--------------------------------------------------------------------------------------------------
                                                     |      Freq.    Percent      Valid       Cum.
-----------------------------------------------------+--------------------------------------------
Valid   1 higher managerial and professional workers |        833      15.33      15.33      15.33
          (I)                                        |                                            
        2 lower managerial and professional workers  |       1411      25.97      25.97      41.30
          (II)                                       |                                            
        3 higher routine service workers (IIIa)      |        810      14.91      14.91      56.20
        4 lower routine service workers (IIIb)       |        732      13.47      13.47      69.67
        5 small self-employed and farmers (IV)       |          2       0.04       0.04      69.71
        6 skilled manual workers (V, VI)             |        757      13.93      13.93      83.64
        7 semi- and unskilled manual workers (VIIa)  |        827      15.22      15.22      98.86
        8 agricultural labourers (VIIb)              |         62       1.14       1.14     100.00
        Total                                        |       5434     100.00     100.00           
--------------------------------------------------------------------------------------------------

. drop if egp==5 // only two observations; self-employed are not part of gsoep-extract.dta
(2 observations deleted)

. quietly levelsof egp

. foreach l in `r(levels)' {
  2.     quietly generate byte egp_`l' = egp==`l' if egp<.
  3. }

. summarize egp_*

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       egp_1 |      5,432    .1533505    .3603582          0          1
       egp_2 |      5,432     .259757    .4385416          0          1
       egp_3 |      5,432    .1491163    .3562359          0          1
       egp_4 |      5,432     .134757    .3414953          0          1
       egp_6 |      5,432    .1393594     .346353          0          1
-------------+---------------------------------------------------------
       egp_7 |      5,432    .1522459    .3592922          0          1
       egp_8 |      5,432    .0114138    .1062339          0          1

Results using class I (upper service class) as reference category:

. oaxaca lnwage yeduc (exp: expft expft2) children (EGP: egp_2-egp_8), by(sex) weight(1) svy

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,432
Number of PSUs   = 2,034                        Population size   = 12,070,291
                                                Design df         =      2,019
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,622
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865539   .0161887   177.01   0.000     2.833791    2.897287
     group_2 |   2.659247   .0151161   175.92   0.000     2.629603    2.688892
  difference |   .2062915   .0205458    10.04   0.000     .1659983    .2465847
   explained |   .1233931    .017833     6.92   0.000     .0884201    .1583662
 unexplained |   .0828984   .0226395     3.66   0.000     .0384992    .1272975
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0108239   .0060271    -1.80   0.073    -.0226439     .000996
         exp |   .1078319   .0126765     8.51   0.000     .0829715    .1326922
    children |   .0017906   .0011574     1.55   0.122    -.0004793    .0040605
         EGP |   .0245946   .0136235     1.81   0.071     -.002123    .0513121
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0111098   .1151472     0.10   0.923    -.2147099    .2369295
         exp |   .0314076   .0428362     0.73   0.464    -.0526003    .1154154
    children |   .0018348   .0088282     0.21   0.835    -.0154786    .0191481
         EGP |  -.0341147    .051724    -0.66   0.510    -.1355527    .0673234
       _cons |   .0726609   .1581958     0.46   0.646    -.2375831    .3829049
------------------------------------------------------------------------------
exp: expft expft2
EGP: egp_2 egp_3 egp_4 egp_6 egp_7 egp_8

Results using class V+VI (skilled manual workers) as reference category:

. oaxaca lnwage yeduc (exp: expft expft2) children (EGP: egp_1-egp_4 egp_7 egp_8), ///
>     by(sex) weight(1) svy

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,432
Number of PSUs   = 2,034                        Population size   = 12,070,291
                                                Design df         =      2,019
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,622
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865539   .0161887   177.01   0.000     2.833791    2.897287
     group_2 |   2.659247   .0151161   175.92   0.000     2.629603    2.688892
  difference |   .2062915   .0205458    10.04   0.000     .1659983    .2465847
   explained |   .1233931    .017833     6.92   0.000     .0884201    .1583662
 unexplained |   .0828984   .0226395     3.66   0.000     .0384992    .1272975
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0108239   .0060271    -1.80   0.073    -.0226439     .000996
         exp |   .1078319   .0126765     8.51   0.000     .0829715    .1326922
    children |   .0017906   .0011574     1.55   0.122    -.0004793    .0040605
         EGP |   .0245946   .0136235     1.81   0.071     -.002123    .0513121
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0111098   .1151472     0.10   0.923    -.2147099    .2369295
         exp |   .0314076   .0428362     0.73   0.464    -.0526003    .1154154
    children |   .0018348   .0088282     0.21   0.835    -.0154786    .0191481
         EGP |  -.1490758   .0651441    -2.29   0.022    -.2768326   -.0213191
       _cons |   .1876221   .1343036     1.40   0.163    -.0757661    .4510102
------------------------------------------------------------------------------
exp: expft expft2
EGP: egp_1 egp_2 egp_3 egp_4 egp_7 egp_8

The contribution of EGP to the explained part does not change. However, the contribution to the unexplained part changes quite dramatically depending on the choice of the reference category.

Task 2: normalized decomposition

Normalize the effects of EGP to make its contribution independent of the choice of the base level (unweighted normalization using oaxaca).

. oaxaca lnwage yeduc (exp: expft expft2) children (EGP: normalize(egp_*)), ///
>     by(sex) weight(1) svy
(normalized: egp_1 egp_2 egp_3 egp_4 egp_6 egp_7 egp_8)

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,432
Number of PSUs   = 2,034                        Population size   = 12,070,291
                                                Design df         =      2,019
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,622
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865539   .0161887   177.01   0.000     2.833791    2.897287
     group_2 |   2.659247   .0151161   175.92   0.000     2.629603    2.688892
  difference |   .2062915   .0205458    10.04   0.000     .1659983    .2465847
   explained |   .1233931    .017833     6.92   0.000     .0884201    .1583662
 unexplained |   .0828984   .0226395     3.66   0.000     .0384992    .1272975
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0108239   .0060271    -1.80   0.073    -.0226439     .000996
         exp |   .1078319   .0126765     8.51   0.000     .0829715    .1326922
    children |   .0017906   .0011574     1.55   0.122    -.0004793    .0040605
         EGP |   .0245946   .0136235     1.81   0.071     -.002123    .0513121
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0111098   .1151472     0.10   0.923    -.2147099    .2369295
         exp |   .0314076   .0428362     0.73   0.464    -.0526003    .1154154
    children |   .0018348   .0088282     0.21   0.835    -.0154786    .0191481
         EGP |  -.0462237    .018013    -2.57   0.010    -.0815498   -.0108976
       _cons |   .0847699   .1280491     0.66   0.508    -.1663522     .335892
------------------------------------------------------------------------------
exp: expft expft2
EGP: egp_1 egp_2 egp_3 egp_4 egp_6 egp_7 egp_8

Task 3: how collapsing categories changes results

Now simplify the EGP variable by combining classes VIIa and VIIb (codes 7 and 8) into one bigger class. How do the decomposition results change?

. generate byte EGP = egp

. replace EGP = 7 if EGP==8
(62 real changes made)

. fre EGP

EGP
-----------------------------------------------------------
              |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   1     |        833      15.34      15.34      15.34
        2     |       1411      25.98      25.98      41.31
        3     |        810      14.91      14.91      56.22
        4     |        732      13.48      13.48      69.70
        6     |        757      13.94      13.94      83.63
        7     |        889      16.37      16.37     100.00
        Total |       5432     100.00     100.00           
-----------------------------------------------------------

. quietly levelsof EGP

. foreach l in `r(levels)' {
  2.     quietly generate byte EGP_`l' = EGP==`l' if EGP<.
  3. }

. summarize EGP_*

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       EGP_1 |      5,432    .1533505    .3603582          0          1
       EGP_2 |      5,432     .259757    .4385416          0          1
       EGP_3 |      5,432    .1491163    .3562359          0          1
       EGP_4 |      5,432     .134757    .3414953          0          1
       EGP_6 |      5,432    .1393594     .346353          0          1
-------------+---------------------------------------------------------
       EGP_7 |      5,432    .1636598    .3700006          0          1

. oaxaca lnwage yeduc (exp: expft expft2) children (EGP: normalize(EGP_*)), ///
>     by(sex) weight(1) svy
(normalized: EGP_1 EGP_2 EGP_3 EGP_4 EGP_6 EGP_7)

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,432
Number of PSUs   = 2,034                        Population size   = 12,070,291
                                                Design df         =      2,019
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,622
Group 2: sex = 2                                N of obs 2        =      2,810

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   2.865539     .01622   176.67   0.000     2.833729    2.897349
     group_2 |   2.659247   .0150841   176.30   0.000     2.629665    2.688829
  difference |   .2062915   .0204962    10.06   0.000     .1660955    .2464875
   explained |   .1243395    .017779     6.99   0.000     .0894723    .1592067
 unexplained |    .081952   .0226796     3.61   0.000     .0374742    .1264298
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0108035   .0060164    -1.80   0.073    -.0226026    .0009956
         exp |   .1086303   .0127124     8.55   0.000     .0836995    .1335612
    children |   .0017697   .0011475     1.54   0.123    -.0004807      .00402
         EGP |    .024743   .0135759     1.82   0.069    -.0018812    .0513672
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0189711    .114963     0.17   0.869    -.2064874    .2444295
         exp |   .0322705    .043009     0.75   0.453     -.052076    .1166171
    children |   .0012317   .0088633     0.14   0.889    -.0161504    .0186138
         EGP |  -.0270174   .0116508    -2.32   0.020    -.0498662   -.0041686
       _cons |   .0564961   .1292078     0.44   0.662    -.1968985    .3098906
------------------------------------------------------------------------------
exp: expft expft2
EGP: EGP_1 EGP_2 EGP_3 EGP_4 EGP_6 EGP_7

The contribution to the explained part did not change very much. However, for the unexplained part, the contribution of EGP is different (-0.046 vs. -0.027). This is a general problem of the (unweighted) normalization: for the contribution to the unexplained part, results can change quite a bit depending on minor changes in the categories.

Task 4: weighted normalization

Optional: Compute the contribution of EGP and the simplified EGP to the unexplained part using a weighted normalization. You need to do this manually (hint: you can use command contrast to obtain normalized coefficients after running a regression). Compare the results to the results from the unweighted normalization.

This is currently not implemented in oaxaca and has to be done manually. The approach is to first use contrast to compute the transformed coefficients and then use matrix multiplication to obtain the contribution to the unexplained part. The gw. operator is what we need to compute deviation contrasts from the weighted mean.

. svy, subpop(if sex==1): ///
>     regress lnwage yeduc expft expft2 children i.egp, nofvlab
(output omitted)

. contrast gw.egp, nofvlab nowald

Contrasts of marginal linear predictions

                                                             Design df = 2,019

Margins: asbalanced

--------------------------------------------------------------
             |   Contrast   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
         egp |
(1 vs mean)  |    .267891   .0258126      .2172688    .3185131
(2 vs mean)  |   .0564703   .0256409       .006185    .1067556
(3 vs mean)  |   -.099255   .0447632     -.1870419   -.0114681
(4 vs mean)  |  -.1058109   .0495935     -.2030707   -.0085512
(6 vs mean)  |  -.0800621   .0258872     -.1308305   -.0292937
(7 vs mean)  |   -.211495   .0293221     -.2689998   -.1539902
(8 vs mean)  |  -.4512152   .0551535      -.559379   -.3430515
--------------------------------------------------------------

. matrix b_m = r(b)

. svy, subpop(if sex==2): ///
>     regress lnwage yeduc expft expft2 children i.egp, nofvlab
(output omitted)

. contrast gw.egp, nofvlab nowald

Contrasts of marginal linear predictions

                                                             Design df = 2,019

Margins: asbalanced

--------------------------------------------------------------
             |   Contrast   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
         egp |
(1 vs mean)  |   .2583709   .0408441      .1782699    .3384719
(2 vs mean)  |   .0703231   .0191034      .0328586    .1077876
(3 vs mean)  |   .0544614   .0203194      .0146123    .0943106
(4 vs mean)  |  -.1128703   .0252037     -.1622982   -.0634424
(6 vs mean)  |  -.2045433   .0547189     -.3118548   -.0972319
(7 vs mean)  |  -.2452683   .0359918     -.3158532   -.1746834
(8 vs mean)  |  -.5953543   .0841274       -.76034   -.4303687
--------------------------------------------------------------

. matrix b_f = r(b)

. svy, subpop(if sex==2): proportion egp if e(sample)
(running proportion on estimation sample)

Survey: Proportion estimation

Number of strata =    15                                            Number of obs   =       5,432
Number of PSUs   = 2,034                                            Population size =  12,070,291
                                                                    Subpop. no. obs =       2,810
                                                                    Subpop. size    = 5,809,491.4
                                                                    Design df       =       2,019

-------------------------------------------------------------------------------------------------
                                                |             Linearized            Logit
                                                | Proportion   std. err.     [95% conf. interval]
------------------------------------------------+------------------------------------------------
                                            egp |
higher managerial and professional workers (I)  |   .1083481   .0096877      .0907605    .1288609
lower managerial and professional workers (II)  |   .3140206   .0155473      .2843604    .3452818
         higher routine service workers (IIIa)  |   .2113439   .0127826      .1873618    .2374986
          lower routine service workers (IIIb)  |   .2183959    .013002      .1939701    .2449628
                skilled manual workers (V, VI)  |   .0431241   .0055396      .0334762    .0553931
     semi- and unskilled manual workers (VIIa)  |   .0978556   .0086602      .0821363    .1162024
                 agricultural labourers (VIIb)  |   .0069118   .0031766      .0028004    .0169566
-------------------------------------------------------------------------------------------------

. matrix X_f = e(b)

. matrix U   = X_f * (b_m - b_f)'

. matrix list U

symmetric U[1,1]
            r1
y1  -.02459457

We now pack this into a small program so we can re-use it and so that it also supports the unweighted normalization (operator g. instead of gw.).

. capture program drop egpdecomp

. program egpdecomp, rclass
  1.     args op egp
  2.     quietly {
  3.         svy, subpop(if sex==1): ///
>             regress lnwage yeduc expft expft2 children i.`egp', nofvlab
  4.         contrast `op'.`egp', nofvlab nowald
  5.         matrix b_m = r(b)
  6.         svy, subpop(if sex==2): ///
>             regress lnwage yeduc expft expft2 children i.`egp', nofvlab
  7.         contrast `op'.`egp', nofvlab nowald
  8.         matrix b_f = r(b)
  9.         svy, subpop(if sex==2): proportion `egp' if e(sample)
 10.         matrix X_f = e(b)
 11.         matrix U   = X_f * (b_m - b_f)'
 12.     }
 13.     return scalar U = U[1,1]
 14.     display as txt "Contribution to unexplained part = " as res return(U)
 15. end

Using this program, results for different situations can be computed without much effort:

. egpdecomp g egp
Contribution to unexplained part = -.0462237

. egpdecomp g EGP
Contribution to unexplained part = -.02701741

. egpdecomp gw egp
Contribution to unexplained part = -.02459457

. egpdecomp gw EGP
Contribution to unexplained part = -.024743

The first two results are the same as above using the unweighted normalization. The latter two are the results using the weighted normalization. As can be seen, the change in the categorization (egp vs. EGP) only has a minor effect on the results using weighted normalization.

Looking at the results we also realize that in case of the weighted normalization the aggregate contribution of a categorical predictor to the unexplained part is exactly -1 times the contribution to the explained part! This is a formal property of the weighted normalization (at least in a linear decomposition)!

This means that we do not really need to compute the weighted normalization; we can just read it off the standard output from oaxaca (i.e. -1 * the contribution to the explained part). However, it also highlights once more that the detailed decomposition of the unexplained part is problematic (what is the point of computing the contribution to the unexplained part if we know that mechanically it will just be -1 times the contribution to the explained part?).

Note that Kim (2013) suggests a slightly different weighted normalization. Using contrast with the gw. operator is equivalent to normalizing the coefficients of each model using the distribution of categories in the (sub)sample that has been used to estimate the model. What Kim (2013) suggests is to use the distribution of categories in the overall sample across both groups to normalize the coefficients of each model. This leads to slightly different results such that the relation above (i.e., that the contribution to the unexplained part is equal to -1 times the contribution to the explained part) only holds approximately.

Task 5: industry decomposition

Optional: Compute the "industry decomposition" described on the slides by economic sector (variable industry).

Define dummy variables for the three sectors:

. fre industry

industry -- economic sector
------------------------------------------------------------------------
                           |      Freq.    Percent      Valid       Cum.
---------------------------+--------------------------------------------
Valid   1 primary sector   |         80       1.47       1.49       1.49
        2 secondary sector |       1622      29.86      30.16      31.65
        3 tertiary sector  |       3676      67.67      68.35     100.00
        Total              |       5378      99.01     100.00           
Missing .                  |         54       0.99                      
Total                      |       5432     100.00                      
------------------------------------------------------------------------

. generate byte primary   = industry==1 if industry<.
(54 missing values generated)

. generate byte secondary = industry==2 if industry<.
(54 missing values generated)

. generate byte tertiary  = industry==3 if industry<.
(54 missing values generated)

Run the decomposition using the secondary sector as base level:

. oaxaca lnwage yeduc (exp: expft expft2) children primary tertiary, ///
>     by(sex) weight(1) svy

Blinder-Oaxaca decomposition

Number of strata =    15                        Number of obs     =      5,378
Number of PSUs   = 2,018                        Population size   = 11,939,847
                                                Design df         =      2,003
                                                Model             =     linear
Group 1: sex = 1                                N of obs 1        =      2,592
Group 2: sex = 2                                N of obs 2        =      2,786

    explained: (X1 - X2) * b1
  unexplained: X2 * (b1 - b2)

------------------------------------------------------------------------------
             |             Linearized
      lnwage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |    2.86764   .0160951   178.17   0.000     2.836076    2.899205
     group_2 |   2.660935   .0151013   176.21   0.000     2.631319    2.690551
  difference |   .2067055   .0205496    10.06   0.000     .1664048    .2470063
   explained |   .1323815   .0168501     7.86   0.000      .099336    .1654271
 unexplained |    .074324   .0218455     3.40   0.001     .0314818    .1171662
-------------+----------------------------------------------------------------
explained    |
       yeduc |  -.0199175   .0105231    -1.89   0.059    -.0405548    .0007198
         exp |   .1012204   .0126907     7.98   0.000     .0763321    .1261088
    children |   .0020019   .0012916     1.55   0.121    -.0005311     .004535
     primary |  -.0028996   .0024324    -1.19   0.233    -.0076699    .0018708
    tertiary |   .0519763   .0090816     5.72   0.000     .0341658    .0697867
-------------+----------------------------------------------------------------
unexplained  |
       yeduc |   .0992443   .0960004     1.03   0.301    -.0890268    .2875153
         exp |   -.003026   .0435204    -0.07   0.945     -.088376     .082324
    children |   .0020287   .0091417     0.22   0.824    -.0158995    .0199569
     primary |   .0007511   .0017191     0.44   0.662    -.0026203    .0041226
    tertiary |  -.0314402   .0358418    -0.88   0.380    -.1017313     .038851
       _cons |    .006766     .11693     0.06   0.954    -.2225512    .2360832
------------------------------------------------------------------------------
exp: expft expft2

Compute the industry decomposition by Horrace and Oaxaca (2001):

. matrix coefs = e(b0)

. matrix I = J(3,1,.)

. matrix rownames I = Primary Secondary Tertiary

. matrix I[2,1] = _b[overall:unexplained] - _b[unexplained:primary] - _b[unexplained:tertiary]

. matrix I[1,1] = I[2,1] + (coefs[1,"b1:primary"]  - coefs[1,"b2:primary"])

. matrix I[3,1] = I[2,1] + (coefs[1,"b1:tertiary"] - coefs[1,"b2:tertiary"])

. matrix list I

I[3,1]
                  c1
  Primary  .20048314
Secondary  .10501302
 Tertiary  .06756791

Compute the contributions of the sectors to unexplained wage gap according to Fortin et al. (2011):

. svy, subpop(if sex==2): mean primary secondary tertiary if lnwage<.
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =    15          Number of obs   =      5,408
Number of PSUs   = 2,028          Population size = 12,021,927
                                  Subpop. no. obs =      2,786
                                  Subpop. size    =  5,761,127
                                  Design df       =      2,013

--------------------------------------------------------------
             |             Linearized
             |       Mean   std. err.     [95% conf. interval]
-------------+------------------------------------------------
     primary |   .0078677    .003227       .001539    .0141964
   secondary |   .1524993   .0110819      .1307662    .1742324
    tertiary |    .839633   .0113178      .8174372    .8618289
--------------------------------------------------------------

. matrix p = e(b)

. local S = p[1,1]*I[1,1] + p[1,2]*I[2,1] + p[1,3]*I[3,1]

. display `S'    // equal to total unexplained
.074324

. matrix I[1,1] = p[1,1]*I[1,1] / `S'

. matrix I[2,1] = p[1,2]*I[2,1] / `S'

. matrix I[3,1] = p[1,3]*I[3,1] / `S'

. matrix list I

I[3,1]
                  c1
  Primary   .0212224
Secondary  .21546758
 Tertiary  .76331002

We can also use nlcom to compute these decompositions. The advantage is that in this way we will also get standard errors and confidence intervals. The procedure goes as follows. Apart from the decomposition results, oaxaca also returns the underlying regression coefficients and means as well as their joint variance matrix in e(b0) and e(V0).

. oaxaca lnwage yeduc (exp: expft expft2) children primary tertiary, ///
>     by(sex) weight(1) svy
(output omitted)

. matrix list e(b0)

e(b0)[1,35]
            b1:         b1:         b1:         b1:         b1:         b1:         b1:         b2:
         yeduc       expft      expft2    children     primary    tertiary       _cons       yeduc
r1   .08898211   .02782255  -.00038685    .0444333  -.21373931  -.16996766   1.4778099   .08131684

            b2:         b2:         b2:         b2:         b2:         b2:      b_ref:      b_ref:
         expft      expft2    children     primary    tertiary       _cons       yeduc       expft
r1   .02953905  -.00046565   .04064076  -.30920943  -.13252255   1.4710439   .08898211   .02782255

         b_ref:      b_ref:      b_ref:      b_ref:      b_ref:         x1:         x1:         x1:
        expft2    children     primary    tertiary       _cons       yeduc       expft      expft2
r1  -.00038685    .0444333  -.21373931  -.16996766   1.4778099   12.723427   17.343701   401.51787

            x1:         x1:         x1:         x1:         x2:         x2:         x2:         x2:
      children     primary    tertiary       _cons       yeduc       expft      expft2    children
r1   .57997534    .0214336   .53383213           1   12.947265   10.885952   198.72516   .53492034

            x2:         x2:         x2:
       primary    tertiary       _cons
r1   .00786766   .83963304           1

To be able to apply nlcom to these results, we first need to post the results as a new estimation set using ereturn post:

. matrix b = e(b0)

. matrix V = e(V0)

. ereturn post b V

We can now piece together the expressions to be submitted to nlcom. To reduce writing, we use a loop over the variables to collect the elements that are part of each expression:

. local ref (_b[b1:_cons]-_b[b2:_cons])

. foreach v in yeduc expft expft2 children {
  2.     local ref `ref' + (_b[b1:`v']-_b[b2:`v'])*_b[x2:`v']
  3. }

. nlcom (Primary:   `ref' + (_b[b1:primary]-_b[b2:primary]))   ///
>       (Secondary: `ref')                                     ///
>       (Tertiary:  `ref' + (_b[b1:tertiary]-_b[b2:tertiary])) ///
>       , noheader

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     Primary |   .2004831   .2123707     0.94   0.345    -.2157559    .6167221
   Secondary |    .105013   .0382853     2.74   0.006     .0299751    .1800509
    Tertiary |   .0675679   .0240717     2.81   0.005     .0203883    .1147475
------------------------------------------------------------------------------

Likewise, the rescaled variant as suggested by Fortin et al. (2011) can be computed as follows:

. local ref (_b[b1:_cons]-_b[b2:_cons])

. foreach v in yeduc expft expft2 children {
  2.     local ref `ref' + (_b[b1:`v']-_b[b2:`v'])*_b[x2:`v']
  3. }

. local p1        (_b[x2:primary])

. local p2        (1 - _b[x2:primary] - _b[x2:tertiary])

. local p3        (_b[x2:tertiary])

. local primary   (`ref' + (_b[b1:primary]-_b[b2:primary])) * (`p1')

. local secondary (`ref') * (`p2')

. local tertiary  (`ref' + (_b[b1:tertiary]-_b[b2:tertiary])) * (`p3')

. local sum       (`primary' + `secondary' + `tertiary')

. nlcom (Primary:     `primary'/`sum') ///
>       (Secondary: `secondary'/`sum') ///
>       (Tertiary:   `tertiary'/`sum') ///
>       , noheader

------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     Primary |   .0212224    .024191     0.88   0.380     -.026191    .0686358
   Secondary |   .2154676   .0812562     2.65   0.008     .0562083    .3747268
    Tertiary |     .76331   .0854742     8.93   0.000     .5957837    .9308363
------------------------------------------------------------------------------