Work-Related Training and Wages: An Empirical Analysis for Male Workers in Switzerland

Work-related training is considered to be very important for providing the workforce with the necessary skills for maintaining and enhancing the competitiveness of the firms and the economy. On the individual level, the primary effect of training should be an increased productivity of the trained workers. This paper provides estimates of the effects of training on wages which can be seen as a lower bound for the effects on productivity. Based on panel data from the Swiss Labour Force Survey (SLFS) I estimate these effects using nonparametric matching methods. Training is measured either as firm-sponsored training or as any work-related training. The data show that multiple participation in work-related training is not a rare event. This complicates the analysis considerably because the evaluation of dynamic treatments is not yet fully developed. As a solution to this problem a heuristic difference-in-differences approach to estimate the incremental effect of further training events is used. The results indicate that it is important to account for multiple training events. Taken together, there are significant effects of work-related training on wages of roughly 2% for each training event. There is some evidence that workers who already have high earnings profit more from continuous work-related training.


Introduction
Work-related training is considered to be very important for providing the workforce with the necessary skills for maintaining and enhancing the competitiveness of the firms and the economy (see e.g. OECD, 1995). On the individual level, the primary effect of training should be an increased productivity of the trained workers. However, it is difficult to measure individual productivity. The best proxy for productivity is usually the worker's wage which theoretically should be equal to the worker's marginal product. In the case of training this is more difficult to do, at least for general training. Becker (1964) has shown that the costs of general training will be paid by workers. The costs of firm-specific training will be shared by firm and worker. In both cases it is likely that workers pay for the costs with reduced wages. Hence, at least for some time, work-related training will lead to a wedge between productivity and wage. Hence, analysing the effects of training on wages will provide a lower bound for the effects on productivity. Empirical evidence based on data containing information on productivity indicates that the effects on productivity can be much larger than the effects on wages (Barron, Berger, andBlack, 1999, andGoux andMaurin, 2000).
The major econometric problem in analysing the effects of work-related training on wages is that training participation is not a random event. In order to control for nonrandom selection into training I apply a difference-in-differences matching estimator. This estimator has been proposed by Heckman et al. (1997) and has been recently used by Eichler and Lechner (1999) and Bergemann et al. (2001). The difference-in-differences matching estimator combines the advantages of both difference-in-differences and matching. Matching removes all observable differences between the group of participants and the control group by appropriate econometric methods. Hence matching will yield unbiased estimates of the treatment effect when selection is only on observable factors. The major criticism against matching is that it may be hard to justify that there is no selection on unobservable factors like ability or motivation. As long as these unobservable factors are constant over time they can be eliminated by differencing over time. In this sense difference-in-differences matching corrects for both selection on observables and on unobservables.
Using data from the Swiss Labour Force Survey I estimate the effect of work-related training on individual earnings in the first and second year after training. Contrary to previous results for Switzerland I find only small and often insignificant effects. This finding suggests that if training 3 increases productivity workers are not able to benefit from this increase, at least in the first two years after training.

Theory of work-related training
In his seminal work on human capital Becker (1964) made the crucial distinction between general and specific training. If he skills a worker aquires through on-the-job training are purely general, the wage on the external labour market will reflect the full marginal product from this training.
Thus, the worker captures the entire return from their general human capital in a competitive labour market. On the other hand, training in perfectly specific skills has no effect on the worker's productivity in other firms, i.e. the wage he can get elsewhere will be independent of the amount of training he received. As a consequence, the return to specific human capital will be shared between employer and employee. Becker concluded that workers must bear all costs of their general training whereas the costs of specific training are shared between workers and firms.
This prediction, however, is at odds with empirical work on firm-sponsored formal training which is general in nature. Recent research has suggested several reasons why and under which circumstances firms may be willing to contribute to the costs of general training. One prominent explanation is based on informational asymmetries between training firm and potential future employers. If the outside market is not as well informed as the current employer about a worker's level of training or other relevant characteristics, the worker's general skills are no longer perfectly marketable and in essence become specific skills (Katz and Ziderman, 1990, Acemoglu and Pischke, 1998. An analoguous argument applies if there are labour market frictions created by search and hiring costs (Acemoglu, 1997). In both cases, workers receive less than their marginal product from general training which improves firms' investment incentives. Acemoglu and Pischke (1999) note further labour market imperfections where wages are below marginal product and rise less steeply than productivity so that the wedge between marginal product and (outside) wage is higher the more trained a worker is. They refer to this situation as a compressed wage structure. Kessler and Lülfesmann (2000) present a model based on the assumption that general and specific training are complements. They show that in this case employer and employee will share the costs and returns of general training even without market imperfections. 4

Empirical evidence
There is a large and growing literature on estimating the effect of work-related training on wages and job turnover. Methodologically, the papers vary between cross-section OLS regressions with and without selection correction, fixed effect estimators, and nonparametric matching approaches. Since it is rather unlikely that training is allocated randomly across workers estimates without taking account of selection are to be interpreted with caution. These studies often find returns to training that are larger than the returns to education (see Pfeiffer, 2000, for a recent survey). However, controlling for selectivity is difficult in the training context because it is hard to find variables that affect training decisions but do not affect earnings. 1 This may explain the very high estimates of over 20% for the Netherlands in Groot (1995) and for Germany in Pfeiffer and Reize (2000). 2 An alternative to control for selection is estimation by fixed effects, assuming that the unobserved variable determining training decisions and earnings can be eliminated by differencing over time. Examples for this approach are Pischke (2000) and Blundell et al (1999). Pischke uses data from the German Socioeconomic Panel and finds hardly any significant effect of training on wage levels or wage growth. Blundell et al. use data from the British National Child Development Survey, which is a unique panel data set following a birth cohort (born between March 3 and 9, 1958) over time. They analyse the effect of training between 1981 and 1991 on wage growth in this period. In addition to control for permanent unobserved heterogeneity by first differencing, they also control for transitory fluctuations between the determinants of training and wages by a selection term. They find significant effects of roughly 8% for employer-provided training on wage growth over 10 years, i.e. less than 1% per year. Lechner (1999) estimates the effect of enterprise-related training in East Germany in the early 1990s using matching methods. He finds significant effects in the second year after the training of about 350 DM (more than 10% of participants mean earnings prior to training).
Two interesting recent studies are Barron, Berger, and Black (1999) and Goux and Maurin (2000). Both studies are based on data for workers and firms. Barron et al. find only small effects of training on wages (based on fixed effect estimation), but large effects on productivity. Their results imply that firms bear most costs of training, but also get most of the returns to training.
Goux and Maurin find a effect of about 5% for training when not controlling for selectivity. However, when they control for selectivity using firm information the effect vanishes indicating that the returns are taken up by firms.
Bänziger estimates the returns to training by uncorrected OLS using cross section data from the Swiss Labour Force Survey 1996 and finds effects between 4 and 6% for men. These numbers appear to be quite large, given that average labour productivity growth in Switzerland was 0.7% per year during the 1990s. Gerfin et al. employ fixed effects estimators using data from the 1998-2000 waves of the SLFS (which are also used in this paper). Their estimates for men are around 1.5%.

Econometrics
Estimating the effect of training is a classical treatment effect problem. To estimate a treatment effect we compare the value of some outcome variable (e.g. wages) for the treated individuals with the value the outcome variable would have taken in case of nontreatment. This hypothetical value is usually called counterfactual. It must be estimated using the group of the nontreated since we never observe anyone both as treated and nontreated. In order to get an unbiased estimate there must be no systematic differences between the treatment group and the control group selected from the nontreatment group, , i.e. selection into treatment must be random.
However, in the case of work-related training workers are selected or select themselves based on observable and unobservable characteristics. If we do not control for this selection the estimates of the treatment effect are likely to be biased.
The framework for the empirical analysis in this paper is the potential-outcome approach to causality suggested by Roy (1951) and Rubin (1974 can consistently be estimated by the sample mean of y i in the subsample of participants. The problem is the term ( | 1) n E Y S = . A central issue in the literature on causal models in statistics and selectivity models in econometrics is finding useful identifying assumptions to predict the unobserved expected non-treatment outcomes of the treated population using the observable non-treatment population. The most common approach is the standard selection model in which identification is achieved by parametric assumptions about the joint distribution of the error terms in the selection and in the outcome equation. It is well known that the selection model in most cases requires a variable that influences the selection, but not the outcome in order to be fully identified (the model is in principle identified by its nonlinearity, but in practice results often are volatile when no such variable exists). In the context of work-related training such a variable is hard to find, especially in typical labour force surveys. For this reason I use another approach outlined below.
One possible assumption to solve the identification problem is the conditional independence assumption (CIA) proposed by Rubin (1977). CIA can be stated as follows: In words CIA means that participation is independent ( ) of the non-treatment outcome conditional on the values of the attributes x in the space χ. Thus , and θ is identified. As opposed to model-based econometric approaches CIA allows to estimate treatment effects directly without imposing functional form or parametric assumptions necessary to estimate structural models.
A technical problem arises when X has a high dimension. A solution to this problem is the propensity score or the balancing score, respectively. Let ( ) propensity score, defined as the probability ( ), 0 ( ) 1, P x P x < < of participating in the treatment.
If CIA holds Rosenbaum and Rubin (1983) show that | ( ) ( ), In words, this implies that when the outcomes are independent of participation conditional on X, they are also independent of participation conditional on the propensity score. The major advantage of this property is the reduction of the dimension of the estimation problem. The disadvantage is that the probability of assignment is unknown and has to be estimated. n Y CIA and the propensity score property are the basis for the increasingly popular matching estimator of the treatment effect on the treated. A typical matching estimator takes the form depend on the distance between P i and P j . Matching estimators differ in the weights they attach to members of the comparison group. The most common matching estimator, the nearest neighbour (or one-to-one) matching estimator, sets W equal to one for the matched nearest neighbour and zero for all other members of the control group. Alternatives are kernel or local linear regression approaches for W.
In order to justify CIA it is necessary to identify and observe all variables that are mutually correlated with assignment and potential non-treatment outcomes. This implies that there is no important variable missing that influences non-treatment outcomes and assignment given a value of the relevant variable. It is unlikely that the SLFS data are sufficiently informative to justify CIA in the context of work-related training.
As a possible solution to this problem Heckman et al. (1997) proposed a generalisation of CIA. It is applicable when there is at least one observation of the outcome before the treatment and one after the treatment. The idea is that although CIA may not hold, it may be reasonable to assume that the resulting bias is the same for at least one date before training and for one date after training. If the true effect of the treatment is zero before the treatment takes place, the estimated treatment effect before treatment will be an estimate of the bias. This bias estimate can be used to correct the estimate of the treatment effect after treatment. This idea is of course very similar to a difference-in-differences estimator. For panel data the conditional difference-in-differences estimator is defined as The empirical evidence presented below is based on estimating equation (7) using a balanced panel.

Data
I employ data from the Swiss Labour Force Survey (SLFS information is much less detailed and does not allow a distinction according to who paid for the training. But using this reduced information it is possible to analyse the dynamics of training participation in Switzerland. This information proves to be very important for controlling for selection effects.
Unfortunately, there was a significant change in the questionnaire regarding income between 1995 and 1996. 4 Since the estimation method is based on the difference between the income before and after training it is impossible to use the 1995/1996 waves for the analysis. Only full-time working men are included in the sample. Work-related training is defined as training in the past 12 months that is either employer-financed or that takes place during work time. Training duration must be at least a week, and only completed training spells are considered. Table 1 displays descriptive statistics of some important variables for participants and nonparticipants in work-related training. It is obvious that there are significant differences between participants and control group 1 with respect to education, skill level, job position and firm size. Control group 2 appears to be more similar to the participant group but there are still some systematic differences. This will be reflected in the estimated propensity score in the next section. As a second treatment indicator I use participation in any work related training in the past twelve months. This is the training information available in each wave, whereas the more refined training indicator discussed above is only available in the 1999 wave. Of course, both indicators are highly correlated, and the difference should be workers who finance their training themselves.
This is the case for 20% of the workers reporting to have participated in work-related training (hence the overlap of the two indicators is 80%).
The final three rows display real monthly earnings by treatment status. It is obvious that the treatment group had much larger earnings in 1998, i.e. before the training that is being analysed had started. Using these numbers it is possible to compute simple difference-in-difference estimates without control variables. The effect of training using control group 1 is 63 CHF after one year and 113 CHF after two years. This amounts to an increase of roughly 2%. Using control group 2 the effects are 5 and 58 CHF, respectively. None of these estimates is significant (all tvalues are smaller than one).
An interesting question concerns training dynamics. Using the training variable contained in all waves ("did you receive work-related training in the past twelve months?") Table 2  The evaluation of the effects of repeated training participation is not yet fully developed (see Miquel, 2003, for a recent analysis). In addition, the SLFS only provides detailed training information in the 1999 spell. In order to analyse the potential effects of repeated training I focus on the simple training indicator described used for  (2003).

Results
Table 3 displays the estimation results of the training participation probit. Note that all control variables refer to the 1998 wave because the training variables from the 1999 wave refer to the past 12 months. Thus the situation in 1998 is relevant for training participation. Training participation is more likely for highly educated workers and workers with jobs requiring high skill levels. Training is more likely in large firms and in some sectors such as banking and insurance, and public administration. The most important determinant of training participation is previous training, indicating that training participation is highly correlated over time. Table 4 shows the results of the nonparametric difference-in-difference estimation of equation (7). The results are based on nearest neighbour matching with replacement, imposing the common support restriction. Matching was performed using the Mahalanobis weighting matrix, with the estimated propensity score and real income in 1998 as matching variables. The latter variable was included because analysing the balancing properties of matching on the propensity score alone showed that earnings in 1998 were not balanced well at all. This is documented in Appendix  Gerfin et al (2003) and Gerfin (2003) Table 4. Both estimated incremental effects are relatively large and significant. In the case of control group 1 the estimated effect of two training events is somewhat larger than the effect of the training event in 1998, but the difference is not significant. The same is the case when control group 2 is used. While the estimated incremental effect of the second training event is very large it is not significantly different from the effect for the first training event. It is also not significantly different from the effect estimated using control group 1. Overall, these results indicate that it is important to take account of repeated training events. In other words, the estimated effects on ∆Income2 using only the first training event appear to be quite misleading.
Not reported are estimates of the treatment effects by population subgroups. In all cases the remaining sample sizes were too small to estimate treatment effects with any precision. The considered subgroups were private sector, workers in large firms, and workers with higher education. 5 Separating the sample by earnings in 1998, however, provides one significant insight. 6 For workers with 1998 earnings above the median the incremental effect of a second training event is estimated to be CHF 415, which corresponds to an increase in earnings by 5% (see Table 5). For lower income workers this effect is much smaller and insignificant. This finding suggests that workers who already have high earnings profit more from continuous workrelated training. 13

Conclusions
The aim of this paper was to estimate the effects of work-related training on earnings. Given the theoretical literature these estimated effects are only lower bounds for the effects of work-related training on productivity. International evidence suggests that these effects are much larger than the effects on wages. Based on panel data from the Swiss Labour Force Survey (SLFS) covering the years 1998-2000 I estimate these effects using nonparametric matching methods. Specifically, in order to control for permanent observable differences between training participants and nonparticipants I employ difference-in-differences matching. Training is measured either as firmsponsored training or as any work-related training. Only the latter measure is available in each wave of the SLFS. Analysing the dynamics of this indicator clearly shows that multiple participation in work-related training is not a rare event. This complicates the analysis considerably because the evaluation of dynamic treatments is not yet fully developed. As a solution to this problem a heuristic difference-in-differences approach to estimate the incremental effect of further training events is used. The results clearly indicate that it is important to account for multiple training events. Taken together, the main results are that there are significant effects of work-related training on wages of roughly 2% for each training event. Focussing on firmsponsored training the estimated effect is somewhat smaller but the difference is not significant.
As argued above these estimates are a lower bound for the effects of training on productivity.
From a methodological point of view the results emphasise the importance to account for multiple treatment participation. The approach used in this paper is heuristic. More work is necessary to develop estimators that fully account for the dynamic nature of sequences of treatments.
5 These results are available on request. 6 Earnings in 1998 are reported before training takes place. Hence it is possible to treat these earnings as exogenous.
14     Source: Swiss Labour Force Survey, own calculations. All estimations included a constant term. Coefficients in italic are significant on the 10% level, coefficients in bold on the 5% level, and coefficients in bold italic on the 1% level.
Training is any work-related training in 1998 (firm-sponsored or privately financed)