The Trade Effects of Skilled versus Unskilled Migration

In this paper, we assess the role of skilled versus unskilled migration for bilateral trade using a flexible reduced-form model where the stocks of skilled and unskilled migrants at the country-pair level are determined as endogenous continuous treatments. The impact of different levels of skilled and unskilled migration on the volume and structure of bilateral trade is identified in a quasi-experimental design. This is accomplished through a generalization of propensity score estimation procedures for a case of multivariate, multi-valued treatments whereof the bivariate continuous treatment model is a special case. We find evidence of a polarized impact of skill-specific migration on trade: highly concentrated skilled or unskilled migrants induce higher trade volumes than a balanced composition of the immigrant base. Regarding the structure of trade, we observe a polarization specifically for differentiated goods and for north-south trade. Both bits of evidence are consistent with a segregation of skill-specific immigrant networks and corresponding consumption patterns and effects on trade.


Introduction
Globalization is a complex, multidimensional phenomenon. Even if we abstract from the political, cultural, and epidemiological forces that are, in some sense, the most problematic; economic globalization is quite complex. International flows of people, goods, assets, and money have increased dramatically since the late 1970s. Unruly public politics swirl around all of these, making them a source of both academic and public interest. Not surprisingly, the greatest effort has been focused on the link between such flows and labor market outcomes. However, much of that work has a vaguely partial equilibrium flavor (even when conducted in the context of a general equilibrium theoretical framework). That is, it tends to focus on one or another of these flows by abstracting from the others. Unfortunately, interactions between these flows may affect inference on any claims about individual effects. Thus, it behooves us to try to understand those interactions theoretically and empirically. In this paper, we look at the interaction between international migration and international trade.
Specifically, this paper aims at identifying causal effects of skilled and unskilled immigrants on bilateral trade using a quasi-experimental design. In this paper, we view bilateral stocks of skilled and unskilled immigrants as endogenous continuous treatment variables and bilateral trade as a continuous outcome variable. This allows us to develop our empirical analysis using generalized propensity scores (GPS) for causal inference with continuous treatments. We generalize this design for a case of multivariate, multi-valued treatments of which the bivariate continuous treatment model is a special case. The advantage of this approach vis-à-vis traditional models is its flexibility with respect to the functional form of the relationship between treatment (say, skilled and unskilled migration) and outcome (say, bilateral trade).
Exploiting this flexibility we are able to provide novel evidence on the effect of migration on bilateral trade, using two large cross-sections of bilateral stocks of skilled and unskilled immigrants and bilateral trade flows for 1991 and 2001, respectively. Specifically, we find evidence that a concentration on skilled or unskilled immigrants leads to a bigger response of bilateral trade than a mixed composition of the same level of total immigration. That is, there is a polarized effect of migration on trade. We explain this phenomenon in terms of the interaction between what network theorists call brokerage and closure (Burt, 2005). In particular, the latter is seen to play a significant role in determining the efficiency of a given network in providing a bridge between two natural markets with otherwise high costs of trade. This evidence points to the presence of (at least partially) segmented skilled and unskilled migration networks and their effectiveness for trade in different domains of goods.

2
The remainder of the paper is organized as follows. Section 2 provides a review of theoretical and empirical research on the nexus between migration and trade. Section 3 introduces the concept of generalized propensity score estimation with multiple (or multivariate) treatments for inference of the impact of skilled versus unskilled migration on bilateral trade. Section 4 introduces the data used for inferences and associated descriptive statistics. Section 5 summarizes the results regarding the causal impact of bilateral skilled versus unskilled migration on bilateral international trade, including a sensitivity analysis. The last section concludes with a summary of the most important findings.

Literature and hypotheses
This paper is far from the first to examine the relationship between trade and migration. The references in a recent survey of the literature on this relationship run to 16 pages (Gaston and Nelson, 2011)! Where the great majority of this work, on the relationship between trade and migration, tends to proceed in terms of standard neoclassical trade models, the empirical work tends to be based on looser formulations that emphasize mobility costs (for goods and people) and the role of social relations in overcoming those costs. The overwhelming majority of the empirical research on this question views migration as a factor potentially reducing the cost of trade between countries. Not surprisingly, the gravity model as applied to trade has been seen as a natural econometric framework for such empirical work (for surveys, see Anderson, 2011, Anderson andvan Wincoop, 2004;Bergstrand and Egger, 2011). 1 It is well known that the gravity model can be rationalized in a variety of ways, but the various social relations added to distance to capture reductions in cost (e.g., language, common border, common free trade area or currency union, etc.) are essentially ad hoc additions. More recent theoretical and empirical work has identified market size, unemployment rates, and the income distribution as relevant determinants of bilateral trade outcomes. Migration flows enter in the same way. The discussion in the following paragraphs provides an overview of the sorts of argument that have been made for such inclusion. As with most of the existing literature on the empirical link between trade and migration, we draw these literatures together by viewing migration, and particularly the networks that link migrant communities, as reducing trade costs (especially those related to information problems) between countries.
There is a growing body of work on the role of networks in trade (Rauch, 2001). However, this research tends to discuss two distinct aspects of those networks without recognizing their distinctiveness: one is the role of networks in mediating the economic relationship between two dense networks; and the other is the internal structure of the networks that do the spanning. Following Ronald Burt (2005Burt ( , 2009, network research often refers to these two dimensions as brokerage and closure. In this paper we argue that these two aspects of spanning networks interact a significant way in the case of the relationship between international migration and trade. The literature on migration and trade emphasizes the first link. Starting with Gould's (1994) paper that initiated the massive empirical literature using gravity models to evaluate the link between trade and migration, most of the papers in this area have seen that link in terms of spanning between dense networks of consumers, very much in line with Granovetter's (1973) "strength of weak ties" analysis. 2 The basic notion is that, in addition to their own demand for products of their home market, immigrants carry information about those products that is useful to native consumers. Both of these will tend to increase demand for products of the immigrant home market in their new host country. Thus, both of these channels involve market creation.
In addition to seeking an accurate measure of the effect of immigration on trade, most of the empirical literature also seeks to explicitly distinguish the brokerage effect from a pure demand effect. Since, in the absence of data that distinguish imports/exports by immigrants from imports/exports by natives, there is no straightforward structural way to make this distinction, all of these efforts involve attempts to infer which channel is relevant based on information about type of commodity, type of immigrant, or type of country involved. For example, Gould approaches this problem primarily by arguing for an asymmetry between effects on imports and exports. Specifically, he argues that pure demand effects should not have any effect on exports from the host country to the immigrant home, but should effect imports. 3 2 This body of research really is "massive". Starting from Gould's original paper until the time of writing of this paper, we count over seventy published and unpublished papers on the matter. For an extensive review of this literature, see Part I of White (2010), chapter 2 of White and Tadesse (2011), or the meta-analysis in Genc, Gheasi, Nijkamp, and Poot (2011).
3 This makes sense in a partial equilibrium way. However, if the scale of either return migration or emigration of host country natives is correlated with the scale of immigration, this inference Thus, Gould's inference is that: if immigration positively affects imports, but not exports, then the demand channel is revealed to dominate; but if immigration positively affects exports, but not imports, then the brokerage effect dominates. In the event, for the case of the US 1970-1986, he finds that both are significant, but that exports are influenced more than imports by immigrant flows. 4 In addition to the US (Hutchinson, 2002, Jansen and Piermartini, 2009, Mundra, 2005, Tadesse and White, 2010, White, 2007b, White and Tadesse, 2008a, similar studies have been done for Australia (White and Tadesse, 2007), Bolivia (Canavire Bacarreza and Ehrlich, 2006), Canada (Head and Ries, 1998), Denmark (White, 2007a), Greece (Piperakis, Milner, and Wright, 2003), Italy (Murat and Pistoresi, 2009), Malaysia (Hong and Santhapparaj, 2006), Spain (Blanes-Cristóbal 2004, Sweden (Hatzigeorgiou, 2010a), Switzerland (Kandogan, 2009, Tai, 2009, and the UK (Girma and Yu, 2002). 5 All of these papers find a statistically significant, positive link between immigration and both imports and exports; however, there doesn't seem to be any particular pattern in the relative magnitude of the import versus the export link. Similarly, a large number of studies disaggregate the host country to the level of US states (Co, Euzent, and Martin, 2004, Herander and Saavedra, 2005, Dunlevy, 2006, Bandyopadhyay, Coughlin, and Wall, 2008, Coughlin and Wall, 2011, Canadian provinces (Wagner, Head, andRies, 2002, Partridge andFurtan, 2008), French départements (Combes, Lafourcade, and Mayer, 2005), and Spanish provinces (Peri and Requena-Silvente, 2010). 6 Again, the results show significant effects of migration on trade, but no particular pattern in the effect on exports relative to imports. 7 More recently, the development of multicountry datasets has permitted the analysis of multiple host and multiple home countries. The great majority of these focus on (some subset of) OECD host countries and a large number of home countries (Lewer, 2006, Konečný, 2007, Moenius, Rauch, and Trindade 2007, Dolman, 2008, Morgenroth and O'Brien, 2008, Lewer and Van den Berg, 2009, Bettin and Turco, 2010, Egger, von Ehrlich, and Nelson, 2012, Felbermayr and Toubal, 2012, while some use a matched sample of countries that trade and exchange immigrants (Hatzigeorgiou, 2010b, Parsons, 2011. The results may run into trouble. 4 As in most of the empirical literature, we use the language of immigrant flows (because that matches the theory used to interpret the results), but it should be understood that the variable in question is invariably a stock.
5 This is a sample of the published papers which, like Gould, work with a single host country and multiple home countries.
6 An interesting variant examines the effect of migration of firm-level export data: Hatzigeorgiou and Lodefalk (2011) use Swedish data;and Bastos and Silva (2012) use Portugese data. 7 Closely related papers by Combes, Lafourcade, and Mayer (2005) for France and Millimet and Osang (2007) for the US examine the subnational relationship between migration and trade.
here are as with the previous bodies of work. As with this last group of papers, our data set will include 29 OECD countries and 98 source countries.
One possibility for trying to distinguish preference from information effects is to determine whether the effect of migration on trade decays above some threshold. We have already noted that both of these should increase trade. However, one might argue that demand effects should simply be linear in the immigrant population, while information effects should be subject to decay. 8 Gould (1994) did this with a non-linear functional form developed for the purpose and a handfull of other papers followed suit (Head and Ries, 1998, Wagner, Head, and Ries, 2002, Bryant, Gen¸ç, and Law, 2004, Morgenroth and O'Brien, 2008, Egger, von Ehrlich and Nelson, 2012. The general result here is that the effect of migration on trade is subject to diminishing returns. Furthermore, this effect sets in at quite low levels of migration. Unless this effect is driven by the effect of migration on diffusion of preference for migrant source goods to the native population, this would seem to be strong evidence for the information link. To further unpack this result, Gould considers consumer and producer goods separately, under the assumption that the former is more differentiated than the latter and that, as a result, the demand by immigrants will be greater. Imports and exports of both types of goods are positively affected by immigration, but imports of consumer goods have the largest effect found in Gould's analysis. He takes this to suggest that both there is evidence of demand effect for consumer goods imports, but brokerage plays the dominant role in the other cases. A number of studies follow Gould in trying to find a disaggregation that provides additional leverage in 8 It should be noted that the econometric implications of own demand and demonstration to host natives are rather different. We would expect own demand to vary more-or-less linearly with immigration, but demonstration effects are likely to be more complex. For example, if there is a uniform propensity of natives to consume new varieties of foreign goods, and information diffuses immediately, the first immigrants provide all the relevant information, leading to an initial jump in demand, but no subsequent change other than the linear increase deriving from own demand. However, if the diffusion of information follows some specific process (or willingness to adopt does) then that process will interact with the linear immigrant process to produce some combination of the two. Most work, either implicitly or explicitly, presumes that there is a positive linear relationship between immigration and demand for imports from host countries running through the preference channel. To the extent that the information bridge runs both ways, immigrants will provide information to their home countries about host country goods, thus increasing exports and, while there is no reason that the process of learning/adoption should be the same in home and host, neither is there any reason to assume the either takes any particular form. From a welfare point of view, both of these channels should increase welfare in the context of a Krugman (1981) monopolistic competition model of the sort that underlies the Anderson and vanWincoop (2003) framework central to much of the gravity modeling used to study the empirical relationship between trade and migration. Romer (1994) makes a similar argument in his discussion of the welfare cost of trade restrictions where imports may be new goods. 6 distinguishing demand from brokerage effects. A variety of disaggregations are used, including: Gould's choice of consumer v. producer goods (Herander and Saavedra, 2005, Mundra, 2005, Blanes-Cristóbal, 2008, Kandogan, 2009); finished v. intermediate goods (Mundra, 2005); and cultural v. non-cultural goods (Tadesse and White, 2008, White and Tadesse, 2008b. In the cases of all three disaggregations, the expectation is that demand effects will show up in a positive relationship between goods that are in some sense differentiated (i.e., consumer/manufactured/cultural) so that a preference for home varieties makes sense, and there is a fairly consistent pattern of immigration strongly affecting these imports.
Instead of sorting goods by some end-use category, an alternative is to sort the goods by the type of market on which they are traded. Starting from an explicitly network theoretic basis, Rauch (1999) argued that markets could be distinguished by whether: there is an organized exchange; there are reference prices quoted; and all other markets. Rauch argued that the first two types of market require that the goods traded on them be quite standardized (with organized exchanges requiring a higher degree of standardization than reference price goods) and that the residual markets, since they cannot support organized exchanges or reference prices, must be more highly differentiated. In the context of a basic gravity model, this paper found that trade costs (distance) and trade cost reduction (common language, common colonial ties) played a more significant role for differentiated goods than for the more standardized goods. In an important later paper, Rauch and Trindade (2002) argued that coethnic networks can play a significant role in reducing trade costs and, since Rauch (1999) showed that these costs are more important for differentiated goods, they should be particularly important in a gravity model as a factor increasing trade between a home and host country. Rauch and Trindade focus specifically on the Chinese diaspora (widely believed to play a major role in trade) by using Chinese population share as a variable in an otherwise standard gravity regression, finding that this variable is always significant across all types of markets, but has the greatest impact in differentiated goods -as is consistent with the hypothesis that ethnic networks of traders reduce trade costs. 9 Since the publication of Rauch and Trindade, a number of studies have used Rauch's classification in more standard setups where migrants from any country might reduce trade costs (White, 2007a, White and Tadesse, 2007, Dolman, 2008, Briant, Combes, and Lafourcade, 2009, Bettin and Turco, 2010, Hatzigeorgiou, 2010a,b, Felbermayr and Toubal, 2012. 10 While immigrant stock tends to be a significant, positive predictor of trade (both imports and exports) in all three categories, there is no particular pattern in the magnitudes of effect (though the meta analysis in Genc, Gheasi, Nijkamp, and Poot, 2011, is consistent with a smaller effect of immigration on standardized goods). In this paper, we will also apply our analysis to the Rauch product categories.
An alternative approach reasons that emigration of host natives to the source country cannot affect imports via the preference channel, so a positive coefficient on the emigration variable should be seen as evidence for the presence of an information channel. A number of papers have checked this relationship, finding that emigration is consistently positive and significant (Canavire Bacarreza and Ehrlich, 2006, Konečný, 2007, Dolman, 2008, Hatzigeorgiou, 2010b, Parsons, 2011, Felbermayr and Toubal, 2012. While the brokerage role that is emphasized by the literature is of obvious importance in understanding the link between trade and migration, closure plays a particularly central role in dealing with institutional failures and asymmetric information problems. As with brokerage, there is also a sizable literature on this second link. This work starts from the problems of contracting in certain types of goods or environments. The idea is that, in the absence of relatively complete contracts and/or effective legal environments, the risk of loss due to opportunistic behavior is sufficiently high that many mutually beneficial contracts would not be made in the absence of some alternative source of assurance. Anthropologists, sociologists and historians have long emphasized these factors in explaining the role of ethnic networks and diasporas in the organization of trade across political jurisdictions or, more generally, in the absence of effective protection of contractual/property rights (Polanyi, 1957, 1968, Geertz, 1963, 1978, Cohen, 1969, 1971, Bonacich, 1973, Curtin, 1984. Recent theoretical and empirical work strongly suggests that factors such as institutional quality, business conditions, and political order constitute significant trade costs (Anderson and Marcouiller, 2002, Anderson and Bandiera, 2006, Anderson and Young, 2006, Berkowitz, Moenius, and Pistor, 2006, Levchenko, 2007, Ranjan and Lee, 2007, Ranjan and Tobias, 2007, Turrini and van Ypersele, 2010, Araujo, Mion, and Ornelas, 2012. The earlier work by historians, anthropologists and sociologists suggest that diasporas can play a significant role in reducing these costs. Where this work identifies institutional sources of trading cost, or trading relationships that might be expected to be characterized by the presence of such costs (e.g., south-south relations, or north-south relations), it seems reasonable to expect that migration would have a particularly large trade supporting role in the presence of such costs. Thus, a number of papers have focused specifically on developing countries as a source of both trade and migration (Co, Euzent, and Martin, 2004, White, 2007a, Felbermayr and Jung, 2009, Bettin and Turco, 2010, finding that south to north migration affects affects overall trade, but particularly exports of differentiated products. 11 A particularly interesting example of this sort of analysis is presented in Tadesse and White (White and Tadesse, 2011, chapters 10 and 11), where they consider four main classes of immigration corridors (north to north, north to south, south to north, and south to south), finding the strongest effect in the south to south case, no effect in the north to north case, and intermediate effects in the two north/south links. Consistent with this, analyses that include some measure of institutional quality in the source country tend to find that the trade creating effects of migrants is greater when one of the countries has poor institutional quality (Dunlevy, 2006, Konečný, 2007, Briant, Combes and Lafourcade, 2009).
An interesting alternative approach to identifying this effect reasons that members of the same diaspora in both sides of a trading dyad that does not contain the source country of the diaspora would constitute evidence for the presence of market-creating responses to poor institutions or asymmetric information problems. Rauch and Trindade's (2002) widely cited paper, in its focus on the trade creating effect of the Chinese diaspora, is obviously an example of the application of this inference. Felbermayr, Jung, and Toubal (2010) develop the logic of this inference in detail and implement it in a multi-source/multi-host environment.
In an interesting recent approach, a number of studies have built on Chaney's (2008) firm heterogeneity extension of Krugman's (1980) model to evaluate the effect of migration on the extensive versus the intensive margin of trade (Jiang, 2007, Peri and Requena-Silvente, 2010, Coughlin and Wall, 2011. Jiang (for Canada) and Peri and Requena-Silvente (for Spain) find only evidence of an immigrant effect on the extensive margin, while Coughlin and Wall (using US state data) only find evidence of effects on the intensive margin. Peri and Requena-Silvente also include a product categorization based on the Broda and Weinstein elasticities and argue that their finding that immigrants affect mainly the extensive margin of exports for highly differentiated goods implies that migration reduces fixed costs, not variable costs of exporting those goods. To the extent that institutional failures and information asymmetries are interpreted as fixed costs, this interpretation of the Spanish and Canadian evidence would seem to constitute evidence for the importance of closure effects.
The work we have just discussed suggests that disasporas can play a particularly strong role in mediating trade links in the context of institutional problems and asymmetric information problems. However, much of the earlier work suggests that the internal structure of groups plays an important role in dealing with contracting problems. This is where Burt's notion of closure plays an essential role. The role of ethnic homogeneity (or, more broadly, social proximity) is also emphasized in the general literature on "middlemen minorities" and trading diasporas generally (e.g., Blalock, 1967, Bonacich, 1973, Iyer and Shapiro, 1999. 12 Landa (1981Landa ( , 1994 emphasizes social proximity in a transaction cost framework, and, while not central to the formalisms he develops, it certainly plays an essential role in the historical analysis of Greif (1989Greif ( , 1991Greif ( , 2006. For example, Landa's study of the role of Chinese traders in Malaysia suggests that transaction costs rise as one crosses every categorical level implying greater social distance. Greater density of members of a given degree of proximity permit more extensive division of labor within the network and, thus, a more efficient network. In this paper, our data do not permit us the sort of fine-grained distinctions that Landa derived based on surveys, but we do have a different measure of social proximity. In addition to country of origin, we have data on the skill level of migrants. Although might be viewed as rather blunt, we hypothesize that, by comparison to migrant communities with a wide range of skills, migrations characterized by a strong concentration of a given skill group will form more effective networks, generate "better" bridges, and thus produce a stronger link between migration and trade. This is, however, less blunt than it seems prima facie. After all, networks play an essential role in the location choices of emigrants. Because networks reduce the costs of migration, it is well-known that immigrants drawn from well-defined sending regions tend to go to equally well-defined locations in the receiving country. 13 Thus, when we consider the pre-existing social bonds between any group of migrants, the claim that similarity of education creates closer bonds gains considerable plausibility. Thus, we hypothesize that, other things equal, the effect of migration on trade will be stronger for migration flows made up of people with relatively homogeneous skills. 14 We can refine this further by using data on type of product traded. Extending the logic of Rauch and Trindade (2002), we suppose that, not only is type of product relevant to the degree to which networks matter for trade, but that it is relevant to which type of migrant community matters for trade. Rather than looking at the effect of immigration from a home country on the trade of a host country, Rauch and Trindade ask if the stock of Chinese in each country affects trade in a dyad. Chinese communities are well-known in the ethnographic and business literatures as actively involved in trade (e.g., Landa, 1981, Weidenbaum and Hughes, 1996, Liu, 2000. Thus, Rauch and Trindade's interest in the presence of Chinese immigrant communities in explaining trade in a gravity framework makes a lot of sense. In addition, the authors use Rauch's (1999) distinction between standardized goods (goods with reference prices), goods traded on organized exchange and differentiated goods, finding that the effects of Chinese communities on the latter two categories are economically and statistically more significant than on standardized goods. Given the other controls, the authors take this as evidence that Chinese communities provide both market-making and market-replacing services to the countries in which they reside. We extend this logic by supposing that: 1) differentiated goods are more exposed to contracting problems than standardized goods (this is an essential element of the Rauch and Trindade analysis); and 2) in addition to providing knowledge useful in providing basic bridging/information services for differentiated goods, more workers within the same skill reference group are seen as more reliable/able to fill in for incomplete contracts than workers belonging to networks of other skill groups. To the extent that more more homogeneous skill levels involve the acquisition of specialized knowledge that creates tighter bonds on which trust may be based, this would seem to be a plausible channel. Thus, our second hypothesis is that more homogeneous immigrant groups will generate greater trade flows than heterogeneous groups.
We now turn to a development of our methodology.
3 Nonparametric estimation of the trade effect of skilled and unskilled migration by generalized propensity scores Let us denote the cross-sectional units of observation (here, country-pairs) by = 1, ..., and economic outcome (in the present context, the value of bilateral goods imports) by . The goal of this paper is to determine the impact of skilled and unskilled migration on , considering that migration of any kind is potentially endogenous and the functional form of its impact on bilateral trade is unknown.
For econometric identification, let us think of the levels of skilled and unskilled migration as potentially endogenous treatments. Yet, unlike in most of the evaluation literature in econometrics, these treatments are not binary but continuous. 15 Let us denote specific potential treatment levels of skilled and unskilled migrants by , . Those potential treatment levels are associated with sets of potential treatment levels of While potential treatment levels of migrants are denoted by lower-case letters, corresponding realized treatment levels of country-pair will be denoted by upper-case letters, , .
For each country-pair , we may now define the set of potential outcomes in terms of a unit-level dose-response function for ℓ = , as (ℓ) ≡ ( , ) for ∈ , ∈ and the corresponding average dose-response function as ( We observe , , , and ( , ) and assume that { ( , )} ∈ , ∈ , , , are defined on a common probability space. and are continuously distributed w.r.t. Lebesgue measures and , respectively. = ( , ) is a well-defined, suitably measurable, random variable.
For identification, we have to assume weak unconfoundedness as stated in Rosenbaum and Rubin (1983) for the binary propensity score and in Hirano and Imbens (2004) and Imai and Van Dijk (2004) for the generalized propensity score with a univariate, multi-valued (continuous) treatment.
In our setup, the latter means that, conditional on the vector of covariates , the potential outcomes are independent of the treatment status in the two treatment dimensions , . The generalized propensity score in the two-dimensional continuous treatment is specified as follows.

Definition (Generalized Propensity Score)
Denote any possible vector of covariates determining treatment by and define the bivariate conditional joint density of , given as

13
The latter states that the probability of the observed treatments being equal to some potential treatment combination , is independent of the covariates in once we condition on the GPS. Accordingly, the treatment status is independent of the outcomes conditional on the GPS once the above Assumption is met. For our identification strategy, this implies that under weak unconfoundedness we need to condition only on one scalar, namely for unit , in order to remove the selection bias in the unconditional impact of skilled and unskilled migration on trade instead of all covariates in the vector . This allows a maximum of flexibility regarding the functional form of the trade response to skilled and unskilled migration.

Data and descriptive statistics
The variables entering our analysis encompass direct drivers of bilateral trade as well as direct drivers of skilled and unskilled bilateral migration and may be broadly grouped into dependent variables and independent variables.

Dependent variables
The dependent variables are bilateral import flows (which we also refer to as outcome) and bilateral stocks of skilled and unskilled migrants (which we refer to as treatments). The goal will be to determine the impact of (endogenous) bilateral skilled and unskilled migration treatments on bilateral trade outcome. Bilateral skill-specific migration data are available from Database on Immigrants in OECD Countries (DIOC) as published by the Organization for Economic Co-operation and Development (OECD) for the years 1991 and 2001. This dictates the sample coverage of the data used in the empirical analysis. As the outcome, we use data on bilateral imports from United Nations' Comtrade Database for the average year within the period 1991-95 and 2001-05, respectively. For the sector specific analysis we aggregate the bilateral imports according to the Standard International Trade Classification (SITC) one-digit categorization. For the classification into homogeneous and differentiated goods we follow two approaches by Rauch (1999) and by Broda and Weinstein (2006). For both approaches we assign sectors to the respective classifications using SITC three-digit data. This gives us four additional dependent variables, homogeneous and differentiated imports according to Rauch as well as low and high elasticity of substitution imports building on Broda and Weinstein. 17 - Table 1 here -Altogether, we cover 98 countries of origin and 29 (OECD) countries of residence of migrants and trade between those countries in our analysis (see the Appendix for a list). Table 1 provides some descriptive features of the data on the dependent variables covered in our analysis. We measure migration as well as trade in logarithmic terms. In addition to moments of each dependent variable, the table provides simple correlations of these variables at the bottom.

Independent variables
As described in the previous section, we have to employ independent variablesdeterminants of bilateral trade flows as well as skilled and unskilled migration stocks as elements in the vector -in order to remove the selection bias in an assessment of the impact of the two types of migration and on bilateral trade of countrypair . Bilateral imports of homogeneous and differentiated goods are denoted by and , respectively. Variables and refer to the Rauch classification while and refer to the classification by Broda and Weinstein. In the following we define skilled immigrants as those with at least secondary level of education. Note that our results are invariant to an alternative aggregation with skilled immigrants characterized by at least tertiary education.
Generally, includes both continuous and multi-valued discrete variables for both exporters/countries of origin and importers/countries of residence in a flexible 4th-order polynomial functional form and binary variables in a linear functional form. 18 Specifically, we include a parametric (polynomial) function of exporter/originand importer/residence-specific log GDP (GDPO , GDPR ), log GDP per capita (GDPPCO , GDPPCR ), and log population (POPO , POPR ) to account for effects of economic market size, per-capita income, and population size in a fairly flexible way. These variables are taken from the World Bank's World Development Indicators 2009. Moreover, we control for origin-and residence-country GINI coefficients (GINIO , GINIR ), unemployment rates (UNEMPO , UNEMPR ), life expectancy (LIFEEXPO , LIFEEXPR ), fertility (FERTILO , FERTILR ), literacy rates (LITO , LITR ), and real exchange rates between residence and origin countries (REALEXCH ) as measures of unemployment risk, inequality, and economic well-being beyond per-capita income. These variables come from the World Bank's World Development Indicators 2009, the CIA World Factbook, and United Nations Educational, Scientific, and Cultural Organization (UNESCO). Furthermore, we control for bilateral distance (DIST ) residence and origin countries as a continuous pair--specific geographical determinant of migration and trade, and for common language (COMLANG ), colonial relationship ( Finally, we account for third-country effects in all of the origin-and residencecountry-specific covariates on treatment by including neighboring (with a common land border) countries' average values of all the covariates. For convenience, we use a leading letter and number, A3 (for average and 3 rd country), and otherwise the same acronyms as introduced before to denote these third-country variables. The inclusion of the latter serves the purpose of accounting for interdependence of origin and residence countries in supplying and attracting migrants (see Anderson, 2011).
- Table 2 here - Table 2 provides information on moments of the data for all (first-order) independent variables akin to the upper bloc of Table 1.

Multivariate GPS estimation and the balancing property
We include all main effects of the covariates listed in Table 2 together with quadratic, cubic, and quartic terms of all non-binary regressors plus time dummies for each of the years 1991 and 2001 in the regressions explaining and . 19 Altogether, there are 129 explanatory variables in the two equations.
In Table 3, we report parameter estimates, standard errors and a few statistics for the 4th-order polynomial reduced-form model specification for and based on 129 regressors. These regressions feature a decent predictive power with 2 s of about 0.81 and 0.74 for and , respectively. Models based on 1st-order, 2nd-order, or 3rd-order polynomials would have achieved lower tuples of 2 s of (0.78, 0.71), (0.79, 0.72), and (0.80, 0.73), respectively. Figure 1 indicates the predictive power of our preferred first-stage model and suggests that the two reduced-form regressions for and perform quite well in terms of explanatory power.
- Table 3 and Figure 1 here -Based on the estimates in Table 3 one may compute the GPS explicitly by assuming bivariate normality of the disturbances aŝ = 1 where and are the variances of the disturbances in the equation for and , respectively, is the covariance between those disturbances, and and are the first moments of the two treatments (see, e.g., Greene, 2011, for a general treatment of bivariate normals). Due to the normality assumption we perform power transformation to the original data before we estimate the first stages and compute the GPS. Our transformation to bivariate normal follows the procedure suggested by Andrews et. al (1971) and Vellila (1993) and essentially corresponds to a treatment specific Box-Cox transformation. In order to test whether the transformed data satisfies the normality assumption, we run an omnibus test for normality introduced by Doornik and Hansen (2008). The test statistic shows that we cannot reject the null hypothesis of bivariate normality at conventional levels of statistical significance. To facilitate interpretation of our result we retransform the treatments and report dose-response as well as treatment effect functions always in terms of the logarithm of skilled and unskilled migration.
The concern with an unconditional regression of on both and is that the impact of and on might be confounded by the omission of relevant variables as listed in Table 2. This risk would be particularly pertinent, if and varied strongly with those variables. However, to invoke the key underlying Assumption stated above, it should hold that differences in and are reflected in the estimated GPS,ˆ in (1), such that elements in should not matter for and beyondˆ . Otherwise, the GPS would not be a suitable compact scalar function in the sense of a balancing score to eliminate differences in the data in the -domain. As a consequence, differences in outcome could not be ascribed to differences in the data in -space alone.
We assess the balancing property ofˆ by grouping and by splitting the sample along two lines. First, we split the data into a number of groups inspace. As a benchmark, we use nine (i.e., three by three) groups so that there are nine sub-samples of about the same size in terms of numbers of observations inspace. In each one of those groups there are about 357 observations. Moreover, we split the sample of observations within each group inˆ -space into blocs. In the benchmark specification, we use eight blocs. With an identical number of blocs per group, there is approximately the same number of observations per bloc and group, namely on average 45. However, we do not utilize all of the 3, 213 observations for estimation, but focus on units which lie inside the so-called common support region. The latter ensures that units with a certain treatment level in the two dimensions are compared to ones with other treatment levels in the same support region in -space. We then conduct t-tests about the equivalence of the averages in the data of each covariate. With nine groups and an unconditional comparison, this leads to 129 ⋅ 9 = 1, 161 t-values. In the conditional comparison, we estimate 1, 161 t-values per bloc and then calculate the average thereof. Of course, the sample sizes are different and test statistics should be adjusted. This is done by weighting the data properly by the number of observations used. Then, we compare the distributions of t-values between the conditional and the unconditional comparisons. The results are presented by way of Panel A in Figure 2 for the unconditional comparisons and Panel B in Figure 2 for the conditional comparisons.
- Figure 2 here -Two bits of information are particularly interesting. First, the histogram plots illustrate that a large mass (namely 636) of the 1, 161 t-values of unconditional comparisons lies outside of the [−2.576, +2.576] interval which (approximately) indicates significance levels of less than one percent. When taking t-statistics in absolute terms, the interquartile range amounts to [1.15, 5.16]. By way of contrast, the mass of the distribution of t-values of conditional (on blocs of the GPS) comparisons outside of that interval is minuscule (1.1 percent of the t-values are bigger than 2.576 in absolute terms). The interquartile range of absolute t-values for the conditional comparisons amounts to [0.20, 0.92]. This is evident from the much more narrowwaisted distribution of conditional-comparison t-statistics around zero relative to the unconditional-comparison t-statistics in Panels A and B of Figure 2.
Second, this feature materializes in corresponding average or median absolute tstatistics which we report at the bottom of the two figures. For instance, among the unconditional-comparison t-statistics, the average absolute value is 3.67, the median value is 2.90, and 55% of the absolute t-statistics imply statistical significance at one percent. Among the unconditional-comparison t-statistics, the average absolute value is 0.65, the median value is 0.47, and only 1.1% of the absolute t-statistics imply statistical significance at one percent. This illustrates that conditioning on the GPS is extremely powerful in the data. Hence, we hypothesize that the role of confounding variables in the empirical model explaining the impact of and on bilateral imports, , is drastically reduced by conditioning on the included observables. This conclusion is also supported by Figure 1.

Estimating the multivariate dose-response and treatment effect functions
In a next step, we estimate the so-called dose-response function. Utilizing the GPS as a compact (scalar-function) balancing score to reduce drastically the endogeneity bias of and in determining by invoking the underlying assumptions, we propose the regression This regression serves to predict the dose-response function (see Kluve, Schneider, Uhlendorff, and Zhao, 2012, for an example with a univariate multi-valued treatment). The parameters are estimated by ordinary least squares and the standard errors are estimated by a bloc-bootstrap procedure (with 200 replications) in order to respect two features: first, that each unit is observed in two years (1991 and 2001) and, second, that (3) involves estimatesˆ rather than true GPS scores . The regression results corresponding to (3) are summarized in Table 4.
- Table 4 here -Since the main effects and interactive terms involvingˆ are jointly highly significant, there is a strong indication of selectivity across different levels of and . Hence, the GPS is relevant and helps reducing the bias of the estimated response of (log) bilateral imports ( ) to changes in (log) bilateral migration of the skilled ( ) and the unskilled ( ). The parameters summarized in Table 4 lead to the plot of the dose-response function in Figure 3.
- Figure 3 here -The plot in Figure 3 contains four areas which indicate positive (in blue) versus negative (in red) responses on the one hand and at 5% statistically significant (dark) versus insignificant (light) responses on the other hand.
A key insight from Figure 3 is that trade is not maximized at the diagonal where skilled and unskilled migration reach similar levels but at the edges where the migration stock is either dominated by skilled or by unskilled individuals. Accordingly, our results suggest that bilateral trade flows are stimulated most by homogeneous migrant communities while a heterogeneous mix between skilled and unskilled migrants yields ceteris paribus a lower trade volume. According to the results, there is a statistically significant (at 5%) positive impact of any form of bilateral migration on bilateral trade. However, from Figure 3, we conclude that the trade-maximizing treatment corresponds to a polarization of migration types, irrespective of whether unskilled or skilled migration dominates. Note that the range of observed skilledunskilled migration combinations -which we mark by black edges in the surfaces in Figures 3 and 4 -does not support all of the cells in the figures. However, even though the polarization result is more obvious when predicting out of the support region sample -in cells marked by white edges in the figures -it is found also in the --subspace that is supported by the data.
- Figure 4 here -The polarization result becomes even more evident from inspecting the so-called treatment effect functions plotted in Figure 4. Again, blue areas correspond to positive and red to negative predictions while dark and light colors mark significant and insignificant predictions, respectively. The treatment effect functions represent the partial derivatives of the dose-response function with respect to the two types of treatment, that is, Panel A and B in Figure 4 correspond to ( | , , )/ and ( | , , )/ , respectively. Starting from approximately the diagonal in the figure, a marginal increase in yields a significantly positive trade effect. In contrast, the marginal effect of unskilled migration is insignificant or even negative if the country pair's point of origin is skewed towards relatively more skilled than unskilled immigrants. The reverse holds true for skilled immigration: a marginal increase in skilled immigration induces trade only for country pairs with relatively more skilled than unskilled immigrants.
Hence, we find evidence for a polarized impact of skill-specific migration on trade at the diagonal of the skilled-unskilled migration space. Moreover, we can reject the null hypothesis of a positive marginal effect of skilled migration in areas where unskilled migration dominates while we can reject it for unskilled migration in areas with predominantly skilled migration.

Balancing property: changing the number of groups and blocs
For an assessment of the validity of the results in the previous subsections, the balancing property of the GPS procedure is of particular relevance. For the problem at stake, it is worthwhile to compare the reduction in unbalancedness between the treatment and control units in terms of the distribution of t-statistics of pairwise comparisons of variables' average values within groups at different numbers of groups and blocs. In the previous subsections, we used a design with 9 groups and 8 blocs.
Here, we compare this design with one of 4 groups and 15 blocs on the one hand and one with 16 groups and 5 blocs on the other hand. The corresponding comparisons can be summarized by way of histograms of t-statistics in Figures 5 and 6. - Figures 5 and 6 here - The choice of the number of groups may also affect the results about the doseresponse function because the sample of observations satisfying the common support criterion varies with the number of groups. Note that the choice of the number of blocs has no impact on the common support but only determines the precision of the balancing property test. In the benchmark our common support sample consists of 2,212 observation while 2,956 and 2,060 observations remain in the sample when choosing 4 and 16 groups, respectively. For this reason we check whether our results are sensitive to the grouping. Figures 7 -10 illustrate the dose-response and treatment effect functions for the 4 and 9 group cases, respectively. The polarized impact of skill-specific migration on trade turns out to be very robust as is evident from the shape of the functions and the difference in the marginal effects of skilled and unskilled migrants at both sides of the diagonal of the treatment effect functions.

Results for different categories of goods trade
As outlined in Section 2, one question of interest to the matter is whether the results for aggregate bilateral trade flows are driven by specific product groups.
Since an investigation at the very disaggregated product level is not feasible for reasons of presentation, we decided to resort to an analysis at the level of aggregates of products. Two widely accepted ways of grouping products relevant to our analysis are the ones proposed by Rauch (1999; the so-called Rauch classification) and by Broda and Weinstein (2006). 20 Lastly, we also show evidence for the effects on imports at the one-digit aggregation of the SITC which discerns ten categories. 21 In either case, we utilize a 9-group and 8-bloc structure as with the benchmark results for aggregate bilateral imports.
- Figure 11 and 12 here - Rauch (1999) offers two classification schemes, one dubbed liberal and one conservative. Since the results turn out virtually identical for the two schemes, let us focus on the liberal classification, here. We consider the level of differentiated and homogeneous bilateral imports as the outcome variables of interest from that classification. From the upper panel in Figure 11 we observe a strong polarization tendency as the treatment effect function for unskilled is unambiguously for unskilled being relatively higher than skilled and vice versa for the treatment effect 20 Rauch's classification distinguishes between differentiated products, homogeneous products, and an intermediate category. Hence, each single observation on aggregate bilateral trade flows may be split into the corresponding three sub-aggregates. Broda and Weinstein distinguish different groups of imports according to their elasticity of substitution.
21 Revision 3 of the Standard International Trade Classification (United Nations) distinguishes between ten groups of products (the so-called one-digit level of disaggregation): Food and live animals (0), Beverages and tobacco (1); Crude materials, inedible, except fuels (2); Mineral fuels, lubricants, and related materials (3); Animal and vegetable oils, fats, and waxes (4); Chemicals and related products, not else specified (5); Manufactured goods classified chiefly by material (6); Machinery and transport equipment (7); Miscellaneous manufactured articles (8); Commodities and transactions not classified not elsewhere in the Standard International Trade Classification (9). function of skilled. In contrast, for homogeneous goods the overall relationship is generally weaker and in particular there is no significant difference in the marginal treatment when crossing the diagonal for most -combinations.
Results for the classification of Broda and Weinstein (2006) confirm these findings. In Figure 12 we report the effects of skilled and unskilled immigration on low and high elasticity of substitution imports. Again we can see a strong polarization pattern for low elasticity of substitution (differentiated) goods, while for high elasticity of substitution (homogeneous) goods is weaker.
- Figure 13 here -We gain further confidence in the above results from an analysis of the effects of skilled and unskilled immigration on the volume of imports in the ten one-digit sectors of the SITC (see Figure 13). Interestingly, there is relatively little evidence of a polarized effect of migration in the sectors with the lower one-digit numbers up to sector 5 (Chemicals and related products, not else specified) and in sector 9 (Commodities and transactions not classified not elsewhere). Notice that those are the types of goods that we might think of a simpler goods from a bird's eye view. However, there is evidence of polarization in sectors 5-8 which, on average, contain more sophisticated and differentiated manufactures. Hence, the polarized effect of migration on aggregate trade is not driven by different qualitative effects of skilled migration on differentiated goods versus unskilled migration on homogeneous goods. Rather, migration displays a polarized effect on aggregate trade mainly through differentiated goods as such, and there is relatively more differentiated goods trade with a polarization of migrants among either the high-skilled or the low-skilled immigrants. All of the above results are robust to using the alternative group-bloc structures as considered above (see the long working paper version of this manuscript for evidence).

Conclusions
This paper assesses the role of skilled versus unskilled migration for bilateral trade in a large data-set of country pairs. A flexible reduced-form model is postulated where stocks of skilled and unskilled migrants at the country-pair level are determined as endogenous continuous treatments. By invoking the conditional mean independence assumption and weak unconfoundedness, the impact of different levels of skilled and unskilled migration on the volume and structure of bilateral trade is assessed in a quasi-experimental design. This is accomplished through a generalization of existing estimation procedures for an assessment of causal effects of univariate continuous treatments on outcome.
In view of theoretical and earlier empirical work on the impact of migration on trade, we specified two hypotheses: that immigrants formed stronger networks within the same skill group than across skill groups so that polarized (predominantly low-skilled or predominantly high-skilled) immigrant networks generated more trade than mixed ones; and that the knowledge-creation-related effect of (polarized) immigrant networks was more important for differentiated goods trade than for homogeneous goods trade. The empirical analysis in this paper provides support for either one of those hypotheses. First, we find evidence of a polarized impact of skillspecific migration on trade: highly concentrated skilled or unskilled migrants induce higher trade volumes than a balanced composition of the immigrant base. Second, a polarization of migrants -no matter of whether they are skilled or unskilled -tends to induce more trade in differentiated goods relative to non-differentiated goods. Both bits of evidence are consistent with our interpretation in terms of the role of closure in supporting the operation of immigrant communities in bridging trading communities for goods likely to face contracting problems. That is, as in Burt's work we find that it is the interaction between brokerage and closure that explains network success. Notice that equ. (5) implies that treatment and outcome are independent from each other conditional on the GPS for multivariate treatments.
The following proves that the GPS estimated on basis of the observed , , combinations can be used to eliminate the bias in [ ( , )] which is due to differences in the covariates across observations. To do so we estimate the outcome as a function of the (observed) GPS and the treatments , which yields ( , , ) = [ | = , = , = ] where is a vector with elements and . In a second step, we average this conditional expectation over the predicted GPS for a spectific level of treatment. That is, we do not average over the GPS = (  Proof 2 We denote by ( , ) ( | , , ) the conditional density of ( , ) being equal to conditional on = , = and ( , , ) = . Then using Bayes' rule and the theorem above (proof 1 allows for the reformulation in the second line below), Accordingly,   Rauch (1999) and Broda and Weinstein (2006). Notes: We summarize information for those observations that have non-missing, positive levels of , , and as well as for all covariates. When estimating the effects for subgroups of bilateral trade respective dependent variable determines the number of observations (see Table 1). (1) (2) ---- Notes: * * * , * * , * denote significance levels at 1, 5, and 10%, respectively. Notes: * * * , * * , * denote significance levels at 1, 5, and 10%, respectively. and refer to the logarithm of the stock of unskilled and skilled immigrants, respectively who reside in the importer country and originate from the exporter country.ˆ refers to the generalized propensity score calculated according to equation (1) using the coefficients from the first-stage regression in Table 3. We estimate the standard errors of the dose-response function by bootstrapping with 200 iterations that take into account that the second-stage estimates involve imprecision from first-stage estimates.