The Carbon Kuznets Curve: A Cloudy Picture Emitted by Bad Econometrics?

In recent years many empirical studies of environmental Kuznets curves employing unit root and cointegration techniques have been conducted for both time series and panel data. When using such methods several issues arise: the effects of a short time dimension, in a panel context the effects of cross-sectional dependence, and the presence of nonlinear transformations of integrated variables. We discuss and illustrate how ignoring these problems and applying standard methods leads to questionable results. Using an estimation approach that addresses the second and third problem we find no evidence for an inverse U-shaped relationship between GDP and CO 2 emissions.


Introduction
Besides nuclear energy, hydrocarbon deposits like petroleum, coal and natural gas are currently the only available large scale primary energy sources. Their utilization as fossil fuels leads to the emission of -amongst other pollutants -CO 2 , which is considered the principal anthropogenic greenhouse gas. Since most economic activities require the use of energy, a link between economic activity and CO 2 emissions appears plausible.
Increased atmospheric CO 2 concentration can persist up to thousands of years. It exerts a warming influence on the lower atmosphere and the surface, i.e. it initiates climate change, see Peixoto and Ort (1992) or Ramanathan, Cicerone, Singh, and Kiehl (1985). Rational and efficient climate policy requires reliable understanding and accurate quantification of the link between economic activity and CO 2 emissions.
In this paper we are concerned with the econometric analysis of the relationship between GDP and emissions. The core of the econometric approach to study the link between GDP and CO 2 emissions usually consists of estimating a reduced form relationship on cross-section, time series or panel data sets. Estimation techniques as well as variables chosen vary substantially across studies. Most of the studies focus on a specific conjecture, the so-called 'Environmental Kuznets Curve' (EKC) hypothesis. This hypothesis claims an inverse U-shaped relation between (the logarithm of per capita) GDP and pollutants. In the specific case of CO 2 emissions we speak of the 'Carbon Kuznets Curve' (CKC). 1 The EKC hypothesis has been initiated by the seminal work of Gene Grossman and Alan Krueger (1991Krueger ( , 1993Krueger ( , 1995. They postulate, estimate and ascertain an inverse Ushaped relationship between measures of several pollutants and per capita GDP. 2 Summary discussions of this empirical literature are contained in Stern (2004) or Yandle, Bjattarai, and Vijayaraghavan (2004), who find more than 100 refereed publications of this type. 3 1 Note that also specifications in levels instead of logarithms are used in the literature.
2 To be precise, Grossman and Krueger actually use a third order polynomial in GDP whereas the quadratic specification seems to have been initiated by Holtz-Eakin and Selden (1995).
3 A prominent alternative approach to study the links between economic activity and environmental damages in general or emissions in particular is given by 'Integrated Assessment Models', pioneered with DICE of Nordhaus (1992) or MERGE by Manne, Mendelsohn, and Richels (1995). This approach consists of specifying and calibrating a general equilibrium model of the world economy. The economic model is then linked with a climate model to integrate the effects of climate change feedbacks into the economic analysis. To a certain extent the econometric and the integrated assessment model approach can be seen as complements. Unfortunately, only few authors have tried to combine the two approaches, see McKibbin, Ross, Shackleton, and Wilcoxen (1999) for one example. Müller-Fürstenberger and Wagner (2006) contains a discussion on the relation or lack thereof between reduced form econometric findings and relationships derived with structural models.
In the empirical EKC literature there is an ongoing discussion on appropriate specification and estimation strategies, see Dijkgraaf and Vollebergh (2005) for a comparative discussion of econometric techniques applied in the literature. It is the aim of this study to contribute to this discussion by addressing several serious econometric problems that have not been appropriately handled or have been ignored to a certain extent up to now. We focus on parametric approaches only. For non-parametric EKC approaches (see e.g. Millimet, List, and Stengos, 2003), semi-parametric approaches (see e.g. Bertinelli and Strobl, 2005) or versions based on spline interpolation (see e.g. Schmalensee, Stoker, and Judson, 1998). To illustrate our arguments, we present computations for a panel data set for the Carbon Kuznets Curve comprising 107 countries (see Table 7 in Appendix A) over the period 1986-1998. The discussion is on two -related -levels. The first level is a fundamental discussion on whether the time series and panel EKC literature is applying the appropriate tools. The second level is the issue whether the tools applied -abstracting from the first level issue of appropriateness -are applied correctly or with enough care. Of course, those two issues are related and there will be substantial overlap in the two levels of discussion. We turn to both issues below, but can already present the main observation here: The answer is rather negative on both levels. When using time series or panel data the issue of stationarity of the variables is of prime importance for econometric analysis. This is due to the fact that the properties of many statistical procedures depend crucially upon stationarity or unit root nonstationarity, i.e. integratedness, of the variables used. Related to this issue is the question of spurious regression (see e.g. Phillips, 1986) versus cointegration, see the discussion below. One part of the literature, in particular the early literature, completely ignores this issue, see e.g. Grossmann and Krueger (1991), Grossmann and Krueger (1995), Holtz-Eakin and Selden (1995) or Martinez-Zarzoso and Bengochea-Morancho (2004) to name just a few. 4 Another part of the literature is mentioning the stationarity versus unit root nonstationarity issue, these include inter alia Perman and Stern (2003), Stern (2004); and when allowing also for breaks Heil and Selden (1999) or Lanne and Liski (2004) (the latter in a time series context) are two examples. The problem is, however, that three important issues -on both levels of our discussion -have been ignored thus far. On the first level these are the following two -given that the variables are indeed unit root nonstationary. First, the usual formulation of the EKC involves squares or even third powers of (log) per capita GDP. If (log) per capita GDP is integrated, then nonlinear transformations of it, as well as regressions involving such transformed variables, necessitate a different type of asymptotic theory and also lead to different properties of estimators. Regression theory with nonlinear transformations of integrated variables has only recently been studied in Chang, Park, and and Park and Phillips (2001). Currently no extension of these methods to the panel case is available, which posits a fundamental challenge to the empirical EKC literature. 5 To our knowledge this nonlinearity issue has not been discussed at all in the EKC literature. One study avoiding the above problems is given by Bradford, Fender, Shore, and Wagner (2005). These authors base their results, using the Grossman and Krueger (1995) data, on an alternative specification comprising instead of income over time only an average level and the average growth rate of income. Thus, this study circumvents the problems arising in regressions containing nonlinear transformations of nonstationary regressors.
Second, in case of nonstationary panel analysis, all the methods used so far in the EKC literature rely upon the cross-sectional independence assumption. I.e. these, so called 'firstgeneration' methods assume that the individual countries' GDP and emissions series are independent across countries. This rather implausible assumption is required for the first generation methods to allow for applicability of simple limit arguments (along the crosssection dimension). In this respect progress has been made in the theoretical literature and several panel unit root tests that allow for cross-sectional dependence are available. Several such tests are applied in this study, which seems to be the first application of such 'secondgeneration' methods in the EKC context. Third, on the second level of discussion the major issue is the following: The 'firstgeneration' methods used for nonstationary panels are known to perform very poor for short panels. This stems from the fact that the properties of the panel unit root and cointegration tests crucially depend on the properties of the methods used at the individual country level.
If the panel method is based on pooling, then the very poor properties of time series unit root tests for short time series feed directly into bad properties of pooled panel unit root tests, see 5 To be precise: We do not claim that e.g. estimation of a quadratic CKC with integrated regressors by some panel cointegration estimator is inconsistent. We just want to highlight that the (linear cointegration) methods are not designed for such problems and that nonlinear transformations of integrated variables have fundamentally different asymptotic behavior than integrated properties. These two aspects imply that it is up to now unclear what such results could mean, or which properties such results have. Hlouskova and Wagner (2006a) for ample simulation evidence. We show in this paper that by applying bootstrap methods -ignoring as mentioned above the more fundamental question of applicability of such first-generation methods at that point -quite different results than based on asymptotic critical values can be obtained. We have implemented three different bootstrap algorithms that are briefly described in Appendix B. These are the so called parametric, the non-parametric and the residual based block (RBB) bootstrap. The RBB bootstrap has been developed for non-stationary time series by Paparoditis and Politis (2003). The first two methods obtain white noise bootstrap replications of residuals due to pre-whitening and the latter is based on re-sampling blocks of residuals to preserve the serial correlation structure.
The difference between the parametric and the non-parametric bootstrap is essentially that in the former the residuals are drawn from a normal distribution while in the latter they are re-sampled from the residuals.
It seems that the uncritical use of asymptotic critical values might be a main problem at the second level of discussion we intend to initiate with this paper. Even stronger, we find that one can support any desired result concerning unit root and cointegration behavior by choosing the test (and to a certain extent the bootstrap algorithm) 'strategically'. Furthermore and related to the above, standard panel cointegration estimation results of the CKC differ widely across methods. These findings cast serious doubt on the results reported so far in the literature -even when ignoring the two first level problems (nonlinear transformations, cross-sectional correlations). We include this type of discussion to show that, even when ignoring the first level problems and staying within the standard framework applied up to now, the empirical (panel and time series) EKC literature is an area where best econometric practice is generally not observed.
The paper is organized as follows: In Section 2 we briefly discuss the specification of the CKC and set the stage for the subsequent econometric analysis. In Section 3 we discuss first-and second-generation panel unit root test results, and in Section 4 we discuss panel cointegration test results. Section 5 presents the results of CKC estimates based on panel cointegration methods and based on de-factorized data. Section 6 briefly summarizes and concludes. Two appendices follow the main text. In Appendix A we describe the data and their sources. Appendix B briefly describes the implemented bootstrap procedures.

The Carbon Kuznets Curve
In our parametric CKC specification we focus on the logarithms of both per capita GDP, denoted by y it , and per capita CO 2 emissions, denoted by e it . 6 Here and throughout the paper i = 1, . . . , N indicates the country and t = 1, . . . , T is the time index. Qualitatively similar results have also been obtained when using levels instead of logarithms.
Our sample encompasses 107 countries, listed in Table 7 in Appendix A, over the years 1986-1998. The major region omitted is the former Soviet Union and some other formerly centrally planned economies. We also exclude countries with implausibly huge jumps in emissions or GDP, as it is the case for Kuwait for example. 7 The basic formulation of the CKC in logarithms we focus on, is given by with u it denoting the stochastic error term, for which depending upon the test or estimation method applied different assumptions concerning serial correlation have to be made. In this formulation we include in general both fixed effects, α i , and country specific linear trends, γ i t. These linear trends are included to allow for exogenous decarbonization of GDP due to technical progress and structural change. We have also experimented with specifications that include time specific fixed effects, but these do not qualitatively change the results. Thus, we focus in this paper on specifications including fixed effects or fixed effects and trends, since these are the two common specifications of deterministic components in unit root and cointegration analysis. The above formulation of the CKC posits a strong homogeneity assumption.
The functional form is assumed to be identical across countries, since the coefficients β 1 and β 2 are restricted to be identical across countries. Heterogeneity across countries is only allowed via the fixed effects and linear trends. Different α i shift the overall level of the relationship, and different trend slopes γ i across countries shift the quadratic relationship differently across countries over time. This, of course, might be too restrictive for a large panel with very heterogeneous countries. See e.g. Dijkgraaf and Vollebergh (2005) for a discussion (and rejection) of homogeneity for a panel of 24 OECD countries.
Equation (1) allows to discuss one major overlooked problem related with potential non-stationarity of emissions and/or GDP, namely that of nonlinear transformations of integrated regressors. The macro-econometric literature has gathered a lot of evidence that in particular GDP series are very likely integrated. A stochastic process, x t say, is called integrated, if its first difference, ∆x t = x t −x t−1 is stationary, but x t is not. Let ε t denote a white noise process.
Then the simplest integrated process is given by the random walk, i.e. by accumulated white Thus, as expected, the first difference of the square of an integrated process is not stationary. The relationship to the CKC is clear: Both the logarithm of per capita GDP and its square are contained as regressors. However, at most one of them can be an integrated process. This fact has been overlooked in the CKC literature up to now. 9 The above problem is fundamental and no estimation techniques for panel regressions with nonlinear transformations of integrated processes are available. Only recently there has been a series of papers by Peter Phillips and coauthors that addresses this problem for time series observations. This literature shows that the asymptotic theory required, as well as they asymptotic properties obtained, generally differ fundamentally from the standard integrated case. 10 However, we nevertheless will present in the sequel unit root and cointegration tests with the quadratic specification as given in (1) to show that the cointegration techniques have probably not been applied with enough care. We perform bootstrap inference for unit root and cointegration tests to show that the asymptotic critical values are bad approximations to the finite sample critical values. Thus, we argue, that even when being unaware of the first level problems, a more critical application of standard techniques would lead a researcher in good faith to use the proper toolkit to be more cautious about the results.
As a benchmark case, where we avoid the issue of nonlinear transformations of integrated regressors, we also include the linear specification (2) in our analysis. It is only this linear case for which the panel unit root and cointegration tests can be applied with a sound theoretical 8 Here and throughout we ignore issues related to starting values as they are inessential to our discussion. 9 Several authors, e.g. Perman and Stern (2003), even present unit root test results on log per capita GDP and its square. Furthermore they even present 'cointegration' estimates of the EKC. This does not have a sound econometric basis. Consistent estimation techniques for this type of estimation problem have to be established first.
10 Relevant papers are Park and Phillips (1999), Chang, Park, and Phillips (2001) and Park and Phillips (2001). Current research is concerned with an application of these theoretical results to the EKC/CKC hypothesis. basis, given that log per capita GDP is indeed integrated.
The second first level issue is that all the EKC papers that use panel unit root or cointegration techniques only apply so called 'first generation' methods. These methods require that the regressors and the errors in the individual equations are independent across countries. In this paper we present the first application of 'second generation' panel unit root tests that allow for cross-sectional dependence. Indeed strong evidence for cross-sectional dependence is found, discussed in Section 3.2. In the following sections, to parallel the historical development of methods, we nevertheless will start with reporting the results obtained by bootstrapping first generation methods. All results, and in particular the first generation results, have to be seen in the light of the critical issues this paper is concerned about.

Panel Unit Root Tests
The time dimension of the sample with only 13 years necessitates the application of panel unit root tests. The section is split in two subsections. In subsection 3.1 we discuss first generation tests that rely upon the assumption of cross-sectional independence. So far, only this type of test has been used in the EKC literature. In particular we show that a straightforward application of such tests can be misleading, since the finite sample distribution of the test statistics can differ substantially from the asymptotic distribution. This implies that inference based on the asymptotic critical values can be misleading, see Hlouskova and Wagner (2006a) for large scale simulation evidence in this respect. Panel unit root tests should therefore only be applied with great care.
In subsection 3.2 we report results obtained by applying second-generation panel unit root tests. We find strong evidence for cross-sectional correlation. Of course, these second generation methods should be applied first, and only when no cross-sectional correlation is found, one can resort to first generation methods. We revert this logical sequence to show that conditionally upon staying in the first generation framework, much more care than is common in the literature should be taken.

First Generation Tests
Let x it denote the variable we want to test for a unit root, i.e. we want to test the null where u it are stationary processes assumed to be cross-sectionally independent. 11 The tests applied differ with respect to the alternative hypothesis. The first alternative is the homogenous alternative H 1 1 : ρ i = ρ < 1 (and bigger than -1) for i = 1, . . . , N. The heterogeneous alternative is given by H 2 1 : ρ i < 1 for i = 1, . . . , N 1 and ρ i = 1 for i = N 1 + 1, . . . , N. 12 Especially for heterogeneous panels the alternative H 2 1 might be the more relevant one. However, in the literature both alternatives have been used. In our data set we observe no systematic differences in the results between tests with the homogenous and the heterogeneous alternative, see the results below and in Table 1.
In general, some correction for serial correlation in u it will be necessary. Two main approaches are followed in all tests, either a non-parametric correction in the spirit of Phillips and Perron (1988) or in the spirit of the augmented Dickey Fuller (ADF) principle. The ADF correction adds lagged differences of the variable (∆x it−j )to the regression to achieve serially uncorrelated errors.
The following tests have been implemented: 13 The test of Levin, Lin, and Chu (2002) (LL), which is after suitable first step corrections a pooled ADF test. The second is the test of Breitung (2000) (UB), which is a pooled ADF type test based on a simple bias correction. These two tests, due to their pooled estimation of ρ, test against the homogenous alternative. We have implemented three tests with the heterogeneous alternative. Two of them are developed by Shin (1997, 2003). One is given by essentially the group-mean of individual ADF t-statistics (IP S), and the other is a group-mean LM statistic (IP S − LM ). Finally, we present one test based on the Fisher (1932) test principle. The idea of Fisher is to use the fact that under the null hypothesis the p-values of a continuous test statistic are uniformly distributed over the unit interval. Then, minus two times the logarithm of the p-values is distributed as χ 2 2 . This implies that the sum of N independent 11 Note that also time specific effects θt can be included. 12 With limN→∞ N 1 N > 0. 13 We abstain here from a discussion of the limit theory underlying the asymptotic results. Most of the results are based on sequential limit theory, where first T → ∞ followed by T → ∞.
transformed p-values is distributed as χ 2 2N . 14 We follow the work of Maddala and Wu (1999) (MW ) and implement this idea by using the ADF test for each cross-sectional unit.
We furthermore report the Harris and Tzavalis (1999) test results. Their test is identical to the Levin, Lin, and Chu (2002) test, except for that Harris and Tzavalis derive the exact finite T test distribution. This may be advantageous for our short panel. The exact test distribution comes, however, at a high price. Harris and Tzavalis derive their results only for the case when u it is white noise. All tests except for MW , which is χ 2 2N distributed, are asymptotically standard normally distributed. We perform tests with both the homogenous and the heterogeneous alternative to see whether there are big differences in the test behavior across these two tests. This, however, does not appear to be the case.
As mentioned already, it is known that for panels of the size available in this study (with T only equal to 13), the asymptotic distributions of panel unit root and panel cointegration tests provide poor approximations to the small sample distributions (see e.g. Hlouskova and Wagner, 2006a). Hence, the notorious size and power problems for which unit root tests are known in the time series context also appear in short panels. In Figure 1 we display the asymptotic null distribution (the standard normal distribution) and the bootstrap null distributions (from the non-parametric bootstrap) when testing for a unit root in CO 2 emissions including only fixed effects in the test specification, for the five asymptotically standard normally distributed tests. The figure shows substantial differences between the bootstrap approximations to the finite sample distribution of the tests and their asymptotic distribution. Thus, basing inference on the asymptotic critical values can lead to substantial size distortions. The discrepancy between the asymptotic and the bootstrap critical values can also be seen in Table 1, where the 5% bootstrap critical values are displayed in brackets. They vary substantially both across tests and also across the two variables. In most cases they are far away from the asymptotic critical values ±1.645, respectively 249.128 for the Maddala and Wu test.
It is customary practice in unit root testing to test in specifications with and without linear trends included. A linear trend in the test equation, when there is no trend in the data generating process reduces the power of the tests. Conversely, omitting a trend when there is a trend in the data, induces a bias in the tests towards the null hypothesis. Graphical inspection of the data leads us to conclude that for CO 2 emissions the specification without trend might be sufficient, whereas for GDP the specification with trend might be more appropriate. The nature of the trend component of GDP is a widely discussed topic in macro-econometrics.
Both, unit root nonstationarity with its underlying stochastic trend or trend-stationarity with usually a linear deterministic trend are plausible and widely used specifications. This uncertainty concerning the trend specification for GDP manifests itself also in our panel test results, see below. For completeness we report both types of results for both variables. The first block in Table 1

Second Generation Tests
In this subsection we now discuss the results obtained with several second generation panel unit root tests that allow for cross-sectional correlation. 15 Since there is no natural ordering in the cross-sectional dimension as compared to the time dimension, the first issue is to find tractable specifications of models for cross-sectional dependence in non-stationary panels.
There are two main strands that have been followed in the literature, one is a factor model approach, the other is based -more classical for the panel literature -on error components models.
Let us turn to the idea of the factor model approach first. In this set-up the cross-sectional correlation is due to common factors that are loaded in all the individual country variables, e.g.
Here F t ∈ R k are the common factors and λ i ∈ R k are the so called factor loadings. In general the factors can be either stationary or integrated. After de-factoring the data, i.e.
subtracting the factor component contained in the variables in each country, panel unit root tests (of the first generation type) can be applied to the asymptotically cross-sectionally uncorrelated de-factored data.
The most general approach in this spirit is due to Bai and Ng (2004). They provide estimation criteria for the number of factors, as well as -in the case of more than one common factor -tests for the number of common trends in the factors. 16 Thus, the factors are allowed to be stationary or integrated of order 1. After subtracting the estimated factor component, Bai and Ng (2004) propose Fisher type panel unit root tests in the spirit of Maddala and Wu (1999) and Choi (2001). The first one is asymptotically χ 2 distributed, BN χ 2 and the second is asymptotically standard normally distributed, BN N . The two tests are specified against the heterogeneous alternative. See the results in Table 2. The number of common factors is estimated to be three for CO 2 and four for GDP. These estimation results are based on the information criterion BIC 3 , see Bai and Ng (2004) for details. The two tests for common trends within the common factors, CT and CT AR , result in three common trends, except for GDP when both fixed effects and individual trends are included (where four common trends are found). 17 Thus, essentially all common factors seem to be nonstationary.
Let us next turn to the unit root tests on the de-factored data (only implemented for the fixed effects specification). Somewhat surprisingly the null hypothesis is not rejected for CO 2 emissions, but is clearly rejected for GDP by both tests. Thus, it seems that some nonstationary idiosyncratic component is present in the CO 2 emissions series.
16 Testing for common trends can be seen as the multivariate analogue to testing for unit roots. In case of a single common factor, a unit root test for this common factor is sufficient, of course. 17 The two tests for the number of common trends differ in the treatment of serial correlation. In CT a non-parametric correction is performed, whereas CTAR is based on a vector autoregressive model fitted to the   Bai and Ng (2004) present the most general factor model approach to non-stationary panels currently available and the only one that allows for testing also the stochastic properties of the common factors. For completeness we also report the results obtained with two more restricted factor model approaches, due to Moon and Perron (2004) and Pesaran (2003). Moon and Perron (2004) present pooled t-type test statistics based on de-factored data (where we use the factors estimated according to Bai and Ng). We report two asymptotically standard normally distributed tests with serial correlation correction in the spirit of Phillips and Perron (1988), denoted with MP a and MP b . Pesaran (2003) provides an extension of the Im, Pesaran, and Shin (2003) test to allow for one factor with heterogeneous loadings. His procedure, which is a suitably cross-sectionally augmented IPS Dickey Fuller type test, works by including crosssection averages of the level and of lagged differences to the IPS-type regression. Pesaran The results from these factor model approaches are contained in the upper block of    show that the asymptotic behavior established in Chang (2002) holds only for N ln T / √ T → 0, which requires N being very small compared to T . This is of course not the case in our data set with N = 107 countries and T = 13 years.
Thus, the results of the Chang NL-IV tests should be interpreted very carefully.

Conclusions from Panel Unit Root Analysis
There seems to be evidence for cross-sectional correlation for both variables. The results obtained with the method of Bai and Ng (2004) indicate the presence of three to four integrated common factors. The general conclusion from the second generation tests, except for the Chang tests, is that after subtracting the common factors, the idiosyncratic components may well be stationary. The evidence in that direction is stronger for GDP than for CO 2 emissions.

Panel Cointegration Tests
In this section we present panel cointegration tests for cross-sectionally uncorrelated panels.
We do this to show, similarly to the panel unit root tests, that a more careful application of these methods would lead researchers to be skeptical about the validity of their results. This second level discussion is, of course overshadowed by the two first level problems.
We test for the null of no cointegration in both the linear (2) and the quadratic (1) specification of the relationship between the logarithm of per capita CO 2 emissions and the logarithm of per capita GDP. We test in quadratic version solely to show that a careful statistical analysis with the available (but inappropriate) tools of panel cointegration would already lead to ambiguous results. In particular we show that the test results depend highly upon the test applied and whether the asymptotic or some bootstrap critical values are chosen. These observations, which can be made by just using standard methods, should lead the researcher to draw only very cautious conclusions. Of course, we know from the discussion in Section 2 that cointegration in the usual sense is not defined in equation (1).
This observation has been ignored in the empirical literature and several published papers, e.g., Perman and Stern (2003) discuss 'cointegration' in the quadratic specification based on unit root testing for emissions, GDP and the square of GDP.
We have in total performed ten cointegration tests, seven of them developed in Pedroni (2004) and three in Kao (1999 If both log emissions and log GDP are integrated, the possibility for cointegration between the two variables arises. Cointegration means that there exists a linear combination of the variables that is stationary. Thus, the null hypothesis of no cointegration in the above equation is equivalent to the hypothesis of a unit root in the residuals,û it say, of the cointegrating regression. The usual specifications concerning deterministic variables have been implemented. In Table 4 we report test results when including only fixed effects and when including fixed effects and individual specific trends. Pedroni (2004) develops four pooled tests and three group-mean tests. Three of the four pooled tests are based on a first order autoregression and correction factors in the spirit of Phillips and Ouliaris (1990). These are a variance-ratio statistic, P P σ ; a test statistic based on the estimated first-order correlation coefficient, P P ρ ; and a test based on the t-value of the correlation coefficient, P P t . The fourth test is based on an augmented Dickey-Fuller type test statistic, P P df , in which the correction for serial correlation is achieved by augmenting the test equation by lagged differenced residuals of the cointegrating regression. Thus, this test is a panel cointegration analogue of the panel unit root test of Levin, Lin, and Chu (2002).
For these four tests the alternative hypothesis is stationarity with a homogeneity restriction on the first order correlation in all cross-section units.
To allow for a slightly less restrictive alternative, Pedroni (2004) develops three groupmean tests. For these tests the alternative allows for completely heterogeneous correlation patterns in the different cross-section members. Pedroni discusses the group-mean analogues of all but the variance-ratio test statistic. Similarly to the pooled tests, we denote them with P G ρ , P G t and P G df . We report both the pooled and group-mean test results to see whether the test behavior differs systematically between these two types of tests.
After centering and scaling the test statistics by suitable correction factors, to correct for serial correlation of the residuals and for potential endogeneity of the regressors in the cointegrating regression, all test statistics are asymptotically standard normally distributed. Figure 1 are available from the authors upon request. Again substantial differences between the asymptotic critical values and the bootstrap critical values emerge.

Figures similar to
The first block in Table 4 corresponds to the parametric bootstrap, the second to the non-parametric bootstrap and the third to the RBB bootstrap. Within each block, the first block-row corresponds to the linear specification and the second to the quadratic specification.
Both, the linear and the quadratic specification have been tested with fixed effects and with fixed effects and individual specific linear trends. Just to be sure, note again, that testing for cointegration in the quadratic formulation lacks theoretical econometric foundations.
Let us start with the linear specification, which is 'only' subject to the first level problem of cross-sectional correlation. There is some variability of results across bootstrap methods and again in a variety of cases bootstrap inference leads to different conclusions than resorting to the asymptotic critical values. This happens in particular for the RBB bootstrap. For the quadratic specification, i.e. the Kuznets curve in its usual formulation, roughly the same observations as for the linear specification can be made, ignoring again the problem that a correct econometric foundation is lacking due to the nonlinear transformation. Again the RBB bootstrap leads to the fewest rejections of the null hypothesis. The null hypothesis of no cointegration is more often rejected for the linear formulation than for the quadratic specification. Note that no systematic differences between the pooled and the group-mean tests occur.
The above results provide some weak evidence for the presence of a cointegrating relationship between GDP and emissions. However, as for the panel unit root tests, by choosing the test and the bootstrap strategically, any 'conclusion' can be supported. This 'volatility' of the results should lead researchers to be more cautious than what is usually observed.  The autoregressive lag lengths in both the autoregression based tests, the parametric bootstrap and the non-parametric bootstrap are equal to 1. The window-length of the Bartlett kernels used in the non-parametric tests is also equal to 1. The block-length in the RBB bootstrap is equal to 2.

Estimation of the Carbon Kuznets Curve with Panel Cointegration Methods and Using De-factored Observations
We finally turn to estimating the CKC relationship. In the first subsection we estimate the CKC with panel cointegration methods that correspond to the first generation panel unit root and cointegration tests. These methods are of course subject to the two first level critiques. As for the panel unit root and cointegration tests, we include results based on this type of methods to show that by careful application the conclusions one could draw, even when staying in this framework, are very weak. In the second subsection we estimate the CKC relationship on de-factored data. These are, up to potentially bad small sample performance of the Bai and Ng (2004) procedure, stationary. Thus, for these data standard panel regression techniques are applicable. Note also that the de-factored data are (asymptotically) cross-sectionally uncorrelated.

Panel Cointegration Estimation
Two types of estimators for the cointegrating relationship in panels are applied: fully mod- and Moon (1999), nesting the discussions in Pedroni (2000) and Kao and Chiang (2000). As in the time series case, the idea of FM-OLS is to obtain in the first step OLS estimates of long-run variance matrices. In the second step another regression is run on corrected variables, with the correction factors being functions of the estimated long-run variance matrices. The idea of D-OLS is to correct for serial correlation and endogeneity by augmenting the cointegrating regression by leads and lags of first differences of the regressors. The panel extensions of D-OLS are discussed in Kao and Chiang (2000) and Mark and Sul (2003). Both methods, FM-OLS and D-OLS, yield asymptotically normally distributed (for first T followed by N to infinity) estimated cointegrating vectors, which implies that χ 2 inference via e.g. Wald tests can be conducted. Note for completeness that various versions of both FM-OLS and D-OLS in weighted or unweighted fashions have been implemented, see Hlouskova and Wagner (2006b) for a description. These differ i.a. in how the correction factors are computed.  Let us start with a discussion of the results obtained when estimating the linear formulation (2). Note again that the linear specification is 'only' subject to the problem of crosssectional correlation, i.e. only to one of the first level problems. In the specification including only fixed effects, the coefficient of log per capita GDP is between 0.6 and 0.8, depending upon estimation method. For the specification including unit specific trends, the estimated coefficient on log per capita GDP varies between 0.4 and 0.8, depending upon estimation method. The null hypothesis of a unit GDP elasticity of emissions, i.e the null hypothesis H 0 : β 1 = 1 in equation (2), is rejected for all estimation methods and specifications.

FM-OLS D-OLS wD-OLS
We now turn to the estimation results obtained for the quadratic formulation (1), which is subject to both first level problems. Table 5 reports one FM-OLS estimation result and two different versions of D-OLS estimation results, abbreviated by D-OLS and wD-OLS, due to Mark and Sul (2003) and Kao and Chiang (2000). We report two different D The final column in Table 5 reports the estimation results based on the LSDV estimator, to see which kind of results are obtained when ignoring the nonstationarity issue altogether.
When only fixed effects are included, the difference to the FM-OLS and wD-OLS estimators are not too large. However, when fixed effects and trends are included, the differences to the cointegration results become substantial. Furthermore, no coefficient appears to be significant in that case. By choosing other estimators for stationary panels, all kinds of results can be generated. Thus, also when ignoring issues of nonstationarity a researcher can or cannot come to the conclusion of the prevalence of a relationship between emission and GDP, depending upon the specification of the deterministic component and the estimation method.

Estimation with De-Factored Observations
We finally report the estimation results based on the de-factored observations, using the approach developed by Bai and Ng (2004) for de-factoring the data. Remember from Section 3 that three respectively four common factors have been found, all of which seem to be nonstationary, according to the Bai and Ng tests. An application of the unit root tests of Bai and Ng (2004) to the de-factored data indicates that the idiosyncratic components are stationary.
This implies that for the de-factored data standard regression theory developed for stationary variables applies. The results are displayed in Table 6. We present two estimation results.
The first applies if de-factorization is performed in the model with only fixed effects (DF − 2) and the second when de-factorization is performed in the model with fixed effects and trends (DF − 3). The preferred specifications of the estimated CKCs contain in both cases fixed country and time effects. 21 GLS estimation with cross section weights is performed to allow for different error variances across countries.
Since the data are de-factored here, the size of the coefficients cannot be directly compared with the results of Table 5, ignoring for the moment that the results presented in Table 5  subject to the problems discussed throughout the paper. Both coefficients are positive and significant, the coefficient on squared log per capita GDP in DF − 2 only at 7%. Thus, there is no evidence for an inverse U-shaped relationship as postulated by the CKC hypothesis.
Of course, these results are subject to the properties of de-factorization for short samples, which are yet not well understood. Apart from this problem, however, these estimates are the only ones presented in this paper that are based on an asymptotically well founded estimation theory, given that the data are indeed unit root nonstationary. Therefore, with all reservation necessary, we tentatively conclude that within our panel data set no evidence for an inverse U-shape relation between log per capita GDP and log per capita CO 2 emissions is present (after de-factoring the data).

Summary and Conclusions
In this paper we discuss three important econometric problems associated with the Environmental Kuznets Curve, that arise when the data are of the unit root nonstationary type.
We exemplify the discussion for the Carbon Kuznets Curve, relating per capita GDP to per capita emissions of CO 2 , on a panel comprising 107 countries over the years 1986-1998.
The three problems are grouped in two first level problems and one second level problem.
The two first level problems are the use of nonlinear transformations of integrated processes as regressors and cross-sectional dependence in nonstationary panels. The second level problem is the poor performance of (panel) unit root and cointegration techniques for short time series or panels.
Let us start with the first level problems. The discussion in Section 2 shows that nonlinear transformations -like the square -of an integrated process are in general not integrated.
This implies that the usual unit root and cointegration techniques cannot be applied for the EKC and CKC, if log per capita GDP is indeed integrated. This point has been completely overlooked in the empirical EKC and CKC literature up to now, even in that part of the literature that acknowledges the potential presence of integrated processes. We do not solve the problem in this study, since up to now no estimation techniques for panels containing nonlinear transformations of integrated processes are available. Currently only results for the time series case, developed by Peter Phillips and co-authors, are available. Ongoing research is investigating the applicability of (panel extensions of) these methods to EKC/CKC estimation.
To address the second of the first level problems, cross-sectional dependence in nonstationary panels, the literature offers several approaches in the meantime. Prior to this study, only so called first generation panel unit root and cointegration techniques have been applied, which all rely upon cross-sectional independence. In the CKC case this amounts to independence of both GDP and CO 2 emissions across countries. We present in this paper the first application of second generation methods that allow for cross-sectional correlation in the EKC/CKC context. The results obtained with the method of Bai and Ng (2004) indicate that non-stationary common factors may well be present in both GDP and emissions. The results also indicate that the idiosyncratic components (i.e. the de-factored data) are stationary. In this respect the evidence is stronger for GDP than for emissions. Based on these findings we estimate the CKC on de-factored data, which are cross-sectionally uncorrelated and, see above, also stationary. Thus, standard panel regression techniques are applicable to the de-factored data and also the nonlinearly transformed regressor does not pose additional problems in the stationary context. We find no evidence for an inverse U-shape relationship.
These results are, of course, subject to potentially bad small sample performance of the Bai and Ng de-factoring procedure, potential failure of the homogeneity assumption across countries and potential structural instabilities over time. The first issue is not yet understood in practice and the second and third issue have not been discussed in detail in this paper, since the focus in this paper is solely on the implications of unit root nonstationarity on the estimation of Environmental Kuznets Curves.
The second level problem is the, in our opinion, relatively uncritical use of unit root and cointegration methods in the EKC/CKC literature. It is known that unit root and cointegration techniques perform poor for short time series. This poor performance translates into poor performance for short panels, see Hlouskova and Wagner (2004a,b) for simulation evidence. Staying within the first generation framework (and thus ignoring the first level problems!), we show that careful application of the methods indicates that the results should be interpreted with caution. By implementing three different bootstrap algorithms we show that (three different estimates of) the finite sample distributions differ substantially from the asymptotic distributions. This implies that inference based on the asymptotic critical values can be highly misleading. Thus, we conclude that by 'strategic' choice of the unit root and cointegration tests any conclusion can be 'supported'. This holds, to a lesser extent, even when resorting to bootstrapping, where the RBB bootstrap results differ in several cases from the other two. This finding, however, may be due to the short time dimension that poses a challenge to block re-sampling based bootstrap schemes. The results for the two other bootstrap algorithms are rather similar.
Ignoring the first level problems also for estimation, we estimate the CKC with panel cointegration estimators. This exercise leads to highly variable results across different variants of estimators, with less variability across the FM-OLS variants than across the D-OLS variants. From this variability we conclude that also estimation results obtained within the first generation framework should have been interpreted with much more caution than has been done in the literature.
Summing up we conclude -a bit polemically -that a large part of the empirical EKC and CKC literature up to now has been plagued by using inappropriate methods in a sloppy manner. Hence, the title of the paper. However, recent progress made in the theoretical literature will soon equip the empirical researcher with the necessary tools to clear the sky.

Appendix: Data and Sources
Our analysis is based on balanced panel data for 107 countries for the period 1986-1998 listed in with ∆ denoting the first difference operator. The lag lengths p i are allowed to vary across the individual units in order to whiten the residuals u it . Denote withû it the residuals of equation (5). Then the following two bootstrap procedures are based on the autoregression residuals.
(i) Parametric: The bootstrap residuals are given by u * it =σ i ε it , whereσ 2 i denotes the estimated variance ofû it and ε it ∼ N (0, 1).
Given u * it the bootstrap data themselves are generated from y * it = y it t = 1, . . . , p i + 1 γ i0 + y * it−1 + p i j=1γ ij ∆y * it−j + u * it t = p i + 2, . . . , T As indicated above Paparoditis and Politis (2003) propose a different bootstrap algorithm, the RBB bootstrap, based on unrestricted residuals. By unrestricted residuals we mean residuals which are not generated from an equation like (5) where a unit root is imposed, due to estimation in first differences, but from an unrestricted first order autoregression. Higher order serial correlation is not dealt with by fitting an autoregression, but by bootstrapping blocks, with the block-length increasing with sample size at a sufficient rate. 23 The implementation of the RBB bootstrap is as follows: (i) Estimate the equation y it = γ i0 + ρ i y it−1 + u it by OLS (for each unit).
(ii) Calculate the centered residuals (y iτ −ρ i y iτ −1 ). 22 For notational simplicity we assume pi = p for all units here in the discussion. 23 For an autoregression based implementation of this idea of using unrestricted residuals see Paparoditis and Politis (2005). . Here x denotes the integer part of x. By taking the same realizations j m for all cross-sections, the cross-sectional correlation is preserved in the bootstrap data.
(iv) Denoting with m = t−2 b and with s = t − mb − 1, the bootstrap data are given by: it−1 +ũ ijm+s t = 2, . . . , kb + 1 Note again for completeness that for the tests that only allow for an intercept in the test equationγ i0 above is replaced by zero.
For the panel cointegration tests used in this study we also apply three bootstrap algorithms. These are essentially multivariate extensions of the above. The starting point for the autoregression based bootstrap procedures is now given by for i = 1, . . . , N, t = 1, . . . , T . Now α i , δ i ∈ R, X it = [x it1 , . . . , x itk ] and A i , β i ∈ R k . Note for completeness that for the test proposed by Kao (1999) β i = β holds for all units. Under the null hypothesis of no cointegration between y it and X it it follows that u it is integrated and that ε it is stationary.
We estimate 24 the above equations (8) and (9) to obtain the estimated residualsv it = Under the null hypothesis v it ∈ R k+1 is a process whose first coordinate is integrated and whose other coordinates are stationary. These known restrictions can be incorporated into the autoregressive modelling to obtain white noise residuals by fitting a vector error correction model which incorporates the exact knowledge about the cointegrating space. This is achieved by estimating:v with B i ∈ R k+1×k . The residuals from equation (10),μ it say, resemble white noise due to appropriate choice of the lag lengths p i .
As in the univariate case for the panel unit root tests, two bootstrap versions are implemented based onμ it .
(i) Parametric: Estimate the variance-covariance matrix ofμ it ,Σ i say. Denote its lower triangular Cholesky factor byL i and generate the bootstrap residuals µ * it =L i η it with η it ∼ N (0, I k+1 ).
(ii) Non-parametric: µ * it is given by re-samplingμ it . By choosing the same re-sampling scheme for all cross-sectional units, the contemporaneous correlation structure is preserved.
The bootstrap series y * it and X * it are generated by first inserting µ * it in (10) and by then inserting the resulting v * it in (8) and (9). The multivariate implementation of the RBB bootstrap is based on an unrestricted VAR (1) for Z it = [y it , X it ] as follows.
(i) Estimate the first order VAR Z it = A i0 + A i1 Z it−1 + v it .
(ii) Compute the centered residuals Choose the block-length b and draw j 0 , . . . , j k−1 from the discrete uniform distribution over the set {1, . . . , T − b} with k = T −1 b and x denotes the integer part of x. By taking the same realizations j m for all cross-sections, the cross-sectional correlation is preserved in the bootstrap data.
(iv) Denoting with m = t−2 b and with s = t − mb − 1, the bootstrap data are given by: it−1 +ṽ ijm+s t = 2, . . . , kb + 1 Note again for completeness that for the tests that only allow for an intercept in the test equationÂ i0 above is replaced by zero.