A Topological View on the Identification of Structural Vector Autoregressions

The notion of the group of orthogonal matrices acting on the set of all feasible identification schemes is used to characterize the identification problem arising in structural vector autoregressions. This approach presents several conceptual advantages. First, it provides a fundamental justification for the use of the normalized Haar measure as the natural uninformative prior. Second, it allows to derive the joint distribution of blocks of parameters defining an identification scheme. Finally, it provides a coherent way for studying perturbations of identification schemes becomes relevant, among other things, for the specification of vector autoregressions with time-varying covariance matrices. JEL classification: C1, C18, C32


Introduction
Structural vector autoregressive (SVAR) models have established themselves as an indispensable tool in empirical macroeconomics. While these models capture reasonably well the dynamic properties of the data, their economic interpretation in terms of structural shocks is discussed controversially because these models suffer from a fundamental identification problem. This problem is addressed by imposing some restrictions (short-run, long-run, sign restrictions, etc.) which are more or less funded in a priori economic reasoning. The econometric aspects of the identification problem has been analyzed by Rubio-Ramírez, Waggoner, and Zha (2010) and Waggoner and Zha (2003) in the spirit of Rothenberg (1971).
We view the identification problem as an invariance property of the group of orthogonal matrices on the set of observationally equivalent identification schemes. While already anticipated in the previously mentioned papers, following this route rigorously presents several advantages. First, the identification problem is given a precise mathematical framework. In this framework, the invariance principle naturally leads to the use of the normalized Haar measure as an uninformative prior (Jaynes, 1968). Second, it allows the derivation of the joint distribution of the impact effects and not just of a single coefficient as in Baumeister and Hamilton (2015, section 3). Third, the action of the group allows to conceive a kind of perturbation analysis of the identification scheme. This is not only interesting in itself, but can be used to formulate time-varying covariance matrices in a coherent way.

Structural Vectorautoregressive Models
Consider a vector autoregressive (VAR) processes {X t } with observations in the state space R n and defined as the stationary solution of the stochastic difference equations of order p with constant coefficients Φ 1 , . . . , Φ p : where Σ is symmetric and positive definite. The reduced form shocks Z t are obtained from the structural shocks {V t } by a linear weighting scheme where the n × n matrix B is left unrestricted. The uncorrelatedness assumption of the structural shocks is very much accepted in the literature. Otherwise, there would remain some unexplained relationship between them. The 1 assumption that the structural shocks have a covariance matrix equal to the identity is just a convenient normalization. Although this is not necessary for the discussion, it will be assumed that {X t } admits a causal representation with respect to {Z t }. Thus, there exists a sequence of matrices {Ψ j }, j = 0, 1, 2, . . ., with Ψ 0 = I n and ∞ j=0 Ψ j < ∞ such that Such a causal representation of X t in terms of current and past Z t 's exists if and only if det( While the VAR, usually, gives a good summary of the data, at least up to the second moments, it is just a preliminary first step in the analysis. The second and more controversial step aims at identifying the structural shocks {V t } and their effects on X t+j , j = 0, 1, 2, . . . These effects are propagated over time and captured by the sequence {Ψ j B }, j = 0, 1, 2, . . . , known as the impulse response function. The shocks and their propagation are usually given an economic interpretation and are at the core of the SVAR approach. Relying on second moments only or assuming a Gaussian framework, it is easy to see that the simultaneous equation system (2.2) is not identified, i.e. it is impossible to extract B just from the knowledge of Z t alone. 1 Indeed, taking the symmetry of covariance matrices into account, the nonlinear equation system (2.5) delivers only n(n+1)/2 independent equations for n 2 unknown coefficients in B. Thus, there is a need of n 2 −n(n+1)/2 = n(n−1)/2 additional equations. A customary solution to the underidentification problem is to place enough restrictions on the matrix B so that the equation system (2.5) admits a unique solution. We call these identifying restrictions an identification scheme.
One popular form of restrictions is to set some coefficients a priori to zero. These, so-called, short-run restrictions have to come from either additional, usually theoretical, reasoning or other a priori reasoning and are subject to controversy. Another common way to place restrictions on B is to assume that the cumulated effects of some particular shocks on some variable equals zero. Thus, these so-called long-run restrictions impose zeros on Ψ(1)B. Obviously, short-and long-run restriction do not exclude each other, but can complement each other. As the gap between equations and unknowns grows quadratically, it becomes more and more difficult to incorporate reasonable restrictions as the dimension n of the VAR increases. 2 3 An Algebraic Interpretation of the Identification Problem

The group action of orthogonal matrices
One aim of this paper is to provide a deeper conceptual framework which in the end should allow a better understanding of the fundamental identification problem and of the solution techniques proposed in this context. Before presenting some results it is necessary to introduce some algebraic and topological notions. Let M n be the vector space of n × n matrices with real entries. It is clear that to any matrix A = A ij ∈ M n we can associate a point in R n 2 and hence identify the vector space M n with R n 2 . In this way, M n can be equipped with the Euclidian metric of R n 2 . With respect to this metric, the usual matrix operations are continuous and even smooth. 3 Since det A is a continuous function from M n to R, the set of invertible matrices is an open subset of M n which forms a group with respect to the matrix multiplication. This group is called the general linear group and denoted by GL n .
In the following, the subgroup of orthogonal matrices O n , i.e. matrices Q with the property Q Q = I n , will be of special interest. It can be shown that O n is a compact (closed and bounded) subgroup of GL n . 4 This implies that O n has a finite Haar measure (see Diestel and Spalsbury, 2014, chapter 5). This measure can be normalized to make it a probability distribution. 5 This distribution can be efficiently implemented numerically by applying the QR-decomposition to a random matrix A with law L(A) = N(0, I n ⊗ I n ), i.e. the elements of A are iid N(0,1) random variables (see Birkhoff and Gulati, 1979;Stewart, 1980;Edelman and Rao, 2005, for details). In Section 3.2, we derive an analytic expression for the density of subblocks of the normalized Haar measure on O n . This result will then be used to derive a corresponding result for the identification schemes. For this purpose, we define the set of conceivable identification schemes, called the set of structural factorizations, and an action of O n on this set. This set is nonempty because every positive definite symmetric matrix admits a unique Cholesky factor R such that Σ = R R with R being an uppertriangular matrix with positive diagonal entries (see, for example, Meyer, 2000, 154-155). Clearly, any Proof. Consider the function F (B) = B B − Σ. Because the usual matrix operations are continuous and the set consisting just of the zero matrix is closed, Consider the following map: Note it is well-defined because QB ∈ B(Σ) as B Q QB = B B = Σ and continuous, in fact even smooth. Moreover, the map satisfies: (i) I n ∈ O n and I n B = B for all B ∈ B(Σ).
Thus, the map defines a continuous (even smooth) group action of O n on B(Σ).
In addition, the group action is transitive because for any B 1 , B 2 ∈ B(Σ) we can find Q ∈ O n such that (Q, B 1 ) = B 2 . To see this take Q = B 2 (B 1 B 1 ) −1 B 1 = B 2 Σ −1 B 1 . This property means that we can move matrices in B(Σ) around via homeomorphism of B(Σ) onto itself. For any given element B ∈ B(Σ) the isotropy subgroup or stabilizer H B is defined as Given the invertibility of B, this subgroup is the trivial subgroup which consists just of the identity element I n . Thus, the group action is not only transitive, but also free. The quotient group O n /H is therefore just O n itself. Given these preliminaries, we can apply two classic theorems of Weil (see Diestel and Spalsbury, 2014, chapter 6) to draw the following two conclusions: This invariance property provides an important justification to state our ignorance about the correct identification scheme in B(Σ) in terms of the normalized Haar measure on O n (see Jaynes, 1968, sections 7 and 8). This view stands in contrast to the arguments put forward by Baumeister and Hamilton (2015).

The distribution of structural factorizations
The next step consist in the characterization of the Haar probability measure on O n which will then allows us to derive the corresponding probability measure on B(Σ). For this purpose, we closely follow the exposition in Eaton (1989, chapter 7). Let F q,n , q ≤ n, denote the set of n × q real matrices F such that F F = I q . These are matrices such that the q columns of F belong to some orthogonal matrix in O(n). For q ≤ n, let A denote a n × q random matrix with probability law L(A) = N(0, I n ⊗ I q ), then F = A(A A) −1/2 is a well-defined element of F q,n . According to Eaton (1989, proposition 7.1), F has the uniform distribution on F q,n . Partition the matrix A as A = (A 1 , A 2 ) with A 1 and A 2 being n 1 × q, respectively (n − n 1 ) × q matrices, then F can be written as Denote the upper n 1 × q block of F by F 1 . Thus, F 1 can be thought of as the n 1 ×q upper left block of some random orthogonal matrix.
Using the same notation as in Theorem 1, it is possible to derive an explicit formula for the density of F 1 .
Theorem 2. For n 1 ≥ q and n 1 + q ≤ n, the density of F 1 is given by where the constant C 1 is given by −n 1 q ω(n − n 1 , q)/ω(n, q) 7 Further background material can be found in Eaton (1983) and Anderson (1984).
Remark 1. The case n 1 ≤ q is treated by taking transposes.
It is easy to see that for the special case n 1 = q = 1, equation (2) reduces to which is exactly the formula given in Baumeister and Hamilton (2015, equation (32)). From the preceding Section 3.1 it is clear that every B ∈ B(Σ) can be represented as QR where Q is some orthogonal matrix and R is the Cholesky factor of Σ. Assuming as in Theorem 2 that n 1 ≥ q and n 1 + q ≤ n, partition Q and the Cholesky factor R conformably into four submatrices such that Q 11 and R 11 are n 1 × q, respectively q × q submatrices. Remember that R 11 is an invertible upper triangular square matrix. The upper left block n 1 × q block of B ∈ B(Σ), B 11 , is then given by B 11 = Q 11 R 11 . From this relation we can deduce the distribution of B 11 Corollary 1. Given that the n 1 × q matrix Q 11 , n 1 ≥ q, has density (3.2), B 11 = Q 11 R 11 has a density given by f 2 (X) = C 1 | det R 11 | −n 1 (det(I q − R −1 11 X XR −1 11 )) (n−n 1 −q−1)/2 1(R −1 11 X XR −1 11 ) = C 1 (det Σ 11 ) −(n−q−1)/2 (det(Σ 11 − X X)) (n−n 1 −q−1)/2 1(R −1 11 X XR −1 11 ) (3.3) where Σ 11 is the upper left q × q block of Σ. Using transposes a similar argument can be made for the case n 1 ≤ q.
Proof. As Q 11 is distributed according to the density in equation (3.2), the theory of multidimensional analysis (see, for example Magnus and Neudecker, 1999) implies that B 11 has a density given by (3.3).

Perturbation analysis
As O n is a smooth submanifold in R n 2 we have the notion of a tangent space at any point Q ∈ O n denoted by T Q O n . We can explicitly realize this tangent space as: It can be shown by simple algebra that T Q O n is a vector space. Because γ(t) ∈ O n , γ(t) T γ(t) = I n where the superscript T denotes transposition. Differentiating this expression with respect to t and evaluating at t = 0 gives γ (0) T γ(0) + γ(0) T γ (0) = γ (0) T Q + Q T γ (0) = 0. This implies that Q T γ (0) equals some skew-symmetric matrix S so that γ (0) = QS. The tangent space at the identity therefore consists of the skew-symmetric matrices and we have T Q O n = QT In O n = Qo n where o n is the so-called Lie algebra of O n . 8 The Lie algebra of a Lie group is just the tangent space at the identity together with the Lie bracket [., .] which for matrix groups is just the commutator [A, B] = AB − BA. In our case the Lie algebra o n is the vector space of n × n skew-symmetric matrices S, i.e. of matrices with the property S = −S. These matrices have trace, tr(S), equal to zero and purely imaginary eigenvalues. The dimension of the vector space o n is given by the number of free parameters of n × n skew-symmetric matrices. Thus, = "number of missing equation".
The advantage of working with the Lie algebra is that it is a vector space where many computations can be accomplished more easily. The mechanism for passing information from the Lie algebra to the matrix Lie group is the matrix exponential map which is defined through the power series expansion: It can be shown that the exponential map is defined for every A ∈ M n and that exp(A) is invertible. Thus, exp : M n → GL n . Note that exp(A + B) is in general not equal to exp(A) exp(B). A sufficient condition for this to hold is that A and B commute, i.e. that AB = BA. For skew-symmetric matrices, the matrix exponential delivers the special orthogonal group SO n of rotations which is defined as the set of orthogonal matrices with determinant equal to one. Thus, we have exp(o n ) = SO n ⊂ O n . Given a fixed identification scheme B ∈ B(Σ)), we can consider (small) random perturbation of this scheme as follows where the free elements of the skew-symmetric matrix S are independently drawn from a zero mean normal distribution. Other centered distributions are, of course, reasonably as well. Thus, S is centered at the zero matrix whose exponential equals the identity matrix. Clearly, exp(S)B ∈ B(Σ). Note that because SO n has determinant equal to one, exp(S) excludes reflections and simple interchanges of in the ordering of the structural shocks.
Alternatively, we can construct a random walk like time-varying scheme starting with some identification scheme B 0 ∈ B(Σ). The term random walk is motivated by taking the matrix logarithm of B t and expressing it using the Baker-Campbell-Hausdorff formula (see Hall, 2003;Higham, 2008): + higher order terms.
The "higher order terms" are given only in terms of Lie brackets of log(B t−1 ) and log(exp(S t )). Ignoring the higher order terms and noting that log(exp(S t )) = S t , we can approximate log B t as (Böttcher and Wenzel, 2008), we can make log B t following approximately a random walk by making the perturbation S t small. It is, however, not wise to use the logarithmic formulation because the matrix logarithm is not unique and the principal matrix logarithm not always defined (see Higham, 2008, for more details). Thus, it is recommended to work directly with the propagation mechanism (3.4) and take the logarithmic specification only for motivation.
As B t ∈ B(Σ) the resulting identification schemes will hold the covariance matrix constant. To relax this assumption, one can follow a suggestion of Primiceri (2005) and factorize the time-varying covariance matrix Σ t as Σ t = L t Ω t L t where L t is an upper triangular matrix with ones on the diagonal and Ω t is a diagonal matrix with strictly positive diagonal elements. Like the Cholesky decomposition, this factorization is also unique. The evolution of Σ t can then be specified as L t = exp(T t )L t−1 where T t is an upper triangular matrix with zeros on the diagonal and normally distributed mean zero elements above the main diagonal and Ω t = Ω t−1 exp(D t ) and D t is a diagonal matrix with mean zero normally distributed diagonal elements. 9 The scheme is initialized with L 0 being an upper triangular matrix with one on the diagonal and Ω 0 being a diagonal matrix with strictly positive diagonal elements. As exp(T t ) results in an upper triangular matrix with ones on the diagonal, the above scheme is well-defined and produces symmetric and positive definite matrices Σ t .

Conclusion
Although the flavor of this note is on the theoretical side, it is should be clear that the suggestions derived from viewing the identification problem in terms of a group action are awaiting to be implemented in empirical applications. This should affirm their usefulness and open the way for further insights along this route.