Applying Markowitz’s Critical Line Algorithm

We provide a Matlab quadratic optimization tool based on Markowitz’s critical line algorithm that signiﬁcantly outperforms standard software packages and a recently developed operations research algorithm. As an illustration: For a 2000 asset universe our method needs less than a second to compute the whole frontier whereas the quickest competitor needs several hours. This paper can be considered as a didactic alternative to the critical line algorithm such as presented by Markowitz and treats all steps required by the algorithm explicitly. Finally, we present a benchmark of diﬀerent optimization algorithms’ performance.


INTRODUCTION
1 1 Introduction Markowitz's (1952) Portfolio Theory formulates investors' decisions in a mean-variance setting as a problem of minimizing portfolio variance at a certain level of expected return. The solution set of this problem is visualized by the minimum variance frontier and its positively sloped segment, the efficient frontier.
When including the practically relevant condition that short selling cannot take place -thus, that investors cannot weight assets negatively in portfolios -the system of linear equations is extended by n (weak) inequalities, one for each asset. Minimizing portfolio variance with equality and inequality conditions requires computationally expensive quadratic optimization algorithms such as first stated in the critical line algorithm (CLA) of Markowitz (1956) and and the extended simplex algorithm of Wolfe (1959).
Current research by Steuer, Qi, and Hirschberger (2006) indicates the still remaining need of improving quadratic optimization algorithms' performance. 1 They develop a simplex based algorithm that calculates all turning points of the constrained minimum variance frontier while significantly reducing computational time compared to standard software packages such as Matlab, Cplex, LINGO, Mathematica and premium Solver. When comparing their algorithm's performance with the VBA based implementation of the Optimizer by Markowitz and Todd (2000) they encounter the problem of the 256 column limitation of MS Excel. In order to circumvent this problem we implemented a similar algorithm as in Markowitz and Todd (2000) in 1 As an example resampling simulations as discussed in Michaud (1998) benefit from an algorithmic improvement of quadratic optimization. For a recent application see Wolf (2006).
1 Fortran 90 where lower and upper bounds on asset weights are imposed. 2 We show that this algorithm outperforms the algorithm in Steuer, Qi, and Hirschberger (2006) by a factor of almost 10 thousand (for 2000 assets) and standard software packages by even more.
From this observation we conclude that the high performance of the CLA is not well known. In fact, excluding the paper by Steuer, Qi, and Hirschberger (2006), no studies benchmarking quadratic optimization algorithms' performance are known to us. Moreover, as no publicly available software package exists that computes the entire constrained minimum variance frontier, we provide a Matlab optimization package using our Fortran 90 implementation of the CLA.
Finally, this paper can be considered as a didactic alternative to the standard CLA as presented by Markowitz. All numerical improvements to the algorithm are treated explicitly.
The rest of the paper is organized as follows: Section 2 introduces the mathematical framework and definitions required by the CLA. Section 3 formulates the quadratic optimization method and the numerical improvement. Section 4 describes performance tests and computational experience.

The Framework
Given is a universe of n assets with Σ: an (n × n) positive definite covariance matrix, µ: n vector with the assets' expected returns, w: n vector with the assets' weights.
For a minimum variance portfolio where lower and upper bounds on asset weights are included we define l: an n vector containing the asset weights' lower bounds (w i ≥ l i , ∀i), u: an n vector containing the asset weights' upper bounds (w i ≤ u i , ∀i), F: a subset of N = {1, 2, . . . , n} containing all assets' indices where weights are within their bounds (l i < w i < u i ). We shall call the corresponding assets free assets.
B: the subset of all asset indices where the weights lie on one of their bounds The sets of assets on their upper and lower bounds will be called U and L, respectively, with B = U ∪ L. k ≡ |F|: number of elements in F.
When writing the free assets' indices at the beginning, the covariance matrix Σ, the expected return vector µ and the weight vector w can be subdivided into

Unconstrained Case
Before turning to the constrained variance minimization it is worth to familiarize oneself again with the unconstrained portfolio minimization problem.
The unconstrained problem can be written as a Lagrange function with the Lagrange coefficients γ and λ and the expected return level µ p . 3 The first constraint in (2) ensures that assets' weights sum to one; the second constraint tells that portfolio variance is minimized at the expected return level of µ p . Differentiating with respect to w, γ and λ and setting the results to zero, one obtains a system of (n + 2) linear equations. Solving this system leads to the solution of the variance minimizing weight vector w * .
Obviously, w * will not generally satisfy the constraints l i ≤ w i ≤ u i .

Constrained Case
The computation of efficient portfolios becomes more difficult when inequality constraints on asset holdings are included. If short selling of assets is forbidden, asset weights must be non-negative and the constraint has the form of w ≥ 0.
However, some problems require a more general constraint with upper and lower bounds. For such problems the optimal solution will be a portfolio where assets' weights lie within their respective bounds, thus, l i ≤ w i ≤ u i for all i. In the following, we shall call the solution of this problem a constrained minimum variance portfolio 4 and the graphical representation 3 In the following, we will denote a vector of ones (1, ..., 1) ′ as 1. To emphasize that the size of a vector 1 is the same as of a set A, we will write 1A (e.g. 1F has size k and 1B has size n − k.) 4 This is also called a feasible mean-variance efficient portfolio in the literature.
of the set of solutions in the (µ p , σ p ) plane as constrained minimum variance frontier (CMVF).
One important feature of the CMVF is the existence of turning points. 5 Definition 1 A constrained minimum variance portfolio is called turning point if in its vicinity other constrained minimum variance portfolios contain different free assets.
¾ When knowing which assets at a certain expected return level µ p are free in the constrained minimum variance portfolio, thus, knowing F, the problem can be formulated easily. This is stated in the following proposition.
Proposition 1 The free weights in the solution w * of the constrained case are equal to the weights in the solution of the unconstrained case with the Lagrange function where the components w F are subject to minimization and the components w B are fixed to their actual values.
Proof This is obvious: changing w F infinitesimally ensures that all weights remain free. This cannot lead to a smaller L otherwise w * would not be a solution of the constrained case.
There is a major difference between (2) and (3); 6 the underlying subsets in (3) depend on the constrained minimum variance portfolio's location. 5 In Markowitz (1959) turning points are described as the intersections of two critical lines.
6 Note that for the case li = 0 and ui = ∞ (3) becomes which is the same as (2) bar the subscript F .
Since the subsets F and B do not change between turning points, the solutions between two turning points will be the solution of an unconstrained optimization upon the subset F.
Corollary 1 Combining two neighboring turning points with a real weight ω ∈ [0, 1] always leads to a constrained minimum variance portfolio.
Proof This follows from Proposition 1 and the fact that a linear combination of two solutions (for different values of µ p ) of the unconstrained problem is a solution as well.
Differentiating (3) with respect to w F yields At this point the constraint w ′ 1 = 1 must be adapted according to (1) leading to Solving (4) together with (5) for γ yields Note that here λ is set exogenously instead of µ p . Therefore, λ determines the value of γ and finally the expected return of the minimum variance portfolio. The value of µ p is fictitious in (3) and the optimal solution is solely determined by λ. In fact, it is very similar to set λ exogenously or to calculate with a fixed µ p and look at λ as Lagrange multiplier. This is because λ and µ ′ w (= µ p ) are linearly related between turning points and because a higher λ yields a (constrained) minimum variance portfolio with higher expected return. This is stated here by Proposition 2 with the proof being banned to the appendix.

6
Proposition 2 Between two turning points λ and µ ′ w are linearly related with a positive slope

The Algorithm
The main idea for the algorithm presented is the following: first, the turning point with the highest expected return value is found; then the next lower turning point is calculated. This is illustrated in Figure 1. Figure 1: This figure shows the minimum variance frontier (dashed line) and the constrained minimum variance frontier for ten assets and an arbitrarily chosen non-singular covariance matrix. The dots represent the constrained frontier's turning points.
From the definition of a turning point we know that each of them will differ in the composition of its free assets. Therefore, for each of them (4) will hold for a different subset F. Except for the case where two or more 7 turning points lie upon each other 7 , passing a turning point one has to add or remove exactly one element from the set of free assets F.
Moreover, when moving downwards from a turning point to the next one, λ will decrease (see Proposition 2). When looking at turning points such as in Figure 1 it must therefore be that with T being the number of turning points.
The next two subsections show how to find the turning point with the highest expected return (turning point 1) and how to move to the next lower turning point. The third subsection shows a way how to improve the algorithm's performance significantly.

The Starting Solution
This algorithm requires an initial solution on the constrained minimum variance frontier. It is convenient to find the turning point with the highest expected return before moving to the next lower turning point.
Therefore, we order all assets with respect to their expected return values such that the first asset has the highest and the last asset the lowest expected return. After setting all asset weights to their lower bounds (w i = l i ) we start to increase the weight of the first asset. If the upper bound is reached and w ′ 1 < 1, we start to increase the second assets' weight and so forth.
This procedure terminates when w ′ 1 = 1 is reached. 8 7 We do not discuss this possibility even though the algorithm can cope with it; when calculating numerically, there will hardly be two or more turning points on one (σ, µ) location. Markowitz (1959) proposes as a solution to this situation to either alter the µ of one asset slightly or to use the method described in Markowitz (1956).
8 Obviously, this requires 1 ′ l ≤ 1 ≤ 1 ′ u. For 1 ′ u < 1 or 1 ′ l > 1 there is no solution to the portfolio optimization problem. For 1 ′ u = 1 or 1 ′ l = 1 the whole efficient frontier consists of only one portfolio, namely w = u or w = l, respectively.
The solution is typically a weight vector where the weights of the first assets are set to their upper bounds and the last assets' weights are set to their lower bounds. There is one asset in the middle, where the weight is between its bounds. We will call this asset the free asset and index it with We solve a simpler problem than Markowitz and Todd (2000) as we have the only equality constraint w ′ 1 = 1 whereas Markowitz and Todd consider the general constraint Aw = b. However, because in many practical situations the specific case w ′ 1 = 1 is sufficient, one can use our simpler algorithm to find the first turning point instead of the simplex algorithms used in Markowitz and Todd (2000).

Iteration
When moving from a turning point to the next lower one by decreasing λ, one of the following two situations will occur; either one free asset moves to one of its bounds or an asset formerly on its bound becomes free. These two cases have to be considered in order to compute the next turning point's λ and w.
Case a) one formerly free asset moves to its bound Let λ current belong to a turning point and let F be the set of the free assets slightly below this turning point (i.e. for λ such that λ current ≡ λ t > λ > λ t+1 ).
For this subset containing k variables (4) holds and thus Substituting γ from (6) into (7) gives us w F as a linear function of λ. When decreasing λ the asset i will hit the lower bound if the derivative dw i /dλ is 9 positive and the upper bound if the derivative is negative. We denote the value of λ as λ (i) at the point where asset i hits the corresponding bound.
λ (i) can be derived from the linear relation between w F i and λ resulting from (6) and (7). This gives with and Note that for k = 1 one gets C i = 0, which reflects the fact that the con- Therefore, case a) should be considered only for k > 1. C i is zero for all i if accidentally µ F is proportional to 1 F , i.e. µ i = µ j for all i, j ∈ F. The next λ < λ current where an asset wants to leave the subset F is or λ inside does not exist if k = 1 or C i = 0 for all i.
However, λ inside will only describe the next lower turning point if there is no portfolio with a λ where λ current > λ > λ inside and where an asset on its bound wants to get into the subset F. This situation is summarized by case b).
Case b) one asset formerly on its bound wants to become free When moving downwards in µ ′ w it might occur that an asset i formerly on its bound wants to become free. The corresponding portfolio is therefore a turning point where the subsets F and B have to be redefined. Let us denote the new subsets as where i ∈ B. Analogously to (8) the value λ (i) where the newly included asset i's weight moves away from its bound is given by with and where for convenience b i is used for (w F i ) i = w i , which is an asset on its bound possibly becoming free. If asset i was previously on its upper bound b i stands for u i , if it was on the lower bound it stands for l i . 10 In order to find the maximal λ (i) < λ current where an asset i currently on its bound wants to become free (12) must be applied for all i ∈ B.
Again, if no λ (i) < λ current exists, we remember that there is no solution for λ outside .

Finding the next turning point
In order to find out which case will occur, the values of λ inside and λ outside must be compared.
• If solutions for both λ inside and λ outside could be found, then the next turning point will have a λ defined as Thus, e.g. case a) is characterized by λ inside > λ outside .
• If a solution only for λ inside or λ outside could be found, λ new is overwritten by the respective value.
• Depending on which case occurs we replace F by F \ {i} or by F i , B by B ∪ {i} or B i and λ current by λ new .
• If no solution for λ inside and λ outside could be found, we have reached the lowest turning point and the algorithm terminates. 11 The pseudo-code in Appendix B summarizes the algorithm for the simplified case when l i = 0 and u i = +∞. 12 One way of checking the results is to look at the last turning point's weight vector. This weight vector must be the 'opposite' one to the initial solution. When ordering all weights with regard to their expected returns, the last weights must be at their upper and the first weights at their lower bounds. The remaining weight will correspond to the free asset.
11 For practical purposes, the lower half of the efficient frontier (with µ decreasing and σ increasing) does not interest us. In this case we can terminate the algorithm when σ starts increasing again which corresponds to λ = 0.
12 Note that in this case wB = 0 and the equations become much simpler. The starting solution is also simpler: one has to set the weight of the asset with the highest expected return to 1.
Note that in contrast to the calculations in (8) the specification of F i in (12) depends on i and Σ −1 F i must be recalculated for each i ∈ F. 13 Moreover, as the next turning point has one asset more or one asset less in F, the inverse of the respective covariance matrix Σ −1 F must be recalculated each time. We will show in the following, how these time consuming computations of inverses can be avoided.

Improving Performance
In the algorithm described above, the compositions of F and B change with one asset being included or excluded. Here we show that one can avoid recalculating the inverse of the corresponding matrices each time.
Expansion of the covariance matrix Σ F Lemma 1 Let A be a symmetric non-singular k ×k matrix, a a k ×1 vector and α a scalar. Then for the expanded matrix's inverse Proof Multiplying the expanded matrix with the right-hand side of (15) yields the identity matrix.
Our algorithm requires often expanding the subset F by one element i and recalculating the inverse of the covariance matrix, Σ −1 F i , for the new subset. Lemma 1 frees us from the burden of making this calculation all 13 Or at least the vectors Σ −1 F i 1F i and Σ −1 F i µF i have to be calculated. over again. This reduces the number of operations for inverting A a a ′ α from k 3 /3 to 2k 2 .

Reduction of the covariance matrix Σ F
Reducing the covariance matrix by one row and column does not require the inversion of the newly obtained matrix either. Having calculated the inverse of the expanded covariance matrix as in the previous section and now deleting the given row and column, the newly obtained matrix's inverse can be calculated. This is stated in Lemma 2 where for presentational purposes the given index is assumed to be the last one.
Lemma 2 Let A and B be k × k matrices, a and b k vectors and α and β holds then holds as well.

Key Improvement
The most remarkable improvement stems from the fact that in (12) we need This is stated in Proposition 3. (12),

Proposition 3 Expression
can be rewritten as where the used variables are defined as Proof The derivation is shown in the appendix.
It is remarkable that in (17) the vector a (the ith column of Σ corresponding to a trial i ∈ B) enters linearly. Therefore the calculation of λ (i) is fast, one should calculate only two scalar products with a for each i ∈ B.

Performance Tests
Obviously, it is problematic to compare different algorithms based on absolute CPU times. Their performance will strongly depend on the programming language and on the algorithm's memory requirements. When not testing all algorithms on the same computer, different processor performance, RAM size and system configurations do not allow for comparing CPU times directly.

15
However, there are two reasons for us to make such performance comparisons. First, when looking at the increase of CPU time at an increasing number of assets, the algorithms' relative performance is independent from the programming language and other hardware and software properties (as long as there are no memory bottlenecks). Second, the difference in the programming languages does not explain that amount of CPU time improvement such as obtained by our tests.
In the following a Fortran 90 implementation (Fortran 90 CLA) of the discussed algorithm is tested against three programs for the case where the lower bound is zero and the upper bound is infinity; a simplex like algorithm based on Wolfe (1959) coded in Java (Java Wolfe-Simplex as described and implemented in Niedermayer (2005)) and the quadratic optimization package of Matlab. Furthermore, we compare our results with those in Steuer, Qi, and Hirschberger (2006) 14 , whose simplex based multi-parametric optimization algorithm was implemented in Java (Java MPQ). The latter comparison is important; as argued in Steuer, Qi, and Hirschberger (2006), the MPQ outperforms Matlab, Cplex, LINGO, Mathematica, and Excel's premium Solver. Steuer, Qi, and Hirschberger (2006) did not compare the Java MPQ algorithm to the Excel Optimizer by Markowitz and Todd (2000) due to the 256 column limitation of Excel. Finally, we also provide run times of the Excel Optimizer by Markowitz and Todd (2000). Note that this implementation is provided by Markowitz and Todd (2000) for illustrative purposes in form of an Excel VBA macro and can calculate the efficient frontier for up to 256 securities (the maximal number of columns in Excel). Note further that even though we ran the Optimizer with the same set of constraints as the other problems, it can solve the optimization problem for a more general set of constraints.
For the tests illustrated in Figure 2 a positive definite covariance matrix was generated as where r (i) is an n vector containing random numbers between [0, 1] and is regenerated for each i. Since our results and the MPQ results in Steuer, Qi, and Hirschberger (2006) strongly depend on the number of free assets (i.e. assets not on their bounds), thus, of the maximum dimension,k, of (8) and (12), we made sure thatk/n is similar to that in Steuer, Qi, and Hirschberger (2006). In our tests with 1000 assetsk was 60 and when testing with 2000 assetsk was 250.
In Figure 2 both axes are logarithmic. The slope of the linear fit (of an OLS fit) corresponds therefore to the exponent of the respective algorithm's CPU time increase at an increasing number of assets. Note that the problem with Java Wolfe-Simplex is that the program's RAM requirements increase rapidly which allows only for the computation of problems up to 150 assets.
The test's results are summarized in Table 1.
Since the Wolfe-Simplex algorithm and the Matlab quadratic optimization package only calculate one single point on the constrained minimum variance frontier and do not calculate the whole frontier analytically such as our Fortran CLA algorithm, the Optimizer by Markowitz and Todd (2000) and the algorithm of Steuer, Qi, and Hirschberger (2006), the CPU times reported in Table 1 support our method even more.
As in Steuer, Qi, and Hirschberger (2006), our tests were conducted on a Dell desktop with a 3.06 GHz processor.   Hirschberger, Qi, and Steuer (2004). Note further that the performance we have provided for the Fortran 90 CLA is calculated from the run times without matrix sizes 100 and 150 because for smaller matrix sizes the fixed costs of calculation seem to distort the data. When including matrix sizes 100 and 150 we get O(n 1.3 ).

Conclusions
This paper presents the critical line algorithm (CLA) developed by Markowitz and demonstrates its strong computational performance compared to standard software packages and to a recently published optimization algorithm. We find that our implementation of the CLA -available on request from the authors in form of a Matlab package -outperforms the current Matlab optimization tool by a factor of approximately 15 thousand when the problem size (number of assets) is 2000. When comparing with the algorithm in Steuer, Qi, and Hirschberger (2006) that also computes all turning points analytically such as the CLA does, the performance improvement is still around 8 thousand.
As all steps of the algorithm are treated explicitly, this code can be di-rectly used for the implementation in other programming languages and used for problems of large scale portfolio optimization and CPU time intensive Monte Carlo simulations.

A Proofs
Proof (Proposition 2) For tractability we define three constants From equation (4) and the definitions given in (1) follows that Differentiating this expression with respect to λ yields Since between two turning points Σ F does not change, µ p (λ) = µ ′ w(λ) is linear in λ with a slope given by (18). We show below that this slope is positive.
From the positive definiteness of Σ follows that its submatrix Σ F and Σ −1 F (≡ (Σ F ) −1 ) are positive definite as well.
We introduce a vector x ≡ 1 F − αµ F , with α ∈ R. Then x ′ Σ −1 F x can be written as Positive definiteness of Σ −1 F means x ′ Σ −1 F x > 0 for any vector x, hence, the equation C 11 − 2αC 1µ + α 2 C µµ = 0 cannot have a solution for α. Therefore, the discriminant is negative which gives Proof (Proposition 3) According to Lemma 1 Σ −1 F i can be expressed in terms of Σ −1 F , a and α. Multiplying Σ −1 F i by 1 F i and µ F i respectively yields and Multiplying (19) and (20) by 1 ′ F i and plugging the values into (12) yields the denominator in (17).
Using the following properties and Lemma 1 yields the numerator in (17):

B Pseudo-Code
The following pseudo-code describes the procedure which calculates all turning points of the minimum variance frontier where restrictions on negative portfolio weights are imposed. Parameters are the covariance matrix Σ and expected returns µ. Asset weights w (t) F are returned for each turning point t. Between turning points weights are linear combinations of the two surrounding turning points. All equation numbers in the pseudo-code and its description below refer to equations in Niedermayer and Niedermayer (2006).
Calculate-Turningpoints(Σ, µ) 1 j ← arg max i µ i 2 w (1) j ← 1; ∀i = j : w (1) i ← 0 3 F ← {j} 4 λ current ← ∞ 5 t ← 1 6 repeat 7 £ Case a) Free asset moves to its bound 8 if size(F) > 1 9 then (9) 13 i inside ← arg max i∈F {λ i |λ i < λ current } 14 £ Case b) Asset on its bound becomes free 15 if size(F) < size(µ) 16 then 17 for if i inside = nil or i outside = nil 24 then t ← t + 1 25 λ current ← max{λ i inside , λ i outside } 26 if λ i inside = max{λ i inside , λ i outside } 27 (7) 31 until i inside = nil and i outside = nil 32 return w (1) F , ..., w (t) F In our notation x ← y means that the value y is assigned to the variable x. We define arg max i {x i } to return i * with x i * ≥ x i for all i or nil if the set {x i } is empty. max{·} returns the greatest value unequal to nil.
As in the main text Σ F is a short-hand for the matrix {Σ ij |i, j ∈ F} and µ F ≡ {µ i |i ∈ F}. The same applies to Σ F i and µ F i with F i ≡ F ∪ {i}.
Note that the performance of the algorithms improves significantly if one uses (16) from Proposition 3 instead of (11) on line 20.