Faculty Working Papers \ IIULTICOLLINEARITY MID THE CHOICE OF ESTIMATOR UNDER SQUARED ERROR LOSS G. G. Judge and M, E. Bock //122 College of Commerce and Business Administration University of Illinois at Urbana-Champaign FACULTY WORKING PAPERS College of Commerce and Business Administration University of Illinois at Urbana'Champaign August 13, 1973 MULTICOLLINEARITY AND THE CHOICE OF ESTIMATOR UNDER SQUARED ERROR LOSS G. G. Judge and M, E. Bock #122 Digitized by the Internet Archive in 2011 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/multicollinearit122judg 5 July, 1973 MULTICOLLINEARITY ASD THE CHOICE OF ESTIMATOR UliDLR SQU/\rxED ERROR LOSS G. G. Judge and M. E. Bock University of Illinois Urbana-Champaign Using the traditicnal linear statistical model the impact of .-nulticol linearity on the choice amons conventional, pre test and variants or the Stein- James estimators is analytically evaluated under a squared error loss measure of goodness for both the condition c* . -6- Building on this work, Sclove, et.al. [15] have developed a modified version of the Stein-Jar.es estimator which may be extended to our specification in the forn ■C2.ll) 3** = I. ^.Cu)(l-cVu)"'(b-3) + 3 = Ir^ .Cu)Cl-c*/u}b ^\ which has the form B** = 3 = ifu c. When c = c*, the modified version (2.11) is the same as the extension of the positive part estimator (2.10). The measure of goodness Given this set of estimators we will evaluate the irfipact of multicol line- arity on the choice of estimator by making use of the quadratic loss function (2.12) L(3,3) = (3-3) '(3-3) = | |i-3| | where 3 i^ any particular estimator with risk (2.13) P(3.3) = E[L(3,3)] = E(6-3)'(3-3). In comparing the risk function of tv/o estimators we will say that the estimator B is superior to 3 if (2.14) E(3-3)'(3-3) - E(|-6)'(§-3) < 0, for all 3, i.e. if the risk of the estimator ^ is less than 3 over the region of the para- meter space considered. In general, risk functions for alternative estimators cross. In other words, the difference in the risk of the estimators change sign for different regions of the paraiiictcr space. Vihen this haj/pens for the esti- mators considered in this paper we will identify the point (s) in the paraineter space where the risk functions cross. -7- * The reparametrized model Although we are concerned with a situation where a near linear relation (s) exists between the explanatory variables, we have assumed that the X'X matrix ' is of full rank. Since it simplifies the algebra, we perform a canonical reduction on the statistical model (2.1) and the restrictions (2.3) and work with the following reparametrized model: ^(2.15) y = XS"'%'^3 + e = Ze + e and • (2,16) 6=6, ~o — where S - is a positive definite s>Tnmetric matrix with S S = S, Z'Z- Ij^, e -* S'^3 and Z'Z = S ^(X'X)S ^ = I„. An estimator 8 for 6 yields an esti- mator S 9 - B for 3. This equivalent model leads to the least squares or maximum likelihood estimator for 6, 1 (2.17) u) = (Z'Z)~ Z'y = Z'^, and the restricted least squares estimator (2.18) = ?• The likelihood ratio test statistic becomes (2.19) u = tji'W (T-K)/K(y-Z(£)' (y-Zo)) = a.>'u3/Ka which has a non-central F(X, K, T-K) distribution with K and T-K degrees of freedom with 2 (2.20) X = e'0/2a , and its use implies the preliminary test estimator for 6 (2.21) 9 = w - I . s (u}aj. ^ ^ - - (o,c) -8- In terms o± tne reparametrized model the extension of the Stein-Jaj?es estiitator (2.10^ becojnes (2.22) e* = (l-c/u)(J, , with the corresponding changes nade for the extension of the positve part and the modified Sclove estinators (2.10) and (2.11). 3. Choice of Estimator under the Prediction Goal In this section we consider the impact of multicollinearity on estimator choice when the objective is one of conditional mean forecasting. Our interest centers on comparing the risk functions (3.1) E(X3 - XB) '(XB - X3) = E(3 - 3) •X'X(3 - 3), which weights the quadratic form E(5-3)'(3-6) with the cross product matrix S = X'X, where 3 is any of the six estimators for B that were developed in the previous section. Since for the reparametrized model (3.2) P(0,e) = £(?-?) '(?-&) = E(|-B)'sV'^(B-§) - E[(X^-XB)'CX§-X3)], we can make our comparisons xv'ith an unweighted risk function in terms of 9. Since the explanatory variables, Z, are orthogonal, multicollinearity is not a problem, and therefore, in the conditional mean forecasting case, conventional results regarding the risk of the alternative estimators hold and..::iay, from the work of [11, 15, 18, 21, 22, 24], be summarized as follows: I 1 -9- i) In terms of (3.2) the risk of the unrestricted least squares estimator m is p(u),e) = a K, and the risk of the restricted least squares estimator 6 is p(e,e) = §'? . If the restrictions are correct, 6=0, P(e,ej < PCoj,^). If 9 ?^ 9» ^^ order that p(^,e) - p(a3,6) < 0, then a K - G'G must be non-negative and this implies the conditior. G'e /a < K or in terms of the noncentrality parameter for the test statistic, A < K/2, must be satisfied. The risk for the restricted estimator, in terms of 6 is unbounded, and we have the t>'pical situation when the risk functions for the two estimators cross. ii) Under (3.2), with a weighted squared error loss measure of goodness in terms of 3, from the work of Cohen [8], Sclove, et.al. [IS], and Bock, et.al. [5] on the preliminary test estimator 0, if the restrictions are correct, ? = p» and p(^,e) < p(u),e). If e ^ 0, then it is necessary that 0^9 /a > K/2 or X> K/4 in order for the risk of the preliminary test estimator to be smaller than that of the least square estimator. Alternatively for PCS. 6) > p(u),8) then, 6' 9 /a > K or X > K/2. Although there are conditions xinder which the pre test estimator has a smaller risk than the con- ventional least squares estimator, the pre test estimator is in- ferior to the least squares estimator over an infinite interval of the parameter space of X = 9'8/2a iii) In terms of the measure of goodness reflected by (3.2) and from the work of James and Stein[ll], if K > 2 and < c < 2(K-2)/ (T-K-»-2) , then the Stein-James estimator G* is uniformly superior • to the least squares estimator ca. The optimal choice of c is c^ - (K-2)/(T-K+2). iv) Under (3.2) the Stein-Ja::ies positive part estimator C3.3) 6'*" c. (l/c*/u)"*'w = If * ^. (u) (l-c*/u)cj, where o < c* < 2(K-2)/(T-_K+2) or < c* < 2c is uniformly superior to the Stein- — o , James estimator (2.24) and thus, demonstrates its inadmissibility under squared error loss [11,18]. In addition as Bock has shown [4], if c < c* and K > 3, for comparable values of c, the positive part estimator doniinates the preliminary test estimator. -10- v) The positive part version of the Sclove, et.al. [IS] modified Stein-James estimator is (3.4) e** = I. ^. Cu)(l-c*/u)* 0), where o < c* < 2 CK-2)/ (T-K+2) . If c < 2(K-2)/(T-K+2) let c* = c, then 9** = 9"^ = 6 - c/u I^ ^ ^ (u)a}. Alternatively, if c > 2CK-2)/ (T-K+2) let c* < 2 (K-2)/ (T-K+2) / Then 6** = 6 - cVu Ir «;) ^^^^• If the value of c is equal to or less than c*, then (3.4) is the con- ventional positive part estimator, and this estimator is uniformly su- perior over the range of the parameter space to the least squares (2. IS), pre test (2.23) and Stein-James (2.24) estimators. If c > c* the modi- fied Sclove positive part estimator (3.4) is' uniformly superior to the conventional preliminary test estimator (2.23). UTien c"< 2c. the modi- fied Stein-James estimator (3.4) provides a minimax substitute for the conventional preliminary test estimator (2.23). The estimator given in (3.4) is in reality a preliminary test estimator with the outcome of the of the preliminary test, at a level of significance dictated by the value of c, resulting in either a selection of the Stein-James posi- tive part estimator or the restricted least squares estimator. In summary, in the prediction case, although the X's may be "almost col- linear", as long as X is of rank K the conventional results for estimator choice under quadratic loss hold. The Stein-James positive part estimator (3.3) dominates and thus provides a minii:iax counterpart for the conventional (2.18), Stein-James (2.24), and pre test (2.22) estimators, when K > 3 and < c < 2c , and the Sclove estimator (3.4) dominates the conventional pre- liminary test estimator over the whole range of c. It would appear that with or without multicollinearity, if we are willing to leave the class of unbiased estimators, for prediction purposes a version of the Stein-James prelim.inary test estimator should be our choice. It should be remarked here that although the pre test Stein-James estimator is minim^ax, when the conditions K 5*- 2 and < 2c < (K-2)/(T-K+2) are fulfilled, this estimator alo;ig with others using 11- Stein rules, are not acrrjlssible. Strawderir.an [19] has developed an estimator of this general form, that is both minimax and admissible, when certain con- ditions are fulfilled (one being that K > 5). 4. The Choice of Estimator Under the Estiriation Goal If we are interested in a measure of goodness involving an unweighted risk function in terms of the originial parameters p, then under squared error loss and the reparametrized model of the last section (4.1) p(§,B) = E[(3-e)'C3-3)] = E[C6-e)'D(e-9)] = P^(6,e) where D = S and 6 = S'^S . An unweighted risk function for p implies a weighted risk function for 6 in the reparametrized sr.odel and indicates vchy the X'X = I case traditionally analyzed in the statistical literature is not suffi- cient for gauging estimator perform.ance for the general (usual) case vs'hen the emphasis is on estimation and X'X is some positive definite S)'mmetric matrix. From the standpoint of nulticollinearity this focus on parameter esti- mation is relevant since we are concerned with the inplications or incidence of nulticollinearity on the comparative sampling performance of the alter- native estimators of B. Risk of traditional estimators Using the measure of performance reflected by (4.1) the risk for the con- ventional least squares estimator is 1 (4.2) P(b,§) = E[(b-Bj'(b-P)] = E[(a)-S)'D'(u;-9)] = E[(a)-e)'S' (q-B)] 2 1 = p (a),C) = a trS . In contrast the risk for the restricted least squares estimator,, in the context of Wallace [21], and Yancey, et.al. [22] is -12- (4.3) p(§,3) = E[C3-3)'C§-2)] = E[(e-e)'Dce-e)] = p^(e,e) 12 1 I = P Cw,8) + e'S" - a trS" = 6'S" 9. The difference in the risk of the least squares estimator (4.2) and ' the risk of the restricted estimator (4.3) is (4.4) P(b,6) - P(p,e) = a trS" - O'S' 6 t^ich is non negative if 2 1 1 (4.5) G trS > e'S 0. From the work of Wallace [21], and Yancey, et.al. [22] is 2 1 2 (4.6) a trS > cJl S'G = dL2Xa _ * where the d. are the roots of S' , ^vdth d, being the largest, and t = d /E d . 1 L J- ^±sl From (4.6) the difference in the risk? (4.4) will be non -negative if K 1 (4.7) X < l/2t =Cl/2)Id./dj^ = a/2)trS /d^ . 1=1 Alternatively equation (4.6) will be non positive and the risk of the conventional estimator less than the restricted estimator if K 1 (4.8) X > l/2t^ =(1/2) I d-/d„==(l/2)trS" /d^ = X b i^l ^ ^ i> 1 where d is the smallest root of S . Therefore, the risk functions of the unrestricted least squares and restricted least squares estimators crosG for some value of X in [X ,X ] where •■ o 1 (4.9) l/2t, < X < X < l/2t_ Jj— O" l" v><» -13- Since the incidence of collinearity between the explanatory variables means that the smallest root of X'X = S approaches zero, the largest root of S approaches infinity. As the degree of collinearity increases the largest _ 1 root d. of- S increases and the range of the parameter space, in terns of X, over v/hich the risk of the restricted least squares estimator is less than that of the least squares estimator, approaches one half. The interval of uncertainty in terir.s of the equality of the risk functions depends of course on the relative sizes of the roots, d and d . Therefore, the degree of collinearity in the explanatory variables affects, for a given X, the location in the parair.eter space where the risk for one estimator is equal to less than or greater than that of another estir.ator, and thus, the choice of the estimator. Risk of the pre test estiirator Alternatively following Wallace [21], Feldstein [10], and others in using either a new or conventional test statistic along with a preliminary test of sig- nificance rule, we now ccnipare analytically using C^-i)^ the risks for the re- sulting pre test estimator with that of the least squares estimator and ana- lyze the impact of collinearity on estimator choice. The risk for the pre test estimator from the work of Bock, et.al. [6] is (4.10) p (3,3) = E[(5-3)' (3-3)1 = E[ (6-6) •D(6-e] 2 1 ~ ' 1 = p (co,e) - cr p trS + (2p - p ) S'S" 9 I - - I 12- where p is the probability of a randor.; variable with a non-central f distribu- tion being smaller than a constant, i.e., p. = Pr[x ,^ ,r -,.^/X r^ uy< cK/T-K], J 2CA,k+23J 2C1-KJ- and all other symbols were previously defined in connection with (5,6) and (4.5). _i Since the risk depends on both X and 9'S~ 6 the risk for the pre test es- timator may be bounded by -14- (4.11) p (3,3)/a < CI - p.)trS - 2t.XtrS~ (p -2p ) 1— — — 1 L 21 and ^2 _i 1. (4.12) p iB,^)/o > (1 - p )trS - 2t^XtrS (p -2p ) , 1-- - 1 o 21 where t- and tr are defined as in (4 .6) > ' ('^ • 7) , and (4.8). The difference in the risk of the pre test and conventional least squares estimators (4.2) and (4.10) is ^21 121 (4.13) . p(b,3) - p(f^,$) = a trS" [p + 2(p - 2p )]0'S" 0/2a trS" . __ __ J 2 1-- This difference in the risk functions (4.13) will be non-negative if A < 4t and non-positive if X > l/2t . Therefore the risk fuiictions for the preliminary test and least squares risk estimators cross for some values of A in the interval 2 (4.14) l/4t, < X = e'G /2a < l/2t^. It is not possible to be more precise about the relation between the risk functions unless more information exists about ^ or 6 . -o This outcome reflected by (4,14) contrasts to tlie range of uncertainty for A in the pre test prediction case of J/4 < X < J/2. Thus again, the degree _i of col linearity, as reflected by the size of the roots of the S" matrix, con- ditions the range of X over which the risk of the pre test estimator is less than that of the least squares estimator. In the face of multicollinearity this range of uncertainty may be very large since the smallest root of S ap- 1 proaches zero and the largest root of S approaches infinity. Thus, the interval of the parameter space where there is a gain from testing, may be very small -15- indeed, and the losses for the pre test estimator relative to the least squares estimator may be positive and large over a significant interval of the parameter space. The Monte Carlo results of Feldstcin [10] appear consistent with the analytical results presented above. Risk of the Stein Rule estimators Let us now consider the impact of col linearity on the choice between the conventional least squares and the extensions of the Stein-James estimators. The risk of an extension of the Stein-James estimator, 3* = (l-c/u)b = (l-c/u)S"\^ ^ S"^8* , is 2 12 1 (4.15) p (3*, 3) = E[(0*-e)'DCe*-e)] = a trS" + a trS" c*(T-K) 1' 1 E{(l/(K+2H)(K-2-2H))[c*(T-Kn2)-2(K-2) + (6'S" e/9'3(trS" ))2H 1 1 (c*CT-K+2)-2(S'9CtrS* /9'S' 6 -2))]} where H is a Poisson random variable with parameter (X/2) . In order for the expression between the brackets to be zero or negative and thus the risk of the Stein-James estimator to be equal to or less than the risk of the least squares estimator over the range of the parameter space (i.e. be uniformly superior) then K 1 (4.16a) E d. / d, = trS" /d, > 2, i=l ^ ^ ^ - and 1 1 (4.16b) c < 2d" (trS" - 2d, )/ CT-K+2) . -16- Thus the uniform superiority of this extension of the Stein-James estima- tor to the conventional estimators for the general regression model depends not only on the number of explanatory variables or hypotheses, as was the requirement _i for the orthonormal or prediction case, but also on whether or not trS divided -1 _i by the largest root of S is equal to or larger than 2. ^ trS /d < 2/ then for no value of c > does this extension of the Stein-James dominate the least squares estimator. Since the degree of collinearity is related to the mag- nitude of the roots of S it therefore affects whether or not the risk functions cross at some point in the parameter space and thus has a direct impact on the choice of estimator. In addition, if multicollinearity exists, then at least one of the roots of X'X will approach zero, and thus d. , the largest root of S~ , will approach infinity and trS /d may well be less than 2. That this situation may often occur in econometric work can be seen from the following first order correlation matrix for four explanatory variables of sample size 10, which was initially generated to reflect, the characteristics of economic time series data and used in a Monte Carlo study [23]: (4.17) 1.00 .58 .76 .44 1.00 .28 1.00 .29 .87 1.00 This correlation matrix, which is certainly not atypical of that reflected by much passively generated economic sample data, has one root for the X^X matrix which is .000006 and small relative to the other roots. Thus trS /d. < 2 and in fact is very close to one. -17- _i These sairie conditions or requirenents (trS /d, > 2) also hold in order for this particular extension of the Stein-Janes positive part estimator, 3 , and the extension of the Sclove-Stein-James preliminary test estimator, 3**> to dominate the least squares and conventional preliminary test estimators respectively. This means that when we are concerned with the risk for the esti- mation case the appearance of 3 or more regressors and a suitable small c do not insure, as they did in the prediction case, that the risk of these various extensions of Stein lule estimators, will be less over the entire parameter space than conventional and pre test estimators. As a consequence we have a new rule (4.16a) for determining the degree of ccllinearity that is permissible to permit these extensions of the Stein rule estimators to dominate the other sam- pling theory estimators, 5. Concluding Remar!;s In terns of ii.ipact on and choice among estimators, for the orthor.orr.al and general regression models under a squared error loss measure of goodness, multicollinearity appears to have che following affects when all or a subset of the regressors are almost perfectly col linear, but when the X regression matrix has full column rank: i) If the objective is conditional mean forecasting, conventional results for the choice between estir.-iators hold; i.e. (a) under con- ditions normally found in practice the Stein or modified Stein rule dominate the conventional least squares and preliminary test estinia- tors and (b) al though the risk o f th e pre te s t estimator is smaller than that of the least squares estimators' over a r^art of the nara- meter snace there is an in'.nrval of infinite len{:th of the space where this superioritv does not hold. -18- ii) If the objective or emphasis is on estimation, the incidence of multicol linearity (a) conditions the interval of the parameter • space when the risk of the conventional pre test estimator can be said to be less than or exceed that of the least squares estimator, (b) means that in the case when the X regressor matrix is "border- . line" full rank, and thus at least one root of the relevant weight matrix approaches infinity, the range of uncertainty as to estima- tor choice between the pre test and least squares estimator goes over almost the entire range of the parameter space, (c) means the appearance of 3 or more regressors and a small c does- not insure that these extensions of the Stein-James estimators will dominate the least squares and conventional preliminary test estimators, (d) means that in order for these extensions of the Stein-James estimators to be uniformly superior to conventional estimators, the ratio of the sum of the characteristic roots of the S matrix ' (4.5), to the largest root of this matrix must be equal to or greater than 2, rather than the traditional condition K > 2, (e) means that if this ratio is not greater than two then some members of the fam- ily of potential risk functions for the extensions of the Stein-James es- . timators in a given problem cross the risk function for the least squares estimator, and (f) means that if the conditions (4.16a and b) are fulfilled, the extension of the Stein rule pre test estim.ator (2.11), for the general model, is uniformly superior to the conventional pre test estimator over the parameter space, but like the conventional pre test estimator its risk function crosses that of the least squares estimator for large values of the critical value c or small values of a, the level of the test; (2.11) is only a minimax esti- mator for smaller values of c or larger values of a than are ordi- narily used. These anal>^ical results suggest that when multicollinearity is present to the extent that the trS < 2d , under a squared error loss measure of goodness the restricted, pro test and the family of Stein rule estimators are superior (smaller risk) to the least squares estimator only over a very small -19- interval of the parameter space and are inferior (larger risk) over a large and in some cases infinite range of the parameter space. Unless the re- searcher has great confidence that his linear hypotheses RS - r = 6 = are true^ under the risk measure we have employed when multicollinearity is present, he has much to lose and very little to gain by broadening the class of estimators and using the two stage procedure. 6. References [13 Ashar, V.G. (1970): "On the Use of Preliminary Tests in Regression," Unpublished Thesis, North Carolina State University, Raleigh. [2] Bancroft, T.A. C1944) : "On Biases in Estimation Due to the Use of Preliminary Tests of Significance," Annals of Mathematical Statistics , Vol. 15, pp. 190-204. [3] Baranchik, A.J. (1964): "Multiple Regression and the Estimation of the Mean of a Multivariate Normal Distribution," Stanford University Technical Report, no. 51. [4] Bock, M.E. (1973): "flinimax Estimators of the Mean of a Multivariate Distribution," Research Paper, University of Illinois. (S] Bock, M.E. (1972): "A Comparison of the Risk Functions for Preliminary Test and Positive Part Estimators," Research Paper, Ur^iversity of Illinois. [6] Bock, M.E., T.A. Yancey, and G.G. Judge (1972): ''Vne Statistical Con- sequences of Preliminary Test Estimators in Regression," Journal of American Statistical Association , forthcoming. [7] Chipman, J.S. and M.M. Rao (1964): "The Treatment of Linear Restric- tions in Regression Analysis," Econometrica , Vol. 32, pp. 198-209. [8] Cohen, A. (1965): "Estimates of the Linear Combination of Parameters in the Mean Vector of a Multivariate Distribution," Annals of Mathe- matical Statistics, Vol. 46, pp. 78-87. >-20- [9] Farrar, D.E. and R.R. Glauben (1967): "Muiticollinearity in Regression Analysis: The Problem Revisited," Review of Economics and Statis tics, Vol. XLIX, pp. 92-107. [10] Feldstein, M.S. (1971): "Muiticollinearity and the Mean Square Error of Alternative Estimators," to appear in Econometrica, [11] James, W. and C. Stein (1961): "Estiiiiatiori with Quadratic Loss," Proceedings of Fourth Berkeley S>-iroosiuin on Mathematical Statistic Problems , University of California Press, Berkeley, pp. 361-379. [12] Judge, G.G., M.E. Bock, and T.A. Yancey (1973): "On Post Data Model Evaluation," to appear in Review of E cono mics and Statistics . [13] Larson, H.J. and T.A. Bancroft (1963a) : "Sequential Model Building for Prediction in Re^-ression Analysis," /a inals of Mathematical Sta- tistics , Vol. 34, pp. 462-79. [14] Massey, W.F. (1965): " Principal Components Regression in Explor at ory Statistical Research ," Journal of the Amer ic an Statistical Association , Vol. 60, pp. 234-256. [15] Sclove, S.L., C. Morris, and R. Radhakrishnan (1972): "Non Optimality of Preliminary-Test Estimators for the Multinormal Mean," Annals of Mathematical Statistics , Vol. 43, no. 5, pp. 1481-90. [16] Scott, J.T. (1966): "Factor Analysis and Regression," Econometrica Vol. 34, pp. 552-62. [17] Silvey, S.D. (1969): "Muiticollinearity and Imprecise Estimation," Journal of the Roya l Statistical Society , Vol. 31 (Series B), pp. 539-52. [18] Stein, C. (1966): "An Approach to the Recovery of Interblock Informa- tion in Balanced Incomplete Block Designs," Research Papers in Statistic s: F estschrift for J. Xeyman , F.N. David (ed . ) , John Wiley and Sons, Inc., New York, pp. 351-66. [19] Strawderman, W.E. and A. Cohen (1971): "Admissibility of Estimators of the Mean Vector of a Multivariate Normal Distribution with Quadratic Loss," Annal s of Math e matical Statistics , Vol. 42, pp. 270-96. -21- 120] Toro-Vizcarrondo. C. and T.D. Wallace D96S) : "A Test of the Mean Square Error Criterion for Restrictions in Linear Regression," Journal of the American Statistical Ass o ciation , Vol. 63, pp. 558-72., [21] Wallace, T.D. (1971): "Keaker Criteria and Tests for Linear Restric- tions in Regression," to appear in Econometrica . [22] Yancey, T.A., G.G. Judge, and M.E. Bock (1973): "Wallace's Mean Square Error Criterion for Testing Linear Restrictions in Regression: A Tighter Bound," Forthcoming in Econometrica . X23] Yancey,* T. A., M.E. Bock, and G.G. Judge (1972): "Some Finite Sample Results for Theil's Mixed Regression Estimator," Journal of American Statistical Association , Vol. 67, pp. 176-79. [24] Zellner, A. and W. Vandaele (1972): "Bayes-Stein Estimators for k Means, Regression, and Simultaneous Equation Models," Unpublished Paper, University of Chicago.