Ui\iV£RSlTY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN BOOKSTACKS o Z (0 - — c >•! 21 u CD (/) 5 5 < o 2 c *5 M ijil rH >* o 1 «5l« Eh S r- JHtf V© « D u: w O r-l PQUS^o, 0\ • w < o < en o oq wm 3: a. rH 2 C4 r-i o o\ 00 f- CM CM C\J iH Oh oo A VO E r-l > I v m o oo n ro • ft o o A ^ E-« *3 2 H H ai4-i ft, o s M • v D OS < § < H OS 530 385 d. 1676 COPY 2 STX BEBR FACULTY WORKING PAPER NO. 90-1676 Tests of Linear Hypotheses Based on Regression Rank Scores C. Gutenbrunner J. Jurcekovd R. Koenker S. Portnoy College of Commerce and Business Administration Bureau of Economic and Business Research University of Illinois Urbana-Ci lampaign BEBR FACULTY WORKING PAPER NO. 90-1676 College of Commerce and Business Administration University of Illinois at CIrbana-Champaign August 1990 Tests of Linear Hypotheses Based on Regression Rank Scores C. Gutenbrunner J. Jureckova R. Koenker S. Portnoy Dedicated to the memory of Jaroslav Hajek Philipps Gniversitat, Marburg, Germany Charles University, Prague, Czechoslovakia University of Illinois at (Jrbana-Champaign, USA The work was partially supported by NSF grants 88-02555 and 89-22472 to S. Portnoy and R. Koenker and by support from the Australian National University to J. Jureckova and R. Koenker. Digitized by the Internet Archive in 2011 with funding from University of Illinois Urbana-Champaign http://www.archive.org/details/testsoflinearhyp1676gute ABSTRACT We propose a general class of asymptotically distribution-free tests of a linear hypothesis in the linear regression model. The tests are based on regression rank scores, recently introduced by Gutenbrunner and Jureckova (1990) as dual variables to the regression quantiles of Koenker and Bassett (1978). Their properties are analogous to those of the corresponding rank tests in location model. Gnlike the other regression tests based on aligned rank statistics, however, our tests do not require preliminary estimation of nuisance parameters. AMS 1980 subject classifications: 62G10, 62J05 Key words and phrases: Regression quantile, regression rank scores TESTS OF LINEAR HYPOTHESES BASED ON REGRESSION RANK SCORES C. Gutenbrunner, J. JureSkovl, R. Koenker, S. Portnoy Philipps University, Marburg, Germany Charles University, Prague, Czechoslovakia University of Illinois at Urbana-Champaign, U.S.A 1. Introduction Several authors including Koul (1970), Puri and Sen (1985) and Adichie (1978) have proposed and investigated asymptotically distribution-free tests of some types of linear hypothesis for the linear regression model based upon aligned rank statistics. A good review of these results including extensions to multivariate models may be found in Puri and Sen (1985) and Adichie (1984). The hypothesis under consideration typically involves nuisance parameters which should be estimated by a preliminary estimate; the aligned (or signed) rank statistics are then based on residuals from the preliminary estimates. Alternative approaches to inference based on rank estimation have been considered by McKean and Hettman- sperger(1978), Aubuchon and Hettmansperger (1988) and Draper (1988). In this paper we explore an alternative approach to the construction of rank tests for the linear regression model based on the regression rank scores introduced in Gutenbrunner and JureSkovd (1990). Regression rank scores represent a natural extension of location rank scores introduced in Hdjek and Siddk (1967, Section V.3.5) which stand in one-to-one rela- tion to the ranks of sample observations. The tests based on regression rank scores offer a natural extension of rank-based methods of testing to the general linear model and avoid many of the difficulties introduced by preliminary estimation of nuisance parameters in prior propo- sals. The next section of the paper surveys our results and establishes notation. Section 3 develops some theory of the regression rank score process. Section 4 treats the theory of simple linear rank statistics based on this process. Section 5 contains a formal treatment of the proposed tests. And Section 6 describes an example. 2. Notation and preliminary considerations Consider the classical linear regression model Y = X0 + E (2.1) which u c will partition as Y = X& + X^ 2 + E (2.2) where fi t and B 2 are P~ an d <7 -dimensional parameters, X = X n is a known. nx(p+q) design matrix with rows x ra -' = x, ' = (x lf -\ x 2 , ') € R p+1 , i«l,..., n . We will assume throughout that x tl = 1 for i = 1 ,...,«. Y is a vector of observations and E is an nxl vector of i.i.d. errors with common distribution function F. The precise form of F need not be known but we shall generally assume that F has an absolutely continuous density / on (A,B) where -oo < A = sup{x: F(x) = 0) and +oo > B = inf{x: F(x) = 1}. Moreover, we shall impose some conditions on the tails of / assuming, among other conditions, thai / monotonically decreases to when x — ► /I + , or x — > B-. Denoting D n = n~ 1 X 1 'X 1 and Hi = X^X/X^X/ and Q„ = n^(X 2 - X 2 )'(X 2 - X 2 ) (2.3) with X 2 = H X X 2 being the projection of X 2 on the space spanned by the columns of X l5 we shall also assume lim D„ = D, lim Q„ = Q (2.4) n— mx> n— K)o where D and Q are positive definite (pxp) and (qxq) matrices, respectively. We are interested in testing the hypothesis #o : 02 - 0, Pi unspecified (2.5) versus the Pitman (local) alternatives H n : /? 2n =«- 1/2 /?o (2.6) with O being a fixed vector from R q . The regression rank scores introduced in Gutenbrunner and Jureikova (1990) arise as a vector of solutions Ua) = (a nl (a),..., a TO (a))', < a < 1 (2.7) of the dual form of the linear program required to compute the regression quantile statistics of Koenker and Bassett (1978). More precisely, the vector p^a) = (^(a),..., p (a))' e R p of regression quantiles corresponding to the submodel Y = X^ + E (2.8) is any solution of the minimization EpJXi - Xu'tX teR* (2.9) t'=l where P a (x) = \X | {(l-a)/[x<0] + al[x>0]}, x 6 R 1 . (2.10) Koenker and Bassett (1978) characterized the regression quantile as the component of the optimal solution (/?, r + , r~) of the linear program al m 'r + + (l-a)l„'r~ : = min X^ + r + -r-=Y (2.11) 0€R P , r+, r~eR^ and 1„ = (1,..., 1)' g R n , < a < 1. Finite-sample as well as asymptotic properties of 0(a) are studied in Koenker and Bassett (1978), Ruppert and Carroll (1980), Jureckova (1984), Gutenbrunner (1986), Koenker and Portnoy (1987), Gutenbrunner and Jureckova (1990) among others. The dual program to (2.10) can be written in the form Y'a(a) : = max X 1 'S(a) = (l-a)X 1 'l n (2.12) a(a) e [0, l] n , < a < 1 As shown in Gutenbrunner and Jurefckovd (1990) many aspects of the duality of order statis- tics and ranks in the location model generalize naturally to the linear model through (2.11) and (2.12). To motivate the approach, let us illustrate (a„(a), < a 1} in the location model which in the present notation may be viewed as (2.1 1) with X x = l n . Then S ra (a) specializes to a^(a) = a n *(/?,, a) = 1 if a < (/?,-!)/« Ri-an if (*,— l)/w < a < RJn (2.13) if RJn < a where R { is the rank of Y,- among Y lv .., Y n . The function a„0',a), 7=1,..., n, 0 : (0, 1) — ► R 1 which is nondecreasing and square-integrable on (0, 1). We may then con- struct scores in the following way: £» = -£ 4>it)d^{t), / = !,..., n. (2.14) Defining S^n-^iX^-^Yk (2.15) where b„ = (b nlr ., o^)', we propose the following statistic for testing H against H n : r.-s,'Qi%/^) (2.16) with A\4>) = £ m)-??dt % $ = £ 4>(t)dt (2.17) and with Q„ defined as in (2.3). An important feature of the test statistic T n is that it requires no estimation of nuisance parameters, since the functional A () depends only on the score function and not on F. This is familiar from the theory of rank tests, but stands in sharp con- trast with other methods of testing in the linear model where typically some estimation of a scale parameter of F is required to compute the test statistic. See for example the discussion in Aubuchon and Hettmansperger (1988) and Draper (1988). As we shall show in Section 5, the asymptotic distribution of T n under H is central x 2 with q degrees of freedom while under H n it is noncentral x 2 with q degrees of freedom and noncentrality parameter r? 2 - [T 2 ^, F ) / A\)WQPo (2.18) where li*. F)--£+(t)df(F*(t)) (2.19) The quantities i and A are familiar from the theory of rank tests. The test based on T n is asymptotically distribution free in the sense that, under H , neither T n nor its asymptotic distribution depend on F. Moreover, it follows from (2.18) that the Pitman efficiency of the test based on T n with respect to the classical F test of H coincides with that of the two- sample rank test of shift in location with respect to the / -test. For / unimodal, we obtain an asymptotically optimal test if we take ^(0 = ^/(/) = - | /f ,( - F ~y^ , 0(u) = u - 1/2). is 3/ir = .955 at the normal distribution and is bounded below by .864 for all F '. When F is heavy tailed this asymptotic efficiency is gen- erally greater than one, and can in fact be unbounded. For normal (van der Waerden) scores ((u ) = $ -1 (w )) the situation is even more striking. Here the test based on T n has asymptotic efficiency greater than one, relative to the classical F test, for all symmetric F, attaining one at the normal distribution. See e.g. Lehmann (1959, p. 239), and Lehmann(1983, pp 383-87). Let us now look at the scores (2.14), which can be written as £.=-/*(')£* W i = \,.,n (2.21) where the functions a ra - '(/) are piecewise constant on [0,1]. In the location model, using (2.13) this reduces to There are three typical choices of 4>: (i) Wilcoxon scores: (t ) - t - 1/2, < / < 1. The scores are L = -I \t - U2)dUt) = J \u)dt - 1/2 while A\) = 1/12, and i(, F) = Jf 2 (x)dx. Wilcoxon scores are optimal when / is the logistic distribution. (ii) Normal (van der Waerden) scores'. {t) = $ -1 (0> < t < 1, $ being the d.f. of stan- dard normal distribution. Here A 2 () = 1 and i(, F) = jf{F~ l (^{x)))dx. These scores are asymptotically optimal when / is normal. (iii) Median (sign) scores: 4>{t) = ^sign^-te), 0E**0y(«O T o if r, F~\t)] (3.6) as n — * oo uniformly on any fixed interval [e,l-e], where < e < 1/2 for any properly stand- ardized triangular array {d ra - : / = 1,.„, n) of vectors from R«- They also showed that the pro- cess (3.4) (and hence (3.5)) has continuous trajectories and, under the standardization n J]df» = 0, (3.5) is tied-down to at / =0, and / = 1. The same authors also established the weak convergence of (3.5) to the Brownian bridge over [e, 1-e], Note however that Theorem V.3.5 in Hajek and Sidak (1967) establishes the weak convergence of (3.5) to the Brownian bridge over the entire interval [0, 1] in the special case of location submodel. Here we extend the results of Gutenbrunner and JureSkova (1990) into the tails of [0,1], in order to find the asymptotic behavior of the rank scores and the test statistics (2.14) and (2.15). To this end, we will assume that the errors E lt ..., E n in (2.1) are independent and identi- cally distributed according to the distribution function F(x) which has an absolutely continuous density /. We will assume that / is positive for A < x < B and decreases mono- tonically as x — > A +, and x — ► B- where -oo < A = sup {x: F(x) = 0} and +00 > B = inf {x: F(x) = 1). For < a < 1 denote rp a the score function corresponding to (2.10) $ a (x ) = a - I[x < 0], x 6 R 1 . (3.7) We shall impose the following conditions on F: (FA) |F _1 (a)| < c(a(l-a))^* for < a < a , l-a and c > 0. (F.2) 1//(F-Ha)) < c(a(l-a))" 1 ^ for < a < a and l-a < a < 1, c > 0. (F.3) / (x ) > is absolutely continuous, bounded and monotonically decreasing as x —* A + and x —> B-. The derivative / ' is bounded a.e. (F.4) /'(*) fix) K > 0, c >0. REMARK. These conditions are satisfied, for example, by the normal, logistic, double exponential and t distributions with 5, or more, degrees of freedom. The following design assumptions will also be employed. (X.1) x $1 -l, i=l » (X.2) lim D„ = D where D„ = n _1 X n 'X n and D is a positive definite p x p matrix. (X.3) n - 1 £ 11^,- II 3 = O ( 1 ) as n -+ 00. (X.4) max||x,|| = CK/iW^M^ 1 -* 6 )) f or some b>0 and £>0 such that < b-a 0, where r n (t, a) = (ad-a^/V-^tPaC^.a - n^a^'i)- p a (E ia )) i=i + n-^ad-ar^ExrWfJ - %t'D„t (3.11) »=i and E ia = E, ,- F-Ha), 1=1, ...,». (3.12) PROOF. (i) First fix a e [a tt *, l-a„*] and t such that ||t|| < C. Define fl B = minfo-^-J/W 1 -" 6 )), /j- 2 */(i-4M) (3.13) We wish to show that for any A > P(\r n (t,a)\ >(X+l)B n ) 0. To do this, we will use the Markov inequality P(K(t, a)\ > s n ) < exp(-us n )(M(u) +M(-u)\ u>0 (3.15) where M{u) = Eexp(ur n (t, a)). 11 Denote c ra - = e„(t, a) = w 1/2 a a x,'t (3.16) and tf,(t, a) = (a(l-a))- 1 VW£ ia -"- 1/ V,'t)-p a (£ ia )] + fi^ad-a))" 1 ^ >«(£* J-Mi/i^Cx,- 't) 2 i =1 «. (3.17) By definition of E ia , a a , p a and V'c ^•(t,a)+%/t- 1 0c,'t) 2 = (ct(l-a))-*' !i 0. By (3.18) and conditions F.1-F.4, for e„- > and for a n *< a < a , l-a n *> a > l-a , ERi(t, a) + %«- 1 (x f 't) a < (a(l-a))- 1 / 2 a-7j m '(e m -z)/ o *(|F- 1 (a)| +W )/(F- 1 (a))^^z < c(a(l-a))- 1 / 2 - 2a «^/ 2 |x,'t| 3 +c(a(l-a))- 1 - 2a «- 2 |x,'t| 4 (3.21) and we get the same inequality for c m < 0. The same expressions are 0(«^/ 2 |x,'t| 3 ) + 0(>7- 2 |x.-'t| 4 ) if a < a < l-a . Hence, ££/?,(t,a) + %t'D n t = <9 ' -2(6 -a) ~\ 1+46 (3.22) Similarly, 12 £Var/?,-(t,a)< £E/tf(t,a) «=i 1=1 a(l-a) ^2 i=i (3.23) -(26 -o) = 0(/7- 1 / 2 -(a(l-a)) 1 / 2 //(/ r_1 a))) = C>(« l+lt ). These results hold uniformly in a, t. Hence, using (3.15) and (3.13) with u = log n/B n = 0(n 3a ^ 1M % so that 3.20 holds, -2(6 -a) -(2b- P(\r n (t,a)\>(\ + l)B n )n where K > and « do not depend on a and t. (ii) Following the proof of Lemma A.2 in Koenker and Portnoy (1987), choose intervals [a,, a, +1 ] of length \/n h covering [a B *, l-a„*| and balls of radius 1//2 5 covering {t: ||t|| < C}. Let (a x , a 2 ) c [a,, a, +1 ] and t u X^ lie in one of the balls covering {t:||t|| < C). For /' £ {l x , / 2 } we use (3.17) (and the boundedness of/ and / ') to obtain \R i {h,a l )-R l {X 2 ,a 2 )\< a 1 (l-a 2 ) -i^ * |(F- 1 (a-) + 6 ra )| + /(F"»(a") + € ra )^- 1 (a ) | -V~*< 1 1 a 1 (l-a 1 ) a 2 (l-a 2 ) •-U~*' Ofi(l-a 2 ) + /7- 1 / 2 (a 1 (l- Ql ))- 1 /2|| x ,|||| tl -y 13 + Cn-^WxiW IMl-cO)- 1 /* - (a 2 (l-a 2 ))-^\ = CK*" 2 ); |ai-a 2 | here a * e (a, , a, +1 ) and we have used the inequalities F 1 (a 1 ) - F l (a 3 )| < f(F-\a)) s + -}^ = 0{n 1446 ) = Oin- 37 ^. Hence, on any ball in the covering, \r n (t lt a x ) - r n (t 2 , a 2 )l < K/n. Since the number of sets needed to cover the set S =[a„, 1-cO x{t : ||t|| < C) is bounded by « 5 (p+ 1 ) wc obtain from (3.14) for A > 50 + 1) P(sup |r„(t, a) | > (A + l)5 n + K /n) < n 5 ^^n^ -* ■ (a,t)€S LEMMA 3.2. Assume the conditions of Lemma 3.1 and let d„ = (d nU - - • ,d m Y be a sequence of vectors satisfying X„'d n = 0, -i-£42 - A 2 , < A 2 < oo (D.l) n i=i n~ l Y,\ d ni\ Z = 0{\) as n-^oo (D.2) i=i max |rf„- | =0 |„(2(6-a)^)/(i-K6) (D 3) Ki and for a* given in (3.8). PROOF. Consider the model Y = X'p* + E (3.26) where X* = (X„ : dj, p' = (ft, • • • ,ft,,/? p+1 ). Then X*'X* = 14 X n 'X n o d„'d„ and the conditions of Lemma 3.1 are satisfied even when replacing X by X*. Computing the right derivative of (3.1 1) with respect to teR p+1 , we arrive at (3.25). ■ Let fi n (a) be the a-regression quantile corresponding to the model (2.1) with the design matrix of order (n xp ); i.e. , /?»(<*) is a solution of the minimization YsPJXi ; - x t - 't) : - min, / € R p . (3.27) The following theorem establishes the rate of consistency of regression quantiles, and is needed for the representation of the dual process. THEOREM 3.1. Under the conditions (F.l) - (F.4) and (X.l) - (X.4), n l ft*?@{a) - 0(a)) = n^\a{\-a))-^Ji n Y l rl> a (E ia ) + o,(l) »=i uniformly in a n * < a < l-a n *. Consequently, . sup . tffivf&Jfx) - fta))\\ = 0,(1). min G na (t) = min{-t'Z na + %t'D B t} + o p (l) n 0. This ball lies in a compact set with probability exceeding (1 - e) for n > n ; actually, for t G B na , lltll < l|t-U na || + ||IU| <6 + K, for some K t with probability exceeding 1 -e fjorn > n . Hence, by (3.10), 16 p A na = , sup , sup |r„(t, a)| — 0. (3.41) Following Pollard (1988), consider the behavior of G na (t) outside B„ a . Suppose t a = U n01 + k £, k > 6 and ||£|| = 1. Let t* be the boundary point of B„ a that lies on the line from U na to t a , i.e., t a *=U na + *£ Then t a * = ( 1 - (6/k ))U na +(6/k )t a and hence, by (3.38) and (3.39), S/kCJX) + (W/*)CJUJ > G na (0 > WXo + G na (U na ) - 2A na where A is the minimal eigenvalue of D. Hence, inf <7 na (t)> (7 na (U na ) + (k /S)(WX - 2A na ). (3 42) Using (3.39) the last term is positive with probability tending to one uniformly in a for any fixed 6 > 0. Hence, given 6 > and e > 0, there exist n and r\ > such that for n > n , />( . inf J inf C na (t) - C na (U na )] > V )> 1 - e *n ^" ^ * "n (3.43) and hence (since the event in (3.43) implies that G na must be minimized inside the ball of radius 6) P( t sup t \\T na - U„J < S) — ► 1 for any fixed 6 > 0, as n -* oo. ■ a n F~\a)], / = l,...,/i. (3.45) PROOF. Insert n l l 2 o- l (p n {a) - 0(a)) for t in (3.25) and notice (3.29) and the fact that 17 .sup .{n-^aO-cOr^Erfrf/m -x,'i§(a)]}-» 0, ( 3 .46) from which (3.44) follows. ■ The following theorem which follows from Theorem 3.2 is an extension of Theorem V.3.5 in Hajek and Sidak (1967) to the regression rank scores. Some applications of this result to Kolmogorow-Smirnov type tests will appear in Jureckova(1990). THEOREM 3.3. Under the conditions of Theorem 3.2, as n — ► oo, sup (In-^E^CM") " S»(«))l> * (3.47) 0 under which we may integrate (3.47) and obtain an asymptotic representation for S n of the form S n = n-^d^HFiEi)) + 0,(1). (4.2) We shall prove (4.2) for satisfying a condition of the Chernoff-Savage(1958) type; thus our results will cover Wilcoxon, van der Waerden, and median scores, among others. THEOREM 4.1. Let 4>(t) : < / < 1, be a nondecreasing square integrable function such that '(t) exists for < t < a , l-a < t < 1 and satisfies l^'(/)l '{t)\ £(1 - Ut))dt. (4.7) Then / u < n 1 ' 2 max |c/ ra | / V dt = 0(w l+ " 1 " K6 ) = 0(n~ 2 ^ >). Finally, * /12 = «- 1/2 E^« / °V(O/[/ > F(£,)]^ = n-^Zd^M*^ - ^(F(£,))]/(F(£:,) < o^ »=i «=i Now we may assume that <£(<*„*) < for n > w , since otherwise if 4> were bounded from below p then 1 12 — * 0. Hence * ^r(/ 12 ) < n^Erf^C^CFC^))] 2 /^^) < a„*]) < C\u)du> (9(1) - • =1 due to the square-integrability of . Treating the integrals 7 4 , 7 6 analogously, we arrive at (4.5) and this proves the representation (4.2). ■ 5. Tests of linear subhypotheses based on regression rank scores Returning to the model (2.2), assume that the design matrix X = (X x • X 2 ) satisfies the conditions (X.l) - (X.4), (2.3) and (2.4). We want to test the hypothesis H : fi 2 = (Pi unspecified) against the alternative H n : p 2n = « _ly,2 ^o (Po^ R q fixed). Let S„(a) = (S nl (a), ..., & m (a)) denote the regression rank scores corresponding to the submodel Y = X 1 £ 1 + E under 77 . (5.1) Let <£(/):(0, 1) — > R 1 be a nondecreasing and square integrable score-generating function. Define the scores &„, / = !,...,« by the relation (2.14), and consider the test statistic T n =S n 'Q?SJAH<(>) (5.2) 21 where S n =n- l /HX n2 -St n2 Yb n (5.3) and where Q„ and A 2 () are defined in (2.4) and (2.17), respectively. The test is based on the asymptotic distribution of T n under H , given in the following theorem. Thus, we shall reject H provided T n > x 2 (<*>), te. provided T n exceeds the w critical value of the x 2 distribution with q d.f. The same theorem gives the asymptotic distribution of T n under H n and thus shows that the Pitman efficiency of the test coincides with that of the classical rank test. THEOREM 5.1. Assume that Xj satisfies (X.l) - (X.4) and (X x \ X 2 ) satisfies (2.3) and (2.4). Further assume that F satisfies (F.l) - (F.4). Let T n defined in (5.3) and (5.4) be generated by the score function satisfying (4.3), and nondecreasing and square-integrable on (0, 1). (i) Then, under H , the statistic T n is asymptotically central x 2 with q degrees of freedom. (ii) Under H n , T n is asymptotically noncentral x 2 with q degrees of freedom and with non- centrality parameter, •7 3 -A>'Q/W(*.nA4 2 (*) (5.4) with Tf(rf,F) --£«/*// (F-*(/)). (5.5) PROOF. (i) It follows from Theorem 4.1 that, under // , S n has the same asymptotic distribution as S n =n-^(X n2 -X n2 yb n where b n = (b nl ,...,b m )' and F m = ^(F(£,)), i = l,...,n. The asymptotic distribution of S„ follows from the central limit theorem and coincides with <7 -dimensional normal distri- bution with expectation and the covariance matrix Q • A 2 {4>). 22 (ii) The sequence of local alternatives H n is contiguous with respect to the sequence of null n distributions with the densities {n /(?,)}. Hence, (4.1) holds also under H n and the «=i asymptotic distributions of S n under H n coincide. The proposition then follows from the fact that the asymptotic distribution of S n under H n is normal NA~i{4>, F)Q0 o , QA 2 ()). 6. An Example To illustrate the tests proposed above we consider briefly an example taken from Adi- chie (1984, Example 3). The log of the leaf burn (in seconds) of 30 batches of tobacco is thought to depend upon the percent composition of nitrogen, chlorine, and potassium. Adi- chie suggests testing the potassium effect and describes an aligned rank version of the test. We are unable to reproduce some details of his calculations, however, using his approach we get least squares estimates of the nitrogen and chlorine effects of -.529 and -.290 with an intercept of 2.653.. With these preliminary estimates we obtain aligned ranks 7 17 2 18 6 1 11 3 30 13 25 16 4 29 26 27 21 23 19 12 28 10 8 15 24 20 22 5 14 9 which yield a test statistic of 13.59 highly significant relative to the 1% xi critical value of 6.63. In contrast the Wilcoxon regression rank scores computed as ft = -£(t - l/2)dUO= f Q \(t)dt - 1/2 and based on the restricted model excluding potassium, are -0.27 0.06 -0.41 0.09 -0.32 -0.48 -0.17 -0.38 0.48 -0.06 0.23 0.04 -0.37 0.42 0.28 0.37 0.19 0.41 0.15 -0.26 0.38 -0.16 -0.23 -0.01 0.33 0.12 0.15 -0.42 -0.10 -0.06 and yield a test statistic of 13.17. The full set of regression rank scores £,(/) for this data are illustrated in Figure 6.1 with the plots ordered according to their Wilcoxon rank score. Note 23 that as a practical matter when ? = / 4>{t )dt = 0, we may omit the X 2 term in the computation of S„ in (5.3) since b n is orthogonal to X v This is in contrast with the aligned rank situation where the use of X 2 - X 2 is essential. Corresponding calculations for the normal scores using 1.00 1.00 -1.00 1.00 -1.00 -1.00 -1.00 1.00 1.00 -1.00 1.00 1.00 1.00 1.00 1.00 -1.00 -1.00 -0.37 1.00 1.00 1.00 yields -0.74 0.15 -1.41 0.23 -0.91 -2.13 -0.45 -1.17 2.08 -0.15 0.63 0.10 -1.25 1.44 0.78 1.15 0.50 1.35 0.40 -0.72 1.41 -0.40 -0.61 -0.03 0.94 0.30 0.39 -1.45 -0.26 -0.18 and a test statistic of 12.87. The corresponding normal score aligned rank statistic is 1 1.72. Finally, regression rank score version of the sign test yields the scores ■1.00 1.00 -1.00 1.00 0.16 -1.00 ■1.00 -1.00 -0.79 and a test statistic of 8.42 while the aligned rank sign scores yield 10.20. Obviously, all ver- sions of the tests lead to a decisive rejection of the null. Note that for the sign scores the test coincides with the l x Lagrange multiplier test discussed in Koenker and Bassett(1982). Since an important objective of the proposed rank tests is robustness to outlying obser- vations, it is interesting to observe the effect of perturbing the first v observation of the Adi- chie data set on the aligned and rank scores versions of the test statistic. This sensitivity analysis is illustrated in Figure 6.2. Even a modest perturbation in y l is enough to confound the initial least squares estimate and reverse the conclusion of the aligned rank test. However the regression rank score version of the test is seen to be relatively insensitive to such pertur- bations. One should be aware that comparable perturbations in the design observations may wreck havoc even with the rank score form of the test. Recent work of Antoch and Jureikovd (1985) and deJongh, deWet, and Welsh (1988) contain suggestions on robustifying 24 regression quantiles to the effect of influential design points. Computation of the tests was carried out in 5+ using the algorithm described in Koenker and d'Orey (1988, 1990) to compute regression quantiles. REFERENCES Adichie, J.N. (1978). "Rank tests for sub-hypotheses in the general linear regression," Am. Statist., 6, 1012-1026. Adichie, J.N. (1984). "Rank tests in linear models," in P.R. Krishnaiah and P.K. Sen (eds.), Handbook of Statistics,, vol. 4, Elsevier, New York. Amemiya, T. (1985). Advanced econometrics, Harvard University Press, Cambridge. Antoch, J. and Jureckova, J. (1985). "Trimmed LSE resistant to leverage points," Comp. Sta- tist. Quarterly, 4, 329-339. Aubuchon, J.C. and Hettmansperger, T.P. (1989). "Rank based inference for linear models: Asymmetric errors," Statistics and Probability Letters, 8, 97- 1 07. Chernoff, H. and Savage, I.R. (1958). "Asymptotic normality and efficiency of certain non- parametric test statistics," Ann. Math. Statist., 29, 972-994. Draper, D. (1988). "Rank-based robust analysis of linear models. I. Exposition and Review," Statistical Science, 3, 239-271. Gutenbrunner, C. (1986). Zur Asymptotik von Regression quantileprozessen und daraus abgeleiten Statistiken, Ph.D. Dissertation, Universitat Freiburg. Gutenbrunner, C. and Jureckova, J. (1990). "Regression rank-scores and regression quan- tiles," Ann. Statist (to appear). Hajek, J. and Sidak, Z. (1967). Sidak, Z. (1967). "Theory of rank tests," Academia, Prague. deJongh, P.J.,deWet, T. and Welsh, A.H. (1988). "Mallows type bounded-influence-regression trimmed means," J. Amer. Statist. Assoc, 83, 805-810. Jureikova, J. (1971). "Nonparametric estimate of regression coefficients," Ann. Math. Statist., 42, 1328-1338. JureSkova, J. and Sen, P.K. (1989). "Uniform second order asymptotic linearity of A/-statistics in linear models," Statistics & Decisions, 7, 263-276. Koenker, R. and Bassett, G. (1978). "Regression quantiles," Econometrica, 46, 33-50. Koenker, R. and Bassett, G. (1982). "Tests of linear hypotheses and /x-estimation," Econome- trica, 50, 1577-83. 25 Koenker, R. and d'Orey, V. (1987). "Computing regression quantiles," Applied Statistics, 36, 383-393. Koenker, R. and d'Orey V. (1990). "Remark on algorithm 229," preprint. Koul, H.L. (1970). "A class of ADF tests for subhypotheses in the multiple linear regression," Ann. Math. Statist., 41, 1273-1281. Lehmann, E.L. (1959). Testing Statistical Hypotheses, Wiley, New York. Lehmann, E.L. (1983). Theory of Point Estimation, Wiley, New York. McKean, J.W. and Hettmansperger, T.P. (1978). "A robust analysis of the general linear model based on one-step R -estimates," Biometrika, 65, 571-79. Pollard, D. (1988). "Asymptotics for least absolute deviation regression estimators," Econometric Tlieory, to appear. Puri, ML. and Sen, P.K. (1985). Nonparametric Methods in General Linear Models. J. Wiley, New York. Ruppert, D. and Carroll, R.J. (1980). "Trimmed least squares estimation in the linear model," J. Amer. Statist. Assoc, 75, 828-838. Scheffd, H. (1959). The Analysis of Variance, Wiley, New York. Figure 6.1 Regression Rank Scores for Tobacco Data Obs No 6 rank= -0.48 Obs No 28 rank= -0.42 Obs No 3rank= -0.41 02 0.4 6 8 1.0 0.0 02 0.4 0.6 8 1.0 0.2 4 0.6 8 10 Obs No 8 rank= -0.38 Obs No 13rank=-0.37 Obs No 5 rank= -0.32 02 0.4 0.6 8 1.0 02 4 0.6 0.8 1.0 0.0 02 4 6 8 1.0 Obs No 1 rank= -0.27 Obs No 20 rank= -0.26 Obs No 23 rank= -0.23 0.0 02 0.4 6 8 1.0 2 0.4 6 8 1.0 0.0 0.2 4 06 08 1 Obs No 7rank=-0.17 Obs No 22rank=-0.16 Obs No 29 rank= -0.1 0.0 0.2 0.4 6 8 05 0.4 0.6 8 1.0 0.2 4 6 8 10 Obs No 30 rank= -0.06 Obs No 10rank=-0.06 Obs No 24 rank= -0.01 00 02 4 6 08 1 2 4 6 8 10 0.0 02 4 6 8 10 Figure 6.1 (continued) ObsNo 12rank=0.04 Obs No 2 rank= 0.06 Obs No 4 rank= 0.09 0.2 0.4 6 8 1.0 02 0.4 6 8 1.0 02 4 6 Obs No 26rank=0.12 Obs No 19 rank= 0.15 Obs No 27rank=0.15 0.0 2 0.4 0.6 8 1.0 02 0.4 6 8 1.0 0.0 02 4 6 0.8 10 Obs No 17rank=0.19 Obs No 1 1 rank= 0.23 Obs No 15 rank=0.28 02 4 6 8 10 0.2 0.4 6 8 1.0 0.2 4 6 8 1.0 Obs No 25 rank= 0.33 Obs No 16rank=0.37 Obs No 21 rank= 0.38 0.2 0.4 6 8 10 2 0.4 6 8 1.0 00 0.2 04 06 08 10 Obs No 18rank=0.41 Obs No 14 rank=0.42 Obs No 9 rank= 0.48 02 4 6 8 10 0.2 4 6 8 10 Figure 6.2 Sensitivity Curves for Rank Tests CM -*— ' __ 0) CD O O XI -^ i- ■♦- CO o CO CD to CO _^ CO > CD - M- - h\ /: \ Aligned Rankscore ■10 10 Perturbation of y(1) HECKMAN BINDERY INC. JUN95 Bound -To -Pie N MANCHESTER INDIANA 46962