LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510.84 U26> cop. 2 „ , UIUCDCS-R-74-62^ r Uju ON THE COVERING PROBLEM FOR UNAMBIGUOUS CONTEXT-FREE GRAMMARS by M. Dennis Mickunas February, 197^ UIUCDCS-R-7^-62^ ON THE COVERING PROBLEM FOR UNAMBIGUOUS CONTEXT-FREE GRAMMARS by M. Dennis Mickunas February, 197 4 Department of Computer Science University of Illinois at Urb ana-Champaign Urbana, Illinois 6l801 This work supported by the Department of Computer Science Digitized by the Internet Archive in 2013 http://archive.org/details/oncoveringproble624mick ABSTRACT It is shown that every unambiguous grammar which does not generate the empty string is covered by a A-free grammar. Every unambiguous grammar which does generate the empty string is covered by a grammar which is partitioned into a A-free portion and a portion which generates only the empty string. Finally, every unambiguous grammar is covered by such a partitioned grammar in operator form. 1. 1. INTRODUCTION The presence of A-rules in a grammar often poses tricky problems for practical translators. Thus, practitioners usually restrict themselves to considering A-free grammars, arguing that the restriction imposes no significant hardships. In this paper, we present supporting evidence for that position by showing that every unambiguous grammar is completely covered by a grammar in which the A-generating portion (if present) is completely isolated from the remaining portion. That remaining portion is then shown to be covered by a A-free grammar. As a followup, it is then shown that such an unambiguous "A-isolated" grammar is covered by an operator grammar. This strengthens a result of Gray and Harrison [3] • Definition. A context-free (CF) grammar is a *+-tuple G = (V,Z, P, S) where: (a (b (c (a (e V is a finite non-empty set of symbols (vocabulary); Zc V ( terminal vocabulary ) ; N=V-Z (non-terminal vocabulary) ; SeN ( goal symbol ), and; P is a finite subset of N x V* ( production ) . We will denote an element (u, v) of P by u->v, and we will often ascribe indices to productions: it. = u->v. We also employ the usual binary relation => e V* x V*, writing u => v instead of (u,v) e =^ . Let X and Y be sets of words. Write XY = {xy |xeX,yeY} where xy is the concatenation of x and y. Define X = [A) where A is the null word. For each i > 0, define X 1+1 = X X X and X* = U.^-X 1 . Let X + = X*X and let — i>0 denote the empty set . 2. Definition . Let u, v e V . Define u=>vif there exist words x, y, w e V and A € N so that u = xAy, v = xwy, and A ^w is in P. If y e £ , we write u i|> v. Furthermore, we will write the reflexive-transitive closure of =^> as i • If we wish to make clear that the grammar G is being used, we will write =^> . G Definition . Let x. e V (o < i < r). If x. =^x. , by applying the production it. , then we say that x. directly derives x. , via jt. . If x => x, => . . . =» x 1' J i x+1 i o 1 r where x. directly derives x. via jt. (for all < i < r) then we say that r-1 r-1 x derives x via (^.). " and that (n. )• is a derivation of x„ from x . o r i y i=o l i=o T o R r-1 If x. =^>x. t (for all < i < r) then (jt. ). is called a canonical de- rivation of x from x . r o Definition . We define L(G;X) = {xeZ |X=»x) for all X e V. L(G;X) is called G the language generated by G from X , If X = S, the goal symbol of G, then L(G;S) is called simply the language generated by G , and is denoted by L(G). Definition . A CF grammar G is said to be unambiguous if and only if for all x e L(G;S) and for all canonical derivations (n.)- n? («•)• -, which v ' ' v i / i=l' v i'x=l derive x from S, n = n' and n. = n.' (for all 1 < i < n). A CF grammar which l l v — — ' is not unambiguous is said to be ambiguous . Definition . A context-free grammer G = (V, Z, P, S) is said to be (a) /y-free if P c N x V + ; (b) reduced if (i) for each A e V, there exist y, y e V so that S => XAy, and -x- -■* (li) for each A e N there exists x e Z so that A ==> x; (c) in operator form if P e N x (V - V 8 V ) ; P (d) in canonical two form if P c W x (CaI U V U N ). 3. Following Gray and Harrison [3]> we define the notion of a cover. Definition : Let G = (V, Z, P, S) be a grammer and let H c P. Let D = (A. _x.) n , be a canonical derivation in G. Then the corresponding H- sparse derivation is D TT = (A. _»x. |a. _»x. is in H) n , . H 1 l ' i l '1=1 Definition . Let G = (V, E, P, S) and G ' = (V, Z, P', S') be context-free grammars. Let H c P and H' c P'. Let cp be a map from H' into H. For any .n * canonical derivation D = (A. ->x.). . in G' of some x e Z , define the image of D under cp to be cp (D) = (cp(A. _>x. ) |a. _»x. is in H')._ . C p(D) is an element of H . (G' 5 H') is said to cover (G, H) under C p iff (a) L(G) = L(G'), and (b) for each x e L(G), (i) if D is an H-sparse derivation of x in G then there is an H'-sparse derivation D' of x in G' so that s, s- ->s A ) P 2 = {A _»B, A A -»B A |A, B e N; A _B e P} P = {A _>BC, A -^B A C, A ->B C^, A A ^B A C Ia, B, C e N; A' -»B C e P} A A A [ P^ = {A A ^A|A € N, A _>A 6 P} P r = {A ^a|A e N, a e Z, A _» a e P} . 5 Properties a) - e) hold by construction. It is also clear that (G', P 2 U P 3 U P^ U P 5 ) covers (G,P). Hence G' completely covers G. Moreover, the canonical two form is preserved. D A grammar G* = (V, Z, P'. S) which satisfies a) - c) of Claim 1 and for which A | L (G*;S») and L (G';S )c {a}, is called a A-split grammar . Having obtained such a A- split grammar, we wish to obtain a A- free cover for the portion which generates L(G';S). The following claim estab- lishes that ability. Claim 2 . Every unambiguous CF grammar G' = (V 1 , Z, P 1 , S) for which A ^ L (G';S) is completely covered by a A- free grammar. Proof . We may assume without loss of generality that G : is in reduced canoni- cal two form and is A-split via the transformation of Claim 1. Since G' is unambiguous, it follows that for each B. e N' > the length of the canonical *R i \ derivation B => A is finite. Let uAB ) denote that length, and let A G , A M = {1, 2,... ,max (u(B ))} B eN' A Let G" (V", Z, P", S) where V" = V U (N'xN'xM). 5. Define P" = P ' U P ' U P' where P^ = {A^v | A e N', v e v ,+ , A _>v e P'} P' = {A _^(B A ,C,1),(B A ,C,1) _>(B A ,C,2), ... , (B A ,C, |i (B A )-l) _» (B^C,^) ) , (B A ,C,u(B A )) -»C | A, B A , C e N'; A _> B^ e P'}. P^ - {A ^(B,C A ,1), (B,C A ,1) _+(B,C A ,2), ... , (B,C a , u (C a )-1) _» (B,C A , U (C A ) ) , (B,C A ,n(C A )) _>B | A, B, C A G N'j A _» BC A € P'}. We define cp by cases : a) cp (A _>v e P£) = A ->v; b) en (A ^(B A ,C,1) 6 P£) = a ->B A C; e) cp ((B A C,i) _*(B A ,C,i+l) e Pp = ^ for 1 < i < U ( B ); d) cp ((B ,C Jkl (B )) _C e P' ) '= * , , where B. *4 A via (* )^ ( V; e) cp (A -»(B,C A ,1) e P3) - A ^BC A ; f ) rp ((B,C A ,i) _» (B,C A ,i+l) e P^) = ^ for 1 < i < ^(c ), and; g) cp ((B,C A ,n(C A )) ^B e Pp = n' (c } where C A *4 A via (^)^A } . ^ A G ' It is easily seen that G" completely covers G' under cp. Furthermore, G" is, by construction, A-free and in canonical two form. □ Theorem 2. 1 . Every unambiguous CF grammar G = (V, 2, P, S) is completely covered by a A-split grammar G' = (V, £, P', S') for which a) S' ^S € P*s b) S' _,S A e P*; c) L (G';S) - L (G;S) {A}, and; d) the reduced grammar obtainable from (V, I, P', S) is A- free. Proof . The proof follows directly from Claims 1 and 2. □ 6. A A-split grammar G' which satisfies a) - d) of Theorem 2.1 is called a A -isolated grammar. If A 4 L (G';S'), then G' is clearly A -free as well. As an example of the application of these transforma- tions, we see that the grammar with productions S ^AB A _»a | A B -*b | CD C _»A D _>A is completely covered by the A-split grammar with productions s- _s | s A S _AB | A A B | AB A A _^>a B _»b S A - A A B A (u(S A ) = 5) A A ^A (u(A A ) = 1) B A -°A D A (H(B A ) = 3) C A ^A (u(C A ) = 1) D A ^A ( W (D A ) = 1) 7- which is completely covered by the A-isolated grammar with productions S' ->S | s A S ^AB |fA A ,B,l) | (A,B A ,1) A ^a B ^b (A A ,B,1) ^B (A,B A ,1) _»(A,B A ,2) (A,B A ,2) ^(A,B A ,3) (A,B A ,3) ^A s a-Va A A- A B A^ C A D A °A- A D A- A Gray and Harrison have shown [3] that the A - free portion of a A - isolated grammar can be completely covered by an operator grammar. We will now show that, in fact the entire grammar can be so covered. Claim 3 . Every unambiguous grammar G = (V, 0, P, S.) for which L(G;S ) = { A} is completely covered by an operator grammar. Proof : We may assume without loss of generality, that G is in reduced canonical two form. Since G is unambiguous, it follows that the length of the canonical derivation S =^> A is finite. Denote that length by ^(S ). A Q and let M = {1,2, ... , |i(S )}. Let G' = (V, 0f, P', S ) where V" = {S } U (WxM). Define P» by P . = {S A ^(S A ,1),(S A ,D ->(S A ,2), ... , (S A ,n(S A )-l) -♦lB A ,n(B A )) 1 (S A ,u(S A )) -.A). Clearly, the derivation in G' isomorphic to the derivation in G. Thus G' completely covers G. Furthermore, G' is, by construction, an operator grammar, D 8. By a construction similar to that used in Claim 2, it is possible to prove the following: Theorem 2.2 . Every operator grammar is completely covered by aA-isolated operator grammar. Proof . Omitted. □ We may now state our main result. Theorem 2.3 . Every unambiguous CF grammar is completely covered by a A - isolated operator grammar. Proof : The proof follows immediately from Theorem 2.1, Gray and Harrison's Theorem 1.2 [3], Claim 3, and Theorem 2.2. D 9. 3. Summary and Conclusions We have shown that it is often possible to remove A-rules from a grammar without significantly altering its parse trees, and hence its at- tached semantics. This may be taken as supporting evidence of the commonly held notion that in practical situations (i.e. in the case of translators for unambiguous grammars which do not generate the empty string), one can, without difficulty, dispense with A-rules. More surprisingly, we have been able to strengthen the operator grammar results of Gray and Harrison [3]« There is, of course, no contra- diction between their Theorem 1.3 and our Theorem 2.3. Their Theorem 1.3 holds that there is a grammar (namely the one with productions S _» SS f A) which cannot be covered by any operator grammar. Our results show that their Theorem 1.3 is rooted, not in the fact that the grammar contains a A-rule, but rather in the ambiguity of the grammar. 10. REFERENCES 1. Eloyd, R.W. Syntatic Analysis and Operator Precedence. J. ACM 10,3 (July, 1963), 316-333. 2. Ginsburg, S. The Mathematical Theory of Context Free Languages . McGraw-Hill, New York (1966). 3. Gray, J.N., and Harrison, M.A. On the Covering and Reduction Problems for Context-Free Grammars. J. ACM 19, h (October, 1972), 675-698. BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R- 7^-62^ 3. Recipient's Accession No. i. Title .ind Subtitle 5. Report Date ON THE COVERING PROBLEM FOR UNAMBIGUOUS CONTEXT-FREE GRAMMARS February , 197^ 1. Author(s ) M. Dennis Mickunas 8. Performing Organization Rept. No. >. Performing Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract/Grant No. 2. Sponsoring Organization Name and Address Department of Computer Science University of Illinois Urbana, Illinois 6l801 13. Type of Report & Period Covered Research 14. 5. Supplementary Notes 6. Abstracts It is shown that every unambiguous grammar which does not generate the empty string is covered by a A- free grammar. Every unambiguous grammar which does generate the empty string is covered by a grammar which is partitioned into a A-f ree portion and a portion which generates only the empty string. Finally, every unambiguous grammar is covered by such a partitioned grammar in operator form. 7. Key Words and Document Analysis. 17a. Descriptors overs, parsing, ambiguity, A-free, >perator grammars, context-free grammars b. Identifiers , Open-Ended Terms c. COSATI Field/Group Availability Statement RELEASE UNLIMITED 19. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 12 22. Price IRM NTIS-35 (10-70) USCOMM-DC 40329-P7 1 JUL 2^ 1974 WW 5 1977 UNIVERSITY OF ILLINOIS-URBANA 510 84 lien no C002 no 618 628(1974 Quldt in Inlormillon lytltm Hi H « Bgjffl«f N SPSlEfflBB safe IS §8 g» « ffiH HHi H mi m m m H m Hi ^^B