LI B HAHY OF THE. UNIVERSITY Of ILLINOIS 5I0.B4- I46r no.355-3G0 cop.2* The person charging this material is re- sponsible for its return on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN NOV 1 1971 OCT 1 3 Re Co L161— O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/probabilisticlan355elli , ^,V Report No. 355 tu.sss' C00-11469-01U9 PROBABILISTIC LANGUAGES AND AUTOMATA by Clarence Arthur Ellis, Ph.D. October 1969 (* N K # DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS W LIBRARY OF THE i Report No. 355 PROBABILISTIC LANGUAGES AND AUTOMATA* by Clarence Arthur Ellis October 1969 Department of Computer Science University of Illinois Urbana, Illinois 6l801 This report was supported in part by grant U. S. AEC AT( 11-1)1^69 and by the Department of Computer Science, and was submitted as a Doctoral thesis to the Graduate College of the University of Illinois by the Department of Computer Science, October 1969. Report No. 355 PROBABILISTIC LANGUAGES AND AUTOMATA* by Clarence Arthur Ellis October 1969 Department of Computer Science University of Illinois Urbana, Illinois 6l801 * This report was supported in part by grant U. S. AEC AT(ll-l)lU69 and by the Department of Computer Science, and was submitted as a Doctoral thesis to the Graduate College of the University of Illinois by the Department of Computer Science, October 1969. ACKNOWLEDGMENT Profuse thanks are due to Professor D. E. Muller for his advice and encouragement during the preparation of this thesis. The author is also indebted to the Department of Computer Science, University of Illinois, for its support and to Miss Barbara Hurdle for her typing of this thesis. Finally, the author extends his appreciation to his wife, Anna, whose patience and understanding assistance were of great value. PROBABILISTIC LANGUAGES AND AUTOMATA Clarence Arthur Ellis, Ph.D. Department of Computer Science University of Illinois, 1969 The concept of a probabilistic language is defined and investigated. The motivation for the definition stems from the hope of using this tool to investigate programming languages and their translators. A probabilistic language over a vocabulary T is defined as a class C of words formed from T together with a probability measure on C. The classes T* of finite strings, T^ of infinite strings, T of CO finite trees, and T of infinite trees are considered. Context Free Probabilistic Languages are characterized in terms of (l) Probabilistic Grammars, (2) Probabilistic Tree Automata. TABLE OF CONTENTS 1. INTRODUCTION 1 2. BASIC DEFINITIONS AND NOTATION 2 3. PROBABILISTIC GRAMMARS AND LANGUAGES k h. PROBABILISTIC AUTOMATA 9 5. CONTEXT FREE GRAMMARS l6 6. PROBABILISTIC TREE AUTOMATA 38 T. SUMMARY AND CONCLUSIONS 59 LIST OF REFERENCES 6l APPENDIX A. APPROXIMATION OF PROBABILISTIC TURING AUTOMATA BY PROBABILISTIC PUSHDOWN AUTOMATA 63 B. EXAMPLES OF REGULAR TREE EXPRESSIONS 72 VITA fk 1 1 . INTRODUCTION In recent years, much work has been done on extensions of the theory of finite automata to obtain models of acceptors and translators of programming languages . Examples are pushdown store automata , stack automata , minimax automata , and balloon automata . There are many others. The purpose of this thesis is not to simply introduce another type of automaton, but to describe a general concept which can be adapted to any of the automata present in the literature. It is quite natural to assign probabilities (or frequencies) to the strings of a language to try to get some quantitative measure of "efficiency" of grammars and translators. The model obtained by doing this is called a probabilistic language, which may be considered a [27] fuzzy set , containing all valid sentences of the language together with a grade-of-membership function for these sentences . Acceptors and generators for these probabilistic languages are defined as Probabilistic Automata and Probabilistic Grammars, respectively. Specifically, Context Free Probabilistic Languages are explored in depth in this thesis. This investigation does not consider how one would find the "best" grammar or automaton for a language, or how to improve a given grammar. Indeed, the meaning of "best" is open to many interpretations. The related idea of finding good approximation grammars for languages is also unexplored. It is hoped that the tools developed here will lead to quantitative analysis in these and other areas . 2 2. BASIC DEFINITIONS AND NOTATION This section presents notation and concepts which have been previously defined in the literature and are heavily used in this paper. Then these definitions are altered to form probabilistic analogues. A language over a set T of terminal symbols is a subset of the set T* of all strings over T. A phrase structure grammar over a set T is a system (N, P, S) in which N is a finite set of nonterminal symbols, P is a set of rules (called productions) of the form (¥-»-£) where Z, is any string of symbols of T (J N (denoted z, e(T (J N)*) and *f is any non- empty string of symbols of T U N, (denoted f e(T UN)). * is called the generatrix of the production, and z, is called the replacement string. S e N is the initial nonterminal. Notation: Hereafter, when discussing languages and grammars, A, B, and C will always denote elements of N, while X, Y, and Z denote strings over N. Similarly, a, b, c e T; x, y, z e T*; a , 6 » y e TUN; Y, x> S e(TUN)*. If x = a a. ..a then the length of x is i(x) = n. A The null string is denoted by \(z{\) = 0) and the empty set is . L A A always denotes a language, G is a grammar, and A is an automaton. I denotes the set of positive integers, II the rationals, and R the reals. Let G = (N, P, S) be a grammar. If x B 1 V ^ 2 and (¥ -> Z,)e P, then we write x ■*■ *-, £ V?' If ^ strings r, £ . ..£ such that 5. -*■ z, . , then we write z,^ => z, and we say there is a derivation of Z, from t with On n A A a _ respect to G. The language L generated by a grammar G is A A L(G) = {x S => x, x e T*} . If L is generated by a grammar with all A productions (Y ■+ c)» V £ N, then L is a context free language. If further, £ is of the form aB for some a e T, Be N U {A} in all a A productions of G, then L is a regular language. k 3. PROBABILISTIC GRAMMARS AND LANGUAGES Definition: A Probabilistic Language (P language) over T is a system A L = (L, u) where L is a class of words formed from T and y is a measure on the set L. If y is a probability -A measure, then L is a Normalized Probabilistic Language (NP language) . Definition: A Probabilistic Grammar (P grammar) over T is a system G = (N, P, a) where N is the finite set of nonterminals, A,, A^ , . . . A , A is an n-dimensional vector, (6_...6 ) with 1' 2' n In 5. being the probability that A. is chosen as the initial nonterminal, and P is a finite set of probabilistic p i 1 + + productions, ¥. — ±*»s . , with V.e(N U Tj , z, . e(N U T) , and p. e R (p. J 0). If A is stochastic, if < p <_ 1, and if -'-J ■*■ J E p. . = 1 for every generatrix ¥. contained in productions A of P, then G is a Normalized Probabilistic Grammar (NP grammar) . If all productions of G are of the form A-2* aB or A-*»- a, A AeN, BeN, a e T, then G is called a left linear P grammar . The probability of a derivation of C from t is defined as k k. pr(c * £ ) = £ J_[ P. where k is the number of derivations of C P ii from £_, k. is the number of derivation steps, C. — "■ &. . used in. 1 !» J~J- - 1 - » J the i-th derivation, and p. is the probability associated with the j-th step of the i-th derivation. The derived probability of a terminal + A string x e T with respect to a left linear grammar G is y(x) = n I (6. pr(A. =>x)) where N = {A n , A., . . .A } , A = U 6 ...6 ). . , v i i 1 2' n ' 12 n The P language generated by G is L = (T , y) where y(x) = the derived r o I probability of x. An admissible P grammar (see Greibach ) is a grammar in which there exists a derivation of some x e T from each A e N. A generalized admissible P grammar is one in which there exists a production with A in the generatrix for each A e N. Theorem 1: Proof: Every normalized left linear admissible P grammar G generates a normalized P language. Define an (n+l) x (n+l) matrix U = [u ] as follows: u. . = u. Ij I pr(A. ^aA.),i£n,j<_n a e T 1 J (A.-^aA )eP J E pr(A +b), i i(x) ^ k. 6 Finally, £ y(x) = lim T. y(x) = lim (A'»u) . Since 5 x e T Jl(x) < k x e T" k -► °° x e T" k -*- °° G is normalized, U is a stochastic matrix; and since G is admissible, — k j k e I > u. + - | > for i = 1, 2...n+l. Thus, using the theory of *x [6" X k k Markov Chains , U =| t J where each row vector t. approaches a t k n+1 steady state vector t as k approaches infinity, t = (0 O...Ol)V k e I implies t = (0 0...0 l) and lim (A'-IT^) = A' lim (Jp) = A'-jtr). ]^ -»• oo k ~>" °° /t\ n+1 the (n+l)-st element is (A'jt ) = Z (6.) = 1. QED. w 1=1 1 A P language which is generated by a left linear P grammar is called a regular P language. Theorem 2: There exists a regular language L with a probability y(x) assigned to each x e L such that no left linear P grammar generates (L, u). Proof: The proof will consist simply of exhibiting such a language . (1) Let T = {a}; then T is the set of strings {a |n e I}. (2) Assign probabilities to these strings y(a ) = , n > 0, 'T n whe 2i re t = h t t. = smallest prime 3 t. > max (t , 2 ) for i > 1. (3) Assign y(a) = 1 - E — t-" - • This guarantees that £ y(a ) = 1. i=l / n n=l Next we show that no left linear P grammar generates the language (T + , y). (1) Suppose the grammar G = (N, P, A) is alleged to generate (T , y). Then all y(a ) are in the field of numbers generated by the rationals with field extensions p. where p. is the probability- associated with the i-th production of P if < i <_ |p| , and p. is the probability 6. in the vector A if i = |p| + j. This field is denoted "fl (p .. .p) , where k = |p| + |n|. 1 K. (2) If all p. are in the field II or are algebraic extensions of it, then the total extension is of finite degree. Consider the extension ( — e=—. — z=— » . . • ) • This may be written as a union of fields each of which is a finite extension of degree 2 of the previous field. Thus, U H ( — ■==- , — /==->. ••> — T == ~^ is a fi e l d n=l Al rz /n whose degree must be infinite. Thus all of these irrationals cannot be within the finite degree algebraic field extension 1(p . ..p ). X K A Since all derived probabilities under the grammar G of finite strings are expressible as finite sums of products, these derived probabilities must be within "" (p. . . ,p ) . Thus (T , y) cannot be derived using G. (3) If some of the p. are transcendental extensions, then 1f(p n ...p, ) l 1 k can be obtained by a pure transcendental extension ^(p., • . .p^) = Q followed by an algebraic extension of finite degree Q(p ...p ). In this case, — 4 ""(p^ +1 - • -P k ) implies .— 4 Q(p £+1 . . .p fc ) 2 1 by the following argument. Let the polynomial f(x) = x - — be irreducible over Tf-. = 1f(p_ ...p.) but reducible over l ^1 l ^-.i = MPijJi where p.,, is transcendental (i < i) . Then l+l i ^l+l *i+l — f(x) = (x - a)(x + a), a e IF., (p. ). a is expressible as g(P i+1 ) K g 2 1 —, r where ** is in reduced form and not in 1T (*■) = — e R. h(p i+1 ) h i- h t 2 1, 2 g - ^h =0. But this equation implies that p is algebraic over 1i. which is a contradiction => <=. Thus if f(x) is irreducible over "L , then it is irreducible over A. , , This i l+l can be applied not 1 but H times to yield r 4 *" s» ; " t V(p , . .p. ) . Using the previous part (2) of this proof for the algebraic elements p„ .....p, . we get ft+1 k 3- x 3 -7^- 4 «n(p 1 ...p k ). QED. k. PROBABILISTIC AUTOMATA The idea of the probabilistic finite automaton was originally [17] conceived by Rabin. Basically, if an automaton is in some state q, and receives an input a, then it can move into any state, and the probability of moving into state q f is p(q, a, q'). Rabin requires that E p(q, a, q' ) = 1 (called type 1 normalization in this paper) for all q in the set of states Q, and for all a e T. Practical motivation for this requirement is that these automata can model sequential circuits which are intended to be deterministic, but which exhibit stochastic behavior because of random malfunctioning of components. Thus p(q, a, q' ) is interpreted as the conditional probability of q' given q and a, pr(q'|q,a), so by the theorem of total probability, Z pr(q'|q,a) = 1. q' e Q Other interpretations may give rise to other normalizations. For example, in performing the state identification experiment with a probabilistic automaton, one might interpret p(q, a, q' ) as pr(q, q'|a). This implies a normalization by summing over all possible q, q' values. E p(q, a, q' ) = 1. In fact, eight different types of q £ Q q' e Q probabilistic automata can be defined by the various interpretations listed in the following table. 10 Normalizations for Probabilistic Finite Automata a, p © TYPE INTERPRETATION NORMALIZATION 1 pr(q'|q, a) I p(q, a, q' ) = 1 v q e Q, a e T q' e Q 2 pr(q|a, q' ) E p(q, a, q' ) = 1 v q' , v a q e Q 3 ~pr(a|q, q» ) I p(q, a, q' ) = 1 y q, V q' a e T k pr(q', a|q) I E p(q, a, q' ) = 1 y V. a e T q' e Q 5 pr(q, a|q') E E p(q, a, q' ) = 1 v q» a e T q e Q 6 pr(q, q'|a) I T, p(q, a, q' ) = 1 y a q £ Q q' e Q T pr(q, a, q'|) ZEE p(q, a, q* ) = 1 q a q" pr(|q, a, q*) p(q, a, q' ) = 1 v q, a, q' One of the important theorems concerning finite automata, which was first proved by Kleene in 1956 states that for every left linear grammar, there exists an automaton which accepts all and only the strings generated by the left linear grammar and conversely, there is a left linear grammar which generates all and only the strings accepted by any finite automaton. Surprisingly, an identical theorem was proved by [3] Chomsky and Schutzenberger in 1963 concerning context free languages and pushdown store automata. The analogous problems for probabilistic automata are attacked in this paper. If the symbols a e T are interpreted as outputs instead of inputs, then the automaton becomes a generator similar to a grammar. In this case, type k normalization must be chosen so that an NP grammar will correspond to an NP automaton . 11 Definition: A Probabilistic Automaton (P automaton) over T is a a system A = (Q, M, S, H) where Q is a finite set of states, S is a finite set of storage tape symbols, - is an initial state vector and M is a function, called a probabilistic transition function, which has associated with it a second function p. The specific nature of these functions determines the type of P automaton defined. If 5 is a stochastic vector A A and if A is constrained to some normalization type, then A is a Normalized Probabilistic Automaton (NP automaton). Cases in A which S = will be simplified to A = (Q, M, E). Particular classes of automata are obtained by attaching constraints to the general definition. The following table lists some of the automata definable, and their range (i?(M)) and domain (£>(M)) constraints on the mapping M, and their normalization constraints Types of Automata 1. Deterministic Finite Automaton Norm Constraints: Type 1 RD Constraints: D(M) = Q x T, i?(M) C Q 2. Nondeterministic Finite Automaton Norm: Type RD: D(u) = Q x T, 2?(M) C P(q) 3. Probabilistic Rabin Automaton Norm: Type 1 RD: £(M) = Q x T, i?(M) C P(Q) k. Probabilistic Ellis Automaton ■ Norm: Type k RD: D(U) = Q x T, R(U) C?(q) \j {X} 12 5. Probabilistic Tree Automaton Norm: TyP e ^ RD: Z?(M) = Q x T, i?(M) CP(q«) 6. Probabilistic Pushdown Store Automaton Norm: Type k RD: Z?(M) = Q x S x T, 2?(M) CP(Q x S) U {Xj Note': P(E) for any set E denotes the power set of E. In the case of a type k normalized finite P automaton, M(q, a) = A is used to designate termination. Any q' e Q » M(q' , a) = X is called a terminable state. If for all q e Q, there exists a terminable state q' accessible from q, (i.e., } a sequence of states q = V q l J " q m 9 q i+l £ M(q i' a i } ' and X £ M(q m' ^ for some sequence of inputs a^ a n a_...a e T ) then 12m A A is a terminatable P automaton . A transition is a change from some state q. e Q under an input a e T to some state q. e Q such that q. e M(q.,a), and will be written (q. ,a) -> q.,-,- X e M(q. ,a) will be written (q. ,a) -> halt. Associated with each transition is a probability; the product of these transition probabilities is the probability p of the sequence q q . ..q . A mapping M(q, a) = cj> has probability zero associated with it, and designates that a transition out of the state q under input a is disallowed. The probability of acceptance m of a string x = a, ...a is Z ^(q^) p. p . where m is the In . , i ni i=l number of sequences q q,...q such that q. e M(q* , a.), .1 = 1, 2...n-l, and X e M(q , a ), C(q) is a function whose 13 value is the probability of starting in state q, and p . is the probability of the terminating transition from the last state in the i-th sequence. The P language A a . + accepted by a P automaton A is L= (T , p) where y(x) = the probability of acceptance of x. Theorem 3: Proof: Every finite P automaton accepts a P language which is generated by some left linear P grammar and conversely, every left linear P grammar generates a P language which is accepted by some finite P automaton. (a) Consider any left linear P grammar G = (N, P, A) over T. The equivalent automaton is constructed as follows: A = (Q, M, 5) where Q = N, H = A, and for each (A. ■*■ aA )e P J- J we define q. e M(q. , a) where q. = A. , q 1 = A - For each (A. ■*■ b)e P, we define X e M(q. , b). The probability of each of these transitions is defined as the probability associated with the corresponding production, P i A. *-£.. All other transitions are of the form M(q, a) = and have probability zero. For each derivation of x with A respect to G, there is a set of transitions which accepts A x using A, and by construction, probabilities are the same. Also, each 6. e A is equivalent to £(q.), so the derived n m probability of x = E 6. pr(A. •> x) = E £(q. )p. p . = i=l i=l Ik the probability of acceptance of x. Thus, the P language A A generated by G and the P language accepted by A are the same, (b) The construction of a P grammar from an automaton is as follows: If A = (Q, M, S), then construct G = (N, P, A) where N = Q, A = 5, and for each q. e M(q. , a) add a P i1 production A. ** aA to P; for each A e M(q. , a) add a P i production A. » a, where p. . and p. are respectively the probabilities associated with the corresponding transitions (q. , a) ■* q. and (q. , a) ■* halt. By the argument used in _L J X part (a) of this proof, the P languages generated and accepted must be the same. Corollary 3.1: A Every finite normalized P automaton A accepts a P language which is generated by some left linear normalized P grammar A AAA G and conversely, V normalized G, J normalized A » L(G) = A A A A L(A) where L(G) means the P language generated by G and L(A) A means the P language accepted by A. Corollary 3.2: A Every finite terminating P automaton A accepts a P language which is generated by some left linear admissible P grammar A AAA G and conversely, V admissible G, 3 terminating A 3- L(G) = L(A). Proof: These corollaries follow immediately from the construction in the proof of Theorem 3. a, 1/3 15 b, 1/3 Example 1. State Diagram of NP Automaton Corresponding NP Grammar A^aA A^aB B^bB 2/3 B lii.bC cUl cC SZi, A = (5 A , 6 B , 6 C .) = (1, 0, 0) 16 5. CONTEXT FREE GRAMMARS A If all productions of a P grammar G are of the form (A-^-s), A e N, p £ R, ce(N U T) , then G is called a context free P grammar . The definitions of derivation and derived probability are the same as for left linear grammars except that the replacement string may now consist of more than one nonterminal, so all derivations must be performed by operating upon the left-most nonterminal at each step to .a. avoid undesirable ambiguities. The definition of the P language L(G) A A A generated by G is unchanged; if G is a context free P grammar, then L(G) is called a context free P language . Theorem h: Every admissible context free NP grammar can be transformed into an equivalent NP grammar in Chomsky Normal Form, which means all productions are of the form A-»-BC or A-*-b, (A, B, C e N, b e T). Equivalent P grammars are ones which generate the same P language. Proof: The proof is a constructive one. A (a) Given an admissible context free NP grammar G = (N, P, A), Pit we first eliminate ail production B. — — **B. by constructing a matrix u whose rows and columns are labeled by nonterminal symbols B. . As the element in the B. row and B column, we 1 1 J ' P ij take p. . if B. ^B . is a production in G. Otherwise, the 17 element is zero. Construct a matrix V whose rows are labeled by nonterminals and whose columns are labeled by the strings C. 4 N which appear as replacement strings in J productions in P. The element in row B. and column £ is -L J p! ii A p' if there is any production B. *-C, in G. Otherwise, id i J the element is zero. u V is a matrix with q . , in the row labeled B. and column labeled z, , where q. . is the probability of a derivation B. -> B, -*...-*■ B„ -*■ X, , of l Is. I J length n + 1. Thus ( Z u ) v is the total probability matrix for B. ^> £.. To n=0 x J show normalization, we must show that ( Z u )v is a stochastic matrix. This is true for the n=0 ,u V N combined square matrix (--), and so it is true for n n f UV ) n = £L_ { 1i } ^ 2) can be reduced to productions with replacement strings of length n-1 by the following procedure: replace A-^-a 6 ¥ by A-^-a D, D— M$ V* where D is a nonterminal not in N of the old grammar and a H is a string of length n, so M has length n-1. By repeating this procedure, the maximum length can be reduced to 2. (c) Replace all productions A-^.a a where at least one a. is in T by A— =i*.B, B^ where B. = a. if a e N, and B. is a new 12 l i i l nonterminal if a. e T with the production B. — ►a. inserted into i ii the grammar. The same strings of terminals are generated by a grammar before and after steps a, b, c, and d, and the derived probabilities are unchanged. Thus, the new grammar is equivalent to the old because exactly the same strings with the same probabilities are generated. Theorem 5: Every admissible context free NP grammar can be transformed into an equivalent NP grammar in Greibach Normal Form-(GNF), which means all productions are of the form A^bC, C....C (A,C....C e N, b e T, n > 0). 12 n ' 1 n — 19 A Algorithm: Given any admissible context freeNP grammar G = (N, P, A), eliminate all productions of the form A-E*B by the technique given in the proof of Theorem k. Then define the set of handles of G as H(G) = {ot|a is the first symbol of some replacement string of P} . In this case, a is called a handle. M(G) = {a|a is the generatrix of some production with handle in N} . The requirement for an admissible P A grammar G to be in Greibach Normal Form is that r, must be a string of nonterminals for each production, A-q^-a^ and (l) T _)H(G) or equivalently (2) M(G) = . Note that if any 8 within ^ is a terminal, it can be replaced by a new nonterminal C and a production C-*>3 added to P. Thus our goal will be to obtain productions of the form A -*■ aR ft ...8 with each 8. e (NU T). The method 1 2 n ! of proof which is analogous to the method used for roi nonprobabilistic grammars by Greibach is to employ an iterative technique which generates at each step a A new grammar G = (N , P , A ) which has one less A e N in the set of handles. New nonterminals are created, but the new productions added are such that no new symbols ever appear as a handle. Eventually, all handles become members of T. The construction and proof which follow are illustrated by an example. 20 T = (a, b), N = {A, C} P = {A H. a , A^Lca A = (1, 0) C-HlibA, C i^iACC} Example 2. For any A e N M(G), the following procedure eliminates A from M(G) and from H(G) : (a) First construct a finite directed graph. One node s is labelled by A = A . For each node s. labelled A. , and each production A. — isi-A. 3 n B . . .6 where A. , A. e N, 3-. , 3 . . .3 e N U T, (n > 0) l j 1 2 n ij 12 n ^'_ create a node s. labelled A. (if one does not already exist) and J J create an arc from s. to s labelled 3, 3«...3 • For each x j x 2 n production A. — i*.c 3-, 3^.. .3 where c z T, create a new node l 1 2 n labelled t e I and create an arc from s . to t labelled l c 3„ 3^...3 . At times, nodes will be denoted by their 1 2 n labels since no two nodes have the same label. Numbered nodes are terminal and are not connected to any other node. Each arc is also labelled with the probability p of the corresponding production. Repeat this process until all nodes accessible from A and their arcs have been created. P is a finite set so this process will terminate after a finite number of steps . 21 Example 2: Taking A = A we construct the following graph. a, 1/2 a, 1/2 bA, 2/3 (b) From this graph, a set of productions is obtainable. Put all productions of P into P except those with generatrix A. Put the productions of the form A—Vz; and ^B.-^W, which are created below into P ft . For each terminal node t, and for each simple path from s. to t, call it s. s_ s....s , t = s , one can write a production 1 2 n n e n A "i. r r . . . r , q = II p. where £. and p. are respectively CT^Si Si-1 ^1' ^ . 1 y i l *i i=l the string of symbols and the probability associated with the arc from s. to s., i = 1, 2,..., n. In general, there may be several arcs from some s. to s. so s s ...s actually specifies a set of productions, one for each possible sequence of arcs connecting the sequence of nodes. Note: For any finite directed graph, one can always find all simple paths from any node A. to any node A. because one need only consider all sequences s^ s_...s with s^ = A. such that n is no greater than 1 n i the number of nodes in the graph. Consider a particular simple path from s to some terminal node t. Any nodes s. in this path s^ s, s^. . .s , (s = t), such that there is i 012 nn a path from s. to s. not containing any s. with j < i is said to fulfill 22 the loop condition with respect to the given path. Define one new nonterminal B. for each such s.. For each possible combination 1 i * s., s.,..., s of one or more nodes fulfilling the loop condition, 1 j k and for each sequence of arcs connecting s , s ,..., s , a production r must be written of the form A. — ^-C l, n . . . r B, c, . . . c B C . . . n n-1 ^k+1 k k j+1 j j ^I+I^a C. ••• ^p^i wnere ^n i s "the label of the arc connecting s to s n (Z = 1, 2...., n), B^ is allowed, but not B , and r will be defined a n later, (c) For each s. fulfilling the loop condition, productions must be created to describe all paths from s. to s.. Erase all nodes s^ of s^ s.,...s ^ 11 j 1 n such that j < i and all arcs connected to these s.. Then split s. i 1 2 1 into two nodes, s. and s.. s. has those arcs of s. leading out, and ill l 2 s. has those arcs leading into s. with associated strings and probabilities unchanged. Productions with generatrix B ± must be 1 2 constructed and put into P_ for each simple path from s. to s. using the technique of parts (b) and (c) of this proof with s. = s and 2 s. = s . These productions, which are called B. - loop generating productions, are constructed to allow looping,, so at the 23 right-end of the right-hand side of each, B must occur to allow the loop to repeat. Finally, for each of these productions B . » C B. where p is derived from part (b), another production, called a B. - loop terminating production, must he added to P n 1 to allow the loop to terminate, B.-£»- C 1 X " r i where p = p( ). r. is defined below. This procedure is r. i l recursive because steps (b) and (c) may need to be repeated many times if the path contains many loops within loops. Furthermore, the whole process is repeated for each simple path from s to a terminal node. The probability of a production (A — *.-£)e P n is defined recursively: (l)- If £ = C C _-|***^l» " ttien r = 0. which was previously defined. (2) If L, contains new nonterminals B. B....B, , then £ = Z, C ..... i j k n n-1 ? k + i \ \ ^k-r-'Vi B j V- - ? i+ i B i— c i« ma r = r. r r, q( )( " — ) ( ) where r is the sum of probabilities 1-r. 1-r. 1 - *\ i J k over all B - loop generating productions of P of pr(B^ -> c B^), I = i, j,..., k. 2k A G_ for example 2: A^liLbAa, A ililLbAaB, , 1/2 . 1/10 ,, A — L -»-a, A— - — »-aB, 1/6 B — £-*.CCaB (B-loop generating production) B »»CCa (B-loop terminating production) 2/3 1/3 The recursion terminates because simple loops from B. to B. are sequences of length at most |u| . A simple loop is a path s s ...s such that s_ = s , and s n s_...s is a simple path. At the second On 1 2 n x x level of recursion, we consider simple loops emanating from some node within sequences from B. to B. . These may be expressed as simple (2) (2) loops from some B. to B. . Notice that these paths do not have B. 11 l in them, so they are of length <_ |n| - 1. Similarly at the (m+l)-st level of recursion, simple paths are of length _<_ | W | - m because nodes B. , B. . . . , B. cannot occur. This is shown by the following ill argument. If s. f s_ then the algorithm erases the initial node, and the final node, by construction, has no arcs going out of it, so it cannot contribute to any loops ; if s. = s then it was constructed by splitting some node, and so it has no incoming arcs and therefore no loops. At recursion level |n| , simple paths are of maximum length 1. There can be no further loops, so part (l) of this recursive definition applies. Lemma 5.1: The P grammar G is normalized. Proof: Case 1: Suppose C e N, C 4 A . Then the productions C-E*-5 of N Q A are exactly those of N. So "by the normalization of G, the sum of their probabilities must equal one. Case 2: Suppose C e N, C = A . If there are no loops, then a proof by induction on the maximum number of nodes in a simple path from A to terminal nodes can be given. (a) Let n = 1 be the number of nodes minus one (i.e., maximum path has two .nodes, s_ and s ), then the following diagram shows the situation. A A *± In this case, G and G. have the same productions, A^ *-r . , *i (b) Let n > 1 be the maximum number of nodes minus one in A the graph for G, and let the lemma hold for all NP grammars with graphs having no more than n nodes in every simple path and having no loops. There is a one-to-one correspondence .A between productions A^ -> £ t, . . . . C-, of the P grammar G_ On n-1 1 U and derivations A. =*> r, c, ,...£. with respect to *G such that, U n n-l 1 by construction, the probabilities are the same. Thus the sum 26 A of probabilities of productions for G_ with generatrix A |N|-1 is Z pr(A =»a^) a z p ( £ pr(A .*>&$)) + a e T ■ J=l e C e(N U T)* 5 e (N \J T) * pr(A -»■ a) where all derivations are with respect a e T 5 e(N t T)* to G, and p • = Z P r ( A n "*" A -?)> Since there are no J C e(N SJ T)* ° J loops, each A . =*• aX, corresponds to a path with no more than J n nodes, so by the induction hypothesis, Z pr(A => az, ) = a e T y C e(N SJ T)* for j = 1, 2,..., |n|-1. Then by the normalization criterion, the remaining sum equals one: |n|-i Z pr(A =s>ac) = I Z pr(A -> A c ) + a e T j=UE(Ny T)* J C e(N U T)* Z pr(A ■> a?) = 1. a e T C e(N (J T)* (c) For the most general case, we use induction on h, the maximum number of nodes fulfilling the loop condition. The lemma has been proven by (a) and (b) for h = 0. Next assume the lemma is valid for all values 1, 2,..., h-1 with h > 0. By omitting a node s. fulfilling the loop condition in the path with the maximal number of nodes fulfilling the loop condition, and by adjusting the probabilities of the remaining sequences A of arcs out of s . , the graph now represents a new NP grammar G' 21 \onder the algorithm of Theorem 5 Graph for G 1 a^ V is the probability of the j-th path from s to s.. u . is the probability of the j-th path from s. to a terminal, m (t in the picture represents all terminal nodes. In G 1 , Z u. = 1, j=l m A — A and in G, Z u. + r = 1 by the normalization of G, and by the 0=1 J construction of G' taking u -_A 'j 1 - r By the induction hypothesis, the productions of G' with A as generatrix sum to one. These are productions of the following form u.V. A — J » 'l f .<[), for paths passing through node s. and their J k 1 m £ I probabilities sum to Z Z u V = Z V . The analogous J=l k«l J k k=l k A productions of G are of the form UV uV(r~) A — a-i-Y 4 . and A^ ^ — ^L+y B 41 a where r is, by definition, o j Y k -j jk r k m 1 - E u.. Summing over these two types of productions yields j=l ° 28 £ (u V + u V (--£ )\ = I V C I u + rl = E V, , which j=l k=l ^ k J k * k=l k j-l J w. k ' m £ * m & r m 2 u. A is identical to the sum for G' , so the total probability over A all productions of G with generatrix A sum to one. This technique can be applied to other paths not passing through node s. so that the above proof applies if several paths have the maximum number of nodes fulfilling the loop condition. Case 3: Suppose C = B. e N - N. By definition, the B. - loop generating productions sum to r. , and by construction the 1-14 B. - loop terminating productions sum to r. ( ). The i l r . l two sums together give a total of 1. A A Lemma 5-2: G and G generate the same P language. A A Proof: G differs from G only in rules with generatrix A, and new rules. The same set of strings is still generated from A and other members of N are not affected. A has zero entries for all new nonterminals so that no new generations occur, and the other old nonterminals retain their same value as in A. A is a stochastic vector because A is stochastic. We 29 need only show that any derivation of a terminal string x A A has the same probability under G and G . Consider any derivation which corresponds to a path from node A n to a terminal node, i.e., A =* a C . The general case can be A A inferred from this. We claim that using G or G , the probability of this derivation is the same. To use induction on the recursion level of the path (i.e., the number of simple loops within simple loops), first assume path s_ S....S is simple. The probability of this path A using G is the product of the probabilities of derivation P i P n steps. If A. _ — ^A. ?. (0 < i < n) , and A , — >- a X, , l-l l l n-1 n n A then the probability is II p.. By the construction of G , i=l there exists a production A_— ^.a c, c; .,...£, with On n-1 1 n q = II p. , so the probability of the derivation is the same i=l x A A using G or G . 30 Consider a derivation with recursion level m, i.e., loops nested up to m times "within other loops. Then, at the m-th level of recursion, there are only simple loops ( m) emanating from any B. . Suppose in this derivation the B. node is traversed k+1 times within a B. ~ loop. i 1 v A The probability of this using G is I m. II t J where there may be many simple loops emanating j=l J ( m) from B. and I is the number of different simple loops traversed, with m as the number of times the j-th loop J was traversed, and t . as the probability associated with J A the j-th loop. In G , there is a production B (m) — J^ cB ( m ) for the j_ th loop> r^ k _ th B (m) loQp is the final one and uses a loop terminating production r. fmi t i ( r 1) ri * m i B: m; — J 1- % c. Thus the product will be II t . J . 1— r i By the induction hypothesis, the probability is the same A A using G or G for any path of recursion level m-1, so let q_ be the probability of the path under discussion when all B. loops are removed. q_( IT t , J ) is the probability 1 J=l J using G of the path obtained when the k loops of B are added. Under G , the addition of the k loops ( m *1 ^ r\ means that a production B. —+J, £> , ...C, must be i n n-1 1 1-rn 31 (m) (*_!) P(i" L ) replaced by B^ =».?_•••?, B. C. ...C^. The A probability using G then becomes 1-r. r. I m I m (q.( ~ )) (t- 1 — n t. J ) = q II t .^ which is identical r i X - r i 3-1 J j-l J A to the probability using G. This result can be used, and the above argument repeated to add further B^ loops, and J A A show that the probability using G or G is the same for any path of recursion level m. A A Several things can be mentioned about G . AX M(G ), i.e., ■p A all productions A-*-£ have handles in T; N n H(G )e N, i.e., no new symbol is a handle. If A is a handle of any productions C— *-A ¥, then the following substitution changes this handle to a set of terminals as handles Suppose the productions with generatrix A in P are P l P 2 P n q A — »-a, C-, » A — >-a^ ?„»...« A »»a £ . Then C -VA C can be replaced by 112 2 n n P Ql PoQ- PI A C— i^ a , I. V, A »a £ V,..., C— ^-*-a c *• Thus A\ H(G n ). 11 2 2 n n ^ Final G for example 2: 1/3 The P grammar contains C — i ~*-ACC with handle A. Using the productions with generatrix A, we replace this production by 32 1/9 !A5 C -±i4*bAaCC, C *bAaBCC c iV^ aCCs c^-^aBCC. Proof (of Theorem 5): Let G = G = (N, P, E). Consider G. . If N n M(G. ) = , we stop, if not, the algorithm is used to find A A , v , A v M A ' = G i+1 = (N i+l' 5 i+l } such that L(G i ) = L(G i + l } ' M (G i+1 )n N C M (G.), N. +1 n H(G. +1 )C N, N. +] _ H(g\ +1 ) C A A M(G ). At each step, one member of M(G. P N is eliminated and new members are never added. No new non- terminal symbols ever become handles. Thus, since N is finite, we eventually reach an n such that H(G )n N M. Consider M(G ). If M(G ) = , we are A finished. Suppose C e M(G ). Then there exists a production n C-^*»A Q. By construction A cannot be a new symbol, i.e., A A a A e N. Then A t M(G ) since M(G)n N = 4> and A e H(G ); n n n A A but by construction H(G )fl N c M(G ). This J n — n A A contradiction implies C i M(G ): thus M(G ) = and n n A A H(G ) C T, so G is the sought after P grammar, n — ' n D A Theorem 6: There exist context free languages L with probabilities A y(x) attached to strings of L which cannot be generated by any context free P grammar. 33 Proof: This theorem is the context free analogue of Theorem 2. The example and proof given there are still valid if the P grammar is generalized to be any context free P grammar. The context free extension of Theorem 1, however, is false. The following example of a normalized P grammar which generates an un-normalj zed P language was suggested by D. E. Muller. A^AA, A m a Example 3. This NP grammar generates a, 1/3; aa, 2/27; aaa, 8/243; and oo the total probability is E (a ) = 1/2. n=l Theorem 7". There exist context free NP grammars over T which do not generate NP languages over T*. Proof: First a general criterion for an NP grammar to generate an NP language will be developed. Then the proof of the theorem is simply to observe that example 3 does not fulfill the criterion. A Let G = (N, P, A) be an admissible context free NP grammar. By theorem 4, G can be put into Chomsky normal form (A. -*• A A or l j Tc A. •*■ a) . For any particular A. define a matrix Z. with entries i * * i i p. = pr(A. -*■ A. A, ). Also define a column vector Y(t) with entries y. (t) = 1 JK 1 J K. 1 the probability of derivation of a terminal string within t derivation steps 3k given a starting nonterminal of A., y.(i) = q. = pr(A. ■+ a). The total probability of a derivation 1 a e T 1 |i| of length < t is a(t) = E 6. y.(t). A derivation A. => x of length — . , 1 x 1 i=l less than or equal to t+1 can be obtained by (l) a production A. ■+ a or (2) A. ■+ A. A^ and both A. and A yield derivations of length at most t. Thus we can write the equation y. (t+l) = |n| |n| E E r>. y (t) v, (t) + q.. In matrix form, this is j=l k=l x J k J k y ± (t+l) = Y T (t)Z i Y(t) + y ± (l). Lemma 7.1: y. (t) <_ 1, i = 1, 2,..., |n| implies y. (t+l) <_ 1, i = 1, 2 , . . . , j N| . Assume y . (t ) <_ 1 for all i , for some t . Then y (t+l) = Y T (t)Z j , Y(t) + y i (l) < (1, 1,..., 1) Z. (1, 1,..., 1) T + q. = E E p. + y. (l). Thus y. (t+l) <_ 1. J k ljk X X Lemma 7-2: y. (t+l) >_y.(t) This is easily shown by induction on t. Case t=l: y.(2) = Y T (t) Z ± Y(t) + y ± (.l) >y.(l). Case t=m > 1: Suppose y (t+l) > y.(t) for t = m-1. Then i — l 35 y. (m+1) = I I p. ,. y Am) y (m) + y. (l) y.(m) = j j p y^m-l) ^(m-l) + y^l). Each component y (m) >_y (m-l), J J y (ra) >_y (m-l) by the induction hypothesis, so the sum is greater, y.(ra+l) > y.(m). i — i Lemmas 7-1 and 7.2 show that y.(t) must approach some fixed point as t approaches infinity, (a bounded, monotone increasing sequence converges) T and (l, 1,..., l) is a fixed point vector. Suppose lim Y(t) = Y; t -*■ °° then Y must be a fixed point vector. This means y . = Y T Z. Y + q. » |N| |N| |N| y i = l E y i P iik y k + Z y i p iii y i 1 j=l k=l J 1JK k j=l J ljl * j^i k^i j?i |N| + * y i P iik y k + y i P iii y i + q i .K—J_ k^i 2 l N l l N | |N| y i (P iii } + y i ( 4 y £ (p iU + P i£i } - 1} + { .l ± k l ± P ijk y j y k + V = ° J#i J*i k^i Solving this quadratic for y. yields JN 1 - 2,-1 / V- l?i / Jl^i e y (p.. + p. .) + / e y (p.. + p. .) - lV Yi = , N I l N l - ^p,,,('z E p, , v y, y v + q,) Jfo k^i 111 1=1 k=l ijk j k 2p. . . 36 or if p. . . =0 then a simpler linear equation results. 111 T This system of |N| equations has a solution Y = (l, 1,..., l) . |n| |n| This solution implies lim a(t)= £ 5. y. = E 6. =1. . . -, 1 1 . , 1 t ">■ °° 1=1 1 = 1 A Criterion: An NP grammar G generates an NP language iff the equation has no solution vector smaller than (l, 1,..., l) where a smaller solution vector Y means yy., <_ y. £ 1 and 3 y. 3 <_ y. < 1 (i = 1, 2,..., |N|). 1*1 In case this Y solution exists, lim o(t)= E 6. y. < 1 is the , ... l i t -*- oo 1=1 A probability of a terminal string being generated by G since the smallest solution is always the one approached. It can be shown that this is the case by the following argument. Since both Z. and Y must have all non-negative components, (because G is normalized) y. = Y Z. Y + q. > q. for any solution Y. l i l—i Replacing the unit vector (l, 1,..., l) by the smallest solution vector Y in lemma 7-1 yields a proof that y. (t) <_ y. for all t; so y.(t) t y. . 2/3 1/3 In the example A — L -»-AA, A *a, we have A = A, q = 1/3, P lll = 2/3> S ° _ ^-Sii q i _ l±/L-U(2/3)(l/3) solutic UJ .ons are 1, 1/2. Thus 2p... 2(2/3) in |n| y( a n ) = Z 6. y. = 1/2. This example, therefore, does not fit the n=l i=l X 1 criterion. Furthermore, by solving this equation for the more general 37 _ 1 + /l-U p + Ur> lea p... = p, and q. = q = 1-p, we get y\ = — — '— t Solutions are 1, q/p. Tims A-^-AA, A -i-a yields 00 E u(a ) = q/p if p >_ q and 1 if p < q. n=l The significance of this result is that P grammars may have nonzero probability of an endless cycle of generation. This leads to the concept of nonterminating P grammars which generate infinite strings. Define T as the set of all infinite strings of symbols of T. Definition: A generalized P grammar G is strictly nonterminating iff (l). it is in Greibach normal form; (2) it contains no productions of the form A-=-*-a (these are called terminating productions). A strictly nonterminating P grammar must be considered to generate strings of T . By introducing tree automata in the next chapter, an analogue for context free P grammars of theorem 3 will be proven. The following theorems will be an immediate consequence of the theorems for tree automata. Theorem 8: Every strictly nonterminating left linear NP grammar generates a P language (T , y) such that y is normalized, (i.e., y is a probability measure). Theorem 9: Every strictly nonterminating context free NP grammar generates a P language (TT , y) such that y is normalized. It will also be shown that P grammars which generate strings over T U IT can be considered as a sub-class of P grammars over t . 38 6. PROBABILISTIC TREE AUTOMATA Definition: A tree is a set D C I* where I* is the set of all finite strings of positive integers and where D satisfies the following three requirements: Let d = (n n . ..n ), n. e I, then dn = (n n...n n), n £ I, X £_ K. 1 X d. iC 1. dn e D, n > 1, => d(n-l)e D 2. dn e D, n = 1, => d e D 3. If set {n|dneD,neI}^ , then n is defined as n =0. In this case, d is a terminal node of D. Definition: A valued tree over a finite alphabet T is a pair (v,D) where D is a tree, and v is a function, v:D -*■ T. Valued tree will sometimes be abbreviated to tree when there is no possibility of confusion with the previous definition of tree, Define A(n n . . .n ) = k. The length of a tree is sup £(d). d £ D 39 Define the m-th level of a tree to be the set (d c D|H(d) = m). If ( 3 m e I)(V d c D) U(d) < m] , then (v,D) is called a finite valued tree. Define a subtree of (v,D) to be (v* , D 1 ) where D' C d is itself a tree, and the function v 1 is v on the restricted domain D' , v' = v|D. The set of all finite valued trees over T is denoted by T . If d e D, *dl e D, then (v,D) is a full infinite valued tree. T denotes the set of all full infinite valued trees over T. Definition: A Probabilistic Tree Automaton (PTA) over T is a triple (Q, M, s) where Q is a finite set of states, 5 is the initial state distribution vector, and M is the next state function, M: Q x T ■+ P(Q*). Associated with each q e Q, a e T is a function p 3 p(p ) = M(q,a), i?(p ) = R. *q , a -q,a * *q,a (Transition probability). An elementary transition x i s a change consistent with M K from state q under input a to a sequence of st?tes & = q %. * • • q . (Consistent with M means e iM(q,a)). Thus a transition out of any state may go into many states rather than into a single next state. The nonzero probability of this elementary event is p(i ) = p (Qi,)« k q } a K A probabilistic tree automaton is normalized (NPTA) iff for all states q, the total probability of leaving q is one (i.e., £ Z p (On) = a e T QL e M(q,a) q,a l), < p (0 ) < 1, and 5 is a stochastic vector (i.e., q,a k — £ C(q) = 1 where ^ is a function: Q ->■ R which assigns to each state q, its q. e Q proper probability value from the vector E). Definition: A PTA is complete iff (Vq. e Q)(3 b e T) [M(q,b) 4 0], Thus, complete means there is always a positive probability of existing from any state q. Definition: A PTA A is strictly non terminating if (.1; A is complete. (2) (V q £ Q)(V a e T)[X f. M(q,a)]. Any complete PTA can be altered to form a strictly non- terminating PTA by adding a new state q if there are elementary transitions X e M(q,a), and replacing these by elementary transitions q e M(q,a) with probabilities equal to that of the omitted transitions. Next add a new blank symbol b to the alphabet and the transition M(q t , b) = { q t }, p (q t ) = 1. It is useful and convenient to represent these tree automata using a modification of the directed graphs usually used for state diagrams. A context free state diagram consists of a set of vertices, represented by small circles, each of which may be connected to others oy incoming arcs (lined with arrowheads) and outgoing cables (heavy lines). The vertices denote states, and cables denote possible elementary transitions. Each cable is labelled with the input symbol (e T) which produces the elementary transition and the probability associated with the transition. Each cable splits into one or more arcs each of which goes to a state. The initial states (all q 3- £(q) ^ 0) are designated by incoming labelled arcs with no source states, and no inputs. In the case of a terminal transition, X z M(q,a), the automaton literally goes into no fcl state (it stops). In drawing state diagrams, this can be represented by drawing an arc from the vertex q to a dead-end vertex q which has no cables out of it and which does not correspond to any state in Q. This is a state diagram of the automaton Q = 0} (d) = a v d e D a a /X b a /X b V / \ \ \ A \ \ \ D 2 = D i U{l11 2 ^l 11 ' m - 0} v 2 (d) = a if d e D x v (d) = b otherwise Example 5 U2 A . Definition: A run of A on a tree, denoted r(.v,D), is a function r: D->Q3VdeD, (r(dl),..., r(dn + )) e M(r(d), v(d)) where (a) 5 (r(A)) /0 and (b) if n + = then \ e M(r(d), v(d)). If condition (b) is omitted in the case &(d) = length of the tree (v,D) < », then r(v,D) is a p re-run . The set of all runs on (v,D) is denoted by Rn(v,D). A transition t is defined as the sequence of elementary transitions T used in a run to go from some level n of a tree to level n+1. The probability of this event is p(t) = n p(0- The sequence of transitions used in a run is denoted by t(r). The response function of a run (or pre-run) r is defined as the product of the probabilities associated with transitions used in the run ( or pre-run) and the initial state probability rf(r) = g(r(A)) II p(t). t e t(r) Definition: A k-prefix of a tree (v,D) is the subtree (v, , D. ) where — * IS, K. D = {x e d|&(x) £ k} and v = v|D . In defining acceptance of trees, it is desirable to accept a prefix as part of a longer tree without requiring the machine to terminate. Thus pre-runs on prefixes are necessary, and likewise the following definitions: An elementary pre-transition t out of state q under input a is the set of all elementary transitions from state q under input a. The probability of this event is p(i) I p (Q^.). I f M(q,a) = , ^ g M(q,a) q ' a k then p(x) = 0. The final transition t of a pre-run on (v , D ) is the sequence of elementary pre-transitions used K. K to leave the k-th level. The probability of this event is p(t) = IT P^O* t e t k Definition: The behavior of A is the set of all trees (v,D) over T 3 3 at least one run of A on (v,D), B(A) = {(v,D)| Rn(v,D) ? }. The k-behavior of A , B (A) , is the set of """ K. all k-prefixes (v , D ) over T. K K Definition: The probability of acceptance of a tree^ , y(v,D) is the sum of the response functions of all runs on (v,D), y(v,D) = I rf(r). r e Rn(v,D) Define the k-probability of a tree as y, (v,D) = y(v , D ) = E rf(r) where RriAV , D ) is the set of all pre-runs on r 6 Rn(v k , D k ) k k 1 k (v , D ). Two trees over T are defined to be k-equivalent . (v , D ) = K. K 2 ? 1 ? 12 (v , D ) iff D, = D. and v = v . k k k k Theorem 10: = is an equivalence relation. Proof: It is easy to verify that k-equi valence is reflexive, symmetric, and transitive. Theorem Proof: 11: (v 1 , D 1 ) = (v 2 , D 2 ) =>y, (v 1 , D 1 ) = n (v 2 , D 2 ) over T k 12 12 11 Let D, = D, = A , and v. = y = v, . then y (v , D ) = k k k' k k k' k Theorem 12: If A is a normalized PTA, then £ u(v. , D ) = (v k , D k ) £ B k (A) k * 1, k - 0, 1, 2,... Proof: Let £ rf ( r k ) denote E rf(r) = r e Rn(v k , D fc ) (v k , D k ) E B k a> Z y( v v' D i )• ^ e P ro °f is 'by induction on k. (1) Case k=0: For each pre-run, there is only a single transition which is a final transition consisting of one elementary pre-transition because the 0-th level of a tree is only one node, A. Given any q e Q, suppose 3m inputs a. e T 3 M(q,a.) ^ . Then 3 m prefixes a., , a^,..., a which each have a pre-run of the form r: A ■+ q, 12m and for the tree a. , the response function is £(q) I p (Q.). The probability of these m response q, e M(q,a.) q > a i J functions is E £(q) E P (Q.) which by a. £ T ' Q £ M(q,a.) q ' a i J 1 J J- normalization of A reduces to £(q). Finally, summing over all states q, we get E £(q) E E P (Q.) = E ^(Q.) 1 q e Q a. £ T Q e M(q,a. ) q ' a i J q e Q l j i by normalization criteria. (2) Case k > 0: Assume E . y(v, , D ) = 1. The A set of (k-l) - prefixes of trees in B(A) partitions the set of k-1 k-prefixes via the equivalence relation ==. This is indeed a partition because each (v , D ) has some (.Vj., D k _]_) as prefix, so h5 v . includes all prefixes (v . X K. K U E. includes all prefixes (v,_, D, ) where E 4 is an i=l equivalence class of k-prefixes all having the same (k-l) prefix. k— 1 I is the number of equivalence classes created by = . Furthermore, E. f| E = because each (v , D, ) has a unique (k-l) - prefix. For 1 j Y k' k every class E. , all members have the same (k-l) - probability by theorem 10. For a particular pre-run of some (v , D )e E. , the K K 1 transition from level k-l to level k of the tree yields a sequence of states, q q. All possible transitions emanating from this set of states to possible k+1 level trees have a sum probability of m m m Z Z E n p p. ...p. where m. = Z |M(q.,a)| k =1 k =1 k =1 k l k 2 K n x a e T X 12 n and each value of p, is associated with one of the transitions *i out of state q. . Thus summing over all pre-runs of all k-prefixes m m n the result is I rf(r, ) = E(rf(r )) I . . . 1° II p, ). The inner k k " X k =1 k =1 i-1 k i 1 n m m m sums can be written Z p, ( Z p, . . . ( Z p )...) = 1 since by k =1 1 k =l 2 k =1 n 12 n m. normalization the total probability of leaving state q. = E p. = 1. k.=l l i Thus Z . u(v, , D ) = Z rf(r, ) = Z rf(r, ) = (v k , D k ) . B k (A) Z y(v , D ) and by the induction hypothesis 1 » p(v k 1> D k l' = 1 > so Z a "(v , D ) = 1. ( vi- \-i^ \-i tA) U6 Theorem 13 (Kolmogorov) : A Let B (A) be probability spaces, k = 0, 1, 2,...; let A all of these spaces be consistent. Then B(A) forms a probability space consistent with the B ( A). (In other K. words, a consistent specification of all y determines y uniquely.) Lemma 13.1: Given any normalized PCF tree automaton A over T , the A set B (A) forms a probability space for each k e I. Proof: A probability space consists of a set ft, a sigma field § of allowable events which can always be chosen as P(ft), the power set of ft, if ft is countable, and a probability measure v. Assign (ft , v ) as follows: ft k = B k (A) t = P(\) v (B) = E y(v, , D ) for all B CB(A) (v k , D k ) e B The events e. of ft are sets of k-prefixes . By the previous theorem V, (ft, ) = E . u(v , D, ) = 1. Obviously, v, (B U B' ) = kk/p.x^/Vxkk k ( V V e B k (A) E y(v , D ). Assuming Bfl B' = , we can write (v k , D k ) e BU b' k k MB U B«) = E y(v , D } + E y(v D ) = v (B)' + = A establishes that by replacing the set B (A) by B' , the theorem still holds. Thus, v (B) = v (B* ) k £ (b) In case °° > I > k+1, apply part (a) I - k times. Proof (of Theorem 13): 00 Choose Q = T . The set of all trees k-equivalent to (v,D) is called the Borel cylinder over (v , D ). Define the probability of this Borel cylinder as v { (v 1 , D')|(v', D f ) = (v,D)} = u (v,D). Choose as t the smallest sigma field over T containing all Borel cylinders. Specification of probabilities of all Borel cylinders for all k completely specifies v on T . Since k-equi valence is an equivalence relation (theorem 9)» the definition does not depend upon the particular (v,D) in the equivalence class as representative. This and consistency of all B (A) assure that v is well-defined. It yields values for all Borel cylinders and therefore for all measurable subsets of T°°. Finally v really is a probability measure because v is countably additive and v(fi) = v(B(j£)) = U-jB, (A)) = 1. K k There is a correspondence between PTA's and Greibach normal form P grammars which becomes apparent by considering state diagrams. Each vertex q corresponds to a nonterminal A except for q_^ . Each cable leaving q corresponds to a production with A as generatrix. The various states to which the arcs go, correspond to the nonterminals in the replacement string of the production. The terminal symbol and probability of the production are the labels of the corresponding cable. In the example given of a state diagram, the corresponding P grammar would be G = (N, P, A) where N = (A, B}, A = ( 6 , 5„) = (l, 0), and A D p p 1 P b V-L ' P = (A — a **aA, A— ^aAB, B — »*bB, B HdAB). By changing the requirement of all derivations proceeding from left to right to a requirement that derivations from all nonterminals in a string occur simultaneously, the P grammar in Greibach normal form becomes a generator of trees where a production A-i».aB B ...B generates a node with n branches (a, p) n branches The P language generated is accepted by the PTA given in the above correspondence. This correspondence also shows that tree automata can be viewed as acceptors of strings. If a set B C T is In the behavior of A, the corresponding set of strings, x e T , is found by forming [2] parenthesized strings from trees (see Brainerd ) and then removing the parentheses. Alternatively, the process can be described as follows: For each (v,D)e T , add zeroes to the right end of each d' e D until it has length £(d' ) = max £(d). Consider these strings d e D as integers and order them d , d , ..., d so that d < d. for 1 K 1—1 1 i = 1, 2,..., k. Then form x = (v(d ), v(d ),..., v(d )) as the corresponding string. The following is an example of this process. Example 5: Valued Tree Underlying Tree b A A ^b slC ^2 x x \ b' N c 11 12 b 111 D = {A, 1, 2, 11, 12, 111} v(A) = b, v(l) = c, v(2) = b v(ll) = b v(!2) = c, v(lll) = b 50 Expanding D yields 000, 100, 200, 110, 120, 111 . Ordering this set yields 000, 100, 110, 111, 120, 200 . X The corresponding string is ("b * i ♦ * ♦, c d b c b) Thus , the sets of strings accepted by tree automata over are exactly the context free languages. Next we state this correspondence formally. . A Theorem 14: For every P grammar G in Greibach normal form, there is a A A PTA A which accepts the P language generated by G and A A conversely, for every PTA A, there exists a G in GNF which A generates the P language accepted by A. Proof: The proof is a construction identical to that of theorem 3 using (1) Q = N (2) £ = A (3) q B q B ...q e M(q A , a) with p (q q ,..q ) - p 1 2 n *' 1 2 n iff (A-4-aB, B„...B )e P 12 n (4) A e M(q,a) with p (x) = p iff (A^-a)e P q,a (5) M(q,a) = iff there are no productions A-^aX in P for all X e N*. Corollary 14.1: V normalized P grammar G in Greibach Normal Form (GNF), 3 normalized PTA A 3 L(.G) = L(A) and conversely, V normalized PTA A, 3 normalized G in GNF 3 L(A) = L(G). ST Corollary 3.U.2: V admissible P grammar G in GNF, 3 terminating PTA A , A. . A A 3 L(G) = L(A) and conversely, V terminating PTA A, 3 admissible G in GNF 3 L(A) = L(G) . Corollary lU.3: A V generalized admissible G in GNF, J complete A A . .A . A A 3 L(G) = L(AJ and conversely, y complete A, 3 G in GNF 3 l(A) = L(G). Proof: These corollaries follow immediately from the construction in the proof of theorem lU. Define the P language accepted by the (strictly nonterminating) A 00 probabilistic tree automaton A a s the probabilistic language (T , y) A A where A determines y for all k, and these determine y on B(A); define it .oo A . y(T - B(A) = 0. Definition: The union with weighting vector w of the P languages L. = (T , y.), i = 1,..., n is L = (T , y) where n y = £ w. y . . i=l X X Theorem ik: The union with weighting vector w of P languages L , A A -A L^,..., L forms a P language. Furthermore, if L. is 2 n do 1 normalized, i = 1,..., n, and w is a stochastic vector, a n A then the union L = U w. L. is normalized. i=l X X 52 Proof: (a) Let B , B ,... "be a countable number of disjoint measurable sets of T°°, oo n. oo y(U B .)=z w -y-(U B .) "by definition of y. j=l J i=l X x j=l J n oo = i w - Z y.(B.) by the countable additivity i=l X j=l X J of p i . oo n = Z E w. y.(B.) by algebraic manipulation. j=l i=l X X J = Z y(B.) by definition. j=l J oo A Thus y is a measure on T . Thus L is a P language. (b) y(oJ = i w. y . (pj by definition. i=l X X n = Z w. since y.(ft) = 1, i = 1, 2,..., n, i=l = 1 since w is stochastic* If (v^Dj) and (v , D ) are trees, then ^((v , D ), (v , Dg)) is a tree whose root has value a and with two branches going out to (v , D ) and (v p , D ) respectively. (v^ D 1 ) (v^D 2 ) The generalization of this operation is formalized. Definition: The concatenation under b of P languages L. = (T , y J, i = 1,..., n, is L = (T , y), defined as follows: for 1 1 A n n each combination of (v , D )e L ,..., (v , D )e L , k k define a tree (v,D). D = {A,} U ID, j . . . ynD where kD means all strings in D prefixed by k z I; and define k k v(a) = b, v(kd) = v (d) where d e D , k e I. Define u k (v,D) = n y^_ x (v 1 , D 1 ), k = 1, 2,..., i=l and y Q (v,D) = 1. Also define y k ((v,D) + (v\ D')) = P k (v,D) + U k (v', D'), provided (v,D) ± (v 1 , D' ). This guarantees that y is finitely additive. By simply omitting the definitions for y , concatenation under b can be defined for nonprobabilistic languages. A A 7\ Theorem 15: The concatenation under b of P languages L , L , . .., L , A forms a P language for any b e T. Furthermore, if L. is Proof: normalized, i = 1,... n, then the concatenation A A A L = bo(L n ,..., L ) is normali ze d . 1 n The k-probabilities of trees, y (v,D), were defined as n II y ,(v , D ), so consider the total sum for some k. i=l k_1 If k = 0, the sum is 1. If k > 0, then y n (ft) = y(ft., ) = k k 5h y(v, , D ) since y was previously defined (v k , D k ) e ^ so as to "be finitely additive on the finite set o, . u,(n) = ( \-r D k-i } e <£■: / n r^n s n ( Vi* \-i } £ n k-i n i=l n = n n E . . . = n i = 1 1=1 ( \-i' D k-i } e n k-i ^ (v Lr D k-i } i=1 assuming each y is a probability measure. Thus each (ft , y, ) forms a probability space. By theorem 13, a k k A consistent specification of all y determines y. L is therefore an NP language. ->. AAA Definition: The direct y-sum of the PTA's A. , A , . . . , A , where 12 n A. = (Q. , M. , 5.), is the PTA A = £pU. = (Q, M, S) i ill \^v/ i n where Q = U Q. (assuming without loss of generality that i=l x all Q. fl Q. = if i ?* .1), M(q,a) = M. (q,a) and p = p 1 i j i q,a q,a for all q e Q. , (p is the probability associated with l q,a M. ) and c(q. ) = w. S.(q. )• i l ill AAA A Theorem l6: If A , A A ,..., A are normalized PTA's, then A =(-h") A. 1' 2 ' n VIW/ l Proof: is normalized. One only needs to show that 5 is a stochastic vector, 55 5 " (w l 6l<*u } ' w i ^(a 12 ),..., w i 5 1 Cq 1 fc ), W 2 V^l^*" 4 n (q n k ^ Where q i1 £ \ and n J k. = |Q.| ,1-1, 2,..., n. n n Then |s| - e ^(q) - Z S v. 5.(q ) = Z q e Q i=l q id e Q. 1J i=l n w.( z e. (q.- •)) ■ £ w. = i. q. . e Q. 1=1 H iJ i Definition: The direct b-product of A , A.,..., A , b e T, where A. is the PTA, L = (Q., M., H.Xis i S (W\ = (Q» M > 5) where = u Q t u {q Q } with qLq i Q x » and Q i n Q. = 4>, n U i=l i,j=l,..., n,i/j, $(a) = 1, £(q) = if q ^ q Ql M(q,a) = M.(q,a) and p a (Q R ) if q e Q i ; M(q Q , a) = if a 7* b; for all possible combinations q q ...q such that q e Q and C i (q i ) > 0, Q.-L <1 2 • • • 0^ e M (q Q > b ) and H i=l Theorem IT: If A , L..., A are normalized PTA's, then A = A£) \ is normalized. Proof: By definition H is a stochastic vector, and since the state sets Q. are disjoint, each q. e Q satisfies the normalization l ii 56 criterion because A. is normalized. Finally, £ p U . ..q ) = q 1 ...q n e M(q b) V* X ^ n ^ n £.(q.) = q x e Q x q 2 e Q 2 a^ e Q n i= =1 l l I 5 i( qi )( i c 2 U 2 )..-( i 5 n ( % )...) = i -L n _ E (3- n e Q 11 «2 e Q 2 <*n e Si since E £.(q. .) = 1 for each Q.. Theorem 18: 3 an operator homomorphism h from the set of strictly nonterminating PTA's over T into the set of P languages of the form (T , y) under the operations (r4f) 5 v£/ ) and. (\y) , |0J ) where \w) means union with weighting vector w and lOy means concatenation under b. b Proof: (.a) Define for every PTA A, h(Aj as the language (T , y) which is accepted by A. The restrictions of complete and strictly non- terminating guarantee that A accepts some set of infinite trees. 57 these distributions are equivalent on all Borel cylinders, so 1 2 they are equivalent on all measurable subsets of T 00 . p = yi» so every PTA accepts exactly one language. (b) If A. accepts (T°°, / ) , (T°°, y ) = \v) (T°°, /), i = 1,..., n, then n by definition y (v,D) = £ w. y (v,D) for all trees (v,D). The i=l probability of acceptance by( Z rf V r ) r £ RnA(r k> D ) A. of an arbitrary Borel cylinder fw/ i IS n = £ w. £ rf^(r) i=l X r e RnA (v fcJ D fc ) i n = £ w £ C(r(A)) n P^ (t) i=l r e RnA {v^ D fc ) t e t(r) i i n = £ rfA ( r ) i=l X r e RnA ( v , d ) A i i = Z W, yJ(v,D). i=l i V Since this calculation was carried out for an arbitrary tree (v,D) this implies that the direct sum of P automata,£-.rHA. , accepts the union with weighting vector w of the P languages generated by the P automata, (c) Suppose (v , D ) is an arbitrary cylinder which has a prerun on the PTA A = I Y i A. ,i=l,...,n. Any prerun has a first transition from q_ to q q~...q» q. e Q. The probability of this is 58 n n £.(q.). After this, the other transitions out of state q. i=l 1 x X t correspond to a prerun on A. , so the prerun r. must accept a prefix (v, , , D, _, ) contained in L. . The probability of this prerun on A. k-1 k-1 1 1 is rfA (r.). The total probability associatiated with the prerun is i H £.( #» a ) where £(q n ) > 0, and a is the first symbol (not including #) of some string x e T which is printed on the input tape. A Pictorially, A has the following initial configuration. ( # § Then, using the terminology of Haines , (q' , s 1 ) e M(q, s, a) will be written as (q, s, a)~^(q' , s') where p is the probability associated with the transition. Only three types of instructions are allowed: 65 (1) (q, s, q ) » ( i ' i s'). This means if the automaton is in the state q, and scanning symbols s and a. on the storage and input tapes respectively, then with probability p, a transition into state q' will occur, and the storage and input tapes will move one square to the left and then s' e S - {#} will be printed on the storage tape one square to the right of s. (2) s' = A implies the storage tape is left unchanged and unmoved, so (q', s) e M(q, s, a.). The next situation is (q' , s, a ) l 1+1 assuming 1 <_ i < n where the input string is x = a a ...a . (3) s' = o implies the symbol s is erased from the scanned square of the storage tape and the tape is moved one square to the right. If the string written on the storage tape at time t before the transition was s. s n ...s. s, s. £ S, then the string at time t+1 1 k i D after the transition is s s ...s . (q', s ) e M(q, s, a. J and the new situation is (q', s , a . ,). This is defined only if k >_ 0. K 1 + 1 If a = X in any of these instructions, then the input tape is left unmoved and the transition is independent of the input symbol scanned, (q, s, X)— ^*.(q', s') implies (q' , s') e M(q, s, a) for all a z T and the next situation is (q 1 , s', a). A A pushdown P automaton A terminates if it is in a situation (q, s, a) such that M(q_, s, a) = . Also if X e M(q, s, a) then A may terminate in acceptance if the read-write head is scanning the first square of the storage tape and if during the last transition the read head has just read a and the tape moved. We introduce the useful 66 notational symbols q_^ and # to use in instructions, q is the fictitious dead-end state discussed in Chapter 6 ((q, s, a) ■*■ (q , s') means A e M(q, s, a)), and # is a fictitious "blank symbol written on the square to the left of a^ and on all squares to the right of a on n the input tape. Note # j. T, so a = # is not allowed in any instruction. A transition sequence for x e T + is a sequence of situations (q , s , a ), (q, , s n . a n ) , . . . , (q , s , a ) where A is started in an initial ill n n n configuration with xy written on the input tape for some string y e T*, and a sequence of elementary transitions consistent with M P- occur, (q. , s., a.) •» (q , s ), where p. + is the probability of the transition. The situation after the sequence of transitions must be (q, s , a ) where a = y, , the first terminal symbol of y. If Ti n n n 1 further, y = A and (q , s , a ) is the final situation, (q, , #, #), then n n n t the sequence is called an accepting transition sequence . The probability of a transition sequence is the product of the probabilities of the elementary transitions, p = IT p.. The probability of partial acceptance k i=l X m of x is y*(x) = 2 £(q n )p,, where m is the number of transition sequences k=l ° K for x. The probability of acceptance of x is a m(x) = Z £(q )p where i is the number of accepting transition sequences k=l ° k for x. The analogue of type h normalization (i.e., the total probability of leaving any state must sum to l) is £ Z E E p[(q, s, a) seSs' eSaeTq* eQ (q', s')] = 1. 61 l/2j n s) l/2 Jfl a) Example ?: A pushdown P automaton for the language {a b |n > 0} This automaton is type h normalized. A = (Q, M, S, S) over T = {a,b} (1) Q = {q Q , q x , q 2 > (2) S = {sj} (3) M has instructions: (q. » #» a)— ^(q 1> s) (q 1 , s, a) (q. lS s, b) (q 2 , s, b) ^(q 2 , a) (q 2 , s, b) »-(q t » °) (1+) S = (1, 0, 0) The model of Turing Automaton defined herein is derived from the 1-tape online turing machine . A Probabilistic Turing Automaton (called a Turing P automaton) is basically a pushdown P automaton in which the storage tape is allowed to move left and right without erasing. More formally, A Definition: A Probabilistic Turing Automaton A over T is a system (Q, M, S, 5) where Q is a finite set of states, S is a finite set of storage tape symbols, M is a probabilistic transition function, M: Q x T x sJ*.P(Q x S x J) U M where J = {-1, 0, 1} and H is the initial distribution vector. An instruction of A may be written (q, s, a)— *-(q' , s', j) where j=l indicates move the tape one square to the left and then print s', 68 j=-l implies print s' and then move to the right, and j=0 implies print s' but do not move the tape. All other definitions and restrictions are the same as for pushdown P automata. This version of Turing automaton differs from the usual in that it is probabilistic, it contains a move-bef ore-write type of instruction, its storage tape is only infinite to the right, and it must terminate by scanning the # in the initial square. It is easy to see that the latter three alterations do not change the set of languages accepted by a Turing automaton. Algorithm: Given any Turing P automaton, A, the following procedure A yields a Pushdown P automaton A' , which accepts an approximation language L(A' ) = (T , y'), to the language accepted by A, L(A) = (T + , u). Let A = (Q, M, S, 5) where we restrict H to be 1 for one state (initial state put (q _ g> a) _p^ ^ l V X, X)-^. a), (^ x, A)-^., „.), md (q> ?s> a) ^^ o) into P. Put new symbols q ± and q g into Q' . (« For each < q , s, a)^ (q ., x , _ x) in M> put (q> ,_ a) ^ q , ^ g) ^ (q. ?s, a)-^(q», CT ) into M> (e) For each (,, .. a ) J^, , A , 0) ln M> ^ ^ ^ &) ^ ^ ^ (l, ?s, aJ-JUq.', A) into M. W For each ( q , s, ajJWo." , A , +1) ln M> ^ {q> ^ a) _^ qij ^ into M for all symbols !s defined in S'. The six instruction types cover all possible directions, with and without printing, that the Turing P automaton can move. Notice that the pushdown P automaton can simulate all of these transitions except -ving to the right without printing, in which case it prints ,.. ft us , if type (f) instructions are not used in the Turing P automaton program,' then the P language recognized will also be recognized by the pushdown P automaton constructed from the Turing P automaton by the previous algorithm. Furthermore, if the s qU are in which ? s would get printed by the pushdown P automaton is never revisited by the Turing P automaton, then the approximation is again exact. Definition: The (^-normalized) Initial Definite P language, ^ of a pushdown ^P automaton «■ which approximates a Turing P automaton a" is the set of all strings x which are maximal initial definite segments of strings z c T* 3 p(z) > o. The probability assigned to x is the probability of partial TO acceptance with respect to A, y (x) = y*(x). All other elements y of T* have y (y) =0. x is maximal initial definite if it fulfills (1) initial: 3 y e T* 3- xy = z, y(z) > (2) definite: -J a transition sequence for x with respect to A' which contains no (a, q, ?s)— ±-*^q' , 3') type instructions (called indefinite instructions ) . (3) maximal: -^ a transition sequence for x fulfilling (2) above which can be extended to an accepting transition sequence for z with A respect to A' such that the first instruction after the initial definite segment x is indefinite, (q, ?s, a) ■> (q', s'). Define lid = I y (x). Define L state set as the x e T + set of states of Q' in which the pushdown P automaton can reside after strings x 3- y (x) > have been input. A A Theorem: Let A be a Turing P automaton. Let A' be the approximating pushdown P automaton as described above. If the set of A states of A' reachable from the L y state set all have type k normalization (where q reachable means there is a transition sequence such that q^ = q, and q_ e L state set) then the following error bound holds: E |y(z) - y'(z)|< lid - I y(.z) z e T + z e T + Proof: Consider a Markov chain whose states are the states of the A automaton A' , and whose transition probabilities p. . are just 71 the probabilities p. , of a transition from state q. to q . Take as initial state any q. e L state set. By J normalization, E p = 1. It is then possible to compare y (x) to y'(z) for all strings A z e L(A' ) 3 z = xy for some string y. y (x) >_ £ y ' (z) . z = xy The inequality is not a strict equality because we have not excluded anomalies such as instructions which lead to dead- end, nonaccepting states. Since all accepting transition A sequences with respect to A' begin with transition sequences for some x with u ID (x) > 0, 2 y (x) > E y'(z). X Z L ID ID ~ z e T + lid>_ I u'lz), E Jy(z)-y'U)|= I y'(z)- [z) zeT zeT zeT because each string of this is accepted under this approximation with probability >_ y(z). = E + y ' (z) - I + y(z) zeT zeT = lid - Z y(z) zeT = lid - 1 if Lift 1 ) is an NP language. Thus, if error of approximation is measured by E J y I z) - y'(z)|, then an error bound is lid - 1. zeT 12 APPENDIX B EXAMPLES OF REGULAR TREE EXPRESSIONS 73 Expression conventionally written (a U b)*c is written a U b \j c The context free grammar of example 7 is A ■*> bAB, A -*■ bB, B -v b A + cAC, A -> cC, C ->- c The corresponding RTE is b <[u)]*[b]> + c <[w]«[c]> + b <[b]> + c <[c]> The PTA of example k is a,p The corresponding RTE is (l) (0) ■«•— superscript probabilities a <[wl]*[b < [wl]»[u2]> + b <[a>l]>]> + a <[wl]> « — subscript vectors ( p;> K> ( v <*a> 7* VITA Clarence Arthur Ellis was born in Chicago, Illinois, on May 11, 19^3. He graduated from Beloit College in 196U with the degree of Bachelor of Arts in Physics and Mathematics. Since that time he has been a research assistant in the Department of Computer Science at the University of Illinois. In 1966 he graduated from the University of Illinois with the degree of Master of Science in Mathematics. He has been a member of Sigma Xi , the Association for Computing Machinery, the Mathematical Association of America and the Institute of Electrical and Electronics Engineers. =ormAEC-427 U.S. ATOMIC ENERGY COMMISSION (6/68) UNIVERSITY-TYPE CONTRACTOR'S RECOMMENDATION FOR AECM3201 DISPOSITION OF SCIENTIFIC AND TECHNICAL DOCUMENT I S»e Instructions on Revmne Sid* ) 1. AEC REPORT NO. COO-1U69-01U9 2. TITLE PROBABILISTIC LANGUAGES AND AUTOMATA 3. TYPE OF DOCUMENT (Check one): Q a. Scientific and technical report 12 b. Conference paper not to be published in a journal: Title of conference Date of conference Exact location of conference. Sponsoring organization (2?]c. Other (Specify) Thesis 4. RECOMMENDED ANNOUNCEMENT AND DISTRIBUTION (Check one): a. AEC's normal announcement and distribution procedures may be followed. 2 b. Make available only within AEC and to AEC contractors and other U.S. Government agencies and their contractors. 2 c. Make no announcement or distrubution. 5. REASON FOR RECOMMENDED RESTRICTIONS: 6. SUBMITTED BY: NAME AND POSITION (Please print or type) C. W. Gear, Professor and Principle Investigator Organization Department of Computer Science University of Illinois Urbana, Illinois 618OI Signature ^K&k*i*** Date October 1969 FOR AEC USE ONLY 7. AEC CONTRACT ADMINISTRATOR'S COMMENTS, IF ANY, ON ABOVE ANNOUNCEMENT AND DISTRIBUTION RECOMMENDATION: 8. PATENT CLEARANCE: LJ a. AEC patent clearance has been granted by responsible AEC patent group. LJ b. Report has been sent to responsible AEC patent group for clearance. l_J c. Patent clearance not required.