uBnfiililnllJifQ HHWaBlH unGnuliflJ IMXiflim |8h}HJJI JHH tmm i! ■■I IMMIII Jil'tif't'i ''ii'i'»l(( LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510. 8+ lift- no. G6I-666 cop. Z CENTRAL CIRCULATION BOOKSTACKS fee of $75.00 for each lost book. Allft 8 When renewing by Phone, writenew due datebdow previous due date. UIUCDCS-R -7^-663 /7UAA( A COMPARATIVE DISCUSSION OF VARIABLE -VALUED LOGIC AND GRAMMATICAL INFERENCE by A. B. Baskin July 197^ DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS THE LIBRARY OF THZ uiucdcs-r -7^-663 A COMPARATIVE DISCUSSION OF VARIABLE -VALUED LOGIC AND GRAMMATICAL INFERENCE by A. B. Baskin July, 197^ Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 Digitized by the Internet Archive in 2013 http://archive.org/details/comparativediscu663bask Ill ACKNOWLEDGMENT The author would like to thank Dr. R. S. Michalski for suggesting this line of inquiry and for his discussions concerning the material contained in this work. The author would also like to acknowledge Connie Slovak for her help in preparing the manuscript. IV ABSTRACT This paper reviews what is meant "by grammatical inference and it discusses some of the currently used methods of grammatical inference. The application of a variable -valued logic system to this inference problem is explored and, where possible, direct comparisons between methods are discussed. Examples of a correspondence between a variable- valued logic inference process and a grammatical inference process are presented. Operations which increase the class of languages to which variable-valued logic can be effectively applied and operations which allow simplified variable -valued logic descriptions of classes of grammars are discussed. Type 3' and type 3" grammars are defined (both subsets of type 3 grammars) and the equivalence of variable -valued logic formulas and type 3* and type 3" grammars is illustrated. TABLE OF CONTENTS Page 1.0 Grammatical Inference 1 2.0 Methods of Grammatical Inference 8 3.0 Inference of Pattern Grammars 18 k.O Variable -valued Logic 22 5.0 VL Systems and Grammatical Inference 27 6.0 VL Representations of Language 30 7.0 Simplified VL Representations 37 8.0 Grammatical Inference and Pattern Recognition kl NOTES 1+5 1.0 Grammatical Inference In order to describe the process of grammatical inference, it is necessary to present a few definitions. Formal statements of the process of inductive inference [22] and specifically grammatical inference exist in the literature [5,9]* Since no formal theoretical comparisons will be included in this work, the appropriate constructs will "be introduced only as needed to supplement the intuitive discussions to follow. Feldman [9] has correctly observed that any discussion of grammatical inference can be generalized to include a much larger set of problems including the inference of functions, theories, and patterns. Because of the fact that the process of grammatical inference holds promise as a vehicle to study other forms of inference, more detail will be included in the discussion of the inference of a formal language grammar than might be otherwise justified. A grammar, G, is a ij- -tuple G = < N, T, P, S >, where N is a finite set of non -terminals, T is a finite set of terminals (the input alphabet), P is a finite set of productions or rules, and S is a non- terminal called the sentence symbol or the start symbol. It is necessary that N and T be distinct sets (i.e. their intersection must be empty). Lower case letters will be used to represent terminals, and capital letters will be used to represent non-terminals. A string is a concatenation of symbols from the sets N and T. A language is a set of strings where each string contains symbols from the input alphabet T only. The set of all strings over an input alphabet T will be denoted as T*. Thus T* represents all strings which can be formed by the concatenation of elements from T selected with replacement. The difference between terminals and non-terminals can best be demonstrated in terms of a simple example. The terminal is the atomic unit of a language. It is the unit below which it is not profitable to subdivide the language. Consider ordinary English; the logical terminals are letters. For information to be conveyed, it is not necessary to examine the structure of each individual letter to determine its component strokes. It is clear that a language could be formed by considering as terminals a set of strokes from which all letters could be formed. However, letters seem the most appropriate terminals for the English language. As examples of non-terminals, consider the concept of a word. It would be possible to describe English without the concept of a word, but it is simpler to group letters together into logical constructs which can be treated as a unit. Non-terminals serve just such a purpose. In addition to words, it is sometimes convenient to group words and letters (one letter words for instance) to form sentences. Thus the non-terminal "sentence" is formed from both terminals and non-terminals. The ideas of a paragraph and a chapter are also the use of non-terminals for the description of English. One speaks of the rules of grammar for English as the rules which specify the manner in which terminals and non-terminals can be put together to form statements in the English language. As indicated by the example above, the use of a set of rules or productions is an essential part of a grammar. The set of rules specifies the valid substitutions allowed in forming strings of the language. The start symbol is required to be the first symbol in the derivation of any string in the language. A derivation consists of the successive application of rules from the set of rules which change one string into another. The idea that a grammar can capture the pattern in a language is important to the discussions which will follow. Consider the following two grammars for the trivial language (abc, at>, cab}, where the input alphabet T = {a,b, c}: S -» abc S -* Dc S -* ab or S - D and N = {D,S} S -> cab S -> cD D -* ab The trivial set of productions on the left (and thus the grammar of which they are a part) defines the set of strings which constitute the language, but they do not demonstrate the pattern or structure that is present. The productions on the right specifically state that the string ab occurs in three places. In addition, in situations in which enumeration is not desirable, or even possible, the use of non-terminals can simplify the set of productions used to describe the language. The idea of simplicity has been studied along with the companion idea of cost [5*6, 9] • F° r man y applications, the measure of cost and complexity is the same. Several different measures of simplicity will be mentioned in section 2. A string from the set T* which is an element of a given language, L, is said to be a positive instance of the language L. In a similar manner, a string from T* which is not in the language L is a negative instance of the language L. An information sequence or just a sequence is a group of strings presented in a specified order or sequence, A positive information sequence is a sequence in which all strings in the sequence are also in the language L, and a negative information sequence is a sequence in which all strings are in the set T* but not in L. With the above definitions, it is possible to specify what is meant by grammatical inference in more detail. Grammatical inference is the process of inferring the grammar which describes a positive information sequence while not including any strings from any given negative information sequence. The omission of a given positive instance of the language from the inferred language is considered an error. The inclusion of a known negative instance in the inferred language is also considered an error. Generally, errors are not allowed in the inference process. The inclusion of a string from T* about which nothing is known is called the introduction of a discrepancy in the inferred language. The inferred language is then greater than the initially given set of positive instances, but it does not contain any error. Cook [5] has noted that by increasing the discrepancy it is often possible to reduce the complexity of a grammar. It is possible to classify grammars based on the strings used in the productions. The classification to be discussed below was introduced by Chomsky [3] and will be extended to include two interesting classes of grammars (and thus languages). For the purpose of the discussion below, a general production will be of the form: where A is a non-terminal and a is a string of terminals and non-terminals, The interpretation of the rule above is that the string A may be replaced by the string a. The strings cp and \|r are the left and right context of A respectively. Type grammars are grammars with productions of the form described above. There are no other restrictions on the form of productions of type grammars. Type 1 grammars are grammars with productions of the form: cpAty -» cpa\|r where a is not the null string. Type 1 grammars are the so-called context sensitive grammars because the replacement must be made within the left and right context. Type 2 grammars have productions of the form: A — a where a is not the null string. Type 2 grammars are called context free because there is no specifica- tion of either left or right context in the productions. Type 3 grammars, the so-called finite state grammars, are of the form: A -> a where a = a or = aB (a is an arbitrary terminal and B is an arbitrary non- terminal) . There are several important results from the formal theory of languages which will be important to the discussions which follow [13,1^]. The most important idea is the idea of a machine which accepts a given language as defined by a grammar. This equivalence between machines (called acceptors) and grammars will be exploited when a comparison is made between grammatical inference and variable -valued logic . Type 3 languages can be represented by a finite state machine (and thus the same finite-state grammar or language). Such a machine is totally specified by giving the states of the machine and a tabulation of the state transition and output behaviors as a function of the input and the current state of the machine. The machine is seen as being presented the letters in a candidate string, one at a time, and the output of the machine after the input of the entire string indicates whether the string is in the language in question. In general, a different machine is used for each different grammar. For the purpose of this work, type 3' grammars will be defined as type 3 grammars without recursion. This means that productions of the form: A -> aA (or sequences of productions which accomplish the same thing) are not allowed. This restriction can be expressed in terms of the machine acceptor for a type 3' grammar also. The state transition graph of the finite-state acceptor for a type 3' grammar is a directed graph with no loops . As an additional extension to the formalism of Chomsky, type 3" grammars will be defined as type 3* grammars whose corresponding machines can be viewed as a tree with the root at the sentence symbol and no links that connect different levels of the tree. The corresponding restriction on the form of a grammar will be defined by example. The grammar below is type 3' but not type 3"« T = [a, c] S - aB S -> cD N = {B,D,S} D -* aB D -» a B -» a B -» c The use of B on the right-hand side for two different non -terminals (S, D) distinguishes this grammar from type 3". It is important to notice that, by construction, the types of grammars are related in the following way: type 3" c type 3* c type 3 c type 2 c type 1 <= type 0. This observation will be important later. An equally important observation is the fact that the machines necessary to accept the languages above increase in power from left to right. This means that a machine (or other formalism such as a grammar) which is sufficient for type 3 is not sufficient, in general to accept a type 2 language. Since subsets of type 3 are the simplest grammars and thus the simplest languages to represent, it is not surprising that early work in grammatical inference concentrated on type 3 languages and subsets thereof. The next section will present the general aspects of several different methods of grammatical inference. The range of applicability of each method will be indicated. 8 2.0 Methods of Grammatical Inference Several surveys of results in grammatical inference exist in the literature [1, 5, 11] • Gold [12] presented a number of interesting results about grammatical inference which were extended by Feldman [9] and reviewed by Cook [ 5] • The details of the results are not of interest here because the problem was treated from an enumerative approach. Such an approach to the inference problem implies the enumeration of all possible grammars over a given input alphabet and the selection of the first grammar found to include all of the given positive instances and none of the negative instances. A more sophisticated problem, also treated by Feldman, is the problem of selecting the simplest grammar that meets the above requirements. An interesting result of these discussions is the fact that the selection of the least complex grammar from the set of all candidate grammars is unsolvable, in general, unless negative instances are given. This result combined with the fact that the number of candidates to consider for the enumeration grows combinatorially with the size of the language implies that the enumerative approach is impractical for any but simple examples . As an alternative to the enumerative approach, several constructive approaches to the problem have been proposed. Most work that has been done involves finite-state grammars with some work being done for more powerful grammars [5]« Before discussing specific algorithms which have been suggested for grammatical inference, an intuitive example will be presented. Consider the set of all well-formed strings of parentheses. As a sample (a positive information sequence) take all such strings of length six or less. The sample set is thus: 0, (0), ((())), (00), (0), 0(0), 000, (())(). For the sets of productions below, the short hand notation S -» a/b will be used to replace the more bulky S -* a, S -> b. The generalization is obvious. The most trivial solution to the inference problem is the simple set of productions: S -> ()/(())/((()))/(()())/()()/()(())/()()()/(())() This set of productions portrays the sample with no discrepancy (the inclusion of strings not in the sample and not known as negative instances). It does not satisfy the desire to capture the structure of the language under study, and the method does not generalize to large and potentially infinite languages. Since we know that non-terminals are used to portray the structure of a grammar or to simplify a grammar, it seems reasonable to try to introduce some non-terminals into the trivial grammar in order to simplify it. Three different identifications of non-terminals will be demonstrated below: 1. Y- S - Y/(Y)/((Y))/(YY)/YY/Y(Y)/YYY/(Y)Y 2. Y-(( S -> 0/Y))/Y()))/Y)())/()()/()Y))/()()()/Y))() 3. Y- )) S - 0/((Y/(((Y)/(()(Y/()()/()((Y/()()()/((Y(). Other choices for Y are possible but they lead to less compact grammars than the above. On a purely intuitive basis, it appears that grammar 1 is the most compact representation of the sample and should be chosen as the best from this group. Can the reduction in complexity be improved? It would seem logical to take the new grammar and try to improve it by selecting a substitution for some string which might include Y and 10 assigning it to a new non -terminal. This iterative process forms the heart of many constructive approaches to the grammatical inference problem. In general, the constructive approach to grammatical inference is the same as a heuristic search procedure. Such a procedure tries to make the best choice possible at any stage of the search. Since the objective of the constructive approach is to avoid an exhaustive search, the stopping condition becomes all important. Cook [6] disucsses several typical stopping criteria in the context of the inference of stochastic grammars, but many of his remarks are generally applicable. The first method for assuring the termination of a search process in an acceptable time is to restrict the set of allowed candidate solutions in such a way that an exhaustive search of this set can be performed in the worst case. This procedure is very similar to the total enumerative method and suffers from most of its problems. Any constructive method starts with a candidate solution or a part of a candidate solution and tries to add to or subtract from the current best guess in order to improve the candidate solution. Implicit in such a process is the potential for backing up and trying another path which was previously rejected. At the point where a back up is suggested a decision must be made as to whether the amount of improvement to be gained by the retry is worth the effort. No definitive solution to this problem has been presented for the existing algorithms for grammatical inference. (More will be said about this problem when variable-valued logic is discussed. ) The proposed stopping criteria in use at present are to limit the time or steps in the entire process, to avoid back up entirely, or to try to back up only when the discrepancy 11 of the new path proves to be less than the final discrepancy of the current best grammar. The last of these is the criterion used by Cook, and it is the most promising. At present it is not possible to prove it to produce a correct choice in the general case. The first specific method of grammatical inference to be discussed, unlike those which will follow, does not involve the possibility of a back up. Biermann and Feldman [1] have reported a method of inferring finite-state grammars which creates and compares sublanguages. The original sample is divided into equivalence classes which share a common initial string. The resulting equivalence classes are identified with non-terminals and the grammar can be produced. The mechanism is best understood in its application by Biermann and Feldman [2] to the problem of the inference of finite-state machines from samples of their input and output behavior. Biermann and Feldman define a relation which they use to partition the sample set of strings into equivalence classes. The relation uses only the first k terminals in a string. This means that the adjustment of the parameter k can cause the machine to pass from the universal acceptor (k=0) to the minimal deterministic finite-state acceptor for the sample (k> the longest string's length). When used as an acceptor, the machine will have the desired behavior for the first k letters of a string and the behavior for longer strings is not assured to be as given in the sample. Thus, if an acceptor for only k letters were needed, the algorithm would be faster since it would not have to consider letters in strings in the sample after the k th letter. For small values of k the machines are non-deterministic. Such a non-deterministic machine must be converted to a deterministic acceptor before it can be 12 converted to a grammar. Because of the equivalence between finite- state acceptors and type 3 grammars, the resulting grammar has the fewest possible number of non-terminals. Biermann and Feldman have programmed the algorithm and they claim that typical constructions of machines with 10 to 20 states require only a few seconds of processor time. Since the process minimizes the number of states in the acceptor and thus the number of non -terminals, the process is not generalizable to alternate minimization criteria. A generalization of the algorithm has been made to infer productions of the form A -» a or A -» aBc, but it will not be discussed. In order to simplify the example below, the algorithm will be used to find the minimal grammar for the sample set shown below: {caaab, bbaab, caab, bbab, cab, bbb, cb] where k>5« (Two sublanguages are considered equal if they are equal for the first k letters.) The first two sublanguages are: S = {aaab, aab, ab, b} and S, = {baab, bab, bb} which are distinct sets (not equal and neither is a subset of the other). The two sets above each give rise to a non-terminal and the first production is : A -* cB/bD. The two sets above will now be divided into sublanguages and all sub- languages which are distinct from all previous sets will give rise to new non-terminals. Sets which are subsets of previous sets or equal to previous sets will be identified with the non-terminal created by that set. Thus we have: 13 S = faab, ab, b] and S^., = faab, ab, bl ca *" ' ' J bb ' ' ; where S = S c S and thus no new non-terminal is needed. The set ca bb c S , only contains a single terminal (all strings of length one) and cb thus no new other non-terminal is needed. The final set of productions is thus : A -» cB/bD B -> b/aB D - bB which is the minimal grammar which produces the strings in the sample as required by the choice of k. Notice that the recursive production allows a language which is larger than the sample, but all strings of length 5 or less are properly described. The more general case of k less than the length of the longest string in the sample involves the conversion of a non-deterministic finite-state machine to a deterministic machine and it will not be treated further here. The method above is capable of inferring a type 3 grammar with the minimal number of non- terminals for a given sample set. Feldman et al. [10] have implemented another algorithm for inferring finite-state grammars from a sample of a language. The process constructs a non-recursive type 3 grammar with residues which represents the sample. A residue is a production with a non-terminal on the left- hand side of the rewriting arrow and a string of terminals on the left. The grammar with residues is simplified to produce a finite-state recursive grammar. The resulting grammar is near minimal with no precise estimate of the distance from the minimum. As an example of the method consider the sample set: Ik (caaab, bbaab, caab, bbab, cab, bbb, cb} . Strings are processed sequentially in order of decreasing length. The first string can be generated by the following: S - cA A -* aB B -* aC C -+ ab (a residue). In order to generate the second string, bbaab, the following productions are needed: S - bD D -> bE E -» aF F -+ ab (a residue). In order to generate the third string, caab, the production C -» b must be added to the list of productions. If this process is continued, the set of non-recursive productions with residues found below will be generated. S -» cA/bD A -> b/aB B -> b/aC C -* b/ab (a residue production) D -* bE E -» b/aF F -» b/ab (a residue production) A set of productions which generate the sample has been formed. It is now necessary that the residue productions be removed and, if possible, recursion be used in the replacements and simplifications. The residue production F -* b/ab can be removed if the production E -» b/aF is replaced by E ->■ b/aE and all occurrences of F are replaced by E. A similar merger of the other residue production produces the following grammar: 15 S -> cA/bD A - b/aB B -» b/aB D -* bE E - b/aE where there are now no residue productions. The productions involving non-terminals A, B, and E are of the same form and can be merged together to form the set of productions below: S -* cB/bD B -* b/aB D - bB which is the same grammar which was obtained in the first example for the same sample set. As was pointed out in the first example, the inclusion of recursion automatically increases the discrepancy in the generated grammar, but the reduction in complexity appears to be worth it. The example above is presented by Cook [5] and Feldman [10]. Unlike the first example, the example above provided a large number of places to make a choice. The string to expand in productions must be chosen on the basis of other criteria if several strings are of the same length (the process normally processes strings in order of decreasing length). When residues are merged they may be merged in any order, and similar productions may be merged in any order. It is thus possible to apply other measures of cost or complexity at each choice. This ability and the number of choices to be made mean that the process cannot provide a guarantee of minimality, but a near minimal solution is possible using different measures of minimality than number of non-terminals, Solomonoff [23], Chomsky [k], and Crespi-Reghizzi [7] have developed inference methods depending on an informant. The informant is used to indicate whether a string is in the language. The process might 16 try to generalize on the sample and in so doing create a string about which it has no information. The informant is used to answer the membership question about the new string. These processes have been applied with success in special cases [7] hut it is generally as big a problem to act as an informant as it is to infer the grammar. Thus in order to solve the problem, it must first be solved. The final method to be considered is the method due to Cook [6]. Cook has developed a cost and discrepancy measure for type 2 languages defined by type 2 stochastic grammars. A stochastic grammar is a grammar in which a probability is associated with any choice allowed in the production. The productions using / are those with choices. A stochastic grammar defines a stochastic language in which a probability can be associated with each string in the language. The probability associated with each string in the language is the sum of the probabilities associated with each possible derivation of the string. A derivation has the probability equal to the product of the probabilities of all productions used in the derivation. The method used by Cook is that which was described in the first example in this section. Cook makes an initial grammar consisting of the trivial grammar and the given (or assigned) probabilities. Specific kinds of simplifications are considered and the cost and discrepancy of each is computed. The alternative with the lowest cost is chosen and the discrepancy measure is used when the costs are nearly equal. Cost being the same, the lesser discrepancy alternative will be chosen. Just as in the last example, the process is a search and thus does not always find the minimum solution. There is no measure of the distance to the minimum. 17 Cook applied the algorithm above to the same example as the one used twice above. The intermediate steps are too lengthy to state here, but the algorithm converged to the following solution after 13 steps : X - WY (l) Y - b/aY (0.5,0.5) W -» c/bb (0.5,0.5) Recursion was not explicitly considered, but it was found to have the smallest cost and thus was chosen. Probabilities associated with the productions are shown in parentheses. When the probability information is available or can be generated, the algorithm is a more general solution to the problem than has been discussed. If the probabilities of sample strings must be arbitrarily assigned, the simplest solution that the algorithm could produce might not be found because of a poor choice of probabilities. Other authors have approached the problem from the point of view of machines [21] or specialized applications. The algorithms discussed are a representative sample of those which exist. 18 3.0 Inference of Pattern Grammars Evans [8] has reported an attempt to use the results of the inference of grammars for formal languages in the inference of descriptions of patterns in objects or images. The inference of a pattern grammar to describe a set of objects is more difficult than the problem of grammatical inference for formal languages. The inference of a pattern grammar begins with the formulation of a pattern description. The first step in the formation of a description is the quantization of the object. In some instances the word 'digitize' is applicable, but in general the object is described in terms of a set of prespecified numeric quantities which are thought to have some relevance to the set of objects in question. Evans calls these numbers terminals and quite properly calls them lowest-level object types. After the object has been quantized a set of predicates is defined. The predicates are operators which form equivalence classes over the set of terminals in the pattern grammar. After the proper relations are chosen, it is possible to describe a pattern in terms of terminals and the relations which collections of terminals from the pattern obey. The description takes the form of a set of rules or productions which specify the description. An example will be used to define the form of the productions. Consider the figure shown below: 19 As Evans points out, the natural terminals to choose for such a pattern are circle, dot, line segment, and square. The relations are defined by the operators: above, inside, and left, each of which is binary. The description of the figure above is: face -» features, head: inside (features, head) head -» circle features -* eyes, nose, mouth: above (eyes, nose) A above (eyes, mouth) A above (nose, mouth) eyes -* dot, dot: left (dot, dot) nose -* square mouth -» lineseg. This description does not precisely specify the image since no comment was made about the angle of inclination of the two dots or the line segment. It does contain the necessary information as specifiable by the terminals and the predicates chosen. A statement of the algorithm for the inference of a pattern grammar i s : 1. Quantize the pattern into a suitable set of parameters. 2. Define relations on the parameters which have structural meaning for the pattern (predicates). 3« Describe each instance of the pattern in terms of the relations which are true for that pattern. Express the description as a set of productions. k. Form a grammar for the set of instances as the union of the grammars above. 5. Simplify the grammar. The details of the various steps will be explained in terms of an example, Consider the three line drawings below. a o 20 The relevant terminals are circle, dot, triangle, and square. The first step results in the description of each of the three objects above in terms of the terminals. For step 2 the only relation between terminals that looks appropriate is the relation inside. The results of step 3 are shown below: a. S -♦ triangle, square: inside (triangle, square) b. S -» circle, square: inside (circle, square) c. S -*• dot, square: inside (dot, square) The union of the rules above creates a trivial grammar which completely specifies the set of images presented. This corresponds to the trivial grammar formed as a first step of the grammatical inference for formal languages. At this point, the simplification problem is the same as it was for the formal language case, and it should be hoped that many of the same tools could be brought to bear. The example above points out a fundamental difference between the inference of pattern grammars and the inference of formal language grammars. The system of productions above is not capable of simplifica- tion with the set of terminals chosen. Evans indicates that at this point it is necessary to weaken some of the rules used to specify the initial set of objects. The significance of this observation is equivalent to the realization in the formal language case that the initially given input alphabet should be replaced with another. No inference procedure for formal languages known to the author has this provision. The example above should be restructured so that the simplification to the form below is possible. S -* any, square: inside (any, square) The grammar above represents a generalization in order to simplify the 21 grammar. This concept will be reviewed again in the discussion of variable-valued logic as a vehicle for inference. 22 4.0 Variable -valued Logic Detailed definitions of several forms of variable-valued logic may be found in the literature [15, l6, 17]. The system to be defined below will be a simple subset of the systems in current use and it will be expanded as more sophisticated operations are found to be needed. The operations defined below are sufficient to define a minimal subset of variable -valued logic system VL [17]. A variable -valued logic system (a VL system) is a quintuple: < X, Y, S, R F , Rj. > where X — is a non-empty set of input or independent variables, whose domains, denoted by D. i = 1, 2, ~5, ... are any non-empty sets. Y -- is a set of output or dependent variables, whose domains, denoted by ^D, j = 1, 2, 3, • ••> are any non-empty sets. S — is a set of symbols called connecting symbols. Initially the only connecting symbols we will use are : = ( ) A V [ ] • R_ — is a set of formation rules which define well-formed formulas F (wff) in a VL system. A string of elements from X, D., Y, J D and S is a wff if and only if it can be derived from a finite number of applications of the formation rules. R T — is a set of interpretation rules which give an interpretation to VL formulas. They specify the mapping from all wff to elements of the sets 3 D. In the discussions which follow, the symbol x. will be used to denote a variable which may take on values selected from the input set 23 D. . Only one set of output variables will be needed and elements from it will be referenced by name where required. Allowable wff will consist of simple selectors: [x. = a sequence of elements from the input set D. ] or more than one simple selector joined by the operators A or V . The interpretation of VL formulas will be as follows: A simple selector will have the value equal to the maximal element of the output set if the statement in it is true and the value zero otherwise. Selectors joined by A will take on the value of the smallest selector so joined. Selectors joined by the symbol V will take on the value of the largest selector so joined. Parentheses may be used to specify the order of evaluation in the normal way. An example should help to indicate the nature of well-formed formulas and their evaluation. Consider three input sets: X 1 = {1,2,3} X 2 = {r,e,s,t} X^ = {cat, boy, dog} and the output set: D - {0,1}. The following are well-formed formulas. [x 1 = 1] V [x 1 = 2][x 2 - r] V [x^ = cat] [x^ = dog][x 2 = e] V [x 2 = t] [x 1 = l][x 2 = s][x = cat] where K has been omitted where unambiguous and the order of evaluation is left to right with A given a higher precedence than V . The three wff above can take on a value only after the values of each of the variables has been specified. The first formula will 2k have the value 1 if x is assigned the value 1, or x, the value cat, or x^ the value 2 and x the value r. In all other cases the first formula will be assigned the value zero. The second wff will have the value 1 if X-. has the value dog and x_ has the value e, or x has the 5 ^ ^ value t; otherwise, the formula will take on the value zero. Formula two points out an interesting aspect of the evaluation of VL formulas. Since the operation V selects the larger of the values it joins, the evaluation of formula two could terminate after the first condition was satisfied since it is not possible to find a larger value. Only if the first condition (x, = dog and x = e) were not met would the second condition need to be evaluated. The third formula will be assigned the value 1 only if x. is 1 and Xg is s and x* is cat. In all other cases the value will be zero. The formulas above can be thought to select points from an event space. The event space has three dimensions because there were three input sets. The total number of distinct events in the space is the product of the cardinalities of the input sets. For the example above, this would be 36 events. The third formula above specifies a single event since it specifies a value for each of the input variables. The second formula specifies two groups of events sharing one of two properties (either x,. - dog and x = e or x p = t). A similar statement can be made about formula 1. The generalized logical diagram (GLD) has been proposed by Michalski [18] as a geometrical representation of a mult i -dimensional space using the thickness of dividing lines to signify the dimensions. The diagram is constructed so that each event is represented by a single square in the diagram. The specification of a formula on such a diagram consists of putting a 1 in the squares of the 25 diagram represented by the formula. The GLD provides an excellent geometrical model for describing formulas and can be used to infer formulas from sets of events in simple cases. Michalski [19] has reported an algorithm for inferring formulas based on sets of events to be included in the formulas. The algorithm has been used in a computer program, AQVAL, and has met with success in its applications. The details of the algorithm (based on the algorithm A q ) are beyond the scope of this work, but some general comments about the algorithm are appropriate. The program AQVAL produces a near minimal formula for the representation of a set of events E 1 while not including any elements from the set of events E . The problem can be stated graphically on a GLD by placing a 1 in the squares which correspond to the elements of E and a in the squares of the diagram which correspond to the events in the set E . Those squares not marked are so called "don't care" conditions and may be included in the formulas if their inclusion will simplify the result. The inclusion of such don't care events in the formula corresponds to the introduction of discrepancy in grammatical inference . In addition to near minimal formulas for the event set E , the program produces the maximum possible distance between the present formula and the minimal formula. The application of the algorithm is deterministic and does not require any backup. Repeated application of the algorithm to the problem can yield successively better formulas or better estimates of the maximum distance to the minimal solution. A minimal formula is the cheapest to evaluate using a cost function defined by the program user. 26 The estimate of the maximum distance between the present formula and the minimal formula produced by AQVAL appears to answer the backup question posed earlier for grammatical inference. After the application of AQVAL it is possible to decide whether the next iteration can produce a simplification which is worth the time it might take by examining the distance to the minimum. No such situation is known to the author in the case of currently applied methods of grammatical inference. 27 5.0 VL Systems and Grammatical Inference Before attempting to apply the process of inference for VL formulas to the problem of grammatical inference, it is necessary to evaluate the applicability of the VL system as a means of expressing the problem and its solution. This comparison should yield interesting insights into the process of grammatical inference and into the structure of VL formulas. This section "will discuss the formulation of the grammatical inference problem for formal languages in terms of the formalism of variable-valued logic. (Actually a subset of VL will suffice for the early discussions in which all input sets are of the same cardinality. ) As before, the problem is to infer from a positive information sequence and a (possibly empty) negative information sequence a grammar which describes the language of which the positive information sequence is a subset. This statement will be modified slightly to state that the desired result is a representation of the language not just a grammar. The nature of a VL representation of a language will be discussed at length in the next section. For the discussions below, the simplest possible mapping from strings in the sample to VL variables will be made. All input sets will be of the same cardinality. The cardinality of the input sets will be that of the input alphabet as observed in the sample plus 1. The domains of the input variables will be the input alphabet and the previously unused symbol $. The output set D will contain the values and 1 and the meanings associated with these values will be = not in language, 1 = in the language. 28 A number of input variables equal to the number of terminals in the longest string in the sample of the language will be needed. To simplify the constructions below, the special symbol $ will be used to signify the end of a string. That is the original positive and negative information sequences will be replaced by sequences in which $ has been added to the end of each string. A string will be represented in a VL system as a set of values of the input variables in positional notation. That is, the first input variable x.. will be assigned the value of the first terminal in the string; x the second and so on until the $ has been assigned. Input variables following the $ may be assigned arbitrary values as desired. For the discussion below the input variables after the $ will also be assigned the value $. By using the procedure stated above it is possible to write a unique set of values for the input variables corresponding to any string over the alphabet used in the sample. This means that every string in the original alphabet (before the $ was added) corresponds to a unique square in the GLD. The inference problem is thus stated: For each positive instance of a string in the sample place a 1 in the appropriate square of the GLD. For each negative instance of a string in the sample, place a in the appropriate square of the GLD. All squares not marked correspond to strings about whi ch . nothing is known. The desired result is a formula which includes all of the events marked by a 1 and none marked by (no error) and contains as few of the unmarked squares as possible (minimum discrepancy). In addition, the formula should be as simple as possible. 29 The tradeoff "between complexity and discrepancy discussed by Cook [5,6] is clearly evident if the construction discussed above is used. Simplification is possible when specially selected don't care events are included. Having stated the problem in terms of a VL system, it is necessary to investigate the nature of possible solutions to the problem before attempting to infer descriptions for specific languages. The next section will discuss the representations of simple grammars in the VL system outlined above. The applicability and simplicity of VL descriptions of languages will be discussed. 30 6.0 VL Representations of Languages Before the inference of VL. formulas can be applied to the inference of descriptions of languages, it is necessary to ascertain the applicability of VL systems to the representation of formal languages. It should be obvious from the definition of a VL system given above that the VL system as defined is not sufficient for recursion. There is no provision in the formalism to handle it. A less obvious result which will be demonstrated below is the fact that VL systems as restricted above are not sufficient to represent languages as well as type y grammars. The correspondence between VL, and grammatical representations of languages will be explored for simple classes of grammars below. New operations will be proposed for the VL system above and they will be used to extend the comparison. The simplest grammar to be discussed is the type 3" grammar. In order to demonstrate that the VL system defined above is sufficient for languages described by type 3" grammars, algorithms for conversion between the two different formalisms will be outlined. In addition to the remarks of section 5 it is sufficient to specify the correspondence between productions and formulas. Productions to VL-. formulas. 1. Rewrite all productions with the same left-hand side (lhs) as one production using /. 2. Assign level number 1 to the production with the starting symbol on its lhs. 3. If no productions have been assigned then go to 6. h. Write all productions whose lhs contain non-terminals referenced in the level above and assign this group of productions the next level value not assigned. 31 5. Go to 3. 6. Replace -* with =. 7. Replace / with V . 8. Replace any single terminal, a, at level i with the form: [x i = a][x i+1 = $] 9. Replace any terminal, a, next to a nonterminal at level i with: [x i = a] where a may represent any terminal. The result is a system of VL-, formulas which represents the grammar. An example will be presented "below to demonstrate the process. Consider the type 3" grammar with start symbol S below: E -> a D -+ c E -* c S -+ aA A -* c A ^ bD A - cE which can be rewritten in the form: S -> aA level 1 A_^>_c/bD/cE__ ._level_2__ D -* c E -> c^a level 3 and the resulting VL formulas are: S = [x = a]A A = [x 2 = c][x 3 = $] V [x 2 = c]E V [x 2 = b]D D = [x 5 = c][x h = $] E = D V [x^ = a][x^ = $] 32 The grammar and the VL formulas above both represent the language : {abc, ac, ace, aca} . In order to show that the VL system defined above and type 3" grammars are equivalent it is necessary to show that for every VL, representation of a language there exists a corresponding type 3" language. Once again, the proof will be constructive. VL, formulas to productions. 1. Rewrite the VL]_ formulas leaving out all selectors containing the symbol $. 2. Rewrite the VL]_ formulas so that no two selectors are joined by the operation A . Create new formula names if needed. 3. Remove the brackets and the x. = from all selectors. 1 k. Replace = with -» and V with /. The result is a set of productions which represent the same language as the set of VL formulas. From the nature of the constructions it should be obvious that given any set of type 3" productions, if the productions are converted into a set of VL formulas and then converted back to a set of productions, then the resulting set of productions will be the same as the original set. Since the inference of the simplest possible VL-, expression for a language described by a type 3" grammar can be converted to the inference of a type 3" grammar which is just as simple, the inference of a VL, representation for a type 3" language is equivalent to the inference of a type 3" grammar directly. A great deal can be learned from trying the conversion algorithms above on a grammar which is type 3' but not type 3"- The examination of a few examples should be sufficient to convince the reader that the ordering 33 of productions used above is not possible for type 5'- If non-terminals which occur at more than one level are treated as a special case, then a set of VL_ formulas which represent the grammar can be produced. The algorithm which was used to convert VL.. formulas to a grammar can be used to convert back to a set of productions. An example will demonstrate the process. Consider the following type 3' grammar: S -» aA A -> c/bD/cE D -> c/bE E -> c/a The reader can easily verify that the grammar above represents the language : {abc, ac, ace, aca, abbe, abba], and that the same language is represented by the following set of VL, formulas . S = [jc, = a]A A = [x 2 = c][x 3 - $] V [x 2 = c]E V [x 2 = b]D D = [x^ = c][ Xj+ = $] V Cx 5 = b]E' E = [x 3 = c][x^ = $] V [^ = &][x k = $] E' = [ Xi| = c][x 5 = $] V [x^ = a][x 5 = $] This example is simple enough that it can be shown that the VL-, formulas above cannot be simplified. A most interesting situation arises when the set of formulas above is converted back to a set of productions. That set is shown below. S -*■ aA A -* c/cE/bD D -» c/bE' E -» c/a E' -* c/a 5* This grammar contains a non-terminal which was not found in the original set of productions. A moments thought will show that the new production is not necessary and can be removed to produce the grammar above. Thus in converting to a VL representation and back something has been lost. The simplest VL, representation of the language is not as simple as the simplest type 3' grammar for the same language. This means that the inference problem for type 3' grammars is not solved if the simplest VL, description of the language can be found. The simplest VL representation can be simplified after first being converted to a grammar. In looking at the formulas for E and E' there is a distinct similarity in the structure of the formulas. In a type 3' grammar this similar structure could be portrayed by one single non-terminal E, but in the VL system defined above two formulas are required. In order to correct this situation, an addition will be made to the definition of the VL system above in order to allow the VL system to represent type 3 ' grammars. The addition proposed by Michalski [20] is to use selectors of the form: r -, x. , x . . ... — a where a is an arbitrary terminal and the sequence of variables in the selector should be interpreted as follows: if x. or x. or ... is equal to the terminal a, then the selector takes the maximal value in the output set else it takes the value zero. As an example of its use, the set of formulas for the language above will be rewritten using the new selector. S = [x ± = a]A A = [x 2 = c][x 3 = $] V [x 2 = c]E V [x 2 = b]: 35 D = [x 3 = c][x k = $] V [x 3 = b]] E = [xy x^ = c][x^, x 5 = $] V [*y \ = a][x^, x^ = $] The set of formulas above represents the language and can be converted into the original grammar by the algorithm stated below. VL formulas to productions. 1. Rewrite the VL]_ formulas leaving out all selectors containing the symbol $. 2. Rewrite the formulas so that no two selectors are joined by the operation A . Create new formulas if needed. 3- Remove the brackets, the sequences of variable references and the equal sign from the selectors. k. Replace = with -» and V with /. The conversion from a type 3' grammar to a set of VL-i formulas can be accomplished by the following algorithm: Productions to VL, formulas. 1. Rewrite all productions with the same lhs as one production using /. 2. Assign level number 1 to the production with the starting symbol as its lhs. 3« If no productions have been assigned level numbers and the previous level contained no non-terminals on the rhs of any productions, then go to 6. k. Write all productions whose lhs contain non- terminals referenced in the level above and assign this group of productions the next level value not yet used. 5. Go to 3. 6. Remove all multiple copies of productions, but associate the level numbers of the productions removed with the copy left behind. 7« Replace -* with - and / with V . 36 8. Replace any single terminal, a, in a production with associated level values i, j,k... with the form: [x i' X j'V *•• = a ^ X i+l' X j+l ,X k+l'' # ' = ^' 9. Replace any terminal, a, in a production with associated level values i, j,k, ... with the form: [Xj^X-jX^,, • • • = a l • where a may represent any terminal. As an example of the application of the conversion algorithm for the conversion from a grammar to a set of VL formulas, the type 3' grammar above will he converted to a VL system. The result of the ordering of productions is: A -> c/cE/bD level 2 D -> c/bE E -> c^a level 3 E -> c^a level 4 and the resulting set of formulas is the set of formulas used as an example of the definition of the new selector. It is simple to show that the set of formulas can be converted back to the same grammar for any type 3' grammar, so the equivalence between the inference problem for type 3' grammars and the extended VL system defined above is established. This means that the simplest result in one representation can be converted to an equivalent representation which is isomorphic to the original. The relationship between an extended VL]_ system, called VL [20], and grammatical inference will be discussed in paper [2^]. In general, it may be necessary to add selectors of the form [x„ - $] for 2 > i to eliminate the introduction of discrepancy. 37 7.0 Simplified VL Representations The previous section presented an extension to the VT^ system which served to increase the classes of languages which could be represented by a set of VL formulas. This section will explore the use of some operators in VL which do not change the classes of languages which can be represented, but which allow a more concise representation of those already represented. The comma was introduced as a way to indicate that for the purposes of the given selector a group of variables could be treated together, and the selector could be satisfied by any one of the variables. The existing VL system allows the comma to be employed in a similar manner when specifying the values of the input variables for comparison. The meaning is the same as for the previous case. A selector with input values separated by commas is satisfied if any indicated input variable has the value of one of the values in the list. Using this feature, the first example of section 6.0 can be written: S = [x_ = a]A A = [x 2 = c][x 3 = $, E] V [x 2 = b]D D = [x 3 = c][x k = $] E = [x^ = a, c][x^ = $] where the use of the comma is defined above. This equivalent representa- tion contains only 8 selectors where the previous one contained 11. The selectors more compactly represent the same language. As a natural extension of this construct the use of : as defined by Michalski provides 38 an additional savings of expression. In addition, when the input set is ordered on the relation < (or mapped into a representation such as the positive integers for which the relation is defined), then the use of : to indicate a sequence of possible values represents a computational savings. It is no longer necessary to make the individual comparisons for each value in the interval defined by the : , but the question of whether the specified input variable has a value in the range can be answered by checking the end points only. The use of the comma for the specification of a sequence of values is the same as specifying an equivalence class of values which, for the process of evaluating this selector, may be treated as a unit. The use of the symbol : makes the manipulation of the specified sequence especially simple. Michalski defines a number of other operators which can be used to simplify the expression and computation of the value of a VL.. expression. The comparison / is especially useful when the number of elements of the input domain to be excluded is less than the number of elements to be included. This simplicity of expression becomes most effective when combined with the use of sequences of values as defined above. For input sets which are ordered, the operations <, >, <, > can provide simplifications of selectors. Since all of the above operations can be performed (although less concisely) with the test for equality, the addition of these operators does not change the class of languages for which VL is applicable, but they do allow a more concise manner of representation. As an example of simplification, a grammar will be presented below and converted to a set of VL. formulas which will be simplified. The set of simplified VL formulas will be used to answer the membership 39 question for the language defined by the grammar. The performance of the grammar will be compared to that of the VL, formulas on a production by production basis. The sample grammar is shown below: T = {a,b,c,d} S -» aB/cB/bB/aD/bE B -> a/b/d N = ( S, B, D, E, F} D -+ aF/cF E -» bF F - c The order relations will test order in the alphabetic sense, thus the expression a