The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University. To renew call Telephone Center, 333-8400 UNIVERSITY OF ILLINOIS LIBRARY AT URBANA-CHAMPAIGN m 17 1*2 ***«* APR 3 1996 JAN 6 1997 L161— O-1096 6r "/«**■ V*' /Report No. UIUCDCS-R-77-907 MULTIPROCESSOR FOR STRING MANIPULATION UILU-ENG 77 176^ by Wing Kai Cheng October 1977 NSF-OCA-MCS73-07980-000029 Digitized by the Internet Archive in 2013 http://archive.org/details/multiprocessorfo907chen Report No. UIUCDCS-R-77-907 MULTIPROCESSOR FOR STRING MANIPULATION* by Wing Kai Cheng October 1977 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 This work was supported in part by the National Science Foundation under Grant No. US NSF-MCS73-07980 and was submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, October 1977. iii ACKNOWLEDGEMENT I would like to thank my advisor Professor David J. Kuck for Jus advice, guidance, incisive questions, encouragement, and general concern. Many others also deserve much thanks — a partial list includes Professor R. P. Futrelle, mv friends, and my naren ts. 1 V TABLE UF CONTENTS PAGE INTRODUCTION I AN OVERVI f:rf OF SNOROI 4 2. I Introduct ion 4 2.2 Syntax and Semantics b 2.3 Serial Execution 1 I 2.3.1 Pronram Execution II 2.3.2 5 tat orient F/xectuion 11 2.3.3 Sari al Pattern Matching 13 2.4 Sp ^e dup 16 2.4.1 Speeduo of the. Has ic Operations 16 2.4.2 -So aadup at Statement Leve 1 1 / 2.4.3 Speedup in the Proqram Laval 1/ 2.4.4 IF Tr fie 1 ci PATTERNS AIL) PATTERN MATCHING PROCESSOR 22 3. 1 Introduction 22 3.2 Algebra of Pattern Matching 22 3.2.1 Deviation from SNOBOL4 23 3.3 i'ime Bound for Pattern Matching Algorithm i n SN0BGL4 29 3.4 Pattern Matchinn Network 31 3.4.1 Staae I 3/ 3.4.1.1 Comparison of the Methods 40 3.4.2 Design of Stage I 41 3.4.2. I Speed 50 3.4.3 Stage II 50 3.4.4 Stage III 50 3 . 4 . 5 : , t age IV 51 3.4.6 Stage V 51 3.4./ : ; t age via 51 3 . 4 . .i St ane VI b 52 3. 5 Catn , Time, and liate* f i mo 52 MACHINE DES-JCU 66 4 . I [1 vera 11 Organ i z at ion . ^6 A . 2 Oarbane Co llect ion . . 67 4 . 3 I K- Tr ee Processor 69 4. 1 Memory 70 A . 5 Proqram Memory 72 4.6 Data Memory.. 72 4.7 Instruction Set , 7/ i-:;ipi RICAL results ul 5.1 I ntroduct ion 31 5.2 Stacic Frequency Count 32 vi 13.3 Ovnamic Statistics 37 6. SYSTEM PARAMETERS 88 6. I Introduct ion 83 6.2 Descriptor Size 3d 6 . 3 Descri ntor Forma t 88 6. 4 Character Size 89 6. 2 M,U Processor 90 6.6 IF Free Processor 9 1 6.7 Pro or am Memory 9 1 6. 8 Data Memory 92 6.9 PMM , 93 6 . 10 Ex o e c 1*. ed S p eed w p 94 7. GUNCLUSIOM 96 FIST HF REFERENCES 97 I. INTRODUCTION In the past decade, there has been significant improvement in computer speed as a result of faster and faster gates, and computer organization which exploits parallel processing. This treni is expected to continue, with major improvement to come from computer organization, as switching speed is approaching its theoretical limit, and the declining cost and the progress of LSI technology are making the exploitation of parallelism economically and physically possible. It is this computer organization (machine and software) aspect which this thesis is concerne I with. Many proposals and attempts have been made to soeed-uo the execution of numerical nroblpms on computers, and eech has achieved some form of success. In some cas^s, actual machines have bpen built or are being built: some .examples of which are ILLUC IV. CRAY I, CDC 7700, COC STAR, II ASC, etc. ("In the other hand, computer organization lor non-numerical computing is still in its infancv, although there are orobably mor^ non-numerical oroblems than numerical ones in this world. f tie fpllowina result is probably familiar to many people [GIM76J: To multiply two 3 dioits numbers, it takes the computer 6 microseconds as compared to about 60 seconds by human — the computer is therefore 10**/ Limes fester than human. To search for a ten character strinn in a 3000 characters text takes a typical computer 6 milliseconds, whereas it takes a human being 60 seconds — thus the machine is 10**4 times faster then human. Comoarinq the two situations, one may conclude that the machine is about 10**3 times faster with numerical oroblems than it is with non-numerical oroblems. The obvious reason for this inefficiency for non-numerical problems is that non-numerical oroblems and lanquacjes contain facilities and concepts so far removed from t lie conventional machines that elaborate software is required to bridge toe man. Furthermore, parallelism and functional requirements are not as well understood and exploited in machines for these non-numerical problems as they are for tlie numerical counterparts. With t lie increasinq usaqe of comnuters for non-numerical data process inn, it has become innro i moor tent to process this kind of data more efficiently and faster. The primary objective of this research is to studv the requirements and peculiarities of strinn orocessinn, and to propose a machine organization that is suitable for strinn orocessinn in terms of speed and ease of sunportinq software features of their lnnquaqes. A well known strinn orocessinn lanouane called SNOBOLI PCM 7 i ] will be used as a r eoresen ta ti ve strinn process inn lannuane for our studv. ;.'e start off with Chapter 2 as a svnopsis of the basic features of o'KJBUL, and an introduction to our aoproach of speedino up SNGBDL. Alqebraic properties of natterns and the design of a oattern matchinn network are presented in chapter J. The machine ornani zat ion and its instruction set are described in Chapter 4. Gome emoirical results of SNUDQL nronrams are given in Chapter 5. Chanter n discusses the system oarameters and nives an estimation of the sneeduo achievable. Conclusions and sunocstions for further research can be found in Chapter /. 2. AN OVERVIEW OF SMGBUL 2 . 1 In troduct ion 3NUBUL is mainly a lanouane for strino: manipulation, containing many features not readily found in other proqrarnminrj languages. i'his language oroves to be a useful tool in areas such as compilation, machine simulations, symbolic mathematics, text preparation, natural language translation, linguistics, music analysis, information retrieval, artificial intelligence research, and tne like. i'ue fart that it is implemented in several different computers (includinn I MM System /360 and 370, UNIVAC I 109, GE63 C 3, CDC 3600, CDC 6000 series, P0P10, Sigma 15/6/7, Atlas 2, and RCA Spectra Series) nrobably reflects t fie wide acceptance bv the users. i'he basic operations in SIJOBOL, in addition to t ne usual arithmetic and assignments, include concatenation. alternation , and pattern m ate iiina . i'ne basic data element of SfJUBOL is the strino, that is, a strino of characters. INUliUL also provides numerical capabilities witn both interjers and real numbers, and automatic conversion 13 capability between string and numerical values. Intener is more often used than real number because most numerical operations involve character count inn. Structured data types include those of arrays, which can be accessed through index, and tables, which can be accessed through variables. 2.2 Syntax ami Semantics A brief description of the SNQBOL syntax and semantics is provided in this section in order to ooint out some of the peculiarities and properties of SNOBDL and Highlight features we propose for sneeduo. A more detailed and complete description of SNDRDL can be found in [PDA /I ,Uf?I /2J. i'he syntax of SNUBOL will be defined bv the traditional Backus-Naur form [ BACb9, NAU63J , where syntactic constructs are denoted by English words in lower case letters. Sequences of constructs separated by t ho meta-symbols { and ) imnly t ho i r repetition zero or more times. In addition, we choose to use '::=' to separate left part of the production from the possible rioht parts. The notation [x3 ".'ill be used to indicate that the? term k is optional. PRODUCTION RULES COMMENT program ::={[ label J statement) END Ccharacter-stringJ label : :=character string statement: : =[ rule ] : i s=rule • • qoto • i • :=rule : S goto • • t=rule : S qotol F goto2 : : '=rule • F goto : i !=rule • • F gotol S qoto2 goto: : =( exor ) : : = rule [expr] exor = ; exprl = expr2 exorl expr2 ; exnr 1 exnr2 = : exor I exor2 = exor3 no qoto unconditional qoto S qoto S-F qoto F qotO F-S qoto normal qoto direct qoto f j xnression evaluation null assignment ass ignment oattern match null reolacement string reolacement expr ^'character-string' :=number :=ident i f i er :=operator expr :=exorl ooerator expr2 :=identifier ((expr)) :=identifier < 1 it era 1 number identifier unary ooeration binary operation function call array indexing Table 2.2-1 Simplified SNOBOL Syntax A SNUBUL nroqram consists of a sequence of statements which ara executed sequentially unless control is transferred elsewnere. \ statement, as shown in Table 2.2-1, contains up to two oarts: rule and qoto. The execution of a statement merely determines which statement is to be evaluated next; in other words, the statement itself carries with it no value and does not succeed or fail. .Jithin a statement, mila. is evaluated first, and its success or failure is used to influence the qoto nart to determine the subsequent statement to be evaluated. If the statement is without a qoto, the subsequent statement is evaluated; however, if the statement is with a conditional qoto, control is transferred to the statement indicated by the evaluation of the ooto, S qoto or F qoto is evaluated when the rule si qnals success or failure respectively. If th^ rule siqnals success (failure) and the statement is without S qoto (F nolo), control is simply transferred to the followinq statement in the program. Tne three main classes of statements are ( I ) ass iqnment , (2)nattcrn matchino, and ( 3)replacement , depending on the rule part. As sixjwn in Table 2.2-1, assignment has the form sub iect = o b iect Pattern matching takes the General form s_uli.L2.ci etattern wnereby during execution, subj ect is inspected for the occurrence of pattern. The replacement statement is a combination of oattern mate hi nq and as si on merit, in which the matched strino is reolaced by the string in the object field. [ i^. form is subject aait.ern = obj The result of the execution of the r ul e is either a success or failure. If the evaluation of any one of exor, expr 1 , expr2 , or expr3 should fail, then no further parts of the rule are evaluated, and the' whole rule fails. More specifically, if the rule is an expression (exnr) evaluation, null assignment, or assignment, the rule succeeds if none of the exor fails. 10 In pattern matchinci, oxnrl and then exnr? are evaluated and coerced. i : xor1 is then used as t lie ?\ih ject or text . and expr2 is used as the na ttern . Pattern matching either succeeds or fails. In the case of t he nattern match rep lac anient , axprl and exor2 are evaluated as in pattern match. If the match succeeds, exnr3 renlaces the matched substring I hroug h t ne use of concatenations. If exnr.3 is missing from the rule, exnr3 is assumed to be a null string. i"he evaluation of an expression may either succeed or fail. If evaluation succeeds, the result may either be returne d ov n a a e (when the result of the evaluation is a location) or retu rned by YclilUl (when the result is a storable value). If the evaluation fails, it is known as fail ret urn . If a location is returned by name when a value is needed in an operation, the ooeration may retrieve the value associated with the location? this nrocess of retrieval for value when a location is returned is known as c oerc ion. Execution The order of execution in a serial SNOBOL prooran i 55 presented in this section. 2.3.1, Iirj2.or.ajn Ex.s.c.iit.ijQQ As notfd earlier, statements in a SMGRUL. nronram are executed in a sequential order unless control is transferred elsewnere by a qoto. Transfer of control can be either unconditional or conditional, as in other programming lanquanes. 2,3.2 S ta bernent tl xecut ion A SNUI3DL statement, having the general form subject pattern = object ooto is evaluated in a sequential order as follows [PDA/]]: I. The subject is evaluated first. If t ne evaluation fails, t lie statement fails, noto is evaluated, and the other components of the statement are not evaluated. If no failure goto is specified, control is passed to the next statement. 2. Fhe pattern is then evaluated. If the evaluation fails, qoto is evaluated as in the case of subject failure in ( I ) . 3. fhe pattern natch is performed next. If the match fails, the .statement fails, conditional value assignment is not perf orrned, the replacement is skipped, and the noto is processed. Immediate value assignments may take place before the pattern match fails. If oattern matching succeeds, conditional value assignment is performed for those components that matcned. 4. fhe object is evaluated. If evaluation fails, the statement fails, no replacement is performed, and the goto is processed. b. The replacement is performed. 6. fhe no to is processed. The three possibilities ar e : (a) If the statement succeeds, onlv an unconditional or success goto in the statement i s processed. IJ (b) If the statement fails, only an unconditional goto or failure goto in the sta Lenient is evaluated. (c) Transfer is made to an evaluated goto, if there is one. If none of the three conditions above is met, control is transferred to the next statement. 2.3.3 S eria l Pa tter n Match ing Pattern mate hi no can be a very time consuming process, as illustrated by the following example. Consider a pattern of the form ST = CH'I'BRE') CH'I'A') '0' which has three subsequents to the right of the pqualitv sign, where the first and second has two alternates each. '!' is used to denote alternation, the soaces between CIJ'I'BRE') and CE'S'A') and that between Ch'l'A') and / U / are concatenation ooerators. 14 The process of pattern matchinn for th.p> pattern ST, which matches anv of the string in the set D.[3AU,B'?EI£D, BREAD) is shown in big. 2.3.3-1. The matching algorithm for SN0RDL4 tPO/V'/l ,3RI72] may be s ummar i zjsci as folio ws : I. Matching starts with the first alternate of the A. i'inen a component matches, the next subsequent to the r in nt is attempted; otherwise, onto (4). 3. If a match can be obtained for a series of components lead inn from the left through the right, t.ic pattern match is successful. 4. If a component does not match, an alternate witnin this subsequent is tried. If no alternate exists, the matching process backs up, rejecting the previously matched component and seeks an alternate for it. 15 ->'U' 'E' 'D' ->'H'->'E' 'V)' 'BRE' 'A' 'BRE' 'A' (a)match for B (b) attempt to match RE, but fail ->'U'- 'E' '0' '13' 'E' 'D' i i 'BRE' ->'A' ->'IMI' 'A' (c)attempt to natch BA ( d)backuo , match for BRE but fail ->'Bil£'- 'A 7 ->'BRE'->'A' (e)attempt to match BREE (f)match for BREA but fail subject strinq: BREAD pattern: CB'I'BRE') CE'!'//) 'D' Fig. 2.3.3-1 Snapshots of a pattern matchinn process in SN0B0L4 16 2.4 Spcsdiia. fhe speedup proposed will be a combination of special r Iware and algorithms. Usually, for a given programming language, one can expect that parallelism may be exploited in three different levels: (1) In basic numerical and non-numerical operations, (?) statement level, and (3) program level. All these methods will be explored in more detail in the following sections. 2.4. i &>eodua o_£ kaa Basic Qp_e rations Speedup of the basic numerical operations, such as addition of two numbers or multiplication of two numbers, has been investigated by many people, such as [UAD65,MSn61 J . In the next chapter, we shall orooose a way to speedup pattern matching. Speedup of cone itenat i on and alternation can be expected to come mainly from memory bandwidth. I / 2.4.2 Unsmdup al ^itatemonL Level To speedup at the statement level is to execute as many of the cornnon<~ , nts of a statement in narallel as possible. For a numerical statement, it is to evaluate the numerical expression through multiprocessing [MUR71 ,KRA72,[ost-cursor oosition, oointinn to the text v/here the match benins. iin can extend Lh^ fefinition a little further to define a oattern matching A, and c=0, then M(s,o,c)=4. Una can think of an invorso of a oittorn n, as a pattern which doesn't match n. As nn example, the oattrcrn in SflUBDL which matches any character oxc^nt thn character 'A, ' wrJ tton in SMQBDL as 'A' ADDTi' In qeneral, a oattern v/hich matches II but not A is written pat = a A r AJ;?'j" : n Two patterns PI and P2 are equal iff M(s f PI,c)=M(s,P2,c) for all s and c, or Ai'.;(s,Pl ,c)=A;'(s, [>:•', c). Uc-f The concatenation of P and 0, r^nrnsnntnd h^r^ as P?;'0, whore P and are strinqs, is defined as M ( s , P';:d ,c)4i(s,P,c)/)[ M ( s , , c ) - ! i > ; 1 and in Lho case for all matches AM( s , P%Q f c )-AM( s , P, c )f) .C AM (s ,0, c )- In ! ] . An alternation is an operation which ns.sinn morn than onn strina to a variable; it can be defined in the following manner. I)ol : The alternation of P and Q, represented as P!0, is definecJ as tf (s,PJ\), c)=M.(s,P,c ) U M(s,Q,c) and Ai.;{.'.;,! ) !0,c)=.U(s,P,c) U A" (s,n,c), where U denotes the set union. It is easy to show that the following nrooorties hold under the operations of concatenation end alternation 'live P and Q are patterns? thus, the nronartias are List ' without proof. Let P be the set of patterns PI. p is closed under alternation. If pfP and qtfP, t hen p IqgP and q ! p€P« P< ; . Commutative law holds under alternation. If n,g£P t hen p !q= | J p. 27 P3. Associative law holds under alternation, i.e. if p«q, r£P then ( p ! c 1 ) .' r - p ! ( | ! r ) . P4. There exists an identity element FAIL in P such that nit- A I L=o. P5. P is closed under concatenation. If o,q£P. th^n p%q€:P and q%p6'P. P6. Associative law holds under concatenation (n%q )%r=?o%(q%r) P7. Distributive law holds o % ( q 1 r ) = ( p r '/ 1 ) ! ( p%r ) a n d (a!r)%p=(q%p) ! (r?«p) Pci. There exists an inverse -n such that nl-n-RIL P9. p^rJULL =NULL/to-n PIO. Any oat torn formed by concatenations and nit ernntions. As n side remark, patterns, under the operations of alternation md concatenation, form a rinu with unit element i\i\iU6-\] as a result of PI through PIO. 3.2.1 De viation from. ^iiiliiULl Pcaders familiar with SniJPUL4 'would nrobably notice that P2, tiic commutativity of the alternation operations (p!q=q!p) is not strictly valid in 3Mni3DL4;in addition, pattern matching in SNUBUL4 returns only the first natch, ratner than all the matches. The non-commutati vity in SMG30L4 is actually imposed by the Tact that 3IHDBUL4 systems are imolemented on serial machines, "/here mat china is dnn<~> serially. In a serial implementation, the users would arranqe the ilternntos in the or tier of the likelihood of successfully match inn, to 29 increase the probability of reducing the execution time. With the u.se of parallel processors, the Lifting of thP restriction on comnu tat ivi tv would be an advantage. A special mode can be implemented to force the execution of alternation in a sequential manner as defined in 5M0BUL4 if the order is imnortant. i'lie fact that only the first match is souqht in SNUBHL4 can also bo reqarded as a victim of the serial implementation. Aqain, in our Proposed machine, the programmer can decide whether he wants only t h r> first match or all of the matches. Hardware is shown in the next section which could detect the first match. 3.3 Ii.rn.c_ liojiiid. f-or P atte rn UaicMjoa AlgorLkim in MjJBDLA Pattern matching in SNUB0L4 is done in almost an ad hoc manner. The alnorithm for pattern matching used in SNUB0L4, given in Section 2. 3.1, can easily be. imolemented using stacks. Due to the unnecessary backtracking of the alnorithm, the time required to metch two simnle strings rnav bo as much as o ( pq ) , as in the case for [Hatching th^ 3D pattern A**pl3 and the subject strinq A.**qB (where o indicates in the order of, and A**p denotes a strinn of o characters of A). ;iith a sliohtly more complex oattern, such as one with m alternates it would require as much as mnq comparisons. '.Jhon concatenation is introduced, such as o=(/u :a; 5 : ... !Am) (Bi : n;>; ... !Bn) assuming ! Ai 1 = 1 Bj • -!c, and the l^nnth of t ne subject strinq is s, t no time increases to 2kmns. As pattern matchinq is tno central feature and a very time consuming process, it i'~ of paramount importance to speed it id. Faster serial oattern matchino algorithms for one oattern and one subject strinn do exist [KNU74], requiring orocessinq time o(m+n) for a subject strinn of lenqth n and a oattern of length m. An algorithm for pattern with alterna-tes also exists[ AHUVii] , requiring processinq time proportional to o(n+sum of the lenqtn of the alternates), where n is the 31 lenath of tne subject strinn. Moth of t ho two faster aloori t lirns require some preprocessing. The time for the two faster algorithms quoted hern includes the tims needed for the pre-processing. The main weakness of these two algorithms ar n that they hot i) require pre— processing, and storai^ for tables generated during nre-orocessino is required. In the noxt sections, we will present a schema which do«s not require any pre-processing. 3.4 M_ii.Lt em liiitLcJaiiiQ. Met work In general, a pattern P in SNUBDL can he expressed as where each Ai j is a strinn, '!' denotes alternation, and the space between ')' and '(' denotes concatenation. One may view the statement as analonous to a product of r,\m. If fully expanded, the above equation is P^ Ai/ui-- dmlAz/lt, ■ A,j... I^^^..^ 32 Uiven a subject string f such tint f=t( I )t(2) ...t(s) lot us define tlia oattern matching problem I" \> as a nroblem to find post-cursor positions for some c or all c such that l^c^s. i-Je shall refer to a machine which can perform this task as a pattern matching network (P.'-IM), as shown in l-'iq. 3.4-1. A sub-mac hi n a of this network, M(T f A)« v/hicn produces a bit string or vpctor of O's and I 's m(l ) . . .m(s-k+l ) called the oost-cursoi — position-bit-vector for the inout text t(l)...t(n) and one simple pattern a(l)...a(!c) as input is called pattern matcher ( I'M ) , with the condition tint m(i)=l whenever t ( i ) . . . t ( i+k-1 )=a ( I ) . . .a ( k) for some i, l^i+k-l^n. 33 To illustrate how the machine in Fig. 3. 4-1 works, let us limit the pattern to the simplified case hplow, with only a limited number concatenations: /?= (A,\A,\...]/\ n ) fa I -a. I- IB*) f f 3.+. I ■ = A8 l \A6*\...\MBn By definition and using the nronerties of oatt^rns in Section 3.2 V... The last equation above suggests tnat the pattern matching of a complex pattern ( Eq 3.4.1) can be broken up (I) into simole pattern matching such as M(s,AI,c), and M(s,f32,c), etc., where each A I , . . . , Am, B I , . . . , Bn represents simple patterns, and (2) appropriate subtractions, and union and intersection operations. Step I can be solved by using PM where each PM takes a subject string and a simple pattern as inputs, as shown in Stage I of Fig. 3. 4-1. Since the output of the PM are bit vectors, 34 1 subtraction, inters action nnd union may bo replaced by shifting, ANOinq, and UlMno respectively as shown in Staqes III throuqh STAGE IV of l-'iq.3.4-1. An example connection for a particular pattern is nivon in Fin;. 3. 4-2. Mote that in an of fori to save qates and storaqe, we do not desiqn the PMN so that it works with the exnanded forn AIM1 ! ... ! \nf3n as inputs, where wo use tho notation \i0.j to represent the conca tona t ion of Ai and B.i. Tho output of Staae I are strinas of bits, 3nd tho rost of tho staqes work with those strinqs of bits instead of sbrinqs of character or indices, resultinq in additional savinqs in qatos and storaqe . T- aA T pm X T B, T. Shift J* CZD-i N ;[/iw) s . / r PaHtrn Broadcast Allan Vcclor ^H OR ^ 3 Vfecrer « — r> %/ Broadcast — broadcast each output vector M(T,Ai) to n places, and each M(T,0i) to m Places. Align— shift each M(T,Ai) that is to be AMDed with M(r,Bj) in the next stag 9 inj!-! Ai! positions to tlr right if ! B j .' > ! A i I ; else to the left. Vector AND — vector AMU M(T,Ai) and M(T,f3j) for all i and i. Vector OR — UR the resultinn vectors of the previous staqe to form ona vector. Fig. 3.4-1 fhe different stages of PM! 36 ?!Ai ffl Jt <±J \?M; •\JY 477" I \AW> L S J U c/R p WD AVT^ -T_P o£ "1 4a7> H~ S c softer ;. 3.4-3 Connoction for pattorn ( A I ! A: 5 . .' A3 ! A1 ) CHI i R2!n3! H4) 3 7 3.4.1 ^t si CIO I Three cii f f ercnt ways Lo implement Staoe I arc considered: 1. Maximum Para 11 el (MP) 2. Pattern Serial (PS) 3. Serial usina Knuth, Morris , and Pratt's algorithm (SKMP) Method 1 (MP) This method involves cornoarina the pattern in everv possible oosition in relation to the subject strinq simultaneously. fhis is the most parallel way concei vable. Conceptually, this met tod involves comparinq everv character of the pattern with every character of the subject strina and producinn a bit map as shown in Fia.3.4.1-1. The next steo is to AND all these bits in rjroups as outlined by the ellipses. Tho time bound for this is asymptotically aqual to locj2 s, '.-/horn s is the l«mtn of L ho subjnct strini. 39 Pi P ri P3 i iq. 3.4.1-1 Maximal parallel way of match inn Method 2 (PS) This in almost like Met no except that only on character of the pattern . is comoared with the whole subject string at a time or snction of the subject strirvj at a time. Mnr 5 may think of this as Method I compressed 40 in space and expanded (iterated) in time in the 'pattern' dimens ion. The speed of thin method is very close to that of Method I, especially for short mttern. Usim this method, one may take advantarje of the fact that in tne real world, the pattern is usually short. Method 3 (SKMP) I'his is an implementation of the method of [ KMU74, AUG/ 3 J . The time required is n r on or t ion el to s plus oreorocessing time. 3.4.1.1 C ompariso n qJL the Metho-Js Amonq the three methods, 'IP represents the fastest method with the highest in oate requirement. Hn the other extreme, SKMP is the slowest and requiring the least number of gates. Method 2, PfS, lies between MP and JKMP in terms of nates and time. I'o further compare the merits of these throe methods, we calculated the values of qate*time for all these three methods. The results show that qate*time remains approximately constant for all these methods? this, wo are unable to select one of the above methods and incorporate it into our design by basing our decision solely on the qate*time criteria. Eventually, PS, Met hod 2, was selected based on the following reasons: (I) the performance is very close to that of Motb.vl ] (.MP), especially for short patterns, ( 2) preprocessing is not necessary, and (3) Method I is very rigid: It can only process strings of a fixed length? strings of length greater than it is designed for requires additional messy work. Trie the other hand, Method 2 can process strings of any length with equal ease, Method 2 can be regarded as the golden mean between the extremes. The design is given in the following sections. 3.4.2 Holism, ox s_t.g.aa I The function of the PM, l-'ig. 3. 4-2, is such that given 42 the character strinq inputs p(l)...p(k) and t(l)...t(n), t lie PH can nroduce b(l)...b(n). Tto bit vector b(l)...b(n) is caller) tto nost cursor position bit vector, whereby b(i) = l if p( I ) . . ,o(k )=t ( 1 ) . . . t ( i+k-l ) and I < = i>s. then b(l)...b(n) can he obt 'lined in at most n/s major cycles with t lie innuts l;(.l ) . .-. t (n) end n ( I ) . . . p ( k ) . Sketch of Proof. t ( I ) . . , t ( s ) . , .t ( s+k ) ar° taken in the first major cycle to nroduce b(l)...b(s). t ( s ) . . . t( ?s+l: ) are taken to nroduce b(s + 1 ) . . . b(2s ) at t ne saconH major cycle, and so on. It is easy to convince oneself that it most n/s major cycles are needed. Clearly, to produce b( r ) . . .b( r+s-l ) , for some r such thet llerv~jth of pattern; rea lout D's. The bit vector former] by tho content of L) ( 1 )...')( s) is a post -cursor- position-bit-vector for t (r ) • .. t (r+s-l ) ; set D's; r: =r+s ? until lennth of pattern>lenoth of remaininn subject string t(r ).. . 4 / Tiie workinq of the PM can befit be illustrated by an example. Let us use the subject strinn abcdefqh and the pattern def. Fiq. 3.4.2-2a is a snaoshot of the content of the registers before and after the first character comparisons. After the first minor cycle, the contents of X's are sliifted left one character position, the new character e of the subject string is loaded into X(s), and t,ie next character of the pattern, e, is also loaded into re-lister V (Fig ,3.4,2-2b) . The routine of como.arison and updating continues. Fin. 3.4.2-2d marks the beginninq of the second major cycle since the content of D represents t ne post-cursor-position-bit-vector for t ne substring abcdef. The bit vector in L), which is 0001, indicates that there is a match at oosition 4 of the subject strinq. In Fiq. 3. 4. 2 -2d, the contents of X registers are shifted left k- 1 positions (wnere k is the lenqth of the pattern), with successive character of t ne remaining subject string enterirvn X(s) as it is vacated. Matching begins for this new section 'of the subject strinq as it 4 cj did in l-i g. 3. 4. 2- 2 a . riie s hi f t w hi c h takes place after I : in . 3 . 4 . 2 -2d introduces a null character into X(s), as no more characters are available from the subject strino. The whole process terminates at Fici.3.4. 2-2f , when the lennth of the strini in X registers is less than k. i'he comnlete bit string obtained is 00010000. 49 content of X (X( I ) . content of P content of D ( I ). content of I) after .X(s)) : .D(s)) before Kiq. 3.4. 2 -2 a content of X content of P content of f) before content of D after content of X content of P contnnt of L) before content of after content of X con ten t of P content of u before con tent of D after Fin. 3.4.2-2b Fig. 3. 4. 2-2 c abed d d d d 1111 1 b c d e e e e e 1 1 c d e f f f f f 1 1 e f a h d d«d d I I I 1 Fin. 3.4.2-2d content of X content of P content of D before content of after content of X content of P content of [) be fore content of after Fig. 3.4.2-2e f n h A e e e e A =NDNE q h A A f f f f Fiq. 3.4.2-2f 130 3.4.2. I So god Usina ocnott'cv TTL, Reich level of AN r )-oate de-lav is roximately lOnsecs. Each minor cycle is thus about 10 nsocs. 3 . 4 . 3 j.La .io JUL DRnotfi Lin outout of oach I'M of Staqe I by 'Uf,Ai), whore i' is the text or subject strinq and Ai in the pattern. i'ho function of Staqe 1 1 is to broadcast the outout vector or each '•', ( i" , a i ) to n olaces nnd the outout of each M ( I", 1' i ) Lo in Dlaces . 3.4.4 S_Liiaa ILL II is necessary Lo shift each vector 'UT,Ai) to he ANDed with M(T,M.j) in StaqoIV !Mi!-!Ai! nlaces Lo the riqnt (or Lo the left if ! 11.1 ! -! A I ! <0) Lo ensure that vrc toi — AIJDinq in Staoe IV results in vector containing i 51 bit ' I ' wherever Ai concatenated with Hi mate lies t he subject strinn T. Denote the shifted result of this staqe by S(.M(T,AI )) . 3.4.5 Stanp IX Vector-AND 3(M(T,Ai)) with M(T,R.i) for all combinations of i and j, where l^i^m, and l$'j^n. 3.4.6 S tage )L Vector-fjr? the results of Stage IV, i'he m*n vectors of staqe IV are reduced to one vector. The output is the post— cursor — pos ition-bi t-vector of the pattern match of the subject strinn T and pattern (AI ! . .. ! Am) (Bl i ... Win). 3.4.7 afciiaa yjji This stage converts the binary vector into a vector of indices, using recurrences similar to the I's nosition count, of Chen and Kuck [CIIE7 r J], 52 3.4. U Likioa Vii2 ThI.s is the first-match r n coqnizer which consists simnly of a tree of OR nates, which produces a 1 if any of tie inputs is a 1 . If this output is I and the PMN is operating at f irst-ma Lc n-onlv node, it Lnolies the oresence o f n match in this major cvcle, _, nd Urn task of the i ),: '.' is considered completer! wit I out further process im. ~> • '-> Gat e, l i ■ i ? , a nd ^ q L n '"'T ime An estimate of the number of nates, the number of gate-delays , and the product of the two are nivnn in Ficf.3.LJ-1 through Fig. 3. 5-6 for Stage I of Method 1 and Method 2, as a function of text length and oattern length, ["he text lengtn denotes the length of the subject strinn the processor can accomodate at one tine — it is Likened to the word lenqth of an arithmetic orocessor. i'i i. 'S.')-7 throti'ih l ; .i n.3.!5- If! sh'>w estimates of total gates, lelay, and t i'ip nroduct of the two for rstaoes I tiiroucjh Stages VI of Method I and Method 2. The pattern assumed is of the form (Al ! ... !An) (HI ! ... !Bn> and ! A I !-...! Bn !=k. 5U CTN II ZL 0'9 6 ri ( e .0U) 31H0 55 • ii ii \- OTV CIL'L 4- 00 L 3WI1 2 T O CL'S oy'9 i o CP o C.C r- H — •r-i X t UJ 0) 57 in o O I — ( eOTX) 31HQ Q-£ cc r- o u t 4- K X C 1— T o C 2: £ Ouj + C r\j — 1 #^ ID ^ h— c X w LU c en CD o -i- CO ON II X ITN II X II X 58 5 Q a? r- c is N O O ~'S^ 5 L£ 3WII o '0 59 ON II \ I— c o CO cc 0) CVJ ! — x. 4- rv c 0) Cn \— * V ■P cr O VO I ro •H -4- CD cj o 3 /. ■+ S 5 hOIX 1 3WIJ.H31HD T) • 1 C'C p 6o Q-PJ 61 H CV) N«^ S v S — » 0> \s\ ^ * ir\ » ON « rH H * H CV1 CVJ H rH T O C7\ its I O O > O -P CO 0) bO o ■p -j- en z: -o _l rj — I -p in a* L « 4- lo CO oo i en bO •H CD C\j O -i- en v a i — D CM . 0) X i ° ! r. CV 1 o H I CO c ;: ;.,_ ■> • c; 2? D'Gt w; V9c il° 65 ir\ -r O tOlX) 3WIlxJltfO 66 4. MACHINE DESIGN 4 . 1 bvora 11 Ur o aniz. £ Based on the characteristics of SNGBOL oroqrarns, we have come up with the followinn list of items which a machine should have in order to speed-up strinn process inn . I. Since memory reclamation is on° of the most frequent activities, it is desirable that the rnemorv unit has a memory reclamation orocessor (narbane collector). This qarbaqe collector, synchronized with the other memory accesses to wold races [Jl'h/'jJ, would mark, relocate, update, and reclaim unused memory spaces continually, independently of, arid in parallel with other processors except for synchron izat ion. ?.. Pattern matching network which would sneed-up most of the pattern matchinn orocesses, 3. Proqram instructions and data should ho kent separate to abate access conflict, 4, Either a type conversion unit to convert between different types of data representation or an ALU which performs Arithmetic in [3CD. 'j. Multi-function arithmetic logic unit. 6. Build into the memory units some facilities for the transfer of data within tho memory, I. Control unit which is capable of sync hron i /. inn all t te parallel activities and avoid incorrect sequencing. d. II-' tree processor [DAY 72b] to evaluate doc is ion trees. Results of program analysis snows that IF trees of considerable lennth exists, that r nost of the trees ^r a sparse, and that d ?/.'■' of the IF tr^es analyzed are loss than six levels dean. A decision processor of .six level capability would thus be sufficient. Trees greater than six levels could be solved by mapping into free nodes (folding) or repeated use of the processor (iteration). 4 . 2 (jarbaae Collection SflUBGL is a "typeless" language, meaninn that it is declaration-free and that storage is not or fa 1 local'.od for OJ variables, but rather, it is allocator! on riomand. When storaqe is no lonqer needed, it is freed Butomatically bv a process called qarbanc collection. or momorv reclamation. Storaqe is no lonqer n^ede-i when, for example, a variable is assinned a new strinq value; consequently, the storaqe for the old strinn is no Lonoer in use assuming taat it is not referred bo bv 'invt Iiin i else. Allocation of memory in :>NL)o'JL is very simnla. Memory is Jivided into two regions* allocated and I'v^c. reqions. dhen storane is o^ede I, il is taken fr">m the boqinninq of the free reqion. The basis qarbaqe collection alqorithn L -J!? 1 7T? ) works as foil ows 3 1. Marl: — iden t if v storage area which is no lonqer neoda I (or inaccessi ble) . 2. Relocate — compact the accessible cells into a con c iquous reqi on . 3. Un late — pointer reference:; to relocated ca lis inva L i be ni)fJa ted. 69 Uarbaqe collection is a time consuminq orocess. Fypically, a nroqram spends about 30-. of its execution tine in garbaqe col lect ion [ (JIM /6 t STE / ! j] . TiTounh the use of parallelism, we propose to overlao qarbaqe collection with actual execution of the SHGBDL pronram. i'lie garbaqe collector in our proposed machine is essentially a processor whicn executes the three phases! (1) mark, (2)relocate, and (3) update, in n irallel with the other orocessors. The most important consideration of this processor is its synchronization with other orocessors, and a similar problem for tne case of LI5PCMCC62J nas been discussed in some detail by Steele[STE7!3] . iJote that in addition to reclaiminq unused space, continuous rjarbaoe collection has t Ian advantage of imorovinq nrooram locality in a paqed memory system. 4.3 JLL free Processor 5NU13UL proorams tend to have a Larqe number of 10 decision statements compared to numerical programs Like l-"OftTf?AN. In order to speedup tiie execution of decision statements, an IF tree processor [DAV72a] is used. The 1 1- : troe processor receives the conditional result sot as input, whereby each bit in the conditional result set represents the result of a decision statement. fhe outnut of the processor is the identification of the actual oatn traversed due to the results of all these decision statements. By executing all. t bo decision statements in parallel, passing all theso results to the Ii- tree processor and then choosing tho correct results dependi.nn on the path, one is able to sneed-uo tiie execution of decision statements which otherwise heve to bo processed sequentially. The compiler algorithm for the IF Lrn^ processor is sketched in Section 2.4.4. A . 4 .Memory iJefore discussing storage requirements end design, it is necessary to consider briefly the internal data representation of SMG13HL. /I The basic unit of data ronrosontit i on in SMHRDL is a descriptor, which i s n uniform representation of nil IMUI. values md 1. ocn t ions . Fhe loscr.i a tor can ; >^ t ho unlit of as the basic 'word' of the SK)! V "!1. system. blvery value do serin tor contains a tyoe coin to indicate the datatype and the associated \'-ilm. Values witn limited field lennth, such as numbers, ire directly represented in the descriptor. Values with unlimited lencjtn, for example strinqs, are represented indirectlv by a pointer to a memory location and perha >s an offset from t na t locati )n. Consequently, menorv must he addressable byte-wise an I Jescri ntor-wise , just is in ioi" computers. locations c in ; ^e addressed both as a byte and as a wor I, where a word is a multiple or bytes. Since the size of i descriptor is far binder than a byte, a descriptor is tins a mu.l t i pie of bytes . Memory is divided into two distinct units! or on ram memorv and data memory in order to en linn c= overlanoinn in memory access. The oronrarn memory stores the sod") aenerated bv como i 1 inn a SJ'l'df. source proqra . fiie I memorv holds constants ml natural variables nenerated 12 during the compilina and execution phases. 4.!3 Hr.cn ram Memory The proaram memory consists of a hierarchy of stornqe devices [ KUC'/O, MAT/2] — the cache memory, primary nroqram rnemorv, and external storage such as CCD,maqnetic bubble memory, disks, or drums. 4 . 6 Dat a Me m or y The data memory is made uo of a number of memory banks. As transfer within the memory (stemminn from concatenation, qarbaqe collection, etc.) is important in liNUBUL, soecial considerations are oiven to the design to eedup / move / of a sequence of data, from one location to a not her . ["he general organization of a sinqle memory bank is shown in I • in- I-.6-I. Tiie smallest addressable entity is .i byte, and the largest addressable entity is a descriptor. 73 to i - 5 ! ^ I S : ■"5 J output register j A^Jj < in. 4.6-1 A momory bank 74 For a transfer within a bank, data ar^ read out to the output register, gated into the innut register, and a write cycle is initiated. Transfer between memory banks may be performed via the fast inter-memory bun. The inter -memory bus, Fio.4.6-2, is a simple barrel shifter since transfer of a strirvi from one memory bank to anotiier within i mult i -banks memory system can be regarded as .shirts when looking at the memory ports, as si own in Fiq.4. 6-3. /'J (MtRZL SHIFTED '4 J w ouj Afem, //V TZU« T I I Firj. 4. 6-2 Intor Mr nory I jus 16 A 8 C A B C P£ before - transfer ne./n l Yltm^ Mi 'M- tiem. £ A Bet) A z , i after transfer tfem x MtiY). l-'i'l. -1 . 6-3 i'iio effect of movina a strinci // 4. / Instr i. jpt Speed is by no means the only concern which remains for string processing languages; imnlementation of such a language, related software, and compilation of n roar ems are also far more complicated on a conventional machine. In ALu'UL, a statement of the form A=B+C can easily be compiled into LUA:J B ADD C STURE A whereas in SNOBOL, no similar sequence of codes can be generated in the machine level for a string oneration .such as pattern matching. The nap between the software and hardware nas to be bridged by .sophisticated software. With the proposer! machine organization, we can have an instruction sat which includes the one mentioned below. hi The instruction set of t no machine are oivon in Table -!./-]. An address usually desinnates a processor roqister or m a m o r y location. In addition to the instructions frequently found in a conventional machine, it has some instructions for oattern constructions and pattern matchinq. /'? APITIlh L'1_L£ ADD XI »X2 SUB XI ,X2 MPY X 1 ,X2 i) I V XI ,X2 I NC X 1 DEC XI UllG XI NUM X 1 AND X 1 ,X2 nn x i , X2 XUH XI ,X2 nut xi X2! '=X2+X 1 X2i =X2-XI X2: :=X2*XI x;>! i=X2/X 1 XI 1 :=X 1+1 XI i i=XI-l XI !=-Xl convor t X 1 to a number X2:=XI X2 X2:=XI X2 X2:=XI + X? comol ement X 1 lbi oh cQiiniai Guru x uum x,c unconditional branch branch to x if the condition specified in C (EQ,Ni:,LT,LIi,GT,UfI) matches the condition specified in t ho condition codo renister U4IA liiAILiliUR .WGV X I ,X2 transfer value in XI to Y P A . liMPiniCAL RESULTS 1 j. i Introduction A sensible way to sneed-un or design a machine is to study the characteristics of the program tnat cirn to l.v run on that machine. To dcslnn a SHd^nL machine, it is therefore desirable Lo study sons of the statistics of typical SUOBUL programs, Gf course, there is no such thins as tynical program or typical orogr anmer ; therefore, we cannot trust any measurement to be verv accurate. Nevertheless , it would give us an indication of what these typical machine parameters arc. static a n d Jynamic statistics are si van in this section. Dynamic statistics are generally more useful for computer Jesigners, althounh they are mor° lifficult to net. Dynamic statistics usually include relative frequency of usaoe of different oneratlons, and the amount of Lime different operations take. -Jivn the relative frequency >f u sarins of di fferent oper it ions i ; known, one would know which operations the sneeduo attemot should be emphasized on. When the speed of different operations are known, one will have a better idea on how to balance the computer system. b . 2 Static Freque ncy C ount i ne results of the analyses of a sample of SilJCUL programs are presented in this section. These proorams were unprotected files stored on disks at the Computer Services Uffice of the University of Illinois at Urbana-Champaiqn. This sample of 4'j proorams, written bv a wide variety of neoole, includes a ~iood cross section of a noli cation pronrams; some ex amies' of which are those used for text or file processing, job control lnnquaqe processinq, artificial intelligence research, data entry, languaqe preprocessor, etc. i'he total number of statement.'] in the programs, and the proaram size distributions are qiven in Fable L5.2-I and Table :5.2-2 respectively. From these distributions, one can perh i ■ obtain an estimate of the si 7 e of oroqrarn memory n^ded. ae static frequency count of different tyoes of i i operators are summarized in Table 5.2-3. Une would intuitively exoect this result tint arithmetic operators would anpear less frequently than the strinn operators. Frequency count of different keywords are shown in Table 5.2-4. ANCHOR, which controls whether nattern matchinci is to be oerformed in anchored mode is the most frequently occurred keyword. Accord inn to our analyses, 593 out of t he ?520 statements contain decision noto's. This result implies that 23% of the statements are decision statement. I'he occurrences of different type of 'goto' is illustrated in fable 5.2-5. I'he number of levels in each IF tree [hAV/Pb] is also examined. Table 5.2-6 lists the result of the study. The averaqe number of levels is approximately five, and the median is four. fine further observation of the IF" trees is that the IF" trees are generally very sparse. fhese data pertaining to IF" trees would prove useful to the desinn of IF orocessor. a a Total number of executable statements 2520 Total number of comment statements 536 Total number of statements 3056 Total number of programs examined 45 Total number of labels 59 7 Total number of statements lonqer than BO c.nrs fti Total number of statements with complex pattern 45 Table 5.2-1 Characteristic of program examined lumber of statements (excluding noinmnnL) itp' ii mnrv I - 50 'j 1 - I : )0 101 -1 50 I 51 -200 201-250 2b 1-300 301 -350 32 6 3 2 1 Table 5.2-2 Proqrarn size yb fyoe of oo era tors concatenat i on al torn at ion pattern matching assinninent ( = ) arithmetic (+,-,*,LE, / C / / / . ) percent of all operators 10.6 " a 3 . 9 % 10.0% 60. pattern matching 16.1% I/O 11.23 others 16.0% 1'abla 5.3-1 Execution time distribution 88 6. SYSTEM PARAMETERS 6 . i ImroducU m rue following sections will describe the system parameters of the machine described in Chanter A . 6.2 Usscxi o tor LLLzji scriotor, as implemented in SMU13UL on the H3M 360, occupies n double word (6-1 bits). The CUC 6000 descriotor is ono word (60 bits). .in choose n descriptor size of 64 bits since it seems to be t tie smallest descritor .si 7.0 that is a nower of two and also adequate for its nurnose, as manifested by the implementations of SN0I30L for IBM 360 n I CUC 6 . 6 . 3 U escri otor i-ormat All f.i ita objects in SiKJfHJL ere representee descriptor. Descriotor in our machine iwve t ne forn )V V d9 l'ne F field is a flag which identify the type of the descriptor The l ; field contains 3 bits. Tne T field represents the size of the object in the V field or nointed to by the V field. Twentv-f our bits are allocated to the T field, allowing it to indicate 1 en nth of string of up to 2** 2 A bytes lonn. The 3? bits allocated to the V field allows it to represent inteoern in the ranne -2**31 to (2**31)-!. r ?eal numbers may have the approximate ranne of IO**-78 to 10**73. Pointers (address), when "s^d in this field, may nave the ranne of to (2**23)- I . 6.4 ciAjaEa.c.t er Sizs. We oronose that our machine use 3-bit characters since it is conveniently a power of two and it provides a laroe enough character set that is useful in text processing. An 8-bit character will also orovide a character sot which includes a special character which matches any character and a character which does not match any character. I 6 . j ALU Pr oce s sor . ie \LU processor is a multifunction orocessor with one unit for division, ono for multiplication, and .six units of adder since 99-'o of the arithmetic (such as Idition, subtraction, test for nrenter then, equality, ;c . ) r ■ j'.ii re the adder. >n j the proqrams analyzed, there ere only five statements which require multiplication or division; thus, >\ each is used in our machine. Six adders are chos t no sake of convenience (so that the total number of functional units add no to einht), A :nnr^ accurate reflection of the number of adders needed can be ibtaii I ■ transforminn the ororjrams and countinq the number a ffitions rend subtractions that can be carried oat, i n far illel , r a 1 loqi c .processors (AND an I '];) ir n used in tfv ttern hi no no twork . 6.6 LL 'lx_££ P_r.Q,ccs_£Qr Statistics from proqram anslysis indicates that anoroximately 25 o of the statements in a nrooram are decision statements, which aqrees verv well with the other lahqunqes, such as CUDULC5TR74] and (JP55L UAV74b] . file average number of levels for the decision tr°e is five, and tuat the tree is very sparse; thus, a decision processor that can lendle a six levpl decision tree should he su ff ic i ent . .;ith a six level decision orocessor,it can handle at least six decision statements at oncn. In view of this qrouoinn of six decision statements into one, the percentage of decision statements in a program may be reqarded as 4''-'. Consequently, ono IF tree processor for our machine is sufficient. 6 . 7 P rogram Memory As can be seen in Table b.2-2, SMUI3QL oroqrams are generally small in size. To obtain an estimate of the 9 I number of words npoded for the nroorarn mnmorv, wo assume (I) most programs are loss than 3'30 statements lona, (2) n safe estimate of 10 words (64 bits) of machine instructions are needed for each statement. Usinn the above assumption, a pronram memory of 41< words will be i -lequato. ■ . > LLiLa P rio r y i'ho number of data memory banks required is determined by the memory bandwidtn required. Fho PM.'l requires 9 new characters from the memory nor minor cycle (VOnsecs), one character for the text and eiqht for tiie patterns? consequently, the necessary bandwidth is 9 characters nor /() nsecs. Fiuming one microsecond memory is used, a simple calculation shows that sixteen banks of memory, each with a descriptor size of 6A bits (y characters) are needed to fulfill the memory bandwidth requirement. 9 \ 6. 10 PMN Based on the observation of pattern matchinn in SNQBGL, we propose a PMN that can take patterns of the form pattern = ( Al I A2 .' A3 .' A4) ( B1 ! R2 ! H3 ! B4 ) where Ai and Bj are strings. The connection can be changed to handle patterns in other torn. A P.MN of thus complexity contains eight pattern matchers, and can also he use') ^r^ eight pattern matchers by intercept ino the output of S tage I . The PMN also contains 64 AND-orocessors , each of which can AMI) two 64 bit vectors, producing a 64 bit vector as a result. In addition, the PMN also has 32 GR-orocessors . The parameter s (section 3.4., ; ) of the pattern matcher is chosen to bo 128. 94 6.11 L-'xnected S peedup Uith the autonomous qarbaqe collector, qarbaqe collection time is expected to aonroach zero. Concatenation, in SPITBOL, requires 0.05+0. 000'3n milliseconds on an IBM 360, where n is the number of r: ivir-ir. tors involved. In our system, only one microsecond is needed to nov? I2i3 bytes? thus, the concatenation time of our system is anproximately n/\?.3 microseconds. fne speedup (time for SPlTDUL/time for our system) is about 640 for larqe n. Pattern matchinq time is t lie sum of toe time for subject evaluation, pattern buildinq, scanninq, ind object evaluation nrvJ replacement. i'he replacement operation is aooroximately equivalent in time to two concatenations, and we have already seen the speedup achievable. In SPITBUL, nattern matchinq for p ^ nupu 'A', ioo) *> ( 'A' 'B') 'C 9 1 j requires 12 milliseconds plus overhead. Usinn our machine, the time is approximately 30 nanoseconds (?0 rjate delays * IU nanoseconds per qate). The speedup in this case is I .2*1 0**6. 96 /. CDNCLUSIOiI i'Jfi have pointed out some arcnitocturnl raqui .rements for a lanrjuaqe like 5NGBGL, proposed an organization for thfi machine, and sunqestad <->omn transformations and compiler aloorithrns tnat can be use I on SflGiiriL. '/Je hnva also estimated the amount of spea inn achievable usinn the special hardware mentioned oar I i or'. i - .' araa^ are onenad nn for mora research hv this project. I : irst of all, mora dynamic statistics on 3 : IUHnL are desirable. Secondly, some simulations can bo done to determine more accurately the spaaduo achievable. Si. nul -i L ]' on and pronram analysis for R.JUflJAM has baan noinn on for- quit a 'i few years tlUJC/.'J. 97 LIST HP REFERENCES [AIKJ/'jJ Aho,A.V. and Corasick, M.J. "Efficient Strinrj Matching! An Aid to Bibliographic Search," Comm.ACM, vol. 13,6. [DAC59] Backus, J. J. The Syntax and Semantics of t no proposed International Algebraic Lannunnr 3 of the Zurich ACM-GAMM Cnnforoncn," Proc. International Conf. on Information Processing. UNESCH (1959). 125-1 32. tC!ll.:/'j.l Chen f S.C. and Kuck,D.J. "Combinational Circuit Synthesis with Time and Cornnon^nt Hounds," Dept. of Cornp. Sci. Rep. No. 713-7/5, Univ of Illinois, Urbane . [CUH65] Cohen f !<. and Uegstein, J.H. "AXLE! An Axiomatic Language for Strinn Transformations," CACM il, I I . [UAL)6u>J Dadda,L. "Some Schemes for Parallel Mult ioli ers, " LLtn £E£OUen,za, 19., 5, 349-3 56. [DAV72a] Davis, E. /I. "Concurrent Processinn of Conditional Jump Trees," C UM PC 111 ±z LiLansi af_ ea.aexji, 2/9-2SI. [DAV72bJ Davis, E.l'i. "A Mul t iorocessor for Simulation Aonl i cat ions , " Ph.U Thesis, Univ. of Illinois et yd rbana-C h'-iinpaion , Pept. of Gomp. Sci. Pen. Ho. 'j ! /. [DIFo" j di Kir ino, A.C. ''Strino; Processing f.anouan^s and Generalized Markov Algorithms," Synbo] Manipulation !. iirni-inos and fee Imiques . Proc. of thp IFIP iiorkino i. 'onfr»r one; on ivmhol Manipulation f.annuaops , .b.Hobrow (rsd.), 'forth Holland Publishing Co. /'I l-'i rjener ,M.J. nn I Paterson , M S. "strinq Matcninq id "ind dther iMucts," l'P-4 I , Pro ject '.(AC. MIT, 1974. [uT"/ ; J a in I,. J. "A Iheorv of Pi sera La Patterns and , i i'm" r [mol omentati on in .rli);VJL4," Com.n. AC.Vi,J_Q,y !- I 00. [GIM76J L3imoal, J.l : . Algorithms ia LLIJliULl (John .Jilav 3 ■-ns , II. Y. ) . i ; I / . ' J ; r i ;; ■ m I l . , ; . I: . £ ha Macr o fno I. r»n,^n t at j on o.L ojAUL [Ld.. (./.il. l-'reeman and Company, San 1-rancisco) . •J liar stain, I. 'I. l'opics La Mnohra (Xarox Go I lea a 'ub J is bin i ) . tlvMUVIJ Knuth,'"). "An Pmnirical Jtudv of I-'UP'l'PAU inrvis,' 1 'Jof tware 1,2,1 i'i-133. [IC;ilJ/4J Knuth,!).E. , Morris, J. H. ,Jr. , and Pratt, V.I?. "l-ast 'attern '3, 19 7?!. [KUC70] Kuck,!.). J. and Lawrie, !).!!. "The iJse and Performance of i.iernory Hierarchies: A Survey," oor tware Engineer inn , vol. I, Academic PRESS, 19 70, Yj- I ( . [KUG72] Kuck, I). J. Muraoka,Y., and Chen, 3. C. "'in the Number of dnerations Simultaneous Lv Executable in i-OPTRAM-like Programs and Iheir Resulting Speedups," LEEE Irans Q_a CQ!aQiJLLfir5.,2i,l^, 1293-1 31 o. [MAT/2 J Mattson,P.L. and I'm ioer , I .L. "Storaoe Mierarchv Desinn," GGM PCDII 72 1) 1 nest of PiLQjsxii. 1415-MM. LMCCvisn i i!'^nofl, on I kin M 101 ilv i : Lrinqu 1 i ! ALU; 11.60. CACM £, I , 1-1 /. [| ia'/IJ Ponns, J.1-". nnd Polonsky, I . P. Lin. liiidiiDLl Programming Ln not moo ( ! 'r c; r i t i c I In 1 1 , I -J . Y . ) . ) ; l o v.-i r t . ;.!-. "An Alnohraic '.lodnl for String I , ; , " . ic. L'iiorn'.s. l.)«ot. of Donn. Soi., - . Ini v. oj i'oron Lo. . . ;, , . , >.!.., Jr. "'Uil 1. 1 irr/:.'T- i n i Conranc ti fvinn i] ■ i )j11 ?c ti on , " (",o\ m i \C i isi % 9 . iJ ;3tri nut. P.I:. "Program Linop.diin through i •; ■ ii ' irH 'rocensi no, " P ,i. !). .' n ; i ~ , In i v oi I Llinoi;; it Irbann-Chnrnprunn, Dnpt. of Comn. hci, . 6 . 1 3 . j L'owlo, . " mLroJ and Jntn in nnd en en for L'i in.n format ions, " Ph.f). I'hnsis, Univ. of [ 1 1 i n : it Urbana-Chamoai on, Uont. of Comnutor i n n C r. . . !o . / 6 - / . . i • i'/pj rnovo, y.n. ciQiaDiLLan. Llm-iranruna aitli CjlLLLL II. BLIOGRAPHIC DATA EET 1. Repott No. UIUCDCS-R-77-907 "Title and Subt itlc Multiprocessor for String Manipulation 3. Recipient's Accession No. 5. Report Date October, 1977 6. Author(s) Wing Kai Cheng 8* Performing Organization Rept. No -UIUCDCS-R-77-907 ' Performing Organization Name and Address University of Illinois at Urbana-Champaign Department of Computer Science Urbana, Illinois 61801 10. Project/Task/Work Unit No. 11. Contract /Grant No. US NSF MCS73-07980 Sponsoring Organization Name and Address National Science Foundation Washington, D. C. 13. Type of Report & Period Covered Master's Thesis 14. . Supplementary Notes Abstracts A machine organization is proposed for the fast processing of SN0B0L programs. Measurements of SN0B0L programs are given to support various choices. Since pattern matching is a major time consumer in SN0B0L, special hardware is proposed for it. Several alternative designs were studied in detail and cost-effectiveness measures are given for these. Key Words and Document Analysis. 17a. Descriptors High-level language architecture Pattern matching processor SN0B0L machine SN0B0L measurements b. Identifiers/Open-Ended Terms c. COSATI Field/Group .Availability Statement Release Unlimited RM NTIS-38 ( 10-70) 19. Security Class (This Report) m UNCLASSIFIED curitv Class (Thi 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 106 22. Price USCOMM-DC 40329-P7 1