■HUHni &' MB! Uunnfffmn I H HHi H ttaMa Hoi HI UK ■HMHU HI WWBwww HM|M|u|l BRBBOHM HUMaH m mmKBm WMMKM BunBs8ti8U«Y H LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 510. S4 I-fltor ./no. 629-534 cop- 2. CENTRAL CIRCULATION AND BOOKSTACKS The person borrowing this material is re- sponsible for its renewal or return before the Latest Date stamped below. You may be charged a minimum fee of $75.00 for each non-returned or lost item. Theft, mutilation, or defacement of library materials can be causes for student disciplinary action. All materials owned by the University of Illinois Library are the property of the Stale of Illinois and are protected by Article 16B of Illinois Criminal Law and Procedure. TO RENEW, CALL (217) 333-8400. University of Illinois Library at Urbana-Champaign 2 P 2001 5 2001 MAY 2 6 Z002 When renewing by phone, write new due date below previous due date. L162 XT a Report No. UIUCDCS-R-72-53^ AN ASSEMBLER FOR EFFICIENT FILE MANIPULATION by Eugene J. Polley, Jr. August 1972 DEPARTMENT OF COMPUTER SCIENCE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN URBANA, ILLINOIS THE LIBRARY OR THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN AN ASSEMBLER FOR EFFICIENT FILE MANIPULATION By EUGENE J. POLLEY, JR, August, I972 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 6l801 Report No. UIUCDCS-R- 72-53^ This work was supported in part by the National Science Foundation under Grant No. US NSF GJ 27^6 and was submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, August, 1972. Digitized by the Internet Archive in 2013 http://archive.org/details/assemblerforeffi534poll Ill ACKNOWLEDGMENTS I wish to express my deep appreciation to all the dedicated men and women who have guided me in my educational pursuits for the past 19 years. In particular, I wish to thank my advisor, Professor David J. Kuck, for his guid- ance throughout this project. I also wish to thank Mrs. Vivian Alsip for her patience in typing this manuscript. IV TABLE OF CONTENTS Page 1. INTRODUCTION I 2. THE S-LANGUAGE k 2.1 Data Format k 2.2 Coding Format and Continuation Cards k 2.3 Options 6 2.k Reserved Storage 8 2.5 Operand Format 8 2.6 Word Instructions 9 2.7 String Instructions 19 2.7.1 Jump Controls 20 2.7.2 Instruction List 21 2.8 Pseudo Operations 26 2.9 Comments 30 2.10 Input /Output 30 2.11 Errors 36 3. IMPLEMENTATION OF ASSEMBLER 38 3-1 Assembler Background 38 3-2 Assembler Overview 39 3.3 Column Nine kl ~5 .h Symbol Table Management and Hashing Scheme k3 3-5 Subroutine Descriptions 4 5 3.6 DUMP Instruction 56 3-7 Making Changes in the Program 56 APPENDIX -- FLOW CHARTS OF PROGRAM ROUTINES 59 LIST OF REFERENCES 7k V LIST OF FIGURES Page 1. ASCII Character Set 5 2. Word Instruction Format 10 3- OP Field Format 10 k. Bit Patterns for Word Instructions 11 5 . Search Forward and Reverse Model 25 6. i/o Instruction Object Code Format, Word 1 33 7« i/o Instruction Object Code Format, Word 2 3^ 8. i/o Instruction Object Code Format, Word 3 55 9. Disk Address Word 35 10. Conditions of Column Nine k-2 11. Form of Symbol Table Entry hk 12. PASS1 60 13. PASS2 6l Ik. EVALUATE 6k 15. SCAN 66 16. INOUT 68 17. STRINGS 69 18. STORE 71 19. LOAD 71 20. PUNCH 72 21. DPUNCH 73 22. SIMPUNCH 73 1 . INTRODUCTION All of us have looked through a stack of periodicals at one time or another. Maybe we were writing a term paper or perhaps just looking through the old issues of Time magazine to find a special article that we just knew was there, but didn't really know which issue. If we were lucky, what we were looking for turned out to be a cover story and not too much time was involved. More often than not, though, we paged through the whole pile and all we ended up with to show for our efforts was a handful of paper cuts. The trouble was, that even when we were done, we couldn't positively say that what we were looking for wasn't there because we hadn't actually read all the articles. But you say, it's our own fault, what we should have done first is gone to the Readers' Guide to Periodic Literature. Well, that is fine advice if the topic were George Wallace being shot in Maryland, undoubtedly it will be indexed in the Guide. But, what if we have a more abstract topic such as the effect of the Vietnam War on the Russian economy? There is no such category in the Guide, so either you go through all the articles on Russia, or worse yet, you look at all the articles on Vietnam trying to find mention of the Russian economy. Either way your chances of finding an article on the topic are very slim. Imagine, however, being assigned that topic for a report but with one small difference. Instead of having the Readers' Guide to Periodic Literature, at your disposal is an information retrieval system. The data base for the system consists of all the issues of Time magazine for the past two years. Using the system you are able to search the articles for words and phrases which appear in the title of the article or in the actual text itself. For instance, asking for a search on the word "Vietnam" would produce a list of where all the articles on Vietnam had appeared. It would also include articles which just mention the word. That means, an article on a sports hero who spoke out against the war would also be in the list as long as it actually contained the word "Vietnam". Due to this fact, we are careful to never ask for a search where the number of successful matches between text and search pattern, called "hits", is large. For instance, if we really wanted articles on just Vietnam, we would search titles of articles and not the actual text. This would reduce the number of unwanted hits. How would this help you find the effect of Vietnam on the Russian economy, though? The supervisor of the information retrieval system is smart. It realizes that a single word or phrase in an article is not enough to find articles of interest for most people. Therefore, it allows multiple searches in single sentences, paragraphs and entire articles. For example, in our case we want all occurrences of the patterns "Vietnam" and "Russian economics" in the same article. To be on the safe side, we might put in variances of Russia, like U.S.S.R. or something like that, due to the fact that only exact matches will score hits. Naturally, some of our hits will be extraneous to our topic. The sentence "On his trip to the Soviet Union the President will be accompanied by Vietnam expert, Henry Kissinger, and economic adviser, John Connally, who has recently studied the Russian economic plight, " will score a hit. After all the hits have been found, you could ask the system to print out a hard copy of the articles on a line printer. Even if no hits are found, the system has been helpful for you because you can be confident that there are no articles in the magazine touching on your topic. If you were successful, then you obtained your information with far less effort and probably with greater accuracy (i.e. you didn't miss any additional articles) and with no paper cuts. Although the data base currently is not nearly as large as in the example above, this is typical of the file searching problems which are being researched under Professor David Kuck at the University of Illinois. Currently, the actual data base is a series of 65 scientific papers stored on a 25 million bit two surface disk. The computer used for the file searching is a Burroughs D- Ma chine minicomputer. The machine is micro- programmable with IK of 6k bit micro-memory and k¥L of l6 bit main memory words (S-Memory) . Instructions are stored in main memory, then fetched one at a time and interpreted by a series of micro-instructions. The interpreter was written by Hirohide Yamada [1]. The actual file searching supervisor was written by William Stellhorn. The program, which relies heavily on overlay structures due to the limited amount of main memory, was written in an assembly language called the S-Language. The content of this report is a user's guide to the S-Language and a detailed explanation of the assembler. It is hoped that this paper will enable users to program easily in the S-Language, and for other programmers to make additions to the S- Language and its assembler. 2. THE S- LANGUAGE 2.1 Data Format The memory of the D- Machine is ^096 words long, each word being l6 bits in length. All instructions are a multiple of l6 bits in length. All numeric data is handled in fullword integer format. Negative numbers are represented in two's (radix) complement form. Therefore the largest positive number which may be represented on the machine is 32767 and the smallest negative number is -32768. Character data is stored in an 8 bit ASCII form, thus two charac- ters may be stored in a single word and each half word (8 bits) is called a byte. For the representation of the ASCII character set see Figure 1. All bit numbering is done right to left in a word starting with bit up to bit 15- The high order byte of a word contains the sign bit as its leftmost bit and therefore consists of bits 8-I5. The low order byte contains bits 0-7. 2.2 Coding Format and Continuation Cards Each instruction must start on a new card and all information must be coded in columns 1-71 inclusive. Any number of continuation cards may be used for a single statement. To indicate that a statement on a card is continued, any mark is placed in column 72. The following card is then in- terpreted as being a continuation card. Coding is free format for all fields except mnemonics and labels. The mnemonic of an instruction must begin in column 10 of a card and must be followed by at least one blank before any operands. When a card is continued the first character on the continuation card should be in column 19- It is Left half "byte in hexadecimal H CO V CD -d CO CD .£3 •H 0) -P >» rQ < L \ - = M ] • > N t / ? <— (SP - Blank Character) Figure 1. ASCII Character Set an error to begin a continuation card before column 19. Any column after column 19 is, of course, also accepted. Care should be exercised in coding continuation cards for character literals since starting the literal after column 19 on the continuation card will introduce unwanted blanks into the literal. Labels must start in column one and may be up to eight alphanumeric characters long, the first of which must be alphabetic. Combined with the restriction on the starting point for mnemonics, this means column nine must be blank. Variable names must also begin with an alphabetic character and may be up to eight alphanumeric characters long. No imbedded blanks are allowed within labels, mnemonics or variable names. An instruction may be ended by leaving the card blank after the last operand or by punching a semicolon (;) after the last operand. If the semicolon is used it must be preceded by at least one blank and then the rest of the card may be used for a comment, as it is ignored by the assembler. 2.3 Options The assembler is designed to produce object code for either the D-Machine or the Burroughs 5500 simulator. In order to specify the type of output desired from the assembler an OPTION statement is used. The mnemonic 'OPTION' is used as in any statement and is followed by any of the following operands with the effect listed beside it. NOPUNCH no object deck is punched only a listing is produced (default if no option card found) D-MACHINE object code for D-Ma chine is produced SIMULATOR object code for the B-5500 is produced An option statement may occur anywhere in the program. In case more than one is found the last one will be the one used. If no option card is found the default is NOPUNCH. The statement is not executable and should not contain a label. A label on such a statement will be unknown to the assembler and will be flagged as undefined if referenced. Although core addresses are punched on each object card produced by the assembler the simulator does not use the address. Therefore STORAGE in- structions produce an appropriate number of zero words when option SIMULATOR is used. The form of the object decks is as follows: SIMULATOR Each word of core is preceded by a blank and a percent sign (^) and is six octal digits long. There may be up to eight words on a card with the °] falling in columns 2, 10, l8, 26, Jk, k-2, 50 and 58. Columns 65-69 are blank, column 70 has a slash (/) to denote the end of the code, columns 71 an d 72 are blank, columns 73 and 7^ contain the characters ' SM' and columns 75-80 contain the memory address of the first word on the card right justified in base 10. d-machim: Columns 1-3 have the memory address right justified in hexadecimal. Column k is blank. Column 5 contains the character 'S'. Column 6 is blank. Column 7 has the number of words of data minus one, in hexadecimal. Column 8 is blank. Columns 9-72 contain the data in the EBCDIC equivalent of their hexa- decimal form. 2.k Reserved Storage The first 89 memory locations in the S-Memory (0-88) are reserved for special purposes. Thus code generation always "begins at word 89. The locations from 16 to 79 have been named as follows: Location Name 16-31 IAR0-IAK15, Interrupt Address Registers 32-^7 PTR0-PTR15, Pointer Registers 48-63 CHR0-CHR15, Character Registers 63-79 CTR0-CTR15, Counter Registers Throughout this paper these locations will be referenced by their names instead of their locations. 2.5 Operand Format The following conventions are used for operands and addresses occurring in word and string instructions. Assume A is a label on core location 100. Operand Meaning A Contents of core location 100 =A Value of A, i.e. 100 *A Contents of core location whose address is in core location 100 (single indirect ad- dressing) **A Double indirect addressing < A > High order byte address of A, i.e. 200 «A » Low order byte address of A, i.e. 201 150 Contents of core location 150 Operand *150 **150 =25 < 150 > « 150 » = :1A: 'DON"T FORGET 2 QUOTES' i Meaning Single indirect address Double indirect address The number 25, „ High order byte address, i.e. 300 Low order byte address, i.e. 301 Hexadecimal number, 26 (if number speci- fied does not fill field zeros are supplied on left) The character string DON'T FORGET 2 QUOTES, single quotation marks are put into character strings by using two single quotation marks The value of the location counter Anywhere the label A appears in the operand formats above it may be replaced by A plus constant, A minus constant, $, $ plus constant or $ minus constant. Where 'constant' is a base 10 number. 2.6 Word Instructions [label] The basic form of the Word Instructions is: OP code Operand 1, Operand 2, Operand 3 They primarily use the three address format where an operator is applied to operands one and two and the result is placed in operand three. For the bit patterns of the Word Instructions see Figures 2-h. In the list below when two or more instructions are grouped together the last mnemonic usually ends with a V indicating an overflow instruction. For these instructions in addition to the operation described, if overflow 10 occurs the address of the current instruction minus one is stored in IARO and program control is passed to the content of IAR1 plus one. All other instruc- tions ignore overflow. 16 bits 16 bits 16 bit: 16 bits OP field Operand 1 Operand 2 Operand 3 Figure 2. Word Instruction Format 15 11+ 13 12 11 5 h 3 2 1 1 1 J I N p 3 V F T OP code IL2 1 IL1 1 Figure 3 • OP Field Format For those instructions which do not indicate otherwise, if the third operand is omitted then it is assumed that the first operand is the target of the operation and it has the same effect as having duplicated the first operand in the third position. The third operand, unlike the first two, is not doubly indirect addressable and may not be a literal. Therefore, if the third operand is omitted, the first operand may not be one of these types either. Operands must be separated by commas but may have any nuariber of blanks between them. Notice that this is not permitted in String Instructions (see section 2.7). The following is a list of the Word Instructions, the number of 11 w H kO PL, H O w X X X X o o o o PS r*S o o o o s &H X X w H kQ Ph H O w k^ k^< k^ k^ k_^t k^ k-> k^J k^> k^l k^ k^ k^> /— . l^J k_^ (^S rS r^S r*S rS k"S rS r-*N r^S rS rS t^S rS ^ r*S rS W P -P W kQ Ph H O w 3= CO XSe* ^ S^ S^ hx^ K^ S^ St -1 S^ S*-* K^ S^ 1 "^^ S^ 4 K^ rS ^S r^S rS rS r^i rS rN rS rS r^S K^i rS rS rS oooooooooooooooo o •H -p o £ -p c/3 O S4 o w s rS rS CO H -P Ph P -H O O ,Q o vOkOOt^coKNr^KAOJciioJCviaj oo c— S O En ooooooooo o o o o o O > Ph O H O O O O O OOOOi-HOOO M K> XXX Xooooxxooooxx h> Ph ooooooooooooo rH H o H is; o s P P PH P P pq P > > H H pe; Ph H Ph |g % o Q O Ixl B X g O O Q O P w H H Ph > o o W CJ3 12 KA «-v W B ^ g g § g H k H s ^ X X X o X o O O o o X X o S3 X X X X W v£> ft r-\ o ^ OJ ^ w p ■** 1 fi X X X X X X X X X X X X X X X X X W VO ft rH o •— - rH *-s W P -P l ^ X X X X X X X X X X X X X X X X X W MD ft h O ^^ [2 CO o o o o o o o o o o o o o o o o o X X X X X X X X X X X X X X X X X H Kl H X X X X X X X X X X X X X X X X X X X X o X o o o o X X X X X X X X OJ l-H H X X X o X o o o o X X X X X X X X W W -P ft P-H t— c— t- ^D oo Lf\ VD a\ ir\ o o H -H- H ir\ -=t- K^ OOP H H rH H rH H o t- S O Eh H o o O H H H H O O H H o o O o o O > h H o H O O O o o O O o O o o O o o H (^ X X X o X O o o O o X X o X X X X h> ft H H H H H H H r-H H o o o o o o o o o H £ EH £ O & IS] INI EH l-H ft 5 o i-q i-q S S S Pn co IS] EH EH | 1 i £ ^ ij % Pn EH ft H H s B B B B b n> B b s & S S ~SL tf CO CO C o •H -P O -P to O ts Fh o =h w 0) -p -p CO ft p •H pq CD a •H -P id O O 0) •H 13 K> co p •H s ,£> X fxl W V£> Ph H o OJ co g +3 •H a P txj X w M3 Ph H O H P •H S ,£> ><: X w VO CM H O '—*' t2 CO o o X X H i-q H X X ><; X OJ i-q H X X co W P Ph Q •H t- t— O O ^ o t— fe o H o o o > fin o H H N"N X X h> Ph o O O H S O S pq £3 > U CD >> O <4H S sd bO CO > U dj CO o s 2 fr o CO X O 11+ words of core each instruction uses and a description of the action of the instruction. Mnemonic Function ADD Add ADDV length k words The sum of operand 1 and operand 2 is stored in the location specified by operand 3 • AND Logical AND length k words The logical product (conjunction) of operand 1 and operand 2 is stored in the location specified by operand 3- CNVTB Convert to binary length k words The character string at the address specified by operand 1, of the length specified by operand 2 is converted to binary from ASCII and stored in the location specified by operand 5- Byte addressing is used for operand 1, if operand 1 specifies a word address it is converted to the byte address of its high order byte. Maximum length is six bytes. CNVTD Convert to decimal length k words The fullword specified by operand 1 is converted to its ASCII equivalent and the rightmost number of bytes as specified by operand 2 are 15 stored starting at the address specified by operand 3« Operand 3 should be a byte address and if not will be converted to the byte address of the high order byte of that word. Maximum length is six bytes. DEC Decrement DECR DECRV length 3 words Only one or two operands are specified. The first operand minus one is stored in the location specified by operand 2, or if operand 2 is omitted, back into its own location. Notice that DEC and DECR are the identical instruction. EQUTV Equivalence length h words The not of the exclusive OR of operand 1 and operand 2 is stored in the address specified by operand 3« This is not an overflow instruction. EX0R Exclusive OR length k words The exclusive OR of operand 1 and operand 2 is stored in the location specified by operand 3* INC Increment INCR INCV length 3 words Only one or two operands are specified. The first operand plus one is stored in the location specified by operand 2, or if operand 2 is omitted, 16 back into its original location. Notice that INC and INCR are the same in- struction. JUMP Branch unconditionally length 2 words Transfer to the location specified by the single operand. JUMPEQ Branch on equal length k words Compare operand 1 and operand 2, if they are equal branch to location specified by operand 3- All three operands must be present. JUMPGE Branch greater than or equal JUMPGEV length k- words Compare operand 1 and operand 2, if operand 1 is greater than or equal to operand 2 then transfer to the location specified by operand 3* The compare uses a subtract so overflow may occur. JUMPGEV" can be used to also check for overflow. All three operands must be present. JUMPLT Branch less than JUMPLTV length K words Compare operand 1 and operand 2, if operand 1 is less than operand 2 transfer control to location specified by operand 3- All three operands must be present. JUMPNEG Branch negative length 3 words 17 If operand 1 is negative transfer to location specified by operand 2 Both operands must be present. JUMPNEQ, Branch not equal length k words Compare operand 1 and operand 2, if they are not equal transfer control to the location specified by operand 3- All three operands must be present. JUMPNZ Branch not zero length 3 words If operand 1 is not zero transfer control to the location specified by operand 2. Both operands must be present. JUMPPZ Branch positive or zero length 3 words If operand 1 is positive or zero transfer to the location specified by operand 2. Both operands must be present. JUMPST Branch and store location counter length 3 words The address of the next instruction minus one Is stored in the location specified by operand 1. Control is passed to the location specified by operand 2. JUMPZ Branch zero length 3 words If operand 1 is zero, control is passed to the location specified by operand 2. Two operands are required. 18 MOVE Move length 3 words One word (l6 bits) of data as specified by operand 1 is transferred to the location specified by operand 2. NAND Logical NAND length k words Operand 1 and operand 2 are NANDed together and the result stored in the location specified by operand 3« NOR Logical NOR length k words Operand 1 and operand 2 are NORed together and the result stored in the location specified by operand 3 • NOT Logical NOT length 3 words Operand 1 has all its bits inverted and is then stored in the location specified by operand 2, or if operand 2 is not specified, into its original location. OR Logical OR length k words The logical sum (disjunction) of operand 1 and operand 2 is formed and stored in the location specified by operand 3 • ROTR Shift right circular length h words The fullword specified by operand 1 is shifted right the number of 19 "bits specified by operand 2 and the result is stored in the location specified "by operand 3* Bits falling off the right side of the word are placed back in on the left side. Be sure that operand 2 is a numeric literal if the shift is to be a constant amount. The instruction: ROTR 100, 3 will not shift the word at location 100, three bits to the right. It will shift it an amount specified by the word at storage location 3« The correct in- struction is: ROTR 100, = 3 SHIFTL Shift left SHIFTR Shift right length k words Shift the fullword specified by operand 1, an amount specified by operand 2 and store it in the location specified by operand 3« As bits are shifted right (left) out of the word, zeros are introduced on the opposite side of the word. Be careful about specifying operand 2. See the note in the ROTR instruction. SUB Subtract SUBV length h words Operand 2 is subtracted from operand 1 and the result stored in the location specified by operand 3* 2.7 String Instructions The String Instructions are designed to do character manipulation work. They enable the programmer to search for strings of characters in 20 other strings, to move blocks of characters and to replace the individual characters of a string from a translate table. The basic form of each string instruction is as follows: [label] OP Code Operandi, Operand} Jump Control fjump Control) [Mask] where [ ] is optional f } is optional but may occur more than once. The coding for string instructions is different than for all other types of instructions because there are a variable number of fields for many of them. Consequently, a blank character is always assumed to be the end of a field. This means that there may be no blanks between operands in a field. For example, one of the string instructions, search forward, might look like: Field 1 Field 2 Field 3 Field k LABEL SEAECHF CHRl^, CHR11, PTB.8 0, *0, *0 0, +1 0,0 A blank after CHPJJ+ in field 1 would be illegal, as would blanks around the plus sign (+) in field 3« Notice that only operand fields are numbered, thus CHRlA is operand 1 in field 1. This is necessary to understand some of the error messages produced for string instructions. Continuation cards are handled identically with those for all other instructions. See section 2.2. Special care must be exercised when an in- struction is broken in the middle of a field. If the continuation card is not begun in column 19, it is probable that a blank will be introduced in the field which would cause an error condition and prevent assembly of the instruction. 2.7-1 Jump Controls String instructions have special fields to indicate actions to be taken when the operation, specified by the mnemonic of the instruction, is completed. These fields are called Jump Controls. There may be up to three 21 jump controls for some string instructions. Each control represents a set of actions to be taken depending upon which of a set of conditions occurred during the instruction. The format of a jump control is a single field containing two or three operands. The first operand specifies from where the next instruction will be fetched. It may specify an interrupt address register or be zero (0) . If an interrupt address register is specified using the form IARn, (0 < n < 15), then the next instruction is fetched from one plus the address specified in IARn. If zero is specified the next instruction following the current one will be fetched. Subsequent operands specify how pointers are treated. Each string instruction has one or more pointers associated with it. These usually point to character strings . They may be updated in the following ways : *0 Leave pointer at position when instruction began Leave pointer at position after operation was completed +1 or 1 Increment pointer by one after operation -1 Decrement pointer by one after operation Remember, operands in a field may contain no imbedded blanks . 2.7-2 Instruction List The following is a list of all the string instructions, a model of their structure and an explanation of their function. In the model for each instruction alternatives are listed below one another for some operands. Unless these operands are enclosed in brackets, [ ], one of the alternatives must be selected for each operand. For example, compare the SEARCHF instruction in section 2.7 with the model for the SEARCHF instruction in this section. In the following section PD is an abbreviation for data pointer, PK for key pointer 22 and JCn for Jump Control n. COMPARE Compare two strings length k words PK PD Key JC1 JC2 [label] C0MPARE PTRn, PTRn, CHRn IABn, *0, *0 IARn, *0 CHRn CTRn +1 +1 +1 -1 -1 -1 (PD) (PK) (PD) The string whose byte address is contained in the register specified by PK is compared to the string similarly specified by PD until an unmatched character is encountered or until the character specified by Key is found. Key is taken to be the low order byte (rightmost 8 bits) of the register it specifies. If Key is found before a mismatch JC1 is used. If a mismatch is found JC2 is used. For JC1 operand 2 updates PD, operand 3 updates PK. For JC2 operand 2 updates PD, PK may not be updated but is automatically restored to its original position. The difference between the data pointer position at the beginning of the instruction and just before the update, is stored in CTR0. FIND Find a character string in a second string length k- words PK PD Keyl JC1 JC2 PTRn, PTRn, CHRn IARn, *0, *0 IARn,*0 CHRn CTRn +1 +1 +1 -1 -1 -1 [label] FIND PTRn, PTRn, CHRn IARn,*0,*0 IARn,*0 rCHRnl °J Key2 "c (PD) (PK) (PD) The string specified by PD is searched for the string specified by PK until either the two strings match up to the character specified by Key2 23 or until Keyl is found in the PD string. For both pointers the register contains the byte address of the string. For the keys the register must contain the character in its low order byte. If the match is successful, (Key2 is found), then JC1 is used. If Keyl is found then JC2 is used. If Key2 is omitted then the default is zero, which is the character :00:. Note that PK may not be updated using JC2. The difference between the data pointer position at the beginning of the instruction and just before the update, is stored in CTRO. M0VFF M0VER length 3 words [label] MOVEF M0VER Move a character string PK PD Count PTRn, PTRn, CHRn CHRn CTRn JC IARn, *0, *0 +1 +1 -1 -1 L (PD) (PK)J The number of characters specified by the count plus one are moved from string PD to string PK. The count is determined by taking the register specified modulo 256. Thus a maximum of 256 characters may be moved at one time The registers specifying PD and PK must contain the byte address of their respective strings. Move forward (M0VEF) starts at PD and PK and works to the right on both strings. Move reverse (M0VER) starts at PD and PK and works to the left on both strings. There is only one possible action, that specified by JC. If JC is not specified then 0,0,0 is assumed. The difference between the data pointer position at the beginning of the instruction and just before the update, is stored in CTRO. 2k SEARCHF SEAECHR length k words Search character string, forward and reverse For search forward and reverse instruction model, see Figure 5. The characters specified by Keyl, Key2, and Key3 are searched for in the character string specified by PD, which must be a byte address. Key3 must be in the low order byte of the specified register. Keyl and Key2 are also in the low order bytes of their registers unless they are preceded by an asterisk (*) . This indicates indirect addressing. The key is then taken to be the character pointed at by the byte address in the specified register. If Key2 is not specified then Keyl is duplicated and used for Key2 also. Jump Control 1 corresponds to finding Key3, Jump Control 2 corresponds to Keyl, and Jump Control 3 corresponds to Key2. If a mask is specified then the high order byte of its register must be zero and data is ORed with the mask character before comparing with Keyl and Key2. No masking facility is available for Key3« For SEARCHF searching proceeds to the right of PD, for SEARCHR it proceeds to the left. Even if Key2 is not specified, Jump Control 3 must be. The difference between the data pointer position at the beginning of the instruction and just before the update, is stored in CTRO. TRANS length k words Translate PK PD Key [label] TRANS CHRn, PTRn, CHRn PTRn CTRn JC IARn, *0" +1 - (ED)- 25 o O O H H ^~- * + i « -s Ph SO w O H H 3 ^ ^ CD Q O H H Ph * + I »— * + i Q Ph CO *— id K EH Ph O Pn O ft * * L S EH H EH U Ph O Ph * * En W o o o o CO CO H 0) ■s rH 0) -d 0) CO Jh I Pd •a CO -d CO O fin O h CO (D CO •H The data pointer (PD) points to the string to be translated and the key pointer (PK) has the address of the translation table. PD and PK must both contain byte addresses, although PK may point to either byte of the first word of the translation table. Each character of the PD string is used as an offset into the translation table. See Figure 1 for the hexadecimal representation of the ASCII character set. This hex number is used as the offset. It is then replaced by the character found in the translation table. Each reference in the translation table is a full word long and should have a high order byte of zero. It is the low order byte which is used to replace the character string. Trans- lation is continued until the character which occupies the low order byte of Key is encountered in the character string. This character is not translated. If the jump control is not specified it is assumed to be 0,0. Note that the key pointer may not be updated. The difference between the data pointer position at the beginning of the instruction and just before the update, is stored in CTRO. 2.8 Pseudo Operations The pseudo operations are a set of instructions which either produce no object code or produce object code which is not executed (i.e. not instruc- tions) . EQU Equate label [label] EQU expression The form of the EQU is given above. The effect of the instruction is to evaluate the operand 'expression' and to enter this value into the location associated with Mabel' in the symbol table. Thus the instruction: 27 IDENT EQU 89 would have the same effect as placing the label 'IDEM" on the first state- ment of the program (remember code is generated from location 89 on up) . The form of 'expression' may be any of the following: 89 Base 10 number IDENT[± constant] Identifier or identifier plus or minus a base 10 number If an identifier is used in 'expression' it must have appeared as a label on a statement previous to the EQU statement . STORAGE Allocate storage locations [label] STORAGE length The form of the instruction is given above. The number of full words of storage to be allocated is specified as a base 10 integer in 'length' On the D-Machine whatever was in these locations will remain after the current object code is loaded. The simulator receives the correct number of words of zeros since its memory must be completely specified. CONSTANT Generate constants [label] C^STANT length, operands The CONSTANT instruction enables the user to produce any bit pattern desired in full word regions. The total amount of core used by the constant(s) in full words must be specified as a base 10 number, 'length'. While the total length of the instruction must be a fullword increment, the individual operands may be half words (bytes) long. Any number of operands may be specified of the type listed below and the total length of the core 28 used may be as large as the storage of the machine. Individual operands though must be within the limits given below. All operands must be separated from each other by a comma and must be of the following form: 'A QUOTE (")' A character literal, A QUOTE ('), max length 133 bytes including all quotes, may start and end on half word A full word integer, in this case largest possible positive integer A negative integer, occupies full word Address of VAR [± constant], occupies full word Byte address of high order byte of (VAR ± constant), occupies full word Byte address of low order byte of (VAR ± constant), occupies full word Hexadecimal literal, may start and end on half word, max length 13 1 nibbles (half bytes) When the previous operand ended on a half word and the next operand requires a full word, or if no other operands are present, the previous operand is expanded. For character literals this means adding blanks :20:. For hexadecimal numbers it means adding zero bytes (:00). Note that the literal in the example above requires three bytes and that a zero nibble must be added on the left to fill out the three bytes. Thus :093ABC: and :93ABC: are equivalent. 32767 or +32767 .32768 =VAR[ ± constant] < VAR[ ± constant] > « VAR[ ± constant] » : 93ABC : 29 To fill a N word field with zeros use: CONSTANT N, :0: To fill a N word field with blanks use: CONSTANT N, ' ' If too few core locations are allocated for the operands then truncation occurs on the right and an error message is produced. Correct object code of the length specified however, is generated. EJECT Page eject Causes next instruction in program to appear at the top of a new page. This helps make program listings more readable. ORG Set value of location counter [label] ORG expression The ORG instruction evaluates 'expression' and sets the location counter equal to it. If a label is placed on the instruction it receives the same value. Valid expressions are base 10 numbers and previously defined variables plus or minus an offset. END End statement This statement should contain no label and causes the assembler to stop working on the user's program. It is not necessary for the assembler to find an END statement, but it is a cleaner way to terminate an assembly than an end of file condition. The starting location for execution may be specified as an operand on the END card. If used, it must be a base 10 number or a label plus or minus an offset. If no END card is found or no starting location specified, S-Memory location 89 is assumed. 30 2.9 Comments A card with an asterisk in column 1 is ignored by the assembler and is intended for use as a comment. Column 72 is not checked for a continuation mark, on a comment card. 2 . 10 Input /Output There are four i/O devices connected to the D- Machine: a disk, teletype, line printer and a card reader. The next set of instructions enable the programmer to control these devices. The length of the data transfer must be in the following form: CTRn Length is in CTRn =CTRn Stop character is in low order byte of CTRn ='#' Stop character is #, may be a single charac- ter only =:20: Stop character is :20:, should be two digits long If a stop character is used for the length operand, it is not transferred during the i/o instruction's execution. Addresses must be in one of the following forms: Address Word Address *Address Single level indirect *-*Address Two levels indirect < Address > High order byte address « Address » Low order byte address Where 'Address' may be a label plus or minus an offset, or a base 10 number. 31 For a diagram of the object code format for the i/o instructions see Figures 6-8. The following is a list of the i/O instructions grouped by device, Disk [label] READ WRITE instruction length 3 words DISK, length, address These instructions transfer whole sectors of data to and from the disk. The length should specify the number of sectors to be transferred. A sector contains approximately 1000 characters. The address is the word address of a Control Block for the operation. Byte addressing is not allowed for this operand and will result in an error. The contents of the Control Block are as follows : Word Content S-Memory byte address of last character read (supplied by i/O routine after READ, meaningless for WRITE) . Disk address of first sector to be read or written, must be supplied by user. See Figure 9 for format. S-Memory byte address where first charac- ter of first sector is to be read from or written into, must be supplied by user. S-Memory byte address where first charac- ter of second sector is to be read from or written into, (supplied by i/O routine after READ, meaningless for WRITE. 32 Word N+2 Content S-Memory byte address where first charac- ter of Nth sector is to be read from or written into, (supplied by i/O routine after READ, meaningless for WRITE) Notice that if N sectors are to be transferred, the Control Block contains N+2 words. Teletype [label] READ WRITE TTY, length, address [ FEED ] [NOFEED] instruction length 3 words These instructions print character strings on the teletype and in turn receive them, storing them in S-Memory. The length operand for the tele- type instructions specifies the length of the string in bytes (characters) to be transferred. The address must be a byte address of the S-Memory location containing the first byte to be transferred (for a WRITE) or the receiving location of the first byte (for a READ). If a word address is given it is converted to the byte address of the high order byte of that word. Subsequent bytes are loaded and removed contiguously. A carriage control is optional as a second field for the instruction. FEED produces a carriage return, line feed for the WRITE instruction only. NOFEED suppresses this action. The default is FEED. Printer [label] WRITE LINE, length, address [ FEED ] [NOFEED] instruction length 3 words 33 Device Block 15-12 11 10 9 8 7-^ 3 2 1-0 Unused K a Unused 11 | EH H w Ph >H EH EH H Q EH H o « Bits Purpose 1 -5-12 Unused ] .1 =1, Card Reader ] .0 =1, Line Printer 9 =1, Teletype 8 =1, Disk 7-4 Unused 3 =1, Write 2 =1, Read 1-0 11 Figure 6. i/O Instruction Object Code Format, Word 1 Flag Block 3* 15 Ik 13 12-8 7-0 3-0 Bits 15 11+ 13 12-8 7-9 3-0 Purpose =0, CTRn has length of data, bits 3-0 contain n =1, There is a stop character =0, Bits 3-0 have CTR register number containing stop character =1, Stop character is bits 7-0 =1, Carriage return line feed Unused Stop character CTR number Figure "J. i/o Instruction Object Code Format, Word 2 Address Block 35 Bits 15-14 13-0 Purpose =00, Direct addressing =01, One level indirect =10, Two levels indirect Address Figure 8. i/o Instruction Object Code Format, Word 3 15-12 11-4 3 2-0 Bits 15-12 11-4 3 2-0 Purpose Unused Track number Surface Sector Figure 9* Disk Address Word 36 This instruction is used to print a character string on the line printer. The length should specify the number of characters (bytes) to be printed. The address should be the byte address of the first character in the string. If a word address is used, it is converted to the byte address of the high order byte contained in that word. A second field specifying a carriage control is optional. If FEED is used, the characters are printed beginning on a new line. If NOFEED is used, printing begins at the next location in the present line. The default is FEED. Card Reader [label] READ CARD, length, address instruction length 3 words This instruction enables the user to transfer a character string from a card in the card reader to a S-Memory location. The length specification indicates the number of columns read, starting with column one. The address is the byte address of the S-Memory location which will receive the character in column one. If a word address is specified it is converted to the byte address of the high order byte of that word. 2 . 11 Errors Any instruction which violates the syntax rules described in sections 2.6-2.10 will cause an error message to be printed out under that statement in the listing, which explains the first error detected in that instruction. The assembler will find only one error per instruction so it is helpful to check the entire instruction before reassembling any instruction so flagged. When an instruction contains an error the object code produced for that instruction will be omitted for the D-Machine and consist of zeros for the 37 simulator. For all errors except an undefined mnemonic the location counter will "be incremented by the correct amount, so patching of the object deck is possible. Undefined mnemonics cause the location counter to be incremented by four. The only exception to the above is errors in length specified on a CONSTANT instruction. These instructions produce correct object code up to the length specified and produce a warning message instead. At the end of the assembly the number of errors found in the program is listed. It is possible that a run of the assembler might cause a user abend. A list of the abends in the program follows below. Only numbers 1 and 100 should ever occur. Abend Purpose Intermediate file overflow, at present maximum length of program is 1500 cards. To increase size see STORE routine. Programming error in assembler, see BLANKOP in EVALUATE routine. Programming error in assembler, see SIMPUNCH routine . Programming error in assembler, see DPUNCH routine. 5 100 Illegal character found in SCAN routine Program contains DUMP statement which is a debugging aid. Dump occurs during pass two, 38 3- IMPLEMENTATION OF ASSEMBLER 3.1 Assembler Background The S-Language Assembler was written in 360 Assembly Language. It consists of eleven subroutines (which will be individually described later in this chapter) and is about 5000 statements in length. The program runs in a region of l80K bytes of memory, of which 117K is the intermediate file. By keeping the intermediate file in core and having been coded in assembly lan- guage, the program is able to assemble about 15,000 cards a minute. The advantages of writing the program in assembly language include easy character manipulation using the TRT (translate and test) instruction, plus efficient core utilization. The real advantage of assembly language pro- gramming can best be seen by comparing the hashing scheme used in this as- sembler, coded in assembly language, with the identical scheme coded in a high level language like PL/1. The assembly language version actually requires fewer statements and as one would expect, beats the PL/.1 version (actually PL/c) in execution speed [2] . A primary disadvantage is the fact that the development time of projects written in assembly languages typically is longer than for the same project to be coded in a higher level language [3l« Furthermore, the IBM G-level assembler has a number of features which tend to slow it down. Con- sequently, development costs were also higher than they might have been. For example, assembling PASS2, the longest routine of the program, during prime time costs over $10. The entire program costs about $30 to assemble. Natu- rally long runs were made at times when lower rates were in effect. Whether the additional time and cost involved in developing an efficient assembler was worthwhile or not, will only be shown when the D-Machine has been in full operation for some time. 39 3.2 Assembler Overview The S-Language Assembler is a two pass assembler. Pass one in general builds the intermediate file and generates the symbol table. Pass two does the balance of the work of the assembler, namely, resolving address specifications involving variables and actually generating the object code and program listing. Strictly speaking, the symbol table is not completely generated by pass one since it is never empty. The reason for this is that the reserved storage locations (see section 2.h) are usually referred to by name and consequently their definitions are a permanent part of the symbol table. Most of the pseudo operations are handled in pass two but some like EQJJ and OPTION statements are the work of pass one. In the subsequent text a distinction must be made between "pass one" and "pass two" which distinguish the two separate times that the statements are examined by the assembler, and "PASS1" and "PASS2" which are names for routines in the program. The above convention will be held throughout the remainder of this paper. To see how the various routines are used in the processing of a program we will trace the path of a single instruction through the assembler. Let us assume that the instruction is the following word instruction: LBL ADD A,B ; THIS IS AN ADD PASS1 first checks to see if there is a label. Finding one, it hashes LBL into the symbol table and inserts the current value of the location counter and the statement number, into the symbol table too. At this time a check is made to make sure that the mnemonic starts in the proper column. It then takes the mnemonic ADD and finds it in the table of legal mnemonics. Since ko ADD is the first instruction in the list, its offset from the start of the table is zero. Therefore column nine of the card image (in core) is set to zero. It looks into another table (at an offset of zero) to determine the length of the instruction, for ADD it is four (words). The location counter is incremented by this amount. The card image is then stored in the next available position in the intermediate file by calling the STORE routine. As soon as all cards are in the intermediate file, PASS1 is finished and PASS2 of the program is called. PASS2 gets the first card image of the next statement to be proces- sed by calling the LOAD routine. It then clears the area reserved for object code so that OR instructions may be used to set bits in the code. It then checks the status of column nine of the card image. Since column nine con- tains a number less than the smallest error number, (errors are FF to DC in hexadecimal) it finds the length of the instruction in the same table used by pass one. It then looks at the appropriate entry in a branch address table (zero offset for ADD) and goes to the proper section. In our case it is to the word instruction section. If we were following a string or l/O instruction the branch would be to a routine outside of PASS2, namely, STRINGS or INOUT respectively. In the word instruction section, the OP field for an ADD is moved to the first word of object code. The next step is to call the SCAN routine and rip off operand A. The EVALUATE routine is then called to process A. It first looks up A in the symbol table (using the same hashing technique as PASSl) and finds its value (i.e. location in core). It stores this in word two of the object code and sets bits one and two of word one (the OP field) to indicate that operand one is a straight address. EVALUATE then returns to PASS2 . PASS2 calls SCAN to get operand B, then calls EVALUATE to process it similarly with the way it handled A. From information passed by SCAN with operand B, PASS2 sees that operand three is missing. It checks the indirect bits of operand one (bits one and two of word one) and seeing that A was of a correct form, duplicates word two (address of A) into word four (the third operand) . The object code for the instruction is now complete. PASS2 prints out the location counter, the object code, the statement number and the card image in the program listing. It then branches to the object code punch rou- tine address which was either set in PASS1 through an OPTION statement, or remains at NOPUNCH by default. This completes PASS2's work. It increments the location counter by the length which it found earlier and goes to get the next card image from the intermediate file. In case an error is found in SCAN or EVALUATE, each routine has an error return which causes control to be passed to an error handling section of PASS2. This will cause the card image to be printed out with error message below, based on the error message number stored in column nine of the card image. For a list of these errors see Figure 10. The location counter would be incremented by an amount equal to the length found earlier and the next card image would be sought in the intermediate file . 3 «3 Column Nine When the assembler was first being written it was thought that some day the intermediate file might be written out on disk instead of being kept in core. Since the location of the file could change, it was desirable to keep the amount of data transferred to and from the file a number of full words in length. A single card image (80 columns long) is 20 words in length. Trans- ferring a single card image at a time would be fine, except that certain Contents in Hex 1+2 Purpose FF FE FD FC FB FA F9 F8 F7 f6 F5 Fk F3 F2 Fl FO EF EE ED EC EB EA E9 E8 E7 e6 E5 E*+ E3 E2 El EO DF DE DC 1+0 Anything Else Column nine Label contains illegal character Label has imbedded blank Mnemonic does not begin in column 10 Undefined mnemonic Multiple defined label Column one of label not alphabetic Continuation card starts before column 19 Illegal continuation attempted Unmatched apostrophe in operand Operand longer than 133 characters Operand missing Label in operand too long Incorrect form for an offset Label is undefined Illegal operand Literal is too long Error in numeric literal Negative address Illegal first operand type with missing operand Operand value too big for l6 bits Illegal use of apostrophe in literal Illegal literal Illegal double indirect addressing No label on an EQU Truncation in CONSTANT (warning) Card has been printed Card image is a fake, all blank Illegal indirect addressing Too many operands in a field Too few operands in a field Field is missing Too many fields Illegal i/o device No blank before semicolon Blank card, real one Position of mnemonic in table Figure 10. Conditions of Column Nine h3 information concerning each card image is useful to keep in the intermediate file. Such things as errors found in pass one, the number of the mnemonic if the card is an instruction, and whether the card has been printed yet, are all useful. Since the intermediate file was to be in core for at least some time it was also desirable not to use any more core than necessary. To allow even a single extra word for each card image would increase the size of the inter- mediate file by almost 6k. This is why a method for coding the information into the original 80 columns was developed. By requiring the user to punch the mnemonic in column 10 on all instruction cards, and further by placing a limit of eight characters on all labels, it was guaranteed that column nine of all statements except comments would be unused. This could then be used for storing information about each card. Only one piece of information is stored at any one time, but it was found that this was sufficient. Most statements use the column twice, once to store the mnemonic number, found in pass one, and once to indicate the card has been printed. The codes for column nine are listed in Figure 10. 3-h Symbol Table Management and Hashing Scheme The information about all the labels is stored in the symbol table. The table is 1021 entries long and each entry requires three words (32 bits each) of storage. This is 360 storage since the table need be present during assembly of the program and not execution. For the form of a symbol table entry see Figure 11. Since 6k register definitions are in the symbol table at all times, a S-Language program may contain at most 957 labels. This should be sufficient for any 1500 card program. kk Words 1 and 2 31-28 27-16 15-0 Bits Words 1 and 2 31-28 27-16 15-0 Usage Contain label Unused Statement number where label is defined Value of label Figure 11. Form of Symbol Table Entry The purpose of the hashing routine is to take any legal label and to associate with this label one of the entry positions in the symbol table, in our case one of the 1021 locations. This process is called hashing. It is desirable for different labels to hash into different locations as often as possible. When two labels fail to find unique spots it is called a collision. The goal of a good hashing scheme is to perform the hash function as quickly as possible but yet to minimize the number of collisions. The first step of the hashing scheme is to reduce the label to a single 32 bit word. This is done by considering the label to be two words long {k characters per word on the IBM 360) and by multiplying the two words together. If the label is less than five characters long, the label is already contained in a single word, in which case it is squared. The above process which produces a 6k bit result, gives a good distribution to the middle 32 bits [k] . These 32 bits are then divided by the size of the table (1021) . The remainder of the process is used as the entry position. It is always between and 1020, so it is used as an offset. When a collision occurs the label must be rehashed to a new location. The assembler uses what is called a linear rehash, which is a fancy name for a sequential search of the rest of the table. This is not the best rehash scheme, but is a good one if the table is not densely populated. This was the justi- fication for its use in the S-Language Assembler, since a 1500 card program should never use anywhere near 1021 labels. 3 .5 Subroutine Descriptions The following section includes a description of each of the 11 routines comprising the program. PASS! PASS1 ' s primary role is to build the symbol table. In order to do this it must keep track of the effect of each instruction on the location counter. Therefore, CONSTANT and STOPAGE instructions must have their lengths established and ORG instructions must also be evaluated at this time. Natu- rally since they introduce entries in the symbol table, EQUs are handled in PASS1 too. If an OPTION statement occurs in the program it is taken care of here also. The following errors are checked for in PASS1: undefined mnemonic, incorrect labels, mnemonic not starting in column 10, labels not starting in column 1, labels being defined more than once and incorrect form for operands of the pseudo-operations, mentioned above, which are evaluated in this routine. Mnemonics are found by doing a binary search on a table called STARTOPS. If a mnemonic is found in the table, the offset into the table is stored in column 9 of the card for later use. This offset is also used in a table called OFFSETS. This table has two entries for each legal instruction. The first entry contains an offset in a branch table and the second entry is 1+6 the length of the instruction in words. It is this last entry which is stored in location LENGTH, to serve as an increment to the location counter. In- structions which have a variable length, like CONSTANT, STORAGE and ORG, have a zero length in the table, but fill in LENGTH when their operands are evalu- ated. The branch table (BTABLE) is used after the offset and length of an instruction have been established. This table sends the instruction to one of the following sections : SAWB strings and word instructions, SACB storage and constants, END end statements, EQUB equs, OPTION options, ORGB orgs and NOERROR for page ejects. The hashing scheme described in section J.k may be found at a location named HASH. The intermediate file is generated during PAS SI by calling the STORE routine. This has the effect of loading the contents of the 80 characters starting at location CARD into the intermediate file. The symbol table is part of PASS1 and is called SYMBLTBL. It contains zeros for entries not used. When PASS1 is finished it calls PASS2. PASS2 eventually returns to PASS1 and then the run is terminated. Following is the register usage for PASS1, Register Usage R15 Branch address Blk Return address R13 Save area pointer R12 Base register RIO Statement number R9 Card number R8 Location counter R7 Internal return point ^7 PASS2 PASS2 does most of the work in the assembler. It does all the printing for the program listing and opens and closes all the output DCBs. As each card is taken from the intermediate file using the LOAD routine, it is first checked to see if it is a comment. If it is not a comment then column 9 is checked for one of two things, either the error number of an error found in PASS1 or the mnemonic ' s number. If it is the former, then it branches to the error routine at BADBAD. If the latter, then it finds an offset in a table called BTBL and uses this offset in a table of branch addresses called BADDS. There is an individual section of PASS2 for each string instruction and each pseudo-operation. There is also a section for word instructions. Word instructions use a table called BITPAT extensively to generate their object code. There is an entry in BITPAT for each instruction, but it is meaningless for all but word instructions. This enables PASS2 to use the original offset from the mnemonic table in PASS1 as the offset in BITPAT. Each word instruction has five entries in BITPAT. The first two are the OP field for the first word of the object code. See Figures 2-k-. The last three entries give the status of the three possible operands. For instance, the second and third operands of the JUMP instruction may not be present, this is indicated in the table. Similarly, the first operand for convert to binary must be a byte address and this is also indicated. Each word instruction looks at the specification for each operand and then takes the correct action for that type operand. Generally it involves calling SCAN to rip off the operand and then calling EVALUATE to set the bits for the operand. String instructions merely go to a section which calls the STRINGS 1+8 routine. Each instruction has a separate entry point in STRINGS so it makes sense to have the few extra lines of code for the calling sequence duplicated rather than using one call to STRINGS and having to find the instruction type in that routine. STRINGS generates all the object code for string instructions. After the object code is generated for a string or word instruction, control is passed to a section called INSTCOMP. This section loads the object code into a buffer which holds as much object code as can fit on an object deck card. This amount differs from l6 bytes for the simulator to 32 bytes for the D-Machine. If necessary, the buffer is output by calling DPUNCH or SIMPUNCH. The object code is then converted to EBCDIC and output on the listing along with the card image. This is done in a section called PRTCARD. After the location counter is incremented, the next card is removed from the intermediate file . Constant and storage instructions must be reevaluated in PASS2 since their lengths are not saved. Furthermore, the individual operands from the constant instruction are evaluated. The process is as follows: SCAN is called to rip off the next operand then CONEVL, a section of EVALUATE is called to develop the object code. PASS2 puts the object code into the space reserved for it though, unlike regular instructions which let EVALUATE or STRINGS handle the task. There is a separate section devoted to just printing out CONSTANT instructions, it is called INST.1F and INST1B, depending on whether the last operand required a full word or could end on a half word (byte) . All routines called by PASS2 have an error return provision. If an error is detected the return is to the address passed in general register 1^. If no error is detected the return is to this address plus four bytes. There- fore, the first instruction after a subroutine call is always an unconditional branch to the error handling section. h9 Error numbers are stored in column nine of the current card image. See Figure 10. The error section prints the card image, then passes control to a section named EMSGP to print the error message. The buffer is always punched after an error and a number of zeros equivalent to the length of the current instruction are punched for the simulator. No object code is punched for the D-Machine from a statement found to be in error. Input-output is handled in much the same way string instructions are. The only difference is that the routine called is INCUT. The output listing allows 60 lines per page. Register 11 contains the lines printed on the page at any given time. Before any line is output a check is made of register 11, and if greater than or equal to 60 it goes to a new page before printing the line. EJECT statements simply cause register 11 to be set to 60. After an END card is found or when the last card in the intermediate file is processed by PASS2, if cards for the D-Machine are being punched, a card giving the starting execution point is punched. It has the same format as the object deck cards except column five contains an 'E' and the address on the card is the position for execution to begin. For a description of the END card see section 2.8. After a program has been processed by the assembler, the symbol table is printed out in alphabetical order. To alphabetize the table, a linear search is made of the possible 102.1 entries. Every time a nonblank entry is found, the address of the entry is stored in the next available location on top of the intermediate file. A bubble sort is then used to sort the addresses according to the labels they point to. The information printed includes the label, the statement number at which it was defined and the label's value. After the symbol table is printed, the DCBs are closed and control is 50 passed back to PASS1. Following is the register usage for PASS2, Register Usage R15 Branch address Blk Return address R13 Save area pointer KL2 Base register ELI Lines printed on page RIO Statement number R9 Location in intermediate file r8 Base register RT Internal return point R6 Points to CARD EVALUATE EVALUATE handles the operands for all instructions. For certain types of instructions it returns a value, while for others it actually sets the bits in the object code. Each type of instruction has a separate entry point in the routine. Word instructions use EVALUATE, EQUs use EQUEVL, STORAGE uses SCEVL, CONSTANTS use CONEVL and strings use STRINGVL. The sections of code at each of these entry points use common routines to process the various types of operands. Straight addresses, either base 10 or a label, with or without an offset, are handled by a section called LEVAL. It returns the result in general register 11. If a label is used a symbol table lookup is used. The hash routine described in section J> .h may be found at location HASH. To determine the magnitude of a base 10 number in EBCDIC a routine called VALUE is used. It returns the value in general register 7* 51 Character literals are handled by a routine called ALPHALIT. When called the literal should be in location OPERAND of PASS2. On return the literal is still in OPERAND, but leading and trailing apostrophes have been removed and double apostrophes in the literal itself have been reduced to single ones. The literal is no longer in EBCDIC though; it has been converted to ASCII. The length of the literal is in general register one. Byte addresses use a section called BADDS. This section in turn uses either LEVAL or VALUE to get the straight address, then converts it to the appropriate byte address. The result is returned in general register 11. For word instruction operands the bits in the object code are actually set in EVALUATE. EQUs and STORAGE instructions simply return the value of the operand (in general register 0) . CONSTANT instructions have two possible operand types, full words and bytes. If the operand occupies a full word the result is returned in general register one and the return is the address in general register Ik plus four bytes. For operands which may end on a half word (byte) the result is in location OPERAND and the length of the result (in bytes) is in general register one. The return is to the address in general register Ik plus eight bytes. This means that the call to CONEVL is followed first by a branch to an error routine, then by a branch to a full word section and then the code for the byte operands. String instruction operands are either a register or an update in- dicator. For a list of the arguments and values returned see below. Input Value Returned Register n Core location of register n ^Register n Complement of core location of register n 52 Input Value Returned *0 -2 +1 1 1 1 -1 -1 The value is returned in general register 1. The register usage for EVALUATE varies for each type of instruction being worked upon. SCAN The SCAN routine removes the next operand from the current card image. Upon calling SCAN, POINTER, a location in SCAN, should have the address of the first character of the next operand, or any blank character in front of the operand. SCAN -will stop at a comma, a semicolon preceded by a blank, or at the first of a series of blanks which finish off a statement. For string instruc- tions only, SCAN will stop at a single blank character. The operand is moved to location OPERAND in PASS2. Upon return, POINTER has the address of the comma if the operand is followed by one, the address of the first following blank if the operand is the last in a statement or the first character in the next field, if it is the last operand in a field in a string instruction. The return is to the address passed in general register 1^ plus four bytes for the first two cases and to the address in general register Ik- plus eight bytes for the last case. The instruction after the call to SCAN is a branch to an error handling section. The error return for SCAN is to the address in general register 1^. 53 INPUT INOUT generates the object code for input /output instructions. PASS2 sets the bits in the Device Block which specify whether it ' s a read or write operation. It is INOUT' s job to set the rest of the bits in the three words of the instructions. For the bit patterns of the i/O instructions see Figures 5-7' The first task of INOUT is to determine the device type. There is a separate section for DISK, CARD, LINE and TTY. Each section uses a common routine for setting the bits for the length operand. The last three devices also use a common section for their address operand. This section is not used for the disk because the disk uses word addressing while the others use byte addressing. The normal return is to the address passed in general register lk plus four bytes. STRINGS STRINGS is a subroutine to generate object code for the string in- structions. Each string instruction has a separate section of code, with certain bit patterns being generated by common routines. One such routine RIPOFF, finds all the operands in a field, calls the routine STRINGY! to evaluate them, stacks the results in STACKR and puts the number of operands found in general register 6. The routines A2TW0 and A20NE set the bits for word 1 of all string instructions. This word controls the A2 register referred to in Yamada's thesis [1] . SETA3 sets bits 7-10 of word 2. It indicates which PTR register is loaded into A3- SETA1 sets bits 11-1.5 of word 2, which contain the Al load control . 5k Bits 0-10 of word 3, the jump control one bits, are set by SETJC1. Bits 11-15 of word 3, controlling the counter, are set by SETCTR. The bits of word h, the second and third jump controls, are set by SETJC2 and SETJC3, respectively. For a description of the bit patterns for the string instructions, see pages 11-18 of Yamada's thesis [l] . STORE The STORE subroutine looks at general register nine which contains the card number, and checks to see if the intermediate file is full. If it is full, it abends with a user 1 abend number. Otherwise it loads the 80 charac- ters beginning at location CARD of PASS1 into the next spot in the intermediate file . LOAD LOAD takes the number in general register nine, assumes, it is a valid entry number in the intermediate file, and loads the 80 characters at that position past the start of the intermediate file, into location CARD of PAS SI. PUNCH The PUNCH routine punches the 80 columns beginning at location OUTBUF. If entered at NOPUNCH, it punches nothing. The latter is used for the default option if no OPTION card is found in a program. DPUNCH The DPUNCH routine controls the punching of object cards for the D- Machine. It first subtracts the address of the beginning of the object code buffer (in ABUF) from the current buffer pointer position (in ABUFPTR), giving the number of bytes of object code to go on the current card. It converts this 55 to the number of l6 bit words. If this number is greater than l6, it abends with a user k abend number. Otherwise it updates the starting location for the next card, stored in location CSTART in PASS2. It then converts the binary- code to its hexadecimal equivalent and stores it in location OUTBUF of PUNCH, in the column pattern described in section 2.3 • Finally, it calls PUNCH to output the card. To be compatible with the simulator the first word of the routine contains the offset to a section called FAKE. When a STORAGE instruction is found by the assembler, it must punch zeros if the simulator is being used. Rather than check for this, both punch routines have a section for this occur- rence. Here, though, the action taken is to return without punching anything, since this is for the D-Machine. The normal call to this routine is to branch to the address four bytes past DPUNCH. The zero punch call gets the address of the routine (DPUNCH) and adds to it the number at this location. This gives it the address of FAKE. Branching to FAKE causes a return. SIMPUNCH SIMPUNCH works almost identically the same way as DPUNCH does. The object cards for the simulator hold only eight D-Machine words, and the abend is user number 3, if more than eight are called for. The column pattern for the object card is given in section 2.3 • There is a zero punch section for the simulator. It is called STOR and when called it expects general register to contain the number of words of zeros to be generated. This section also updates the starting location for the code on the next card (in CSTART) and calls PUNCH to do the punching. This action may be taken more than once, if more than eight words of zeros are to be punched. 56 3.6 DUMP Instruction If you look carefully at the mnemonics accepted in PASS1 you will find one called DUMP which has not been described yet. Its use is for de- bugging purposes only. When a DUMP instruction is found by PASS2 it causes an abend, user number 100. This causes a dump which enables the programmer to see what the state of PASS2 is at that point. This statement is usually used just after a card which is not working properly. Only one DUMP card per pro- gram will work, since the abend will terminate the run. 3.7 Making Changes in the Program This section will describe how to make three changes in the assembler, increasing the acceptable operand length, lengthening the intermediate file and adding a new instruction. Throughout the user's guide (Chapter 2) you will notice that there are restrictions on how long literals may be. For instance, character literals may only be 133 characters long. It is anticipated that this will be long enough to satisfy most users' needs. If, however, it becomes necessary to change this sometime in the future, it will be easy to do. In PASS2, OPERAND is defined as 133 characters long (133C) . Right after OPERAND is OL which is equated to one less than the length of OPERAND, and OPLENGTH which is an address containing the length of OPERAND. Both of these variables are set automatically when the length of OPERAND is specified. All references in the program to the maximum length of an operand use OPLENGTH. Therefore, if OPERAND is lengthened it will still remain compatible with the rest of the program, so it could be made 256, or anything else if it is more convenient. In the user ' s guide it was mentioned that the maximum length of a program is 1500 cards. Anything longer causes a user abend 1. This is due to the size of the intermediate file. The intermediate file is contained in the 57 STORE routine. At present it is 120,000 characters long. It may be increased or decreased in length by changing both INTSIZE and INTFILE in the STORE rou- tine. It is anticipated that several new instructions will be introduced after the D-Machine is in use for some time. To see how to add an instruction to the assembler, assume we have a multiply instruction with an OP code of 50. This will naturally be a word instruction, and requires the user to specify two or three operands. As a convenience we will allow the user to specify the mnemonic as either MULT or MULTIPLY. First, we must add the mnemonics to PASS1. We do this by putting MULT and MULTIPLY into the STARTOPS list, between MOVER and NAND. They are now recognized instructions. Since this will be a word instruction the branch address we want from BTABLE is the zero entry SAWB (string and word branch). The length of our instruction will be four words so the entries in the table OFFSETS should be: DC FLl'0',FLlV MULT DC FL1 ' ' , FL1 ' k ' MULTIPLY Again this should be between MOVER and NAND. This is all that is required for PASS1 . In PASS2 we must first set the bit pattern in BITPAT. Since the only bits always on in the OP field for our instruction will be the OP code, (bits 5-11), the bit pattern in hex for the OP field will be 06^0 (see Figures 2-^). The first two operands must be present and the third is optional. These are represented by hex kO and 10, respectively. Therefore, the new entries in BITPAT are: 58 DC X'C^'jX'UO'jX'l+O'jX'l+O'jX'lO' MULT DC X'Oe^X'^O'jX'l+O'jX'lj-O'jX'lO' MULTIPLY Again, this goes between MOVER and MED. The branch address we want in BADDS is WORDS, which is at an offset of zero. The entries in BTBL, therefore, should be: DC X ' 00 ' MULT DC X'OO 1 MULTIPLY This is all that is required to completely specify an instruction. 59 APPENDIX FLOW CHARTS OF ASSEMBLER ROUTINES ttCATlO* CT*. TO 69 TNITTAL1ZX CARD no. TO ZERO JSil HASH LABEL HJT LABEL, LOC. CTR. i, STMT. NO. irro sykbol TAELE NOLABEL BET LENGTH TO DEFAULT NOEHROH mca. loc. CTR. BY LETCTH INCR. CARD NIKBER err lfnt.th t, fcRANtfi ADD 2 SACB PRAUJH CALL gCAN liKT I.ENTiTH CALL SiT-V.'AL KVAU1ATE I.KIMTH 3" WD CALL PASS? EQUB OFTIOH _JT_OROB CALL SCAN GET OPERAND CALL FQUEVL evalTCuS APD. 5 00000 i-TKltWS CONSTANT FN? EOJ' 01TION OR) E.1UCT work? storaoe Figure 12 . set ux:. CTR. TP HEW APP. 6 PASS1 PASES' IHTTCAUZE: GETTHT GETCITK WORM OPKERE I>X"10' J.X*10* Ict'lO' © © ® REQUIRED OPTIONAL BYTEADD storage CVu-, «K>\n c^W - ^"^' ktute >:.)!' constant 99 9 09 9 wo:;l« F.'t\:x mc. 1 o.cion aw.- tw STKINiK chan.-.e 1 FROM WORD TO BYTE ArPRSSS inc. or. HO. Figure 13. PASS2 T C0N3 TA /^a \ > — I — ' ran LENGTH HZ CONVERT LENGTH TO BYTES SET FLAG REDUCE OP. LENGTH PUT OP. IN BUFFER DEC. LENGTH 62 OPTHERE PRINT lINST. PUT OBJ. LOPS INTO BUrFER CONVK3T OW. 'iVE TO KBCDIC OUTPUT CARD INC. LOG. CTR. © r.TMiti iHBTmjcnoira CALL ^STRINGS GENERATE v OBJ COUC 'READ .WRITE CALL INOUT GENERATE VOBJ CODE STORAGE CALL SCEVAL 3TH ^SIMULATOR /> V PUNCH ZEROS INCREMENT LOC. CTR. « REOERROR Q±— OUTPUT BUFFER V JL_ f PUNCH ZEROS ADD LENGTH TO LOC. CTR, 5 OF CALL SCAN GET OP. CALL EQUEVL EVALUATE .ADDRESS OUTPUT BUFFER UPDATE: LOC. CTR. STARTING ADD. FOR OBJ. CARPS Figure .13 (continued). PASS2 63 ACMNT \ PRINT / CARD / 9 Q EJECT SET PACE COUNT TO 60 OPTION Iequs \ print / CARD / 5 SET add. TO 89 PUI1CH END CARD Figure 13 (continued) . PASS2 ( SCEVAL J VALUE EVALUATE V NUM. RETURN ADU. IN R]> »«i» GET KIP OK LEAPING, TRAILING AND DUTLE AITSTROlliES CONVKRT TO A:V11 RETURN I NTKKNAL CONVERT KR(»! UKXAPFCIMAL TO BTRARY r \ RETURN tktkrnal Figure 1^. EVALUATE ( -nWKVL J CHAR BYTE POS NEG HEX UsL LIT ADJ. HUM MUM LIT 3 LITERALS A1PHALIT COKVEJC ,T0 ASCII RETURN add. nc W» ♦ 8 JPLIT VALUE EVALUATE V NUM. RETURH ASb. !H 0.(2) <»> EL IRDIHECT PLUS HUM. MINUS ADDCOB LEVAL EVALUATE ADD. 'RETURN ADD. IN iKlk ♦ k Q. 1 BLANKop ABUIU DUMP a -< ERROR SET ERROR CODE IN COL. 9 PXTUSK ADD. IN KLk © Tneglit VALUE GET VAIUE OF WW. , NEGATE Nl* n 'KR RETURN Al>!\ JN ■IK, ICTAL bVitbuA'fE . Apr. RHTKN API). IN .KlU t •'* Figure Ik (continued). EVALUATE •OAK* INCR2KENT OP. PTR PUT CONT. MARK IN COL. 9 $ FINISHED K-Tl'KN ATO IN K1U •» 1», Figure 15- SCAN 67 © PUT CONT. MARK IN COL. 9 /is\ / IT A \T v. colo;; > POINT pro. at COL. 19 Tm pi-. \ ABEND / nn \ DUMP / Figure 15 (continued). SCAN 68 (5~(50~£) DISK CARD LINK . TTX CALL SCAN GET LENGTH SETFLAO TCA CAtD SET DEVICE TYPE CALL SCAN "geT — LENGTH SETFLAG SET FUG BLOCK CALL SCAN -gTT" ADD. SET ADDRESS BITS LIKE SET DEVICE TYPE 1 TTY 1 1 SET DEVICE TYPE ■CTFUtO BET BITS FOR LENGTH OIKRAND IN WORD 2 RETURN INTERNAL ERETURN SET ERROR CODE IN COL. 9 RETURN ADD. IN Figure l6. INOUT C w J saw RIPOFF SETCTR SET BITS FOR WORD 1 ripoff OET OPS. roR jci I SETJC1 SKTA1 :"rT MAf-K V RTF.'- Rirorr GET riELI>" < X OPB. , Afcotnr SET t£ BITS SETAJ SETCTR SET CTR HEG RIPOFF CET JCI OPS. SETJC1 RIPOFF CET JC2 OPS . SETJC2 CED SET OP. CODE RIPOFF CET FIELD 1 OPS. _j AS ONE SET A' FITS SFTA} / SKTCTR \ 69 SL f RirOFT f SBTJC1 \ / RIPOFF \ CET JC \ OPS ( SETJC2 / RIPOFF \ GET MASK SETA1 SET MASK ^BITS S CH3 SET OP. CODE C •"» J) 1 1 SET OP. CODE » RIPOFF . GET FIELD . ,1 OPS. A*>ONF. SET AJ r-TTT seta; RTPPVF SKT .TCI 5KT.TC1 6 Figure 17. STRINGS 70 ATONE A2TW0 SET BITS FOR WORD 1 or TOST. RETURN RETURN INTERNAL \ / INTERNAL SETA3 SET BITS 7-10 OF WORD 2 RETURN INTERNAL SET BITS 6-15 OF WORD h RETURN INTERNAL SETA! SET BITS 11-15 OF WORD 2 RETURN INTERNAL Figure 17 (continued). STRINGS 71 ( STORE ) COMPARE ENTHY HO. WITH SIZE or nrr. file Figure l8. STORE Figure 19. LOAD 72 f FUWCHJ RINCH ONE CARD / mnuaw \ \ ADD. IN / Figure 20. PUNCH 73 DPUNCH IMC. STARTING LOC. OF OBJ. CODS FOR NEXT CARD CALL PUNCH puuuh CODE f STOR J J Y If < 1 TAKE j NEXT 8 CLEAR OUTPUT BUTTER CONVERT OBJ. CODE TO OCTAL PUT IN OUTPUT BUFFER INC. STARTING LOC. OF OBJ. CODE FOR NEXT CARD -C FAKE J / CALL \ \ CODE / RETURN RETURN ADD. IK RlU ADD. IN RlU Figure 21. DPUNCH Figure 22. SIMPUNCH lh LIST OF REFERENCES [1] Hirohide Yamada, "Emulation of Disc File Processor/' University of Illi- nois at Urbana- Champaign, Department of Computer Science Report No. k-36, June 1971. [2] Paul L. Chouinard and Eugene J. Polley, Jr., Term Project for Computer Science ^0.1 Compiler Construction, University of Illinois at Urbana- Champaign, Department of Computer Science, 1972. [3] David J. Kuck, Lecture for Computer Science 397> Computer System Organi- zation, University of Illinois at Urbana- Champaign, Department of Computer Science, December l6, 1970* [h] David Gries, Compiler Construction for Digital Computers , John Wiley and Sons, Inc., New York, 1971- BIBLIOGRAPHIC DATA SHEET 1. Report No. UIUCDCS-R-72-53^ 3. Recipient's Accession No. 4. Title and Subtitle An Assembler for Efficient File Manipulation 5. Report Date August, 1972 6. 7. Author(s) Eugene J. Polley, Jr. 8. Performing Organization Rept. 0, UIUCDCS-R- 72-55^ 9. Performing Organization Name and Address University of Illinois at Urbana-Champaign Department of Computer Science Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract /Grant No. US NSF GJ 27U1J-6 12. Sponsoring Organization Name and Address National Science Foundation Washington, D. C. 13. Type of Report 8t Period Covered Master's Thesis 14. 15. Supplementary Notes 16. Abstracts The Burroughs D-Machine minicomputer will be used for file searching projects. These programs are written in an assembly language called the S-Language. This paper describes the assembler which translates S-Language programs into machine code for the D-Machine. The assembler runs on the IBM 360/75 and is written in 360 Assembler Language. A user's guide and implementation details are presented. 17. Key Words and Document Analysis. 17a. Descriptors Hashing Functions Symbol Tables Intermediate File Information Retrieval System Two Pass Assembler Bubble Sort 17b. Identifiers /Open-Ended Terms 17e. COSATI Field/Group 18. Availability Statement Release Unlimited 'ORM NTIS-35 (10-70) 19.. Security Class (This Report) UNCLASSIFIED 20. Security Class (This Page UNCLASSIFIED 21. No. of Pages 8o 22. Price USCOMM-DC 40329-P7I S£p ** *9?2 > ■ H I H ■ n ■ 9 H H I wBnMH ' : ■ ■ ■ .HI mra ^H ' . i '1