LIBRARY OF THE UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAICN 510.84 cop. 2 The person charging this material is re- sponsible for its return to the library from which it was withdrawn on or before the Latest Date stamped below. Theft, mutilation, and underlining of books are reasons for disciplinary action and may result in dismissal from the University -UN.VERSITY ° F llUNOIS "B«ARY AT URBANA-CHAMPAIGN 4> I L161 — O-1096 Digitized by the Internet Archive in 2013 http://archive.org/details/programmablyload635holl )7ul2A 5/0, W ZUaJ Report No. UIUCDCS-R-74-635 A PROGRAMMABLY LOADABLE CONTROL STORE FOR THE BURROUGHS D-MACHINE NSF - OCA - GJ-36936 - OOOOOl by Lee Allen Hollaar THE LIBRARY OF THE May 1974 JUN 14 1974 UNIVERSITY OF ILLINOIS Report No. UIUCDCS-R-74-635 A PROGRAMMABLE LOADABLE CONTROL STORE FOR THE BURROUGHS D-MACHINE* by Lee Allen Hollaar May 1974 Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 * This work was supported in part by the National Science Foundation under Grant No. US NSF GJ-36936 and was submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science, May 1974. m ACKNOWLEDGMENT "...The matrices may be regarded as very high speed stores holding fixed information. If they could be replaced by an erasable store to which information could be transferred from the main store of the machine when re- quired, we should have a machine with no fixed order code; the programmer would, in fact, be able to choose his order code to suit his own require- ments and to change it during the course of the program if he considered it desirable. Such a machine would have a number of fascinating possibilities, but I doubt whether, in view of the amount of equipment it would doubtless involve, its construction could be justified." Professor M. V. Wilkes "The Best Way to Design an Automatic Calculation Machine" IV TABLE OF CONTENTS Page 1. INTRODUCTION 1 2. PROGRAMMING AND USES 4 2.1 Programming Loader Commands 4 2.2 Device Zero Programming 5 2.3 Loading Programs 7 2.4 Advanced Programming Techniques 8 3. HARDWARE MODIFICATIONS 11 3.1 System Data Flow 11 3.2 Device Zero 14 3.3 Control Memory Organization 17 3.4 Other Hardware Modifications 21 3.5 Microprogram Control Unit 22 APPENDIX . . 26 1 D-MACHINE BOOTSTRAP PROGRAM 26 2 OVERLAY LOADER 27 3 D-MACHINE LOADER 28 4 DEVICE ZERO COMMANDS 30 5 LOGICAL OPERATION OF MPMCU CONTROL SECTION 31 BIBLIOGRAPHY 34 1. INTRODUCTION In 1971, the Digital Computer Laboratory at the University of Illi- nois purchased a prototype D-Machine from the Burroughs Corporation for use in research into microprogramming and emulators; in particular, it would be used to develop an emulator for efficient information retrieval and data manipula- tion. The unit as supplied by Burroughs consisted of a sixteen bit processor; a 4096 word, 900 nanosecond cycle core memory system (later expanded to 8192 words); and four 1024 word by 16 bit, 400 nanosecond cycle M0S memories, built by the Cogar Corporation. In order to test the hardware, a simple means of loading the control store, made up of the Cogar memories organized as 1024 words of 58 bits each, was devised, consisting of an array of 58 switches on the front panel which allowed the user to load any bit pattern into the memory location selected by the normal control storage program counter. This allowed the user to load se- quential words or, by forcing the processor to execute a jump instruction, any desired word. While this system proved very useful for the short programs typically used to check out the hardware, it was obvious that some type of automatic loading procedure from an external device was necessary if the machine were to be used to execute large control programs. Originally, the control store loader hardware designed by Burroughs, consisting of a single board containing the logic necessary to load the control store from either the card reader or the main memory, was considered. However, since the organization of the control store had been modified to the more con- ventional single level to increase processor speed, rather than use the two level organization described in the various Burroughs reports, the unit would have required many modifications. In addition, it was desirable to have more flexibility of input formats and in transferring of code from main memory to control memory than was provided by the Burroughs board. A tentative design of a hard-wired loader to meet all of the requirements was made, with the re- sulting unit's complexity and expense not warranted by the limited use it would receive. It was decided to use the machine itself to control the formatting and loading of the control store, adding only the hardware necessary to allow the writing of the control store under program control. While the operation of this type of loader is not as fast as that of a hard-wired loader, the in- crease in overhead is unimportant for many reasons: when loading from an external device, such as a card reader, the speed of loading depends only upon the speed of the card reader; the amount of time spent loading the control store dynamically from the main store is small compared to the time spent pro- cessing normal instructions; and so forth. By using the processor itself to control the loading, the full data processing capabilities of the machine become available to the loading process. In particular, the loading process now has access to temporary storage (the working registers of the processor and the main memory of the machine), a multi- function arithmetic/logic unit including a full barrel switch, and the condition testing and sequencing logic. This allows the loading of the control storage from any device on the system , using any input format desired. A simplified card format was devised for a special bootstrap program which is entered by the operator from the front panel; this short control program can then read more complex loader programs into the control store. Loader programs have been written which read cards produced by the D-Machine assembler (TRANSLANG); in addition, the control store has been loaded from an exact image stored in main memory. The next section of the report describes the programming required to use the loader hardware; following that is a section which describes the modifications made to the hardware to implement the loader facilities. Each of these sections is independent of the other, so a person wishing to know how to program the loader need only read section 2, while a person inter- ested only in the logical design of the loader would refer to section 3. Be- cause these sections can be studied independently, there is some overlap of the material contained in each section; for example, the commands for Device Zero are explained in both, once from a programming point of view and then from a hardware viewpoint. Both sections assume that the reader is familiar with the architecture and programming of the Burroughs D-Machine and micro- programming in general; if this is not the case, a set of references is given in the annotated bibliography at the end of the report. 2. PROGRAMMING AND USES 2.1 Programming Loader Commands The loader responds to two commands issued by the programmer; these commands are similar to the commands used to write into core memory. They are executed when a memory/device operation is issued and bits 55 and 57 of the control word are zero and bit 56 is a one. A memory/device operation is issued when the instruction is Type I (bits 1 through 4 are all ones), and either the selected condition is true or bit 11 is a one. Bit 58 is used to indicate which base register is to be used with the MAR to form the OS ad- dressing bus. The micro-write instruction takes the data from the MIR bus and places it into the control memory location addressed by 0S3 through 0S16 (BR3 through BR8 and MAR! through MAR8) . The twelve bit address formed by 0S3 through 0S14 indicates the word in the control store which is to be modified, and the two bit field 0S15 and 0S16 indicates the quadrant of that word. The control storage write operation is not initiated immediately, but is delayed by the hardware until after the instruction following the one which issued the loader command. This is to retain compatibility with the core memory system, which does not immediately start its operation, but waits one cycle; it also allows the selection of the proper base register to complete and, if the next instruction is a successful Type I, allows the contents of the MAR/BR or MIR to be updated. The loader hardware "steals" the next ma- chine cycle, stopping the clock so that none of the registers are changed, and writes the data into the control memory. It then fetches the next control word, which would be the second instruction following the one issuing the command, and allows the machine to proceed. With the exception that one machine cycle is lost, the programmer is not aware of the loader's operation; it is as if the instruction following the one with the operation took twice as long. No changes have been made to any of the conditions or registers, and execution has not been modified. Be- cause the conditions remain the same, the program can use the RMI test to determine if it is safe to load the MAR or MIR registers; it will behave the same as with the core memory. The programmer should be careful if he is writing into the words from which he is executing ocde; it is possible that this will cause a pro- gramming error if a partially written word (not all quadrants changed) is executed. The programmer also should not issue another memory/device oper- ation, and in particular another loader command, until after the time that the loader has executed the command. Failure to follow this rule may lead to the wrong base register being selected, or the second loader operation command being ignored by the hardware. 2.2 Device Zero Programming Device Zero provides additional facilities for the programmer, in addition to controlling the input source for control store writes. It contains a real-time clock and a teletype interface, and allows access to the console data switches. The commands for Device Zero are summarized in Appendix 4. The programmer may gate the contents of the console data switches onto the processor EXT bus, for clocking into the B register with either a BEX or BBE command, by issuing a device read (DR1 or DR2) with the selected BR containing a zero and the MAR having bit one set and bit two cleared. The data is placed immediately on the EXT bus, and no condition bits are changed. If the operator has activated the interrupt switch on the front panel, Device Zero will generate both a STINT and DINT to the PSI, which will then set the particular condition code indicated by whether the device is selected or locked. This condition is cleared by reading the status into the machine either by an ASR or ASE device command or by issuing a device read with the selected BR zero and MAR bit one cleared. The real-time clock consists of a sixteen bit counter which is incremented every 3.2 microseconds; this means that the counter overflows about every 0.21 seconds. The programmer can clear the counter by giving the general reset command (device write with MAR bit two set) or can load the contents of the MIR into the counter (by issuing a device write with MAR bit one set and bit two cleared). The clock can be read by issuing a device read with MAR bits one and two set to place it on the EXT bus. The teletype control can operate in either half or full duplex, depending upon how the teletype is connected. At present, the teletype is connected in half duplex, so the programmer does not have to echo the charac- ter to have it print on the teletype. The interface operates in a bit serial fashion, with character assembly done by the programmer using the barrel switch of the processor. A device read with MAR bits one and three set gates the state of the current loop onto bit 9 of the EXT bus, while a device write with MAR bit two cleared and bit three set loads a flip/flop which breaks the current flow in the loop from MIR bit 15. These bits were chosen to simplify the character assembly. To read from the teletype, the programmer monitors the current loop state looking for a start bit; when this is found, he waits for half the bit time (which is the reciprocal of the data rate) and checks to see if the start bit is still there. If it isn't, an error has occurred, probably due to noise on the line. Assuming that it is still there, he then waits one bit time and reads in the data bit, ORing it with the previously assembled bits and shifting them to the right. When eight bits have been read this way, the character is complete and he looks for the start bit of the next character. Writing characters to the teletype consist of sending a start bit by breaking the current loop for one bit time, then sending the data bits by either breaking or allowing the current loop. After the last data bit has been transmitted, the programmer must enable the current loop for at least two bit times to provide the proper stop bits. He can then transmit the next character. There are some combinations of MAR bits which signify more than one valid command. In this case, all the specified commands are executed, except that the clear command overrides any loading of registers. On a read, the OR of the selected data sources is gated onto the EXT bus. 2.3 Loading Programs The actual programs used to load other programs are contained in Appendix 1, which describes the card reader bootstrap program, and Appendix 3, which describes the control and core memory loader currently in use. Before any data can be loaded under program control, a bootstrap routine must be entered into the control store from the front switches; this program is as short as possible to make the loading by hand as easy as possible The purpose of this program is to read the actual loader into the processor and write it into the control store. In order to simplify the bootstrap rou- tine, it is capable of loading only a simple format of card data, and contains no error checking of any type. Once the loader is bootstrapped into the control store, regular 8 programs can be loaded. It is desirable, but not required, that these pro- grams do not overlay the loader, so that bootstrapping need only take place when the machine is initially powered on; it must take place at this time, since the control memory loses its data when power is removed. • Although not installed at the present time, the hardware allows for a read-only memory to be added to the control store array. This ROM can hold the bootstrap code necessary to write the loader into the control store. In this case, the tedious loading of the bootstrap by hand can be avoided, and initializing the machine is simplified. 2.4 Advanced Programming Techniques While this section primarily describes the programming commands used for writing data into the control store and the more conventional uses of these commands, it is interesting to look briefly at some of the tricks which can be programmed using the loader commands. Although there are many tech- niques which can be used, only the two most common will be discussed. The first is the use of overlays in the control store; this can in- crease the effective size of the control store by allowing the sharing of ad- dressing space by many different routines. Appendix 2 shows the main loop of a program which transfers control words from the core memory into the control store. This technique, which is also used by IBM on their various System/370 models, means that routines which are used infrequently or only used for ini- tialization do not need to take up space within the control store. IBM uses this technique to load a set of diagnostic routines which check the various data paths and the operation of the various functional units. When all these tests have been successfully completed, the actual control pro- gram is brought into the machine. If these diagnostics had to be kept in a static control memory along with the emulator program, the expense due to the increased number of words and the increased word length would be prohibitive. On the information processing project for the University's D-Machine, the order code and the control program is written so that only one peripheral device at a time may be in operation. Since the device interfaces contain only logic to remember the device status and connect to the PSI and the vari- ous processor buses, the actual transfer of data to and from the device and the checking for error conditions is handled by code in the control store. In order to conserve space within the control store, and thus allow more available functions, the device handlers are overlayed in a special transient area in the control store. In other words, when the use of a device is re- quested by the information processing program, the emulator loads the appro- priate device handler into the control store from the core memory and executes its code; the code is overlayed by the next device handler requested by the information processing program. Since the time necessary to transfer the de- vice handler is short when compared to the time required to perform the request action on the peripheral device, such as reading a card into memory, the over- head is small while the savings of memory are significant. Another interesting technique is for the control program to first generate and then execute code; an example of this was programmed by J. R. Rinewalt in a program to list cards on the line printer. Since the character timing of the card reader does not match the character timing of the printer, an entire card image must be buffered in the processor; however, this program was needed at a time when problems were occurring with the core memory system, making it unavailable as a buffer, and the number of registers in the proces- sor is too small to buffer an entire card image. The problem was solved by 10 converting the characters read in from the card reader into Type II control words and writing them into the control store in sequential locations. When it was time to print the data, the characters were retrieved by executing an EXEC order with the AMPCR pointing at the proper control store word; this loaded the character into a machine register, where it could be processed and sent to the line printer. 11 3. HARDWARE MODIFICATIONS 3.1 System Data Flow Before discussing the modifications made to the hardware to accom- modate the loader commands, it would be good to look at the data flow within the D-Machine system as installed at the University of Illinois. Since the data flow within the actual processor has not been changed from the normal flow which is covered in the Burroughs manuals mentioned in the bibliography, it will not be covered here; instead only the data paths located outside the processor, and those registers of the processor to which they are connected, will be discussed. Figure 1 is a simplified diagram of the external data flow before the loader modifications were made. The control memory is addressed by the processor's Incrementer register, controlled by timing signals from the processor and read/write control switches on the front console, receives input data from 58 switches on the front panel, and outputs 58 bit control words to the processor. Core memory and peripheral devices place their data on the EXT bus, which serves as an input to the processor through the B register's EXT selec- tor; they receive data from the processor's MIR register and its bus. The MAR and BR registers of the processor furnish the core memory with its address; the MAR contains the low order eight bits of the address while the selected BR contains the high order eight bits. The MAR register is also fed to the various peripheral controllers, where it is used as modi- fier bits for read and write commands. The timing and control signals for the core memory are generated by the processor's OMC board in response to memory read and write commands. 12 < z en O UJ o o > IOHJ u. < a 5 h-O UJ < ll J i 1 i • • • • • • UJ o UJ Q f UJ (J ^^ > o UJ o ll >- LD CE or o O 2 i U UJ 2 1 . a z < Z a> (ft 3 (ft *- • z < p UJ (ft . u u. o o • O Z > to a a. a z 4 Z in (/) Z 1 1 o (J a> (ft a z 3 (A a z (ft * 3 Ift (0 (0 < H < t "J z 1- Cft Z < <» ui < 3 o u Z o 1- 1ft (ft UJ Ul (ft (ft UJ (X i- < a < < a a a. < 2 i- or 3 o 0£ Ul o 3 Ul a: o O or a. a. 2 z _i n o (ft Id K IT »- Z o u O o < 08 o H z c! S H z u n JW OUJ o? a o $ir i » * (E Z H _» o £i 1- °ft z o u < ffi 1- < CD II III IO CO H r to UJ ID z oj < I (- o < (- Is 1 en i o 03 Q <0 en S- o CD S- 13 5 <" or ^ i- 1 > O UJ g a i 1 1 1 . • • • • • • w o " > •"• UJ « a ii Ui f> _i y 11 ° * u w > o ■ ' '< UJ a 5 " 1 " >- UI £ o 5 1 * u UJ 1 M 1 o z s vt 1 a 3 < ' 1 ... g UJ UJ § > a UJ (0 a. a V) n ►- a) * 2 II | m 4 <■ o z vi M O « * »/i O I- g H- z 3 2 a « < 1 < UJ UJ 3 »- o s (A V) a a. ■ u UJ 1 1 * o o o 1 3 o 5 a 4 a K UJ t- OS — »- ■» X UJ UJ a UJ B U 2 HE 2 o or Z O CO UJ X <_> H i a 1 ii VI o s ; * a II 1 u UJ < Z 5 Q 2 X <3 i- UJ on o 1- u 1 S a * a K II < (Tt £ il o *a o * 2 O -M Q -a CD -o o 2: CM CD s- en 14 The contents of the selected BR is also fed to the PSI board, where, along with control signals from the processor generated by device commands, it produces read and write control signals for the selected peripheral con- troller. The PSI also returns signals to the processor used to set various condition codes, based on status signals returned to the PSI by the devices. One fundamental difference between this configuration and the stan- dard Burroughs machine is that it operates with only a single level of control store, while Burroughs uses two levels, one acting as a control store with short encoded commands, and the other acting as the decoder for these commands. While our operation is wasteful of bits, it allows the processor to operate at a speed twice what it would be capable of with the two level control store. Figure 2 shows the system data flow after the loader modifications were made; note that. two new units have been added (MPMCU and Device Zero), and that many data paths have been changed. Device Zero is a utility device which provides the hardware for the switch register, real-time clock, and teletype interface for the programmer, and selects the input source for the control store from either the MIR bus or the switch register. The control of the control store has been moved from the processor to a new board, the microprogram control unit (MPMCU); this board contains all the logic necessary to control the control store and synchronize it to the processor. In addition, it selects the control store address to be used for the next memory cycle from either the normal Incrementer register, the MAR/BR register set selected, or twelve address switches located on the front panel. 3.2 Device Zero Device Zero not only provides input switching for the control store, but adds a number of additional functions to the system. Figure 3 is a block 15 . rr rvj o CE O E R*~ <1J Q CO Ol M o o: o u. z^|o O 2 (E t (T < uj > o «/> v> m UJ o < or 2 a. 16 diagram of Device Zero. Selector II is capable of selecting either the switch register or the processor's MIR bus as the source for the control store's in- put, based on the signal SWENT from the MPMCU. The control section controls the real-time clock and the teletype interface, and selects the proper signals to place on the EXT bus based on commands from the PSI and modifier bits from the MAR register. For example, if either a CLEAR command or a WRITE command with MAR2 set is decoded, the control section clears the counter, the teletype output bit and the status register in the device. Appendix 4 contains a summary of these commands. The programmer can read the value set in the switch register on the console of the machine by issuing a READ command with MAR! set and MAR2 cleared. This causes Device Zero to place the data on the EXT bus, where it can be clocked into the B register of the processor. The real-time clock consists of a sixteen bit counter which is incremented every 3.2 microseconds; the counter overflows in about 0.21 seconds. It can be reset to zero by issuing the general reset command mentioned above, and can be loaded to any value by placing that value in the MIR register and issuing a WRITE with MAR! set and MAR2 cleared. The value in the counter can be read by issuing a READ with MAR1 and MAR2 set. The teletype interface consists of a one bit register which contains the current state of the teletype current loop, and a one bit register which can block current passing in the loop; although it can be operated in full duplex, the interface is currently set for half duplex operation, so that the user does not have to echo the character received. Issuing a READ instruction with MAR1 and MAR3 set reads the state of the current loop into bit 9 of the EXT bus, while a WRITE instruction with the same MAR setting sets the current control flip/flop based on the value of bit 15 of the MIR register. These 17 bits are used to permit easy assembly and disassembly of the character by shifting to the right. The final operation of Device Zero is to place the contents of a status register on the EXT bus and generate interrupts depending upon this status register. If the user sets the interrupt switch on the front panel, Device Zero sends an interrupt request to the processor, setting one of the condition bits. The programmer can read the state of the interrupt bit by issuing a READ with MAR1 cleared. 3.3 Control Memory Organization The control memory organization can be described in two different ways--the logical organization, or how it looks to the user, and the physical organization, or how it is actually constructed. A knowledge of the first is vital for the programmer who wishes to use the loader facilities, while any person modifying or maintaining the system must know the latter. Originally, Burroughs designed the machine for two levels of control store, the so-called micro and nano memories. The idea was that by dividing the control store into these two parts, the total number of bits in the mem- ories would be less than for a single level memory; however, since the cycle times of the memories are a factor in determining the speed of the machine, this structure also requires a memory twice as fast for the processor to run at the same speed as when it has a single level control store. Since it was desired that the processor run as fast as possible, the control store's orga- nization was modified by "folding" the nano memory into the micro memory. In the original scheme, if the micro instruction had a zero bit in any of the first four bits, it was a Type II instruction; Type II instructions are only sixteen bits long, and are used to load constants into various 18 processor registers. If the first four bits were all ones, then the instruc- tion contained the address in the nano memory of a full length instruction (54 bits), used to control the data paths of the machine. The modification simply increased the length of all instructions to 58 bits, with the extra bits being ignored in Type II instructions. For a Type I instruction, the field containing the address of the nano instruction was replaced by the nano instruction itself. Since the word length of the boards used to construct the control memory is sixteen bits, the actual length of a control store word is 64 bits; the six bits not used are reserved for expansion. At the present time, these bits are also used to help in maintaining the system. It is possible to write data into these bits from either the console or the loader, and this data can be used to flag control words of interest. If a scope is connected so that it triggers when one of these extra bits is a one, then the traces on its screen can be synchronized with execution of a specific control instruction. These bits can also be connected to special logic to halt the processor if a specific instruction is reached or to generate some other function based on the execu- tion of flagged instructions and the state of the processor. The logical organization of the memory is different when the user wishes to write into it. This is because the word length of the processor, and hence the length of the output data register, is only sixteen bits. In this case, the memory can be regarded as a sixteen bit memory, with the first four words forming what was the first word in the control store. In other words, each 64 bit control word is composed of four sixteen bit quadrants-- when executing from the control store, a complete word (four quadrants) is read out, but when writing into the control store, the quadrants are addressed singly. 19 The loading address for the first quadrant (bits 1 through 16) of control word N is 4N; for the second quadrant (17 through 32), 4N+1 ; for the third (33 through 48), 4N+2; and for the last quadrant (bits 49 through 64) is 4N+3. These addresses are generated automatically by the hardware when loading is done from the front panel; all that is necessary is for the user to select the normal address either by stepping the processor or using the address switches and activating the proper quadrant write switch. Figure 4 shows the physical arrangement of the control memory. The memory is mounted in a special rack below the processor; this rack contains the line drivers and receivers, cable connectors, and sockets to hold eight Cogar memory boards (with a total capacity of 2048 words of 64 bits) and two ROM boards to hold a bootstrap program. At the present time, only four Cogar boards and no ROM boards are installed. The line drivers and receivers invert the data when it is received and re-invert the data when transmitted, so that all data is stored in the memory in its one's complement form. Each Cogar memory board is connected to the twelve bit address bus and the sixteen bit data bus; every fourth board is tied to the same output bus (the outputs of the Cogar memory are open collector, allowing tie OR oper- ation) and to the same read/write control line. Each group of four boards is tied to a select line. By proper use of the select lines and read/write lines, it is possible to write into any single board or read from any group of four boards. Timing information is also bussed to every board. The ROM slots also are connected to the address bus, and each slot is connected to two of the output data buses. Since it is read only, it is not necessary to connect it to the input bus. A special select line controls the ROM's operation. In order to implement the folded control memory, slight modifications 20 HO. cop ec «8* -I 3 83 ID 1 II ' 8l . 1 i 1 1 1 l i 2 o X o ac ■ i- o LI _J U W tn cc UJ > UJ o UJ cc Ul z _l 2 o a: fO « S < a t- a. Zi o Z 4 o Ul y a: - o S Sh l-t 5 £ 5 < cc < 1 a. s - < CC o Z z < E a < a 1 B O < O 2 <£ UJ UJ £ e K UJ o »- £ (A UJ uj s O uj o o o: uj p o «r °- z CO ^-c- K ^ * £ of J w ° w a: ? UJ o z o o UJ X u 1- ui < to uj * f- to z < Q_ o. u a. d >- O o o ►- in > ui . 5 t uj Jr tc a £ * .O m s E s- CD O o CQ o.. u> i- n en Q < 2 2 O o o o -J o 2 to _J < z o ££ O to to UJ o o tr a. (9 2 2 >- o o o _j o to 24 selected, data is read from the ROM, but is written into the regular memory as though the switch were in normal mode. The various clock monitoring lines (SCLOCK, CYCLE, MCLOCK, CPHASE, and HSCLOCK) are used to tell where in a cycle the processor is and to syn- chronize to the various operations within the processor. Some of these lines are used to generate an internal version of the processor's system clock, since it is stopped by a command from the MPMCU. This is necessary for two reasons: if it weren't done, the operation of the processor in single step mode would differ from the operation in full speed mode, and if the regular system clock were depended upon to allow us to exit the micro-write operation, the machine would hang, since the system clock is stopped by the operation. Command lines to the control section indicate that a load instruc- tion has been decoded (PLOAD), that a system reset is being generated by the operator (CLEAR), and the state of the two low order bits of the MAR (0S15 and 0S16), which is used to select the quadrant for the write operation. The con- trol section produces the command signal SWENT, to indicate to Device Zero that data to the control store should come from the switch register instead of the MIR bus when any of the operator write switches are activated. It produces LMODE to inhibit the processor clock during the actual write cycle; it also switches the address from the Incrementer to the BR/MAR pair by dropping SINCR and raising SOS during the write cycle. However, if the oper- ator has selected the address switches, this selection overrides both SINCR and SOS and forces SSW to be high. The lines to the control memory consist of time signals (CLOCK and SET), generated according to Cogar's specifications and synchronized to the processor cycle, and selection signals. SELO through SEL3 are generated when the appropriate bank of 1024 control store words is addressed for reading, 25 except if the ROM is selected by the front switch, in which case SELROM is true; SELROM is always false and one of SELO through SEL3 is true in the case of a write operation. R/WO through R/W3 indicate into which selected quadrant data is being written and are set by decoding the OS bits or by the appropri- ate panel switch being set. When the MPMCU control section detects a decoded loader instruction, it delays execution of that instruction until after the instruction following the one containing the loader instruction. It does this for two reasons. First, this allows the operation specified by the Phase 3 of the issuing in- struction to complete before the write cycle, assuming there is not a Type II instruction immediately following; second, it makes the timing of the write cycle identical to the timing of a core memory cycle. The delaying of the write cycle is handled by a special shift register in the control section of the MPMCU. Because of the design, it is important that the user not issue another micro-write instruction before the write cycle occurs; if he does issue another one, it may be lost by the control section and not be executed. Since the MPMCU does not modify the status of the RMI condition, and since the write cycle occurs synchronously with the instruction, the opera- tion of this condition indicator is identical to that of the core memory. 26 APPENDIX 1 D-MACHINE BOOTSTRAP PROGRAM The following program is used to bootstrap the loader into the control memory. It is loaded by hand by the user into the bottom of the control memory from the front switch panel of the machine. It reads cards punched in a special bootstrap format--one eight bit byte per column, in binary in rows 2 through 9 of the card. The bootstrap continues to load until stopped by the user, who then forces the machine to jump to the starting address of the loader from the control panel. LOADERSTART + AMPCR; 8 * SAR; SET LCI; AMPCR •* Al ; SAVE; Initialize first loading location Shift of one byte Initialize pointer register, set first byte of word switch, re- member next address (LOOP) When registers available, set card reader "GO" command Issue card feed command Wait for data character ready, set data read command Read byte and put in B register, put load address into MAR and BR1 IF LCI THEN BL* A2; JUMP ELSE STEP; If first byte of word, save it and go to loop to get next... note that testing LCI resets it A2 or B -> MIR; Build data word Al + 1 +A1; SET LCI; UW1 ; JUMP; Increment pointer register, set first byte flag, write the data into the control store, and go to LOOP to get next byte LOOP: WHEN RMI B101 C + MAR2; DW2; WHEN SRQ THEN ■> MAR; DR2; Al * MART, BEX; 27 APPENDIX 2 OVERLAY LOADER This is an example of the code necessary to dynamically overlay microcode with other microcode kept in core memory. It shows only the loop which actually moves the data from the memory into the control store. In this example, we will assume that register Al contains the microaddress where the code should be loaded minus one, A2 contains the location of the code in core memory minus one, and A3 contains the one's complement of the number of words to load (or four times the number of instructions to load). ; SAVE LOOP: WHEN RMI A2 + 1 -* A2,MAR1; MR1; WHEN RDC Al + 1 * Al, MART, BEX B * MIR; UW1; A3 + 1 * A3; IF NOT ABT JUMP EXIT: Load address of LOOP - 1 into AMPCR When MAR free, increment pointer and put in MAR/BR1 pair Read the word from core memory When data is available, clock it into the B register, increment pointer and put it in MAR/BR1 Move data to output register and write it into the control store Increment the count register Loop if more to transfer Continue when done 28 APPENDIX 3 D-MACHINE LOADER This program was written by J. R. Rinewalt to load programs pro- duced by the D-Machine assembler (TRANSLANG) on the B5500 or the S-Language assembler on the IBM System/360. It is capable of loading both the control store and the main core memory of the system. The cards accepted by the loader are in the following format; all numbers are the EBCDIC representation of the hexadecimal value, with A through F meaning 10 through 15. Columns 1 through 3 contain the first address to be loaded from this card, expressed as three hexadecimal digits. Column 4 is blank, and is ignored. Column 5 contains an S if the data on this card should be loaded into the main core memory, an M if it should be loaded into the control store, or an E if loading should end and control be transferred to the start of the loaded control program. Column 6 is blank, and is ignored. Column 7 contains the number of sixteen bit words of data contained on this card minus one. Columns 8 through 71 contain the data to be loaded, with four hexadecimal digits for each word to be loaded. Columns 73 through 80 are ignored. LOADER: B101 C -*■ MAR2; 8 * SAR; COLUMN - 1 + AMPCR; SET GC2; DW2; + Al ; CALL; COLUMN - 1 * AMPCR; CALL; COLUMN - 1 * AMPCR; CALL; COLUMN - 1 + AMPCR; RESET GC2; Al - 1 -> MIR; CALL; COLUMN - 1 -> AMPCR; SET GC2; -> Al ; CALL; START. OF. LOADED. PROGRAM * AMPCR; 29 THEN B L -> MIR, INC; RESET GC2; Al - LIT * A1.BMI; 5 -*■ LIT; IF NOT MST THEN B -> A2; JUMP ELSE STEP; Al -»• • COLUMN - 1 * AMPCR; B + 1 * MIR; CALL; IF ABT THEN SET GC1 ; COLUMN - 1 + AMPCR; SET GC2; - Al ; CALL; Al * CTR.BMI; IF GC1 THEN Al + 1 L + CTR; 14 » SAR; COLUMN - 1 + AMPCR: RESET GC2; CALL; IF GC1 LOOP: COLUMN - 1 * AMPCR SET LC2; + Al ; CALL; COLUMN - 1 + AMPCR CALL; COLUMN - 1 + AMPCI CALL; COLUMN - 1 + AMPCR: CALL ; Al -> MIR,BMI; B * MAR1, INC; IF GC1 THEN UW1 ; SKIP ELSE STEP; MW1; IF RMI THEN B + 1 + MIR; STEP ELSE WAIT; LOOP - 1 + AMPCR; IF COV THEN RESET GC1; STEP ELSE JUMP; LIT OAD + MAR; SAVE; 4 * SAR; 64 + LIT; BlOO R * A3, BEX; DR2; A3 IMP B » ; IF NOT ABT THEN LIT + MAR; STEP ELSE RETN; LOADER - 1 + AMPCR; JUMP; DW2; COLUMN: AMPCR + A2,LMAR; 3 + SAR; 128 + LIT; BlOO R -* A3, BEX; DR2; SAVE; A3 IMP B + ,BEX; DR2; IF ABT THEN A2 + AMPCR; STEP ELSE JUMP; * MAR; BEX; DR2; IF NOT GC2 THEN JUMP; Al L ^ Al; 12 -> SAR; 9 + LIT; B L * A3; A3 L * A3; 15 -> SAR; IF MST THEN Al + LIT - Al ; A3 L ^> A3; SAVE; A3 L * A3; IF MST THEN SET LCI; A2 + AMPCR; Al + 1 * Al; JUMP: IF LCI THEN Al - 1 * Al ; 30 APPENDIX 4 DEVICE ZERO COMMANDS The following commands are recognized by Device Zero: Operation MAR1 MAR2 MAR3 Read X X Read 1 X Read 1 1 X Read 1 X 1 Write 1 X Write X 1 X Write Gate status to EXT bus Gate switch register to EXT bus Gate counter/clock to EXT bus Gate current loop state to EXT bit 9 Load counter/clock from MIR bus Clear counter/clock, clear status register, clear current control flip/flop Load current control flip/flop from MIR bit 15 MAR4 through MAR8 are currently ignored, but should be set to zero for compatibility with possible hardware modifications. If the bit configura- tion in the MAR specifies more than one of the above commands, both are executed. For example, if a READ is issued with MAR1 , MAR2, and MAR3 all set to one, Device Zero will place the counter/clock data on the EXT bus, and will logically OR the contents of the current loop state register with bit 9 of the counter/ clock to produce EXT9. However, the clear command overrides all other commands. The symbol a 1. ' 'X" in a column indicates that the bit may be either an or 31 APPENDIX 5 LOGICAL OPERATION OF MPMCU CONTROL SECTION This describes the operation of the logic which makes up to control section of the MPMCU. Figure 5 shows the structure of the MPMCU, with the control section and its input and output signals on the left, while Figure 6 is the actual logic diagram for the control section, and will be used in the following discussion. Gates 1 through 16 are used to generate the clock signals required by the Cogar memory boards (CLOCK and SET). The inverter strings (2 through 9 and 10 through 16) are used to provide the proper synchronization of each of the clock signals with the other and with the machine cycle. The number was determined experimentally, with an oscilliscope attached at the memory card connecter to allow for cable delays. An output from the CLOCK string is also used to synchronize some of the MPMCU outputs. Skipping to the right of the figure, gates 54 through 62 generate the select commands which select one of the 1024 word by 64 bit address banks or the ROM, depending on the two high order bits of the control store address and the position of the ROM select switch on the front panel. Only one of the select gates (59 through 62) is active at a time, and only when the ROM select is off (ROMSW-0 is high*) or during a write cycle. This enable for the select gates is generated by 56 and 57, while 58 generates the select signal for the ROM. Elements 32 through 35, 36 through 39, 46 through 49, and 50 through 53 produce the read/write commands for the four control store quadrants. Their operation is identical, so only quadrant zero will be discussed. When quadrant zero is addressed by the user (as indicated by 0S15 and 0S16) and a control store write cycle is occurring, gate 32 is activated, enabling gate 33. This is then synchronized to the control memory by flip/flop 34 and sent to the control memory using inverter 35 as a line driver. If the operator has acti- vated the read/write switch on the front panel, gate 33 is also enabled. Activating any of the read/write switches enables gate 40, whose out- put is synchronized by flip/flop 41 and is used to select the switch register as the source for the control store input data. Flip/flop 29 senses the posi- tion of the address select switch, and forces the use of the address contained in the panel switches by enabling inverters 30 and 31 to select the switches as the address source and disabling the normal selection logic of gates 23 and 26. It The convention used for signal names is to have the name followed by a hy- phen and then either a one or a zero. A one indicates that the signal is asserted true when the level of the line is high, while a signal name followed by a zero is true when the level of the line is low. 32 .,7404 kI* 04 k 7404 kI 404 k? 404 k7 404 k. 7404 7404 -0- K-0 1 — HS> =Sh- p>o Figure 6. MPMCU Control Section 33 When a loader command is decoded and the system clock pulse occurs, the Q output of flip/flop 17 goes low; the complemented output is fed to flip/ flop 22 to indicate a load request. Flip/flop 22 is cleared by a normal system reset received through inverter 21, and is clocked by a signal generated by gates 19 and 20 based on CYCLE-0 and MCLOCK-0. This signal is normally iden- tical to SCLOCK, but is generated here since SCLOCK is blocked on the OMC board during a control store write cycle by LMODE-0, and a clock is necessary to re- turn the MPMCU to its normal state at the end of the write cycle. Flip/flop 22 is clocked by the generated clock term, sampling the output of flip/flop 17. If this is set for a load operation, the outputs of 22 reset 17 (cancelling the load request), switch the address lines from the incrementer (gates 26 through 28, SINCR) to the processor address bus (gates 23 through 25, SOS) if enabled by flip/flop 29, and enables gates 32, 36, 46, 50, and 56 to generate the proper select and write commands to the control store during the write cycle. At the next generated clock term, 17 is again sampled and, since it was previously cleared, the write cycle ends and the MPMCU re- turns to its normal state. 34 BIBLIOGRAPHY This annotated bibliography covers material helpful in understanding this report; it is not a complete bibliography on the subject of micro- programming. For the person interested in a more complete bibliography, these are published occasionally in the ACM SIGMICRO Newsletter. (Volume 3, Issue 2, July 1972 has one covering 1951 through 1972.) Burroughs Corporation, "Microprogramming Manual for Interpreter Based Systems," Defense, Space and Special Systems Group Technical Report TR 70-8, Paoli, Pennsylvania, November 1970. Describes the architecture and programming of the original D-Machine, including the use of the TRANSLANG assembler. It also gives examples of programming tricks for the machine. Burroughs Corporation, "Burroughs D-Machine User's Manual," Defense, Space and Special Systems Group, Paoli, Pennsylvania, April 1971. Describes the programming of the D-Machine as delivered to the Uni- versity of Illinois, including a description of the PSI and the various device and memory commands. This report, and the one men- tioned above, are not generally available, since they contain pro- prietary information. Cogar Corporation, "Cogar Memory Manual (Preliminary) Cogar Read/Write Memory Product P/N 5410109," Wappinger Falls, New York, April 1971. Difficult to obtain, since the company is no longer in business, this report describes the timing and control signals for the memory, and the various data and address lines. R. L. Davis and S. Zucker, "Structure of a Multiprocessor Using Microprogram- mable Building Blocks," ACM SIGMICRO Newsletter, Vol. 2, No. 3, pp. 27-42, October 1971. Includes a description of the programming of the D-Machine processor and of the two level control store concept. However, the input/ output structure of the machine described differs from the machine described in this report. S. S. Husson, Microprogramming: Principles and Practices , Prentice-Hall, Inc., 1970. A good introduction to the area of microprogramming. Chapter 4 covers writable control storage, presenting some of the advantages and prob- lems . 35 E. J. Polley, Jr., "An Assembler for Efficient File Manipulation," (M.S. Thesis) University of Illinois at Urbana-Champaign, Department of Computer Science Report No. 534, August 1972. Describes the assembler used to produce the code emulated by the file processing system. It also includes a description of the cards processed by the loader program and the input/output operations available on the file processing emulator. E. W. Reigel, U. Faber, and D. A. Fisher, "The Interpreter—A Microprogram- mable Building Block System," Proceedings of the 1972 Spring Joint Computer Conference , pp. 705-723. This is possibly the most accessible description of the D-Machine's architecture and programming, although the input/output structure differs from the University of Illinois machine. H. Yamada, "Emulation of Disc File Processor," (M.S. Thesis) University of Illinois at Urbana-Champaign, Department of Computer Science Report No. 436, June 1971. Describes the commands used by the file processing emulator. iographic data ;t 1. Report No. UIUCDCS-R-74-635 ? and Subtitle PROGRAMMABLE LOADABLE CONTROL STORE FOR THE BURROUGHS D-MACHINE thor(s) ee Allen Hollaar forming Organization Name and Address niversity of Illinois at Urbana-Champaign spartment of Computer Science rbana, Illinois 61801 >onsoring Organization Name and Address ational Science Foundation ashington, D. C. jpplcmentary Notes bstracts 3. Recipient's Accession No. 5. Report Date Mav 1974 8. Performing Organization Rept. No. 10. Project/Task/Work Unit No. 11. Contract /Grant No. US NSF GJ 36936 13. Type of Report & Period Covered Master's Thesis 14. The modifications made to the Burroughs D-Machine to allow the loading of le control store under microprogram control are described, as is the programming squired to perform these operations. Since the modifications allow the programmer ) write data into the control store as if it were the normal system memory, all ie processing power of the system is available to accommodate various input devices id formats. It is assumed that the reader is already familiar with the architecture id programming of the D-Machine. ey Words and Document Analysis. 17a. Descriptors icroprogramming mtrol Storage Organization 'ocessor Organization irroughs D-Machine l i table Control Store Identifiers /Open-Ended Terms 1 1 Field/Group 'ailability Statement LEASE UNLIMITED 19. Security Class (This Report ) UNCLASSIFIED 21. No. of Pages 39 20. Security C lass (This Page UNCLASSIFIED 22. I'm. e Mil I 1 ' I / 1 M USCOMM-DC 41329-P J ,v v^ -J CJ1 Jtt* .\«l*