.h Emulator Performance Analysis 71 5. CONCLUSIONS 78 LIST OF REFERENCES 79 APPENDIX A. THEORETICAL PERFORMANCE SIMULATION PROGRAM 80 B. LOGICAL SYSTEM SIMULATION PROGRAM 83 C . IBM SYSTEM/360 EMULATOR MICROPROGRAM 97 VI LIST OF FIGURES Figure Page 2.1 Microinstruction Format ^l- 2.2 Simplified Block Diagram of VK-1 k 2.3 Bus Cycle Timing Diagram 8 2.k Bus Controller Flow Diagram 9 2.5 Cycle -Initiate Controller for Control Store Module i « 11 2 . 6 Instruction Stream Generator 13 2.7 Instruction Strobe Controller Ik 2.8 A Typical Sequence of Activity in the Sequence Controller 16 2.9 Immediate Data Logic 18 2.10 Source Bus Controller 20 2.11 Destination Bus Controller 21 2.12 Arithmetic and Logic Unit 23 2.13a Shifter Unit Block Diagram 28 2.13b Shifter Unit Control Format 29 2 . 1^-a Example Code for Branch 32 2.lVb Instruction Queue of Sequence Controller During Execution of Branch with Post-instructions 33 2 . 15 Branch Control Unit 35 2 . l6 BCU C ommand Format 35 2 . 17 Auto-increment Scratch Memory Unit 39 2 . 18 Scratch Memory Controller Algorithms J+l VI 1 Figure Page 2.19 Multiple Bus Control Unit Configuration k-5 2.20a Bus Arbitration Controller k6 2 . 20b Bus Devi ce Interface Controller k6 2.20c Instruction Strobe Controller k'J 2.20d Bus Master Flag Controller I4.7 3.1 Model for Theoretical Performance Simulation.. 51 3.2 Execution Time vs. Number of Busses 53 3.3 Execution Time vs. Taccess ^h k.l Bus Assignment for Example 58 4.2 Overlap of Bus Cycle for Example 58 k . 3 Scratch Memory As signment 62 k.k System/360 Emulator Flow Chart 65 4.5 Flow of Control Through Emulator 70 VI 11 LIST OF TABLES Table Page 2 . 1 Bus Signatures 6 2 . 2 Control Store Module Control Signals 10 2.3 ALU Operation Set 25 2.k ALU Condition Flags 26 h . 1 Bus Register Utilization 63 k.2 Device Mnemonics used in Emulator 69 k«3 360 Emulator Bus Assignment 72 1. INTRODUCTION This thesis presents and evaluates the structure of a microprogram- controlled digital computer called VK-1. In this processor, highly encoded microinstructions are executed by a set of functional modules. Each functional module is control autonomous, decentralizing the processor control function. By using a number of asynchronous "busses to interconnect the functional modules, concurrency in the control activity is achieved. This concurrency is manifested in the overlapping of microinstruction decode and the setup of intermodule data transfers. A functional module may process a microinstruction independent of subsequent bus activity, thus allowing maximum module utilization. This thesis is organized into three chapters which present and analyze VK-1. Chapter 2 introduces the structural concepts and system organization of VK-1, and gives detailed descriptions of the logical specifications for all system components. Chapter 3, presents and discusses system simulation studies. Chapter '+ presents details on microprogramming VK-1, and develops a large application program, namely the emulation of the IBM System/3o0. Chapter 5 gives conclusions of the research. Appendix material related to chapters 3 and k is included. 2. SYSTEM PLANNING AND ORGANIZATION The design of the VK-1 microprocessor organization was motivated by the need for a system with certain inherent qualities which appear to be lacking in commercially available hardware. This chapter describes these design goals, and presents the functional description of the proposed hardware . 2.1 Design Objectives, and Design Overview Microprogrammed processors are finding applications in a wide variety of environments. These range in sophistication from intelligent device controllers to central processing units. When the system designer begins to consider a system comprised of many hardware resources, the system takes the appearance of a centralized network of computing elements. The problems associated with interfacing such a collection of hardware become significant to the point of overshadowing many other design considerations. It is felt by the author that it would therefore be very desirable to have all or a significant number of the network elements based on a single structural concept. In order for this to be feasible as well as desirable, logically and economically, this basic structure must provide considerable flexibility in cost and performance, and must provide efficient data manipulations in a wide spectrum of application areas. These ideas serve as the basic design objectives for the development of the processor, called VK-1, described in this report. As a result of the above considerations a highly modular system organization seems essential. A standard interface for all devices allows the system designer to select various functional modules for a particular application, and at the same time allows a system configuration to be modified later as required. The speed objective dictates a hardware organization capable of performing concurrent processing. The above discussion also implies that the system be designed to allow user microprogramming. The motivations for user microprogrammed computers are discussed in references [3]? !~7]> and [10]. In conventional systems, the usual case is that the microprogramming complexity increases as concurrency increases, thus fast systems with much operation overlap, etc. are generally difficult to program. In order to avoid this conflict and thus produce a fast, easily microprogrammed machine, the VK-1 organization decentralizes control functions. That is, the functional modules mentioned above possess a degree of autonomous control. At this point one can raise the academic question of whether or not the organization is microprogram controlled in the classical sense. The author's opinion is that it is not and the author refers the reader to a thought- provoking argument in reference 1. The industry has perpetrated misuse of the term "microprogrammed control," and so the author will refer to VK-1 as a microprogrammed machine. As a result of the above considerations, the basic approach used in the design of VK-1 is to connect via a bus structure a collection of asynchronously operating functional modules. Concurrency of operations is allowed by having a number of autonomous busses which can operate simultaneously. The bus structure is used for the communication of both control information and data. The control information is fed to the bus structure from the control store. This bus and control structure results in a vertically microprogrammed system, with all microinstructions taking a very simple format illustrated in figure 2.1. Bus Address Source Address Destination Address Microinstruction Word Figure 2.1 Microinstruction Format The overall flow of information in the system is quite simple to conceptual- ize, when one bears in mind the microinstruction format. Figure 2.2 illustrates this flow. Control Store Source and Destination > Multiplexor Bus Controller Bus \ * -x Device 1 Device 2 . Bus Address 4^ Figure 2.2 Simplified Block Diagram of VK-1 Bus Controller Bus Device 1 .... 5 The basic approach described above affords much flexibility in system design, since the number and types of functional modules (bus devices' are variable. The number of busses may be varied from a minimum of one, and control store speed is also variable. Thus the architecture is suitable for a broad spectrum of computing power, and allows for considerable economy on the low end of the performance line. The above discussion has attempted to give the reader a feeling for the design philosophy used in creating VK-1, and to present a brief introduction to the proposed organization. The remainder of this section gives detailed descriptions of the VK-1 hardware and principles of operation. This discussion is based on a high performance configuration. Since systems of lower capability are subsets of this hardware, this approach will enable the reader to understand the entire range of operating performance obtainable through this structure. 2.2 Multibus Organization and Bus Operations Concurrency in VK-1 is obtained by utilizing several independent busses to perform a sequence of instructions. The result is similar to a pipelined execution unit in that the instructions are not performed completely in parallel but rather are initiated sequentially with overlapped execution. The number of busses used is variable depending on the con- figuration. A bus is comprised of control lines and data lines, and is capable of performing device to device transfers over the parallel data lines. Table 2.1 lists the data and control lines comprising the bus. Signature No. of lines X<00-15> 16 D<0-8> 9 S<0-5> 6 REQ 1 ACKS 1 ACRR 1 XEN 1 BCC 1 Function data transfer lines destination address lines source address lines cycle request acknowledge set acknowledge reset transfer enable (previous instruction complete) bus cycle complete Table 2.1 Bus Signatures Each bus has at least one bus controller associated with it which is capable of initiating a data transfer over the bus. Multiple bus controllers are present in multiprocessor systems and when sophisticated autonomous bus devices are used. These will be discussed later. Bus transfers are fully asynchronous and interlocked. All bus devices require logic for address recognition and handshake protocol. A bus cycle is comprised of the following events. First, of course, Source and Destination fields of the next microinstruction are multiplexed (via the bus address field) to the appropriate bus controller. The bus controller accepts this information and signals the multiplexor as soon as it recognizes that the bus is available. The bus controller accepts the source/destination field, and asserts them on the S and D lines. The source device recognizes its address and as soon as it's ready, asserts the data to be transferred onto the X lines, and sets the ACK flip flop thru ACKS. The destination device recognizes its address, waits if necessary, until it is ready, waits if necessary until ACK is set and XEN is true, and then loads the data on the X lines into its internal register. As it does this, it signals the bus controller and source that the transfer has completed by resetting the acknowledge flip-flop. As was described, several conditions can cause a bus cycle to be suspended pending the completion of some other operation. These conditions will be further described. The timing and flow diagrams of figures 2.3 and 2,k- should clarify this operation. The number of bus devices is variable within the limits of the address space of the source address and destination address fields. The size of the microinstruction is also variable so that for a given system, the maximum number of desired source and destination devices will first be determined, then the microinstruction length fixed accordingly. On the order of ten devices per bus seems quite adequate for systems considered in this research. Bus devices may be assigned to busses in a variety of ways. A device which may function as both a source and a destination (e.g., a bus register) may occupy a source address on one bus and a destination address on a different bus. Also a device may occupy source addresses or destination addresses on two or more busses. The fact that a device may suspend a bus cycle if it is "not ready" allows for this type of bus assignment. Of course the complexity of the device controller increases when multiple address assignments are made. 2.3 Instruction Stream Generation and Distribution As indicated in previous discussion, a stream of microinstruction source/destination fields are multiplexed to the various bus controllers 8 J *u s H O •H TJ o -9 *p> • -H -P bo ^ o • o3 >i 1 o § O Kl -d £ <+h «H X o co •H O «H O -P (1) H H d ?H *h s O 0) co -H .£3 -P 0) -P S !h CO -P •H Jh •H CD 0) a3 W £ ft ? •H -P O s 2 o co bO •H O •P !m co .5 3 ■P o CO ?H o 3 cO w 0) ,Q o O £ W -H CO CO -d s 0) s — P d W o «I ,£> -d -p ?H co CO o5 co «—S a •H O ?H <& «H co £ o *h p> X CO g Ph -P I H MH O IB CO -P £h co ■P C co co pj P 1 M.Q •H CO fxj ti g o o •H 1h -P g -P cO •H co TJ -P o3 ■d co fl -d •H H cO P CO o CO £ H ?H d £ -P o o S -H p •H £ fl — Js ■H M o o bcc«<-o Figure 2.k Bus Controller Flow Diagram 10 via the microinstruction bus address fields. In the discussion of bus cycles, it was pointed out that a bus cycle could not be completed until the previous bus cycle was completed. This is required to assure sequential execution of the microprogram. It was also mentioned that a bus controller cannot accept a source/destination word when that bus is in the process of executing a bus cycle. For maximum microinstruction throughput, we would like to initiate each bus cycle at the earliest possible moment. These ideas form the basic constraints on the definition of the control store and multiplexor subsystem illustrated in figure 2.2. In order to match control store bandwidth to execution rate, a very fast memory system is required. To achieve this bandwidth, the control store is comprised of four interleaved memory modules. Each module has its own address register and controller. Output data from the memory modules is not buffered. Four control signals are associated with each memory module, and are defined in table 2.2. Signature Definition, Description M/Wi module available - module i may be accessed DAVi data available - module i data output is valid Cli cycle initiate - begin memory cycle in module i Ali address increment - increment address register i Table 2o2 Control Store Module Control Signals n Control store modules are accessed according to the following procedure. The two least significant address bits of the control store address are decoded to select one of the four modules. If that module is available (MAVi = l) then a cycle is initiated in that module (Cli is pulsed). This control procedure is illustrated by the flowchart of figure 2.5* (Note in figure 2.5 and other flowcharts which follow that some control signals are levels and some are pulses, as indicated. ) MAVi ^ DAVi -4 MEI ^ n cii ^ — n Figure 2.5 Cycle -Initiate Controller for Control Store Module i 12 The initiation of a memory cycle in a particular module also starts a timer (one-shot) which causes data available (DAVi) for that module to become true after a time delay equal to the access time of the memory module. This operation assures that the control store modules will be accessed as soon as possible, maximizing microprogram look-ahead. The module available signals (MAVi) which are reset when module i is initiated are set by the bus controllers which receive the microinstructions. Thus these controllers are interlocked to the cycle initiate controllers, so that a memory access cannot be initiated in a module before the current output data (micro- instruction) has been received by a bus controller. Parallel data paths connect the output data lines of the control store modules to the microinstruction multiplexor. Figure 2.6 illustrates the configuration under discussion. A modulo four counter, the active module counter, is used to indicate to the multiplexor which of the control store module output lines contain the next microinstruction to be transferred to a bus. When data available (DAVi, where i = [AM] ) for the active module becomes true, then the output lines of that memory module are switched onto the I bus and the microinstruction bus address field is decoded by the bus controllers. The controller of the addressed bus then waits if necessary for bus-cycle-complete (BCC) to be true, then strobes the source and destination fields of the microinstructions into the bus instruction register. When this instruction strobe occurs, the active module counter is incremented so that it now points to the next (modulo k) module. Also module available (MAVi) for the module just unloaded is set, and the address register is incremented. This instruction strobe initiates the bus cycle corresponding to this microinstruction as explained earlier. A parallel w CD H 13 ■p H o S -p ti o V s *h O P D2 < o p ctf £h (U CD CD P CQ fl O •H P O El U P W H cvj O) (3D ■H 11+ activity is taking place in the sequence controller and this will be described shortly. The actions of the instruction strobe controller (l per bus) are summarized in figure 2.7 • ( Reset J 1 ik_ BCC < MAVi < 1 Ali i -TL AMI i _n_ IS < _TL Bus address recognized, and data available. (IAV is derived from DAVi) Bus cycle complete. Start bus cycle, set module available for module just read, increment address of module just read, Increment active module counter. Figure 2.7 Instruction Strobe Controller 15 The reader should be able to understand the basic functions of the VK-1 instruction sequencing mechanism at this point. Note that if a bus cycle becomes suspended by a bus device interlock, as many as 3 subsequent instructions can be started (one on each of the other three busses), and memory accesses for the next k instructions beyond the h already strobed into bus control registers can be started and completed. Thus lookahead can proceed to up to seven, microinstructions beyond the instruction which is interlocked. In cases where more than one instruction has been strobed into its bus controller (which is the normal situation) it is necessary to remember the order in which these instructions were strobed if sequential execution is to be maintained. This function is handled by the sequence controller shown in figure 2.6. This mechanism consists of a queue structure. Whenever an instruction is strobed by a bus controller, the bus address bits for that instruction are loaded into the rear of the queue. Whenever a bus cycle is completed, the bus address at the front of the queue is discarded and the front of the queue becomes the bus address of the instruction strobed just after the instruction which has just completed. In order for a bus cycle to complete, the address of that bus must be at the front of the queue. Thus the sequence controller generates the four transfer enable (XENO-3) signals, one for each bus. The XEN signal which is true at any time corresponds to bus whose address is at the front of the queue. Figure 2.8 illustrates a valid sequence of states for the instruction queue . Obviously the queue can hold a maximum number of bus addresses equal to the number of busses (h in the machine being discussed). Queue overflow can not occur because an instruction cannot be strobed by a bus controller 16 Rear Front Rear Front Rear Front Rear Front Rear Front Empty ] XENO = XEN1 = XEN2 = 1 XEN3 = I BCC2 -> 1 Bus 2 instruction completed Empty Empty XEW3 = 1 all other = \k Empty V Empty Empty Empty Empty IS2 — > 1 BCC3 Empty BCC1 Bus 3 instruction still executing Instruction loaded onto bus 2 XEN3 - 1 All others = Bus 3 instruction completes XEN1 = 1 All others = Bus 1 instruction completes XM2 = 1 All others = Figure 2.8 A Typical Sequence of Activity in the Sequence Controller 17 if that bus is busy, and if the bus is free there is at least one space in the queue. The hardware required to implement the sequence controller is straight forward consisting of two k bit shift registers (one for each bit of bus address) and a decoder which activates the appropriate XEN line. The logic which loads the queue is driven by the k instruction strobe (IS) signals, and the logic which unloads the queue is driven by the four bus cycle complete (BCC) signals. From the standpoint of efficiency of microprogramming, it is essential to be able to obtain constants from control memory in a literal fashion. That is by providing a double word microinstruction format where the first word is the usual bus address, source address, destination address format, and the second word is the data to be used as the source data. In this case, the source field of the first word indicates that the data source is immediate (i.e., from control store). To implement this feature, each bus is provided with an immediate data register (IDR) which may be specified as the source. This register will always be loaded with the next word accessed from control store when it is specified as the source. The source address corresponding to IDR must therefore be recognized by the bus controller. When this happens a flag is set which loads the next word appearing on the I-bus into the IDR. The data available (DAVi) signals coming from the memory modules are changed to instruction available (lAV) for a normal instruction and to data available (DAV) when the word being multiplexed is a data word. Figure 2.9 depicts the logic involved in this hardware . This section has described logically the internal operation of the instruction generation segment of the VK-1 hardware. As was mentioned at the beginning of this discussion, a high performance model is the topic of 18 A w CD a •H H 3 •H -P °i -P W q •H O U O A O (D H ft •H A I A A p H OJ KN > > > > < <£ H a; H d) H (D H a; H ■§ H •H i 03 -P o3 -d ■8 H •H a3 t> o3 S •H +3 ■P W el •H I I I 1 : ft 3 s o •H hO O o3 -P o3 P 0) •p o3 € ON 0J 0) •H 19 discussion. From the above description it should be evident that a single control store module could be used if less throughput were allowable. Also the modifications required to facilitate more or fewer than h busses is evident. Further discussion of some of the hardware described above will be required when bus devices are discussed. The most notable of these is involved with branching which alters the control store effective address and the contents of the completion queue. 2.k Bus Devices and Instruction Types The previous discussion described the mechanism which produces source-destination type instructions, and loads them into the addressed bus controller. Section 2.2 defined bus operation. This section will discuss the nature of the bus devices which respond to the source and destination addresses propogated on the bus. One of the basic design con- cepts of the VK-1 is that bus device assignment is variable in configuration and device type, and assignment is done to optimize processor efficiency for a particular application. This section will describe the design of several bus devices. The designs chosen for inclusion in this report are those which will typically be required regardless of application area. Also described are those devices which were selected for the IBM System/360 emulator described in chapter k. Before proceeding with the descriptions of specific devices the bus interface controllers for source devices and destinaton devices will be described. The requirements for this logic are apparent from the bus description previously presented. Figures 2.10 and 2.11 show the flow diagrams for these controllers. 20 DEN <— 1 ACKS <— _TL ADDEN «— DEN f— Wait for REQ (request) from bus controller Examine source address lines Address recognized? Yes Check for local interlocks* Enable data onto X lines and acknowledge Wait for destination to clear acknowledge Disable data and decode *Check for local interlocks (These depend on device type) Figure 2ol0 Source Bus Controller 21 DCK f- S\ ACKR <- „n_ i Wait for REQ (request) from bus controller. Examine destination address lines, Addressed recognized? Yes. Wait for ACK (X lines valid) XEN (from seq cont) LILK (clear of local interlock ) Load data from X bus and clear ACK (acknowledge). Figure 2.11 Destination Bus Controller 22 It is apparent that the microinstruction repertoire is determined by the set of bus devices which are selected for a particular system. Many of the devices are actually functional units which may perform a number of operations. Obviously such devices require control information. Examples are read-write memory and arithmetic units. Such command information is imbedded in the source or destination address fields. Thus one may consider each function to possess a bus address giving some physical devices several addresses. The author has found it more convenient to consider the address field to possess control fields as well as device addresses. This approach will be utilized in discussing various bus devices. 2.4.1 Arithmetic and Logical Unit The design specifications chosen for the arithmetic and logical unit are based on the goals of speed and versatility. Most emulation applications require a significant amount of bit manipulations so that powerful logical instructions are of considerable value. Implementation of a powerful set of logical operations Is quite feasible with currently available MSI devices. Provisions for convenient multiple precision arithmetic operations should also be made, allowing efficient manipulations of a variety of target data formats. Also for convenience and efficiency, it is worthwhile to consider monitoring a number of arithmetic and logical conditions. Evaluation of these considerations has led to design with the structure shown in figure 2.12. In figure 2.12 notice that the inputs and outputs may reside on different busses, and in general will so that loading of the 2 operands may be done concurrently. Also the function command information for the ALU is 23 O K! X - — ^ H -P O rH ^H ri •P w S Q CD o h3 fn o < ^—- — e- -> -p •H • -P IS ! o i H V -p •H iH D o •H O h3 o3 O •H -P CD -P •H OJ H OJ w •H 2k provided when the ALL is loaded. Loading of the ALL causes the ALU to begin its cycle, and thus the ALR must contain the proper argument before that time. During an ALU cycle, the inputs and outputs will become internally interlocked so that if for example the ALO is addressed before the ALU cycle has completed, the source will interlock, i.e., not complete that bus cycle until the ALU has completed and the ALO contains valid data. Also addressing an operand input will interlock in the same way if the previous ALU cycle is still in progress. Table 2.3 lists the various operations which could be considered. In applications which do not require such an extensive set it is desirable to decrease the number of allowable operations thus saving destination address space and reducing hardware costs. The set illustrated is probably a super-set of those operations required for most applications. The ALU automatically generates and saves condition flags during each ALU operation. Table 2.k describes these flags. Condition flags are saved in the branch status register for examination by the branch control unit to be discussed later in this chapter. Multiple -precis ion operations are facilitated by the ADDC and SUBC instructions which utilize the previous carry as input. These instructions also preserve valid condition codes throughout a multiple -precision add or subtract since they can only clear the Z bit. 2.^.2 Shift Unit Shift operations are rather frequent in emulation because a variety of word formats must be decoded, and manipulation of arbitrary fields must be done. Thus a reasonably powerful shifter is of considerable value. Since many shift operations are preceded by ALU operations such as mask types, the shifter should have rapid access to the ALU result. It is also 25 Hexadecimal Function Code Mnemonic 00 CLR 01 SET 02 TRL 03 TKR oi+ NOTL 05 NOTR 06 AND 07 OR 08 NAND 09 NOR 0A XOR OB MASKR OC MASKL OD INSR OE INSL OF EQU 10 NEG 12 ADD1 13 ADD2 11+ ADDC 15 SUB2 16 SUBC IT DEC 18 INCL 19 INCR clear ALO to all zeros ALO *. 0...0 set ALO to all ones ALO «- 11... 1 transmit left ALO <- ALL transmit right ALO *- ALR NOT left ALO «- ALL NOT right ALO «- ALR left AND right ALO «- ALL -ALR left OR right ALO *- ALL+ALR left NAND right ALO left NOR right ALO left exclusive OR right ALO mask right ALO mask left ALO insert right ALO insert left ALO left EQU right ALO negate left, 2's complement add, l's complement add, 2 ' s complement add, with previous carry subtract, 2's complement subtract, l's complement with previous carry decrement left increment left increment right ALL -ALR ALL+ALR ALL3ALR ALL* ALR ALL-ALR ALL+ALR ALL+ALR ALL'ALR+ALL.ALR Table 2.3 ALU Operation Set 26 Mnemonic Description OV Overflow. Set by arithmetic operations when the result overflows the l6 bit result register. OV is set if LrJ + L R and cleared otherwise (L , R , represent the most significant (sign) bit of the ALL, ALR, ALO registers respectively. ) C Carry. Set by arithmetic operations. C is set if a carry out of the most significant bit occurs and is cleared otherwise. Z Zero. Set by logical and arithmetic operations. Z is set if all bits of the ALO are zero, and cleared otherwise. ADDC and SUBC can only clear the Z bit. N Negative. Set by arithmetic and logical operations. Set if the most significant (sign) bit of the ALO is one, and cleared otherwise. Table 2.4 ALU Condition Flags 27 desirable to allow the shifter rapid access to its previous result so that multiple shift operations are conveniently executed. These objectives may be accomplished by allowing the shift unit (SHR) to accept a source from the X lines as usual, and by allowing it to also accept a "pseudo" source internally from itself or the ALO. When a pseudo source is specified, the SHR responds to both source and destination lines and indicates to the bus controller that a bus cycle has been completed (by setting and resetting the ACK flip-flop). This arrangement gives the SHR rapid access to the ALO and SHR even if the SHR input is on a different bus than the ALO and the SHR output. Figure 2.13a illustrates the SHR. The shift control command is contained in the destination field. This command format is illustrated in figure 2.13b. With the commands shown it is possible to execute an arbitrary shift in two microinstructions. 2.4.3 Branch Control Unit The branch control unit (BCU) is the mechanism which allows conditional alteration of the sequential execution of microinstructions. This unit must possess considerable capability because of the look-ahead which is normally taking place in a given instruction stream. Since the instruction stream generator is pipelined, it is desirable to accomplish branching with minimum loss of speed-up in the pipeline. That is, it is desirable to execute a branch without having to flush, then refill the pipe. The BCU to be described has this capability when the microcode will allow branch look-ahead. It is also desirable to have the capability to conditionally branch on a variety of conditions, and to obtain the branch address from a variety of possible sources. These design goals motivate the specifications for the BCU. ALO 28 — * — )k Destination Decoder and SHR Controller C ■• S - D X 7K 7^~^~^ 1 SHR output reg. X jk. Source Decoder and Control jL Figure 2.13a Shifter Unit Block Diagram 29 Destination Address J L SHR control field input control 00 01 10 11 O's in l's in spill in* end around length control 00 1 place 01 2 places 10 k places 11 8 places r direction control right 1 left *spill is unconditionally set to the last "bit shifted out Figure 2.13b Shifter Unit Control Format 30 Branching is accomplished by accessing the BCU as a destination. The source specified contains the branch address which is loaded into the microinstruction counter if the branch is successful. If the branch is unsuccessful, execution proceeds without altering the microinstruction counter. When a successful branch occurs, the source word is loaded into the program counter, the 2 least significant bits are used to initialize the module enable register and the active module counter, and the remaining bits (all except the 2 lsb's) are loaded into the four address registers. (Refer to figure 2.6.) Before this action can occur, however, the sequence controller must be examined, and properly initialized. The term "branch look-ahead" as used in this discussion refers to instances when a branch instruction is preceded by instructions which do not affect the branch condition or the branch address. In this situation VK-1 can effect very efficient branching. These instructions, called post-instructions, are stored in control store at sequential addresses following the branch instructions. The branch instruction contains a field designating the number of post-instructions which follow the branch. Any number of post-instructions up to N-l, where N is the number of busses may be specified, However, the post-instructions must specify distinct busses, and may not specify the bus on which the branch is being executed. The BCU waits until all post instructions are loaded into bus control registers before strobing the branch address into the microinstruction counter. The effect of this is to allow the access of the instruction at the branch address to be taking place while the post-instructions are executing. In the case where zero post-instructions are specified, the flushing and refilling of the pipe must take place. The queue contained in the sequence controller as described in section 2.3 facilitates this activity. 31 An example will clarify this discussion. This example illustrates the execution of a branch with 3 post-instructions (on a four bus machine). The sequence of events to be described begins when the branch instruction reaches the front of the instruction queue. Figure 2.1^a gives a section of code. Figure 2.1^+b shows the sequence of states which the instruction queue experiences during the execution of this code. Recall that the bus whose address is at the front of the queue is enabled to complete, a bus address is entered into the rear of the queue when that bus control register is loaded with a microinstruction, and the queue is "popped" when an instruction completes. Note in figure 2.lU that no delay was experienced because of the branch. Such coding is therefore optimal. Code such as this can often be generated if the programmer is aware of the advantages. Since the post-instructions are issued, but not necessarily completed before the instruction counter is altered, they must not alter the branch condition or branch address. The post-instructions are completed before the branch becomes effective, however, since they are ahead of the instruction from the branch address in the instruction queue. Notice that all that is required to implement this plan is logic which detects the number of bus addresses in the queue (or equivalently the position of the "rear" queue pointer). The following situation must also be allowed for. If, when the branch instruction reaches the front of the instruction queue, the queue length exceeds M+l (M is the number of post-instructions), then too many instructions have been issued (assuming the branch is successful). In this case, the queue must be initialized to the length which includes only M+l elements. Bus addresses in the queue which reside behind the M+l position represent instructions which have been issued but which should not be executed. Bus resets must be issued on these busses to abort execution of 32 < o •H P a a •H P W H & a) « fOi O OJ H O X a5 OJ W W CD 1 Post-instruction, bus 2 has completed and the instruction at the branch address is at the front of the queue. The next instruction has also been issued. Figure 2.1^b Instruction Queue of Sequence Controller During Execution of Branch with Post-instructions 3h these bad instructions. This is accomplished by reseting the acknowledge flip-flop in appropriate bus controllers. (The destination involved is aware of the abort because the acknowledge has been reset by someone else while he is waiting for transfer enable (XEN. ).) Notice also that in the described design, the source of the branch address may be any valid source device. This allows extreme flexibility in programming. Examples are convenient branch table decoding and subroutine linkage. Figure 2.15 diagrams the BCU. The portions of the BCU remaining to be discussed are the Branch-Status Register (BSR), and the decision logic. The purpose of the BSR is to remember various machine conditions so that they may be tested by the BCU. The condition codes generated by the ALU are stored in the BSR. The BSR may also contain flag bits which may be set and cleared by other devices. Further capability may be added to the BCU by including a branch test register (BTR) which may be a source or destination. If this is done, then BCU commands can specify testing of either the BSR or BTR. This addition allows the BCU to perform conditional branches on any bit of any word which can be transferred into the BTR. (This much capability is not normally required and is not assumed in the firmware discussed in Chapter k.) Figure 2,2.6 shows the BCU command format (which is inherent in the destination address). There does not appear to be a simple way to test for boolean combinations of bits in the BSR or BTR. 2.h.K Inter-Bus Communication An obvious problem which is inherent in the multibus structure of VK-1 is the communication of data between busses. It is not economical, in general, to configure all devices so that they are each accessible from ?5 X CO P o oi d d H -P d d o CTl efl trt ?H a3d| fl Ol ■H O -P CD 1 w ^ -X * -o a C) •H o CO •H •H bn O o ai H d (1) i -P CO •H bfl CD h < CO i d ! ■p aS i •P ' CO : Xi o X> 8 -P •H s H O Fh •P o o .d o & LA H 0J bD •H \ix O -P d ' — » co ■H co co CO S d w 0) d o rQ cu s CD d P ri £ d a3 O d co ct3 O 3 o§ H H S Fh d ,d p cd ,d Ph o CD CO p> o a ,0 PI d d a3 •H p n3 d CO CD •H a 1 36 [ j ft / v s> A^ Bit set or bit clear test, BSR or BTR (optional bit) Bit address of bit to be tested (may optionally be a 3 bit field). BCU destination address. Figure 2.16 BCU Command Format 37 all busses. As a result, it is necessary to consider ways of effecting data transfer from a source on one bus to a destination on another. A straight- forward solution to this problem is to provide a register called a Universal Communications Register (UCR) which may be loaded from a source on bus A and may then, on a subsequent microinstruction, load its content into a destination on bus B. If it is expected that such transfers will occur frequently it may be desirable to supply several such devices so that delays caused by the UCR being busy do not accumulate. However, if such transfers are frequent it is probably desirable to first reconsider the bus assignments of the various devices. Since bus devices which act as sources and destinations may be connected across two busses, it should be possible to minimize the need to do interbus transfers. Consequently strong connectivity of busses by UCR's should not be generally required. This is desirable because two bus cycles which are not highly "overlappable" are required to accomplish the transfer. 2.^.5 Scratch-Pad Memory In order for a processor to achieve high throughput, it is necessary to have frequently used data rapidly accessible. This is normally done with high-speed register type memory which is commonly referred to as scratch-pad memory. In preparing microcode it has been observed that accessing modes in addition to random access would be useful, and would save a number of microsteps. A simple example of such a case occurs when a 32 bit target machine is to be emulated on a 16 bit host. One would expect in this case, that many of the references made to scratch memory will first access one word, and then access the next sequential word. In this example, a microstep could be saved for each of this type access made if the scratch-pad could respond to an "increment/read" operation (loading of 38 the address register for the second read is eliminated). This idea could be further extended so that two microsteps could be saved on each of this type of reference. This could be accomplished by executing an increment- read which is issued when the first word is unloaded from the scratch-pad output data register. Other applications may make it desirable to have more elaborate accessing modes which provide stack, queue, reverse stack, and reverse queue action. As one would expect, there is a significant trade-off between logical simplicity and scratch-pad capability. Also as elaborate structures are considered one soon realizes that the required setup activities become significant so that frequent mode switching causes drastic reduction in throughput. It was found that for the emulator described in chapter k 9 a scratch-pad memory with autoincrement addressing was desirable. More powerful capabilities would not have been sufficiently utilized to warrant the extra cost. The scratch pad memory utilized in chapter k will be described. The scratch memory unit (SMU) has the capability of automatically incrementing the address register. Because of this, the address register need only be loaded once when a series of accesses will be performed at sequential locations. The effective speed-up produced by this arrangement approaches a factor of two as the string of accesses becomes long. This is due to the fact that only one bus cycle per memory reference is required. It is assumed that the memory speed of the SMU is high enough to allow access time to overlap bus cycles. Figure 2.17 illustrates the auto-increment SMU. The memory controller causes the address register to be incremented at the proper times, and supplies local interlock signals to the source and destination controllers 39 x n o x n o S3 fH O CD ■? •H H -P rH ctf O S3 Sh •H -P -P S3 — > ra o 0) o Ti •p 3 .S3 o p cd Sh o CQ P S3 (D o; Sh o S3 •H O | (D •H ko in addition to controlling the memory. Operation of this SMU is as follows. Any memory operation is initialized by first loading the address register (SAR). When SAR is loaded the mode, read or write, is specified by one of the destination bits. If the mode is read, then a memory cycle at the address loaded into SAR is initiated, with the data appearing in the output data register (SODR). A subsequent bus cycle using the SODR as the source retrieves the desired data from the SMU. When this is done, the SAR is incremented and another read operation is initiated. This allows a series of bus cycles specifying SODR as the source to unload sequential words from the memory. If the mode specified when SAR is loaded is write, the memory cycle is not initiated until the input data register (SIDR) is loaded (from the bus). When SIDR is loaded a memory write cycle is initiated using the new contents of the SIDR as memory input. The SAR is incremented by this operation. Thus a series of bus cycles specifying the SIDR as the destination causes sequential memory locations to be loaded. Random access of the memory is accomplished by forgetting that the auto-increment is taking place, although it must be remembered that for memory write, the SAR is loaded first, then the SIDR. The mechanism described above also allows mixing of read and write operations at sequential memory locations. Figure 2.18 gives the algorithms implemented by the memory controller. 2.1+.6 Main Memory Implementation of main memory is very straight-forward. Data available and memory available signals are generated by the memory controller and are used by the bus interface controllers (address, data in, and data out), to form the local interlock conditions. Interleaved memory modules are an 1+1 ^ODR addressed ) i- X <— SODR v SODR «- [SAR]-f ( DON! fsiDR addressed ^ SIDR I ■C X V [SAR]4 SIDR ( DONE 1 *The "+" indicates increment SAR after the memory operation. Figure 2.18 Scratch Memory Controller Algorithms 1+2 obvious extension. Main memory design for VK-1 is typical of that for other systems, and will not be further discussed. 2k In chapter h f a total storage capacity of 2 words is required. This necessitates adding an address extension register. Thus the loading of the address register requires two bus cycles. 'd.k.J External Interface External interface and communications (i/O) is one area which has not been fully investigated by the research reported here. The VK-1 bus structure allows convenient transfer of data between bus devices, and therefore allows fairly powerful types of I/O to be implemented in micro- code. External interface will be handled by a bus device which controls interlocked transfers to the "outside world." This controller could communicate either in serial, byte parallel, or word parallel fashion. Autonomous block transfer capability could be implemented by allowing the interface module to become bus master and assert source and destination addresses and execute a bus cycle. Such a capability would also require word count and extended status information to be maintained by the interface module. Hardwired microprogram interrupt capability is also a possibility. This could be done by having the interface module obtain the bus, store the current microinstruction counter and then execute a branch using its assigned interrupt vector as the source. (Macro-level interrupt is done in microcode by polling external interface module status registers at the appropriate times.) The design questions with respect to I/O are application dependent because they can, for the most part, be implemented either in hardware or microcode depending on the requirements. For this reason I/O will not It further discussed. 2. 5 Multiple Bus Controllers As mentioned previously, the major objective of the design of VK-1 is to achieve a highly flexible structure into which a wide variety of special purpose functional hardware units may be plugged. The devices described in section 2.k- are only a small number of the possible units which could be included in a system. It is natural to next consider the problem of connecting processors together. This concept evolves naturally as one considers more and more sophisticated bus devices. Recall that the discussion of I/O mentioned the possibility of block transfer capability by allowing the I/O module to act as bus master. The problem of allowing a bus device to issue bus commands is very similar to the problem of multiprocessor connection. In order for processors to be connected together in a useful way, provisions must be made for the communication of control information and data. Since both of these are transferred over the busses of a VK-1 processor, access to the busses of one processor by another processor gives that processor potentially any degree of control it wants to assert. Therefore a very natural way of connecting processors exists, namely by making one or more busses of one processor continuous with a bus or busses of another processor. As an example, if bus A of processor I is connected to bus B of processor II, and the branch control unit of processor I Is on bus A, then processor II has the power to alter the microinstruction counter of processor I, and therefore can completely monitor and control its activities. There is no need to restrict the number of processors which can control one bus to two. That is in the above example, bus C of processor III could also be a continuation of bus A, B. Since processors may posses mere than one bus, and since more than one BCU-type device could be 1* associated with a processor, the processor network configurations which are possible seem almost limitless. Further implications of such configurations are beyond the scope of this report. One aspect of this idea which will now be discussed is the problem of arbitrating bus mastership when multiple bus masters are allowed. Figure 2.19 shows a bus master arbitrator and the additional control lines which must be added to the bus structure already presented. Figure 2.20 gives the flow diagrams for the bus controllers and the bus master arbitrator. The following activities take place. A device desiring bus control asserts BRQ. A device asserting BRQ, gains control by capturing BG, a daisy-chained signal. (By adding more BG lines an arbitrary priority tree may be implemented. ) A bus device which has gained control may override the other requesting devices by asserting BGD which inhibits the arbitrator from issuing BG. A device which has gained control maintains control until HLD clears (this can occur only when BGD is not being asserted). When HLD clears, control is lost and a new request must be issued to recapture control of the bus. The arbitrator honors a BRQ, by issuing BG and setting HLD anytime a BRQ occurs and BGD is false. The issuing of BG and the setting of HLD is synchronized with the bus cycle. Each bus device must maintain a status flag which tells it if it is in control of the bus. 2.6 Implementation Considerations Detailed design and implementation has not been carried out in this research. An attempt has been made, however, to investigate the feasibility of the proposed hardware, and some observations have been made. BUS ^5 V bus master arbitrator bus control unit I "7fT~] T ] 7f ■H ?■ >._. bus control unit II T TK ± % bus control unit III T ?v BG BRQ BGD HLD BRQ bus request BG bus granted BGD bus grant disable HLD hold Figure 2.19 Multiple Bus Control Unit Configuration k6 Figure 2.20a Bus Arbitration Controller ^option, depends on situation Figure 2.20b Bus Device Interface Controller BRQ — 1 BRQ «— BGD «-l* do high priority bus operations * execute bus cycle ^7 _JS ' BCC «-_ MAV.«- 1 i AJ i Wl AMI «-.J~L is ^n BGD M \ -, then optimally written access Dus microcode could be executed with a theoretical speed-up of T ~ T (m busses) T ^ bus J = M T (lbusj /M 52 This speed-up figure is not very valuable since it is nearly impossible to ■write microcode which uses the busses in an optimally scheduled way, however it is valid as an upper bound. A more reasonable model for the micro- instruction sequence is that which randomly schedules bus use. This model should provide reasonable performance gain estimates when bus address assignment optimization is not possible. The GPSS program which creates and executes the above model with random bus addresses is shown in appendix A. Parameters for this program were swept to obtain the data illustrated in figures 3*2 and 3«3« Figure 3*2 was obtained using Taccess = 0. This was done so that the data would reflect only scheduling collisions and not control-store band width limitations. This data illustrates that speedup increases at a decreasing rate as busses (and thus cost) are increased. From this diagram ~ k appears to be a reasonable number of busses, and with this number, the theoretical speedup is: s i22 _ p 22 S T = k5" Note that by doubling the number of busses from k to 8, speedup of only 3.31 is obtained. Figure 3*3 shows the effects of control-store band width limited operation. This graph gives useful information for control-store design when the number of busses is known. 3*h- VK-1 Logical Simulator A brief discussion of the operation of this program is impossible without assuming that the reader is familiar with GPSS, and therefore 53 w •H 100^ •H -P (D s •H EH o ■H ■P O 0) X H CD a3 0) 90- 80- 70-4- 6o- 50- ifO-j- 30 20 10- Tbus = 100 time units Taccess = time units 8 Number of busses (M) Figure 3*2 Execution Time vs. Number of Busses % w -p •H 8 •H C EH •H •P O PJ fH -P CO fl H fn CD Ph •H EH § •H s cu taD 03 Jh (U A 2 Busses D h- Busses O 6 Busses O 8 Busses V 10 Busses O Points common to all curves 10 20 30 40 50 6o 70 80 90 ioo no Control Store Effective Access Time (Taccess) (Time Units) Figure 3*3 Execution Time Vs. Taccess 55 will not be attempted. The program has the capability for inputing micro- programs and loading core memory. Every control signal and logical activity is simulated, and timing parameters may be easily varied to reflect various implementation technologies. The program makes available an abundance of useful statistics such as utilization of control-store, busses, and bus devices. Queue statistics are also generated for busses and bus devices. These statistical outputs make the logical simulator a very powerful design tool allowing the designer to detect bottlenecks, "tune" the system, and optimize bus device assignments. In addition the logical simulator may be used to debug microcode and even macrocode. The simulator was very useful in the design work reported in chapter 2, and will be very useful in continued VK-1 research. A listing of the logical simulator is given in appendix B. 56 k. MICROPROGRAMMING VK-1 # The success or failure of the proposed computer structure hinges on the obtainable rate of throughput for practical problems. The microprograms which control the information processing must be organized such that they take advantage of possible concurrence in bus operations. Optimization of a firmware system therefore requires maximizing bus -cycle overlap. The development and optimization of firmware will be discussed, and a detailed example of a large microprogram will be presented. k.l Firmware Development Development of VK-1 firmware is a two phase operation. Phase 1 is the selection of hardware bus devices which will be used, and the generation of instruction streams which accomplish the desired data and control manipulations utilizing these hardware devices. This phase may, in general, take place without regard for the physical bus configuration of the selected bus devices. Phase 1 is then, in this sense, configuration independent, and can be carried out with relatively little knowledge of the actual VK-1 device configuration. Also this hardware independence means that an instruction stream developed for a certain set of hardware modules may be used as phase 2 input for many system configurations. For example phase 2 could be applied three times to the same phase 1 output to produce firmware for one, two, and four bus machines. This concept will become clear when phase 2 is discussed. 57 As an example of the phase 1 activity suppose one wished to create a procedure which will select and enter one of many subprocedures by computing a control storage address and then jumping to it. A set of the following hardware devices might be used: two bus registers (symbolically referenced as BADD and DISP), an ALU and a BCU (of the types previously discussed). Then the following microinstruction stream would accomplish this : SOURCE DESTINATION FUNCTION COMMENT BADD ALR ADD THE BASE ADDBESS DISP ALL ADD2 TO THE DISPLACEMENT, AND ALO BCU UC JUMP TO THIS EFF. ADD Phase 2 is the assignment of devices to busses, and the specification of abus address for each instruction. Phase 2 is the critical stage of the firmware development because it is this assignment procedure which deter- mines the amount of bus-cycle overlap. The problem surrounding phase 2 is that of assigning devices and bus addresses so that sequential instructions utilize different busses allowing overlap of the instruction set up. To clarify this notion, consider the simple instruction stream considered above in the discussion of phase 1. Figure k.l illustrates a hardware configuration which could perform the above function. Obviously the first two instructions can be overlapped with this conliguration as illustrated in Figure h.2. 58 bus BADD ALR bus 1 bus 2 ! DISP T BCU 1 ALL T ALO BUS ADD SOURCE DEST FUNCTION BADD ALR 1 DISP ALL ADD2 2 ALO BCU UC Figure k.l Bus Assignment for Example bus bus 1 bus 2 ALU I- -l ALR <- BADD (bars indicate utilization) | 1 ALL <- DISP ^ BCU *. ALO 1 Sum being formed 1 I 1 I 1 I 1 1 | I i » 1 2 3 Execution Time (bus cycles) Figure k.2 Overlap of Bus Cycles for Example 59 Unfortunately the problem of determining the optimal bus assignment for a given instruction stream of practical size becomes extremely complex for a general instruction mix. The maximum obtainable speed up is a factor equal to the number of busses. This theoretical maximum is virtually impossible to obtain unless the instruction stream exhibits highly regular patterns of sequences. An exact procedure for determining the optimal configuration would probably require exhaustive simulation of all reasonable possibilities. String analysis of the source-destination pairs comprising the instruction stream appears to be much too complex to be considered as an algorithmic approach to assignment optimization. k.2 Symbolic Microprogramming Language The language used to describe microinstructions is very simple and requires little explanation. This language bears considerable resemblence to the symbolic assembler language of most two address computers. The following format is used for all microinstructions: LABEL BUS ADDRESS SOURCE DEVICE DESTINATION DEVICE FUNCTION COMMENT The LABEL and COMMENT fields are optional, and the FUNCTION field may or may not be required depending on the particular instruction. The bus address, source device, and destination device fields are required for every instruction. Immediate data, that is a data word which is stored in the control store, is identified by the presence of enclosing single quotation marks. These are the only format requirements which were used in preparing microcode. Device and function mnemonics depend on the particular system and will be presented with the microprogramming example of the next sections. 6o 4.3 Emulation of the IBM System/360 on VK-1 In order to further evaluate the design concepts of VK-1 from a microprogramming viewpoint, and to determine if the device types considered are adequate it was felt that a considerable microprogramming effort should be made. Emulation, the major application area of microprogramming, was chosen. In order to have a bench mark for comparison of the hardware and firmware, it was felt that firmware for emulating a familiar system should be developed. As a result of these considerations, a partial emulator for the IBM system/360 line of general purpose computers was chosen. Emulation may be loosely defined as the hardware/software interpre- tation of the instruction set of one machine, termed the target machine, by another, referred to as the host machine. In a microprogrammable host machine, this interpretation is accomplished by the firmware, which is referred to as the emulator. Design of an emulator involves the mapping of the target machine hardware into the host machine hardware. This mapping is required at the following four levels [ 31 - (a) main storage (b) I/O (c) registers, accumulators, flip-flops, etc. (d) all other addressable resources of the target machine Obviously the more similarity that exists between target and host, the less complex is this mapping, and in general, the more efficient the emulator. In selecting the 360 as a target machine it was realized that close similarity does not exist. It was felt that this was desirable because this would allow a fair evaluation of VK-1 as a general-purpose 61 user-microprogrammed machine. Also selection of such a target would illustrate the use and configuration of various bus devices for achieving an efficient host for a specified target, which is a major design goal. k.3.1 VK-1 Hardware used in 360 Emulation The adaptability of the system hardware to the specific application was not exploited to the fullest extent possible for the emulator which was developed. The bus devices utilized are the following: 4 -port ALU 4 -port shifter 2k core memory module 2 (max) x 16 bit 6k word scratch pad memory with auto increment branch control unit 11 general purpose bus registers Additional hardware would have been desirable if the full system/360 had been emulated. For example a storage key memory module would have been essential for rapid operation. The 3^0 I/O would have also required more sophisticated hardware for efficient operation. It is interesting to note that a significant portion of the internal instruction set has been emulated with a very modest hardware configuration. The ALU is the busiest device in the firmware developed, and so it was found to be essential that the ALU be accessible to all busses. Without this feature it becomes very difficult to take advantage of possible bus cycle overlap. k.~5.2 Emulator Organization The mapping of the target machine into the host machine is illustrated in figure 4.3 and simply involves assigning the addressable storage units of the 360 to storage within VK-1. Frequently processed 62 Scratch Addresses low order half word 1 hi order half word 2 3 • • • 30 31 32 33 34 33 • • • • • 44 45 46 47 48 49 50 51 52 • • • • • 63 > } ^ 16 bits > Assignment general register general register 1 ) general register 15 floating point register floating point register 6 program status word scratch area Figure 4.3 Scratch Memory Assignment 63 information is stored in bus registers, which are accessible in one bus cycle. These bus registers and their symbolic names and functions are represented in table k.l. The mnemonics are used only for convenience in writing and understanding the emulator, and the format defined in section k.2 is adhered to throughout the microprogramming discussion. Register Bus Mnemonic Add. ICLO 3 icm 2 IRO 1 mi 1 EALO 3 EAHI 2 0P2L0 2 0P2HI 2 LKRG 0,3 CCR Tl 3 Function low order half-word of instruction counter high order byte of instruction counter and program mask low order half-word of the current instruction high order half-word of the current instruction low order half-word of the effective address to be used next high order byte of the effective address to be used next low order half-word of the second operand high order half-word of the second operand linkage register for subroutine linkage condition code register (holds internal condition codes) temporary storage register Table k.l Bus Register Utilization 61* The overall program flow of the emulator is shown in figure k.k. Even though only a subset of the system/360 was emulated, the approach taken was to organize the emulator so that the complete 3&0 could be emulated by addition of microroutines without change to what has already been developed. Since all of the "hooks" for a full emulator are present, the execution times for the instructions implemented would not be effected by the extension of the emulator to implement the full 3^0. For example, decoding capability for all instruction types has been included, even though execution routines have been written only for the implemented subset. In firmware design one is frequently confronted with a design tradeoff between execution speed and control store usage. Even though VK-1 has microaddress space for 6k-~K words of control storage, a considerable attempt was made to conserve control storage, as this is still one of the more expensive system resources. Commensurate with this objective, subroutining and shared code are used frequently. Subroutine linkage is not automatic in VK-1, so that the calling routine must store a return address in a specified memory device to facilitate return from the sub- routine. A bus register is dedicated for this purpose, and has symbolic name LKRG. The most frequently used fields of the program status word (PSW) are stored both in scratch memory and bus registers. These fields are the instruction counter, the program mask, and the condition code. The condition code field of the PSW is two bits requiring the four possibilities to be encoded. Since the condition code is infrequently required in this encoded form, its bus register form is not encoded. This allows the program to store the internal condition codes of VK-1 rather than converting {SrsTsviRescT ) 65 IFCH - EXT (UHIHIPL f MC A/ 7-f £>) | /eo *■ COD | 1 | can/* *- IC* \ J Decocr [AO vi* f I I RPO \ V"» Tli 7 ChAiN ii\,n 3d RRI [opaHI,uo~CRZ} — I Oecooc SHO <4-7> VIA BtAHCH TABLE UmmrLENicfiiTi.D 1 RXO UviMPLiMenrreo Jl If I "*" Ct>K . 1 / Drcoot IfO<4-7> VIA B'AMCH TABU BALU r ' , . :.x .- [r/«- '4ooo' j I INK (LFd (Cr)(AP.)(SK) UMMHeMFivrep Load cmp add sub Link TO "TE1T" BCR _i_ 7T3T «e~MTet> FWSU FVJ5U 1 JUMP "1 IFCH 1 A/A/A- I IC HI— ORi I CAg/fl*. IC t T s f li,UVE 4 4 5yst£m/360 E^IALHTOlt Flow Cn*irr FPOV Y (UHIMPLHHEHTED) Ffc V (uHimriSWLHru-)) SPE (UHlMPLE-ntVTEb) ( KCTUAM VI* UOl$) —*- 66 to 360 condition codes for all instructions which alter the condition code. Then whenever the explicit 360 condition code is required it is assimilated from this internal form. The main memory module is the slowest device in VK-1, thus it is important to maximize the utilization factor for main memory. This objective is met by initiating memory cycles at the earliest possible time. It is felt that the main memory utilization factor is a good measure of emulator efficiency, with main memory bandwidth limited operation as the ultimate goal. If this situation is obtained, the next step would be increase memory bandwidth by techniques such as increasing the word size accessed or inter- leaving several memory modules. To further clarify the operation of the 360 emulator, the following descriptions of the major program segments are presented: IFCH (instruction fetch) . This section first checks for external interrupts pending, and if any exist, control is transferred to EXT for handling. (External interrupt handlers were not implemented.) The next half-word (l6 bits) of the target program is loaded from the main memory data register (CDR) into IRO (instruction register 0). The memory cycle which accessed this word was initiated before IFCH was entered. The next half-word of the target program is accessed, and the instruction counter (ICHI, ICLO) is incremented. A branch table decoder causes control to transfer to one of l6 routines for second level decoding. This decoding is based on the low order bits (0-3) of the 3^0 instruction, and thus decodes both format and type specification » Four of the l6 possible branch points are invalid operation codes, and error routines for these choices are entered if they occur. Three different operation code error routines 67 are required, ( 0RER2, 0EER1+, 0PER6) and they correspond to 2, k, or 6 byte instruction formats. Second level decoders for the RRO, RR1, RXO, and RX1 instruction types were implemented. RRO (register-to-register format, type 0) . This section does a serial decoding to one of seven execution routines. The decoder checks for opcodes with high frequency of occurrence first. Execution routines for BALR and BCR instructions have been implemented. RR1 (register-to-register format, type l) . This section completes the setup required for execution of the individual instructions. This setup consists of accessing the 32 bit second operand which is held in scratch memory and storing it in bus registers 0P2L0, 0P2HI. The final decoding is then completed, transferring to individual execution routines. The LR, CR, AR, and SR instructions were implemented. RXO (register-and-indexed storage, type i 0) . This routine first obtains the second half word of the instruction (already accessed by IFCH) from main memory (CDR) and stores it in bus register IR1. A branch table is used to complete the instruction decoding. The RXO branch table differs from others used in that two instructions correspond to each operation code. The first saves the address of the execution routine for that particular instruction (e.g., 'ADD' is saved for the AH instruction), and the second word transfers control to a shared half word format setup routine (HWSU). HWSU will then exit to the appropriate execution routine. Not all instruc- tions will require the two instructions so that some unused words of control store will exist in the RXO branch table. RX1 (register-and-indexed storage, type l) . This routine obtains the second half word of the current instruction from main storage data register (CDR). A subroutine named EA2 is then executed to compute the 68 effective address of the second operand. The RX1 branch table is next entered to carry control to the final execution routines. This branch table like that of the RXO routine saves the execution routine address correspond- ing to the present instruction in the linkage register LKRG, then jumps to shared routine FWSU which does the setup required for full word operations. The remainder of the subprograms which comprise the emulator are execution routines used to complete the various instructions, or are subroutines used by execution routines. They will not be described in detail because the operation is well documented by the comments of the individual instructions in Appendix C. Much sharing of microcode is used in the execution phase of the emulator. For example LR, IE, and L all use the same execution routine, and similarly for add, subtract and compare. The symbolic microcode for the System/360 emulator is given in appendix C. Table k.2 gives the device mnemonics used in the emulator code. The function mnemonics used are the same as those used in the descriptions of the various devices and are not repeated here. Bus register mnemonics were presented in table 4.1. As a further aid to understanding the operation of the S/360 emulator, figure 4.5 depicts the interaction between and flow through the various program modules. 4.3*3 Bus Assignment for Emulator Microprogram As discussed in section 4.1, microprograms development for VK-1 is a two phase process. Section 4.3*2 has discussed the phase one development, the generation of an instruction stream which executes the desired data transformations. The second phase, that of bus assignment is the present problem to be addressed. As was pointed out above, this phase is critical 69 Description right input to arithmetic/ logic unit left input to arithmetic/logic unit output of arithmetic/logic unit output or input of shifter high order bits (8) of address for main memory low order bits (l6) of address for main memory main memory data (buffer) register address register for scratch pad memory output data register for scratch pad memory input data register for scratch pad memory branch control unit system status register * S means the device is a source D means the device is a destination S,D means the device may be either a source or destination Type * Mnemonic D ALR D AIL S ALO S SHR D CARH D CARL S,D CDR D SAR S SORD D SIDR D BCU S,D STR Table k.2 Device Mnemonics used in Emulator 70 /"BCR ) (BALR j i BAL) , ^ST) Figure ^. 5 Flow of Control Through Emulator 71 to the development of firmware capable of utilizing to advantage the computer structure under investigation. Since the optimal assignment is program dependent, the first step in approaching the assignment problem is to analyze the instruction stream to be executed. An inspection of the emulator code immediately reveals the following: First there is not a significant number of repeated sequences of source-destination pairs, second, instruc- tions utilizing the arithmetic and logic unit appear with very high relative frequency. Consequently to achieve a worthwhile amount of bus concurrency, a k port arithmetic and logic unit is essential. Since the type of firmware considered here possesses neither inherent parallelism nor repetitive sequences, further optimization of the hardware configuration is not attempted. Table k.3 contains one possible hardware configuration for execution of the System/360 emulator. This configuration was heuristically arrived at based on familiarity with the microcode. It attempts to evenly distribute the devices among the busses, and minimize interbus transfers. In this table, symbolic names for the bus registers are used to lend more understanding to this choice of device assignment. Note that in table h.3 only those devices which are required for execution of the emulator routines are included. Input/output bus devices for example have not been considered in this emulation study. h.^.h Emulator Performance Analysis The performance of the System/360 emulator may be evaluated analytically. This analysis yields instruction times for the VK-1 system, and also reveals some of the inherent problems of the structure. 72 Bus Address 1 2 3 ALL ALR ALL ALR ALO ALO ALO ALO SHR SHR SHR SHR device mnemonic STR* SAR CARH SDDR CARL UCOM BCU CDR SDR SIDR ICLO* UCOM UCOM UCOM CCR* IRO* ICHI* EALO* LKRG* IR1* EAHI* Tl* SODR 0P2HI* LKRG* LKRG* 0P2L0* ^registers Table k.~5 3^0 Emulator Bus Assignment 73 The analysis procedure used is simply drawing a hardware scheduling diagram for the microcode, using the constraints explained in chapter 2. Table h.k gives the timing parameters used in the scheduling procedures. Activity Time Required (ns.) minimum "bus cycle (no conflicts) 100 instruction multiplex time (min. time between 25 instructions ) control store access time (per module) 75 arithmetic ALU operation 100 logical ALU operation 75 shifter operation (any distance) 50 core memory access time 300 core memory cycle time 650 scratch memory access time 50 Table k.k Timing Parameters Used in Analysis Consider the program segment "IFCH." This code is shown in table U.5» The scheduling diagram of figure k.6 shows how execution of "IFCH" proceeds. It is assumed that a carry from the low order 16 bits to the high order 8 bits of the instruction counter does not occur when the instruction counter is incremented, and that no external interrupts are pending. 7* Statement No. Bus Source Destination Function 1 1 literal ALR 2 2 STR ALL logical 3 literal BCU unsuccessful k 3 ICLO UCOM 5 1 CDR IRO 6 UCOM ALL arithmetic 7 2 ICHI CARH 8 3 ICLO CARL read 9 literal BCU/l unsuccessful (l post inst) 10 3 ALO ICLO 11 l IRO SHR shift 12 3 literal ALR 13 SHR ALL logical 14 2 ALO SHR shift 15 1 literal ALR 16 SHR ALL arithmetic 17 ALO BCU unconditional 18 literal BCU unconditional Table k.5 "IFCH" Microcode 75 D- H H H oa VD K> 'I Lf\ J 21 CO I i i 3 •I € CO "I o OA CO -P CD S 0) Jh o Pi •H O o t- H v£> UA -=± NA OJ 0) S •H EH 1 i i i I i 1 o H CM ro s o Pi CD -P 3 aj ?H o -p W H O ^H -P • PI Pi O o o •H -P w crt -p N Pi •H cu r-j w •H is more easily obtained when the microcode contains regular patterns of instructions allowing bus assignment optimization. The relatively high "level" of the VK-1 microinstruction could probably be better exploited in applications such as the direct execution of high level languages. The use of multiple processor configurations also allows some interesting possibilities. The use of a VK-1 type structure as a machine language emulation engine is of dubious value, especially when cost is considered. 79 LIST OF REFERENCES [1] Author unknown, "Microprogramming: The Way of the Past?", Computer, March/April, 1972. [2] Fairchild Semiconductor, "9500 Series High Speed Logic," Fairchild Semiconductor, Inc., May, 1971. [3] Husson, Samir S., Microprogramming Principles and Practices, Prentice- Hall, Inc., 1970. [k] International Business Machines Corporation, "General Purpose Simulation System/360 User's Manual," Fifth Edition; January, 1970. [ 5] International Business Machines Corporation, "IBM System/360 Model 50 Functional Characteristics," Second Edition, 1967. [6] International Business Machines Corporation, "IBM Systems /360 Model ^0 Functional Characteristics." [7] International Business Machines Corporation, "IBM System/360 Principles of Operation," Ninth Edition; 1970. [8] Morris, Mel, "National's Tri -State Logic," National Semiconductor, Inc., May, 1971. [9] Ramamoorthy, C. V. and Tsuchiya, M., "A Study of User -Mi c reprogrammable Computers," 1970 Spring Joint Computer Conf., pp. I65-I8I. [10] Rosin, Robert F., "Contemporary Concepts of Microprogramming and Emulation," Computing Surveys, Vol. 1, No. k, December, 1969. [11] Wilkes, M. V., "The Best Way to Design an Automatic Calculating Machine, " Manchester University Computer Inagural Conference, p. l6, 1951. [12] Wilkes, M. V., "The Growth of Interest in Microprogramming- -A Literature Survey, Comp. Surveys 1, (Sept. I969), 139-IJ45. 8o APPENDIX A THEORETICAL PERFORMANCE SIMULATION PROGRAM The following GPSS program implements the theoretical performance model discussed in section 3«3« Program operation is explained in the extensive statement comments given. //JOPLIB DD OSNAME=SYSl.GPSSL IR,DISP=SHR // FXFC GPSS //GPSS.SYSIN ID * SIMULATE THIS PROGRAM SIMULATES TFF EXECUTION -FOR PANDCMLY DISTRIBUTED BLS ALDRESSES. 81 TIME GF THE VK-1 ORGANIZATION FUNCTION i PROVIDES UNIFORMLY DISTRIBUTED BUS ADDRESSES. A FACILITY AND a QUfcUE ARE PROVIDED FOR EACH BUS TO GATHER THE nES!-::0 STATISTICS. LCC1T SWITCH 'STEM* IS USED TO CONTROL t Hl CREATICN OF M I C< 01 N S TRUC T I LNS . THE NUMBER OF PUSSES ANC THF ACCESS TIME OF THE CONTROL STORE M AY BE VARIED. LNTIT SERNC HUSl BUS 2 (UJS^ PUS- BUS 5 BUS 6 BUS7 * US^ a ij s 9 BLSiO PLSQ1 RUS02 BUSQ3 BUSQ*» BUSQ5 BUSQ^ BUSQ7 BUSQH BUS0 9 PSQ10 STEN CHNGI 1 ? 5 , : / IFS ecu C QU 3QU EOU EUU EOU FOU FUU EOU EQU ecu ECU "-QU £ CO EQU EQU ECU FOU ECU EQU FOU ecu ir I FUN . t>0, UTILI?EC RY MODEL 1,H 1,F 2,F 3.F *t F K ,^ 6,F 7,F «|F 9,F 1 , F l«Q 2 f 3,0 <>,Q 5,Q 6,0 M, 9,0 10,0 1,L LSI RN1 ,D* 5ERI AL I7ATI0N CONS T ANT BLS1 FACILITY RUS2 " TI AL CTICN 2/. 75, >/!,* BUS? •i RLS4 it BUS5 •i RLS6 •i RLS7 i« RLS* it RLS9 it RLS10 •i HLS1 I OUFUE RLS'' it RUS3 n BUS^ it RUS5 it ,: * ( '. t *- »♦* fl,**"!*** %«•******•*.«. OPE RAT I uN A,B,C,D, E ,F,G CCMM^NTS CHNG2 GENERATE GATE LS LCGIC P SAVEV ASSIGN ASSIGN CUEUE SEI7F DEPART LOGIC S ADVANCE RELEASE STEN STEN SE4N0+ ,K1 ,H 1, XHJSEPNC 2.FN1 + ~> • ? v 2 STEN 100 v ~> TFRMINATF 1 CREATE A MICROINSTRUCTION (T=0) CCNTROL GATE OPEN? (IF NP--WAIT) IF Y C S CLOSE GATE AND PROffESS. UPDATE SERIAL NUMBER AND ASSIGN T PI. ASSIGN RUS ADDRESS TO PARM? TAKE QUEUE STATS ON BUS USE USE ADDRESSED BUS JACILITY FXIT FROM QUEUE BUS QRTAINED--OPEN CONTROL GATE DC CYCLE-- 100 TIME UNITS GFT OFF BUS KILL, AND INCREMENT TEP M COUNT STAR T 1000 RMULT 1 CLEAR CHNGI INI TI AL LSI CHNG? GENERATE 10,,,, ,2 START 1000 RMLLT 1 CLEAR f HNG ! INI TI AL LSI CHNG"> GENEPATE ?0, , , , f? START 10 00 RMUL T 1 CLEAP CHNGI INI TI AL LSI C HN G ?. GENERATE 30, , , , ,2 STAOT 1000 R^ULT 1 CLEAP CHNG1 INITIAL LSI TMNG2 GENEPA T F 40 , , , , t2 START 1000 RMULT 1 CLEAP CHNG1 INITIAL LSI CHNG 2 GENEPATF 50,,,, ,2 START 1000 RMULT 1 CLE4R CHNGI IM TI AL LSI CHNG2 GENERATE 60,,,, ,2 START 1000 RMULT 1 CLEAP CHNGI INITIAL LSI CHNG 2 GENEPATE ~o , , , , »2 ST/SRT 1000 RMIJLT 1 CLEAh CHNGI INITIAL LSI CHNG 7 GEN C PATE eo,,, , ,2 START 1000 RVIJLT 1 4. CLE A<< C HN G 1 INITIAL LSI CHNG2 GENERATE 90,,,, ,2 START 10 00 RMlJLT 1 CLEAP CHNGI INI TI AL LSI CHNG? GENERATE 100, ,, .,2 START 1000 END SIMULATE 1000 INSTRUCTIONS 82 PfcPFAT SIMULATION WITH ACCESS TIME = 10 TIMP UNITS T-ACCFSS = 20 T-ACTESS = 30 T-ACCESS = 40 T-ACC C SS 50 T-ACCESS = 60 T-ACCFSS = 70 T-ACCESS = 80 T-ACCESS = 90 T-ACCESS = 100 83 APPENDIX B LOGICAL SYSTEM SIMULATION PROGRAM The following GPSS program implements the logical simulator of section J>*h, It is a complex program utilizing many of the more advanced features of the GPSS simulation language. The extensive comments provided should explain the operation of the program to the experienced GPSS user. //JOBLIB DC DSNAME=SYS1.GPSSLIB,DISP=SHR 8^ // EXEC GPSS //GPSS.SYSIN CC * REALLOCATE BVR,15 SIMULATE THIS PROGRAM SIMULATES THE OPERATION OF VK1. THE PROGRAM CONSISTS ♦OF MICRCINSTRUCTICN ASSEMBLER, IBUS CONTROL, BUS -MAS T ER SH IP AKDITPA- *TION, COMPLETION CUEUEING, SOURCE ANC DESTINATION DECODING, AND BUS 'DEVICE CCNTPCLLEPS. * t "MOCEL ENTITIES DEFINITION: (SYMBOLIC NAMES MUST BE EXPLICITLY ASSOCI- *ATED WITH DEVICE ENTITIES TO INSURE VALID OFFSET INDEXING.) "hALFWORD MATRIX SAVEVALUES PCMO RCM1 RCM2 RCM3 MATRIX MATRIX MATRIX MATRIX H,5,32 H,5,32 H,5,32 H,5,32 COLUMN i = ADDRESS RCh 1 = BUS ADD ROW 2 = SOURCE ADD ROW 3 = SOURCE FUNCTION ROW 4 = DESTINATION ADD ROW 5 = DESTINATION FUNCTION *HALFWORD SAVEVALUES: MARO EQU 1,H MAR1 EQU 2,H MAR2 EQU 3,H MAR 3 EQU *,H IMRO ECU 5.H IMR1 ECU 6,H IMK2 EQU 7,H IMR3 ECU 8,H MEN EQU 9,H AMP EQU 10, H IMRP * EQU lit H MDIR EQU 12, H XBUSO EQU 13, H XBUS1 EQU U,H XBUS2 EQU 15, H XBUS3 EQU 16, H MDOR EQU 17, H MAR EQU 18, H ILLDO EQU 15, H ILLOl ECU 20, H ILLD2 EQU 21, H ILLD3 EQU 22, H ILLSO ECU 23, H ILLS1 ECU 24, H ILLS2 ECU 25, H ILLS3 ECU 26, H REGOO EQU 27, H REGOl EQU 28, H REG02 EQU 29, H REG03 ECU 30, H REGIO EQU 31, H REG11 EQU 32, H REG12 EQU 33, H MEMORY H II •I ADDRESS n REGISTER ii ii DATA •i •• M MODULE ENABLE IMMEDIATE ii ii it REGISTER ii n ii (MODULO 4 1 2 3 BUS " 1 " 2 " 3 COUNTR) ACTIVE MODULE POINTER (MODULO 4) POINTER TO IMMEDIATE DATA REG TO BE LOADED WITH NEXT INST. MEMORY DATA INPUT REGISTER XFER REGISTER — BUS XFER REGISTER—BUS 1 XFER REGISTER — BUS 2 XFER REGISTER — BUS 3 MEMORY DATA OUTPUT REGISTER MEMORY ADDRESS REGISTER ILLEGAL DESTINATION CODE FLAG ILLEGAL DESTINATION CODE FLAG ILLEGAL DESTINATION CODE FLAG ILLEGAL DESTINATION CODE FLAG ILLEGAL SOURCE CODE— BUS ILLEGAL SOURCE CODE--BUS 1 ILLEGAL SOURCE CODE— BUS 2 ILLEGAL SOURCE CODE--BUS 3 REGISTER -- BUS REGISTER 1 — BUS REGISTER 2 — BUS REGISTER 3 — BUS REGISTER 0— BUS 1 REGISTER 1— BUS 1 REGISTER 2— BUS 1 REG13 ECU 34, H REG20 ECU 35 , H REG21 ECU 36, H REG22 EQU 37, H REG23 EQU 38, H REG30 EQU 39, H REG31 EQU 40, H REG32 ECU 41, H REG33 EQU 42, H UCOMR EQU 43, H ALLRG EQU 44, H ALRRG EQU 45, H ALLFN EQU 46, H ALRFN EQU 47, H ALORG EQU 48, H SERNC * EQU 49, H m •FACILITIES: ROMO EQU ltF RCM1 EQU 2,F ROM2 EQU 3,F RCM3 ECU 4,F BUSO EQU 5,F BUS1 EQU 6,F BUS2 EQU 7,F BUS3 EQU 8,F CORE EQU 9,F m ♦QUEUES: BUSQO ECU 1,C BUSQ1 EQU 2,Q BUSQ2 ECU 3,Q BUSQ3 ECU 4,Q * *USER CHAIN: 4c COMPQ EQU 1,C RCMQ EQU 2.C ME INC ECU 3,C XDLQO EQU 4,C XDLQ1 EQU 5,C XDL02 EQU 6,C XDLQ3 EQU 7,C SDLOO EQU 8,C SDLQ1 EQU 9,C SDLQ2 EQU 10, C SDLQ3 * EQU 11, C * ♦LOGIC * SWITCHES: * MAVO EQU 1,L MAV1 ECU 2,L MAV2 EQU 3,L MAV3 EQU 4,L REGISTER REGISTER REGISTER REGISTER REGISTER REGISTER REGISTER REGISTER REGISTER UNIVERSAL ALU— LEFT 3— BUS 1 — BUS 2 1 -- BLS 2 2 — BUS 2 3 — BUS 2 — BUS 3 1 — BUS 3 2 — BUS 3 3 — BUS 3 COMMUNICATIONS HANO REGISTER 85 REG. ALU — RIGHT HAND REGISTER ALU — LEFT FUNCTION REGISTER ALU — RIGHT FUNCTION REGISTER ALU — OUTPUT REGISTERR SERIAL NUMBER FOR INSTRUCTIONS RCM(FACILITY) " 1 " 2 " 3 BUS(FACILITY) " 1 " 2 " 3 FACILITY FOR GETTING CORE STATS BUS QUEUE BLS QUEUE 1 BLS QUEUE 2 BLS QUEUE 3 CCMPLETION QUEUE CHAN FOR ROM OELAYS CHAIN FOR MEINC CELAY CHAIN FOR XFER DELAY CHAIN FOR XFER DELAYS— BUS1 CHAIN FOR XFER DELAYS— BUS2 CHAIN FOR XFER DELAYS— BUS3 CHAIN FOR SOURCE DELAYS — BUSO CHAIN FOR SOURCE DELAYS — BUS1 CHAIN FOR SOURCE DELAYS — BUS2 CHAIN FOR SOURCE OELAYS — BUS 3 INIT MCDULE n AVAILABLE M VALUE I 1 1 1 INSTF OMRO BMRl RMR2 BMR3 BHLDO RHLOl BHLD2 BHL03 BCCO BCCl BCC2 BCC3 ACKO ACK1 ACK2 ACK3 BRQIO BRQ11 BRQ12 BRQ13 BR020 BRQ21 BRQ22 BRQ23 BR030 ERQ31 BPQ32 BR033 BGTIO BGTU BGT12 BGT13 XENO XEN1 XEN2 XEN3 MAV DAV RCMEN ALUAV CCNOO CONDI COND2 COND3 CCND4 EQU EQU EQU ECU EQU EQU EQU ECU ECU EQU ECU EQU EQU EQU EQU EQU EQU EQU EQU ECU EQU ECU EQU EQU EQU EQU EQU EQU EQU EQU EQU ECU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU EQU "ARITHMETIC MNDEX FUNC ADDI VARI MAVI VARI MEINC VARI AMINC VARI BMRI VARI BHLDI VARI BCCI VARI BUSQI VARI IMMED VARI BLSI VAPI 5,L 6,1 7,L 8,L 9,L 10, L lit L 12, L 13, L 14, L 15, L 16, L 17, L 18, L 19, L 20, L 21, L 22, L 23, L 24, L 25, L 26, L 27, L 28, L 29, L 30, L 31, L 32, L 33, L 34, L 35, L 36, L 37, L 38, L 39, L 40, L 41, L 42, L 43, L 44, L 45, L 46, L 47, L 48, L 49 ,L 50, L VARIABLES: (VARIABLE TICN WHICH ADDS A BASE ABLE P3+K1 ABLE XH$MEN+K1 ABLE (XH$MEN*Kl)34 ABLE (Xh$AMP*Kl)a4 ABLE P4+K6 ABLE P4+K10 ABLE P4+K14 ABLE P44-K1 ABLE Kl ABLE P^+K5 INSTRUCTION FLAG BLS MASTER FLAG 1 2 ■ 3 BLS M it n HOLD it FLAG » 1 tt ii "2 n ii ii 3 BLS CYCLE COMPLETE n m ni 4. ii •• "2 ii it ii 5 SOURCE ACKNOWLEDGE BUS •i ii ii j_ •i ii ii 2 •• it ii -i BLS REQUEST, DEVICE #1 BUS Kl " #1 " #1 « ¥2 ■ #2 ■ *2 " #2 " #3 " #3 " *3 " #3 " #1, BUS BLS GRANT TO CEVICE TRANSFER ENABLE BUS CORE MEMORY IS AVAIL CCRE MEMORY DATA IS MEMORY ENABLE FOR PO ALL IS AVAILABLE CONDITION CODE *0 I 2 3 ABLE AVAILAB M'S LE CONDITION CONDITION CONDITION CCNDITION CODE CODE CODE CODE #1 #2 #3 ¥4 3 NAME WITH "I" SUFFIX IS AN ENTITY CONSTANT TO BUS NUMBER.) MCCULE ADDRESS INDEX MCDULE AVAILABLE INDEX MCDULE ENABLE INCREMENT ACTIVE MODULE INCREMENT BUS MASTER INDEX BUS HOLD INDEX BLS CYCLE COMPLETE INDEX BLS QUEUE INDEX SCURCE CODE FOR IMMED DATA = 1 BUS FACIL ITY INDEX 87 IMRGI VARIABLE XHSIMRP+K5 IMMEDIATE CATA REG SAVEV INDES ACKI VARI XHJIMRP+K18 ACK INDEX FOR ASSERTION OF IDAT XENI VARIABLE P4+K33 TRANSFER ENABLE INDEX MADD VARIABLE P2-K1 CCMPUTE MODULE ADDRESS PCINC VARIABLE XH*2+1 INCREMENT APPROPRIATE PC CCNDI VARI P8+A6 INDEX TO COND-CODE LOGIC SWITCH POSTI VARI XH$XBUS1/ 1000 Tf-OLSANDS DIGIT = # OF POSTI'S ABNO VARI CH$CCMPQ-V$POSTI NC . OF BUS CYCLES TO ABORT LSBS VARI (XH$XBUS1-V$P0STI ) 34 LSB'S OF BRANCH ADDRESS MSBS VARI (XH$XBLSl-V$PCSTI-V$LSBS)/4 MSB'S CF BRANCH ADDRESS XBUSI VARI XHIIMRP+K13 INDEX TO WHICH XBUS TO ASSERT SDLOI VARI P4+8 SCURCE DELAY CUtUE INDEX XDLQI VARI P4+4 XFER DELAY QUEUE INDEX REGOI VARI K25 BASE ADDRESS OF BUS REG SET REG1I VARI K29 BASE ADDRESS OF RUS 1 REG SET REG2I VARI K33 BASE ADDRESS OF BUS 2 REG SET REG3I VARI K37 BASE ADDRESS OF BUS 3 PEG SET CORAC VARI XHJMAR+6C INDEX TO GET CURE XH'S ¥ •BOOLEAN VARIABLES: DELQ BVARIABLE ( L S$BCCO* LSSXENO ) + ( LSSBCC 1 *LS$XEN1 ) *-BV$DEL QC DELQC BVARIABLE ( LSSBCC2* LS$ XEN2 )+( LS$BCC3* L S* XEN3 ) ( CONT I NUAT I CN ) •DELQ+DELQC ALLCWS THE FRCNT OF THE INSTRUCTION COMPLETION QUEUE TO BE ♦DELETED AND FCR A TRANSFER ENABLE (XEN) TC BE ISSUED. * MODIO eVARIABLE ( XHtMEN • E ( K0 ) * L S $M A VO MODI! eVARIABLE (XH$MEN«E «K1)*LS$^AV1 MC0I2 eVARIAELE < XF$MEN ' E 'K2 )*L SSMAV2 M0DI3 eVARIABLE ( XH$MEN' E «K3 ) *L S$* AV3 MINIT BVARIABLE BVSMOD I 0+ EV $MOD I 1 + BV $MCD I 2* B V$MODI 3 * LS$ROMEN * SHOULD GO IF = 1 BARBC BVARIABLE LRSBGDO* ( IS $ER Ql C + L S$BR C20 + L S$BRQ30 ) BARB1 EVARIABLE LR SBGD1 * ( LS SBRQl 1 + L S$BR C 2 1 +L SSBRQ3 1 ) BARB2 EVARIABLE LR JBGD2* ( LS$ER CI 2+LSSBR C22+L SSBRQ32 ) BARB3 EVARIABLE LR $BGD3 * ( LS $ eRQ 1 2+LSSBR C 23+L SSBRQ3 3 ) •BARBI SIGNALS ACTIVATE BUS MASTER ARBITRATION CONTROLLERS (1 PER BUS) * ♦INITIALIZATICN CF LOGIC SWITCHES: INITIAL LS1/LS2/LS3/LS4/LS5/LS6/LS7/LS8/LS9/LS10/LS11/LS12 INITIAL LS13/LS1WLS15/LS16/LS17 INITIAL LS42/LS43/LS44/LS45 * ♦INITIALIZATICN CF CCNTROL STCRE: * * *****»<*************»» MICPC PROGRAM ***************** *************** * * THE MICROPROGRAM IS 5TCREC IN 4 MATRIX SAVEVALUES WHICH REPRE- SENT THE FOUR RCM MODULES. EACH COLUMN REPRESENTS A MICRO INSTRUCTION. •COLUMNS CORRESPOND TO ADDRESSES OFFSET B\ ONE. THE ROWS CORRESPOND TO •MICRO INSTRUCTION FIELDS AS FCLLCWS: * RCW1 = BUS ADDRESS * RCW2 = SOURCE ADDRESS * RCW3 = SOURCE FUNCTION RCW4 = DESTINATION ADDRESS MODULE IS READY TO GO MODULE 1 IS READY TO GO MCDULE 2 I s READY TO GO MODULE 3 IS READY TO GO 88 * RCW5 = DESTINATION FUNCTION * «•**«»*******•«***««*»** *****MICPC-DI AGNOSTI C *3 *«♦**•*«*»«***« •*»****» * INITIAL MH1(2,1 ) , 1/MHl (4,1 ) ,9 INI TI AL MH2<3, 1 ) , 20 INITIAL MH3(2, 1) , ~i/MH3( 4,1 ) ,2 INITIAL MHM2, 1) , 1/MH4< 4,1 ) ,9 INITIAL MHl(3,2),21 INITIAL MH2(2t2) ,7/MH2( A, 2 ) ,6 INITIAL MH3(1,2) , 1/MH3( 2,2) ,6/ MH3 ( 4 , 2 ) , 3 INI TIAL MH4( 1 ,2) , 1/MH4< 2,2) , 1/MH4(4,2) ,8 INITIAL MHl(3,3),40 INITIAL MH2(2,3) , 1/ MH2 ( 4 , 3 ), 9/ MH2 ( 5 , 3) ,1 INITIAL MH3<3,3),22 INITIAL MH4(2,3) , 7/MH4(4,3) ,8 INITIAL MH1(2,4),1/MH(4,4),7/MH1(5,4),3 INITIAL MH2(3,4),2 INITIAL Mh3{l,^),l/HH2(2,1 ),1/MH3(4,4) , 7/MH3<5,*) ,6 INITIAL MH4(3,4),20 INITIAL MH1I1 »6) , 1/MH1(2,6) , 7/MH1 (4,6) ,4- M«tMUMU*MM4U*<*lNIT IAL IZATIGN OF CORE MEMORY* **■********.****< ** « XH8C30 XH81,31 XH82,32 INITIAL INITIAL INITIAL * * * ♦ * * * * ***** * * 4c « * « ***** • « *LCC CPERATICN * GENERATE TEST E SAVEV ASSIGN ASSIGN ASSIGN ASSIGN ASSIGN ASSIGN ASSIGN ASSIGN SPLIT SEIZE LOGIC R SPLIT LINK CORE MEMORY***"- (ATCRESS 20) ( ADDRESS 21) ( ACCRESS 22) #***************«*r******»** SI^LLATICK ^ODEL BLOCKS * * ft******************** ****** A,B,C,C,E ,F,G CCMMENTS » , , , , 8 BV$MINIT,K1 INITIATE A MEM CYCLE MAV(MEN)=1 SEPNO+tKliH SERIAL NUMBER FOR INST'S 1,XH$SERNC ASSIGN SERIAL NO. 2,V$MAVI P2=R0« FACILITY NO. 3,XH*2 P3=M00ULE PC 4,MH*2(1,V$ACCI ) P4=BUS ADDRESS (READ FROM POM) 5,MH*2(2, VSADCI ) P5=S0URCE ADDRESS 6,MH*2( 3, V$ACCI ) P6=S0URCE FUNCTION 7,MH*2U,V$ACCI ) P7 = DEST ADD 8,MH*2( 5, V$ACCI ) P e=DE ST F UNC T ION ltlNCME SEND XACTION TO INC MOD ENABLE *2 ACCESS PRESENTLY ENABLED ROM *2 RESET MAV OF ROM JUST ACCESSED 1,PCMQU SEND OFFSPRING TO JOIN ROMQ ROMQ.FIFC RCMQU MICRC BUSOK ISTB INCME STAGR INCH ADVANCE UNLINK TEPMINATE RELEASE TEST E GATE LS QUEUE GATE LS DEPART TEST NE SEIZE LOGIC S SAVEVALUE SAVEVALUE LCGIC R SPLIT TEST NE TEST NE TEST NE TRANSFER SPLIT LINK ADVANCE UNLINK TERMINATE SAVEV BUFFER TERMINATE 100 P0MQ,MICRC,1 ,1 ♦2 XH$AMP,V$*ADC INSTF, ICATA VSBUSQI VSBCCI VSBUSQI P5,V$IMMEC,IMMED VSBUSI *2 *2,VJPCINC,H AMP,V$AMINC,H V$BCCI It INSTQ P*»,KO,BSCCO P4,K1,BSCCI P4,K2,BSDC2 ,BS0C3 It STAGR MEINCFIFC 10 MEINCf INC IT, 1,1 MEN,VSMEINC t H T-ACCESS = 100NS UNLINK £ SEND TO MICRO 89 RELEASE ROM FACILITY LET XACTN ENABLED BY AMP TO GO IF INSTF = 0, GO TO IDATA GATHER STATS ON WAINING FOR BUS WAIT FOR BUS CYCLE COMPLFTE LEAVE QUEUE IF SOURCE IS IMMED, SEIZE BUS FACILITY SET MAV OF ROM JUST INCREMENT MOOULE PC INCREMENT ACTIVE MODULE CLEAR BCC SEND ONE OFFSRING TO JOIN CCMPQ * * * GC GO TO IMMED UNLOADED POINTER THE BUS ADDRESS IS DECODED TO BUS DECODER/CONTROLLER— 3 SEND OFFSPRING TO DELAY PUT IN DELAY QUEUE 10 NS DELAY TO INC MOD ENABLE REMOVE FROM DELAY QUEUE INCREMENT MOOULE ENABLE RESTART CURRENT EVENTS SCAN KILL IT IDATA SAVEVALLE VUMRGI,P6,H SAVEVALUE LOGIC S LCGIC S LCGIC S SAVEVALUE SAVEVALUE TERMINATE V$XBUSI,P6,H VtACKI INSTF *2 *2,V$PCINC,H AMP,V$AMINC,H LCAD IMMEDIATE CATA REG iNOTE THAT ?6 CONTAINS THE DATA) ALSO ASSERT IMMED DATA ON XBUS ASSERT APPROPRIATE ACKNOWLEDGE SET INST FLAG SET MAV INCREMENT MODULE PC INCREMENT AMP KILL THE TRANSACTION THIS SEGMENT CREATES XACTIONS AS REC'C TO POP THE COMPLETION QUEU GENERATE TEST E UNLINK LCGIC R LOGIC R LCGIC R LCGIC R TERMINATE f f f f f BV$DELQ,1 C0MPQ,XEN,1 XENO XEN1 XEN2 XEN3 DELETE TOP OF INSTQ? CLEAR CURRENTLY SET XEN (CNLY ONE SHOULD BE ON AT A TIM * THIS SEGMENT IS ENTEREC BY TRANSACTIONS WHICH HAVE JUST BEEN RE- MOVED FRCM THE CCMPLETION QUEUE WHICH IS STORED IN USER CHAIN "COMPQ" XEN LOGIC S VSXENI TERMINATE ASSERT XEN KILL IT 90 THIS SEGMENT IS ENTERED WHEN AN INST SPECIFIES IMMED DATA IMMED SAVEVALLE IMRP,P4,H LOGIC R INSTF TRANSFER ,ISTB SAVE BUS ADD IN POINTER REG SO THAT THE DATA KNOWS WHERE TO BE LOADED CLEAR INSTF- NEXT WORD IS DATA GO TO ISTB ♦THIS SEGMENT IS ENTERED BY AN OFFSPRING CF EVERY INST. IT PLACES THE ♦INSTRUCTION IN THE INSTRUCTION CUEUE * INSTQ LINK CCMPQ,FIFC,XEN PUT INST IN COMP QUEUE UNLESS TERMINATE IT IS THE VERY FIRST INSTRUCTION BSOCO SPLIT It SDCO GATE LS ACKO WAIT FOR ACKNOXLEDGE FROM S SPLIT 1,XDL0 * LINK XDLQO,FIFC * XDLO ADVANCE 20 ♦20NS INTERRUPTABLE DELAY UNLINK XDLQ0,XG0C,1 * TERMINATE * XGOO GATE LS XENO WAIT FOR X-ENABLE FOR XFER TEST NE P7,K1,IMRCC * TEST NF P7,K2»RRCC * TEST NE P7,K3,RRDC * TFST NE P7,K4,RRCC * TEST NE P7fK5,RRDC * TEST NE P7,K6,UCLD0 * TEST NE P7,K7,ALL * TEST NE P7,K8,ALR * TEST NE P7,K9,MEMR * SAVEVALUE ILLC0,K1 ,F SET DEST ERROR BUSO FLAG FINDO LOGIC R ACKO RETURN POINT FROM DEV SEGME ADVANCE 10 IONS NONINTERRUPTABLE DELAY LCGIC S BCCO SET BUS CYCLE COMPLETE RELEASE VSBUSI RELEASE BUS FACILITY PRINT » » C t x PRINT . fMOVfX PRINT t f XH t X PR INT ffLG,X * TERMINATE * SDCO TEST NE P5,K1,BUKET IF IMMEDIATE INST —DISCARD SPLIT 1,S0L0 * LINK SDLOO,FIFC * SDLO ADVANCE 50 ♦INTERRUPTABLE 50NS DELAY UNLINK SDLQO, AKGC0,1,1 * TERMINATE * AKGOO TEST NE P5,K1,IMRS0 ** TEST NE P5,K2,RRSC ** TEST NE P5,K3,RRSC ** TEST NE P5,K4,RRSC ** TEST NE TEST NE TEST NE SAVEVALUE FINSO LOGIC S TERMINATE BSDC1 SPLIT GATE LS SPLIT LINK XDL1 ACVANCE UNLINK TERMINATE XGC1 GATE LS TEST NE TEST NE TEST NE TEST NE TEST NE TEST NE TEST NE TEST NE SAVEVALUE FIND1 LOGIC R ACVANCE LOGIC S RELEASE PRINT PRINT PRINT PRINT TERMINATE SDC1 TEST NE SPLIT LINK SDL1 ADVANCE UNL INK TERMINATE AKGOl TEST NE TEST NE TEST NE TEST NE TEST NE TEST NE TEST NE SAVEVALUE FINS1 LOGIC S TERMINATE P5,K5,RRSC P5,K6,UCUS0 P5,K7,MEMCR ILLSO,Kl,F ACKO 1,SDC1 ACK1 l,XOLl XDLQl.FIFC 20 XDLQ1,XG01,1 XEN1 P7,K1,IMRC1 P7.K2.RRC1 P7,K3,RRD1 P7,K4,RR0l P7,K5,RR01 P7.K6,UCUC1 P7,K7,BCU P7,K8,MEMIR ILL01,K1,F ACK1 10 BCC1 V$BUSI » ,MOV,X i t c » X ,,LG,X t ? XH t X 1 P5 t Kl, BUKET 1,S0L1 SDLQl.FIFC 50 SDLQltAKGCl.1,1 P5,K1, IMPS1 P5tK2,RRSl P5 t K3,RRSl P5,KA,RRS1 P5»K5,RRS1 P5,K6,UCUS1 P5.K7.AL0 ILLS1,K1,F ACK1 »* ** ** EPRCR IF THIS BLOCK IS ENTERED RETURN POINT FROM SOURCE SEG. KILL IT 91 WAIT FOR ACKNCXLEDGE FROM SOURCE * *20NS INTERRUPTABLE DELAY * WAIT FOR X-ENABLE FOR XFER Jr * * SET DEST ERROR BUSl FLAG RETURN POINT FRCM DEV SFGMENTS IONS NONINTERRUPTABLE OFLAY SET BUS CYCLE COMPLETE RELEASE BUS FACILITY IF IMMEDIATE INST —DISCARD MNTERRUPTABLE 50NS DELAY * * #* ** ** ** ** ERROR IF THIS BLOCK IS ENTERED RETURN POINT FROM SOURCE SEG. KILL IT BSDC2 SPLIT GATE LS SPLIT 1.SDC2 ACK2 1,XDL2 WAIT FOR ACKNOXLEDGE FRCM SOURCE LINK XDL2 ADVANCE UNLINK TERMINATE XG02 GATE LS SAVEVALUE FIND2 LOGIC R ADVANCE LOGIC S RELEASE TERMINATE XDLQ2.FIFC 20 XDLQ2,XG02,1 XEN2 ILLD2iKl ,F ACK2 10 BCC2 VIBUSI * 92 ♦20NS INTERRUPTABLE DELAY * WAIT FOR X-ENABLE FOR XFER SET DEST ERROR BUS2 FLAG RETURN POINT FROM DFV SEGMENTS IONS NONINTERRUPTABLE DELAY SET BUS CYCLE COMPLETE RELEASE BUS FACILITY KILL IT SDC2 TEST NE SPLIT LINK SDL2 ADVANCE UNLINK TERMINATE AKG02 SAVEVALUE FINS2 LOGIC S TEPMINATE P5,K1,BUKET 1,S0L2 SDLQ2.FIFC 50 SDLQ2,AKGC2,1,1 ILLS2 t Kl,F ACK2 IF IMMEDIATE INST —DISCARD * "INTERRUPTABLE 50NS DELAY ERFOR IF THIS BLOCK IS ENTERED RETURN POINT FROM SOURCE SEG. KILL IT BSDC3 SPLIT GATE LS SPLIT LINK XDL3 ACVANCE UNLINK TERMINATE XG03 GATE LS SAVEVALUE FIND3 LOGIC R ADVANCE LOGIC S RELEASE TERMINATE 1»SDC3 ACK3 1,XDL3 XDL03,FIFC 20 XDLQ3,XGC3,1 XEN3 ILLD3,K1»F ACK3 10 BCC3 VSBLSI WAIT FOR ACKNOXLEDGE FROM SOURCE * *20NS INTERRUPTABLE DELAY WAIT FOR X-ENABLE FOR XFER SET DEST ERROR BUS3 FLAG RETURN POINT FROM DEV SEGMENTS IONS NONINTERRUPTABLE DELAY SET BUS CYCLE CCMPLETE RELEASE BUS FACILITY KILL IT SDC3 TEST NE SPLIT LINK SDL3 ADVANCE UNLINK TERMINATE AKG03 SAVEVALUE FINS3 LOGIC S TEFMINATE P5,K1,BUKET l,SDL3 SCLQ3,FIFC 50 SDLQ3,AKGC3,1.1 ILLS3,Kl,h ACK3 IF IMMEDIATE INST —DISCARD * * ♦INTERRUPTABLE 50NS DELAY * * ERROR IF THIS BLOCK IS ENTERED RETURN POINT FROM SOURCE SEG. KILL IT RPSO ASSIGN SAVEVALUE TRANSFER RPDO ASSIGN 5+,V$REG0I XBLS0,XH*5,H ,FINS0 7*,V$REG0I SAVEVALUE *7 , XHSXBU SO » H ACC BASE TO P5=RELATIVE ADDRESS ASSERT $BUS QUIT ADD BASE TO P7=REL ADDRESS ASSERT REG CONTENTS ON X-BUS TRANSFER ,FINDO 93 * UNIVERSAL COMMUNICATIONS UNIT UCUOO SAVEVALLE UCCMR,XH$ XBUSO , H TRA^SFEP ,FINDO LCAD UCOMREG W/ (XBUS) RETURN TO BUS OECODE /CONTROL UCLSO GATE LS SAVEVALUE TRANSFER XENO XRLSO,XH$LCOMR,H ,FINSO NEEO TO WAIT TIL PREVIOUS CYCLE LOAD (UCOMR) ONTO XBUS RETURN TO BUS DECODE/CCNTROL RRS1 ASSIGN SAVEVALUE TRANSFER 5+,V$REGl I XBLS1,XH*5,H ,FINS1 ADD BASE TO P5=RELATIVE ADDRESS ASSERT $BUS QLIT RRD1 ASSIGN SAVEVALUE TRANSFER 7+,V$REGlI *7,XH$XBLS1,H ,FIND1 ♦ * UNIVERSAL COMMUNICATIONS LNIT UCUD1 SAVEVALLE UCCMR, XH$ XEUS 1 , H UCUSl TRANSFER GATE LS SAVEVALUE TRANSFER ,FIND1 XEN1 XBUSl,XH$LCOMR,H ♦FINS1 ACD BASE TO P7=REL ADDRESS ASSERT REG CONTENTS ON X-BUS LCAD UCOMREG W/ (XBUS) RETURN TO BUS DECODE/CCNTROL NEED TO WAIT TIL PREVIOUS CYCLE LCAD (UCOMR) CNTO XBUS RETURN TO eUS CECODE/CONTROL RRS2 ASSIGN SAVEVALUE TRANSFER 5*,V$REG2I XRUS2,Xh*5,H ,FINS2 ADC BASE TO P5=RELATIVE ADDRESS ASSERT $BUS CLIT RRD2 ASSIGN SAVEVALUE TRANSFER 7*,V$REG2 I *7,XH$X8LS2,H ,FINC2 UNIVERSAL COMMUNICATIONS LNIT UCUD2 SAVEVALLE UCOMR, XHSXBUS2 , H TRANSFER ,FIND2 ACD BASE TO P7=REL ADDRESS ASSERT REG CONTENTS ON X-BUS LCAD UCOMREG W/ (XBUS) RETURN TO BUS CECODE/CONTROL UCUS2 GATE LS XEN2 NEED TO WAIT TIL PREVIOUS CYCLE SAVEVALUE XBLS2,XH$LCCMR,H LCAD (UCOMR) CNTO XBUS * TRANSFER ,FINS2 RETURN TO BUS DECODE /CONTROL * RRS3 ASSIGN 5+.VSREG3 I ADD BASE TO P5=RELATIVE ADDRESS SAVEVALUE XBUS3,XH*5,H ASSERT $BUS + TRANSFER ,FINS3 QUIT RRD3 ASSIGN 7+,V$REG3I ACD BASE TO P7=REL ADDRESS SAVEVALUE *7,XH$XBUS3,H ASSERT REG CONTENTS ON X-BUS TRANSFER ,FIND3 UNIVERSAL COMMUNICATIONS UNIT UCU03 SAVEVALIE UCC MR , XHt XBUS 3 , H LCAD UCOMREG W/ (XBUS) 9^ TRANSFFR ,FIND3 RETURN TO BUS OECOOE /CONTROL t i UCUS3 GATE LS XEN3 NEED TO WAIT TIL PREVIOUS CYCLE SAVEVALUE XBUS3 , XHSLCC MR , H LOAD (UCOMR) ONTO XBUS TRANSFER ,FINS3 RETURN TO BUS CECODE/CONTROL THIS SEGMENT IS ENTERED FPOK A BUS DECODER. IT SIMULATES THE "AfllfNS OF THE ERANCH CCNTFOL UNIT. THE eRANCH ALGORITHM ENVOLVES MESTING THE LENGTH OF THE CCMPIETICN QUEUE (COMPQ), ABORTING ♦APPROPRIATE MEMORY AND BUS CYCLES, AND ALTERING THE PROGRAM COUNTER *WMtN A BRANCH IS SUCCESSFUL. BCU TEST LE P8,K5,BITCL BIT SET OR BIT CLEAR CONDITION? GATE LS VSCONDI ,BCONE BPANCH FAILS — GO TO BDONE TRANSFER ,JUMP BRANCH IS SUCCESSFUL BITCL ASSIGN 8-,K5 SLBTRACT TO GET SWITCH INDEX GATE LR VSCCNDI,8CCNE BRANCH FAILS — GO TO BDONE JUMP TEST GE CHSCCMPQ, VSFCSTI CHECK * OF LOADED INSTS, GWAIT * IF NECESS4RY LCGIC R RCMEN DISABLE NEW ROM INITIATES. UNLINK ROMQ,ABMEM,ALl ABORT ALL MEM CYCLES IN PROGRESS UNLINK COMPQ, ABBLS , VS A BNO , BACK ABORT BAD BUS CYCLES SAVEVALLE AMP,V$LSBS,H ♦ SAVEVALUE MEN,XH$A*F,H * SAVEVALLE MARO , V$MS ES ,H *LOAD NEW PROGRAM COUNTER SAVEVALUE MAR1 ,XH$* /RO ,H * SAVEVALUE MAR2 , XH$N ARC ,H * SAVEVALUE MAR3 » XH$t* ARO , H * DEPART 1,Q1 * DEPART 2,Q2 *ZERO ALL BUSS QUEUE S—NOTHI NG DEPART 3,Q3 *SHOULD BE WAITING FOR A RCC DEPART 4,Q4 *AT THIS TIME. TRANSFER ,BCONE GC TO 'FINISH ABMEM LOGIC S *2 SET MAV OF ABORTED MEMORY CYCLES RELEASE *2 RELEASE ROM FACILITY OF ABBORT BUKET TERMINATE THIS IS THE BIT BUCKET ABBUS UNLINK V$SDLQI , BLKET , 1 *THROW ALL X'S WAITING IN OELAYS UNLINK V«XDLQI,eLKET,l *IN THE BIT BUCKET FOR ALL BAD * *BUS CYCLES STARTED. LOGIC R VSACKI RESET ACK BDONE LOGIC S ROMEN ENABLE ROM'S AGAIN TRANSFER ,FIN01 4c * THIS SEGMENT SIMULATES CCRE MEMORY. SECTIONS FOR MAR ,MDOR, MDI R ARE * INCLUDED. MEMORY ACCESS IS INTERLOCKED BY MAV ON ADDRESS AND INPUT * DATA AS DESTINATION, AND eY CAV CN OUTPLT DATA AS A SOURCE MEMJR GATF LS MAV WAIT FOR MEMORY AVAILABLE SAVEVALUE MD IR , XHSX5LS1 ,H LCAD IR FRCM X-BUS TRANSFER ,FINDl RETURN TO BUS DECODE/CONTROL MtMOR GATE LS DAV WAIT FOR MAV=1 SAVEVALUE XBUSO, XH$ ►DOR ,H ASSERT X-BUS FROM OD REG TRANSFER ,FINSO RETURN TO BUS CONTROLLER MfMAR GATE LS MAV WAIT FOR MAV=1 SAVEVALUE MAR , XH$XB ISO ,H SPLIT 1,MCCNT SEND OFFSRING TO DO MEM CONTROL TRANSFER ,FINDO MCONT SEIZE CORE LOGIC R CAV LOGIC R MAV ADVANCE 300 ASSIGN 7,V$C0RAC SAVEVALUE MD0R,XH*7,H LCGIC S CAV ADVANCE 60C TEST E P8.K1, RESTR SAVEVALUE *7,XH$MDIF,H RESTR LOGIC S MAV RELEASE CORE f TERMINATE * * 9: TAKE FAX. STATS ON CORE CLEAR DAV CLEAR MAV T-ACCESS = 300NS P6=$AVEVALUE INDEX FOR CORE WORD LOAD MOOR SET DAV T-RESTCRE/WRITE = 600NS P8=l IS MEMORY WRITE, ELSE READ DC MEMORY WRITE SET MAV RELEASE KILL CONTROL TRANSACTION * THIS SEGMENT SIMULATES THE OPERATION OF THE ALU. THE ACTUAL •OPERATIONS ARE NOT DCNE. PROVISIONS ARE MADE FOR SETTING AND ♦CLEARING OF VARIOUS CCNDITICN COOES SHICF WOULD BE AUTOMATICALLY SET •IF ACTUAL OPERATIONS WERE SIMULATED. LOADING OF ALL CAUSES EXECUTION *0F THE SPECIFIED OPERATION. INTERLOCKING PREVENTS ACCESS OF INVALID ♦RESULTS. THE ACTUAL ARITF £ LCG OPERATIONS AAY BE SIMULATED BY "DECPDCMG TJE FIMCTOFN, AND ERANCFING TO *FELP* BLOCKS. ALL GATE LS ALLAV SPLIT 1,ALUG0 TRANSFER ,FINDO ALUGC LOGIC P ALUAV TEST GE P8,K10 t L0CCP ARTOP ADVANCE 160 TRANSFER ,OPDON LOGOP ADVANCE 130 OPDCN LOGIC S ALUAV * TERMINATE * AIR GATE LS ALUAV * TRANSFER ,FINDO 4. ALO GATE LS ALUAV TRANSFER .FINS1 WAIT UNTIL ALU AVAILABLE SEND OFFSPRING TC ALU CCNTROLLE CCMPLETE BUS CYCLE CLEAR ALU AVAILABLE P8<10 ARE LOGIC OPERATIONS ARITH OPS TAKE 160NS LOGIC OPS TAKE 130NS OUTPUT VALID--SET ALU AVAILABLE OVER AND OUT WAIT UNTIL ALUAV = 1 GO BACK TO BUS CONTROOER WAIT TILL ALUAV = 1 IMRSO IMRDO IMRS1 IMRD1 IMRS2 IMRD2 SAVEVALUE TRANSFER SAVEVALUE TRANSFER SAVEVALLE TRANSFER SAVEVALLE TRANSFER SAVEVALUE TRANSFER SAVEVALUE TRANSFER XBUSO,XH$ IMRO.H ,FINSO IMRO,XH$XEUS0,H ,FINDO XBUS1,XH$ IMR1,H ♦FINS1 IMR1 ,XH$XEUS1 ,H ,FIND1 XBUS2.XHI IMR2.H ,FINS2 INR2,XH$XELS2»H ,FIND2 ASSERT XBUS LCAD EMREG FROM XBUS RETURN TO OECODE/CONTROL ASSERT XBUS RETURN TO BUS DECODE/CONTROL LCAD EMREG FROM XBUS RETURN TO DECODE/CONTROL ASSERT XBUS RETURN TO BUS DECODE/CONTROL LCAD EMREG FROM XBUS RETURN TO DECODE/CONTROL IMRS3 IMRD3 SAVEVALUE TRANSFER SAVEVALUE TRANSFER START XBLS3,XH$ IMR3,H ,FINS3 IMR3tXH$xeLS3tH .FIND3 96 ASSERT XBUS RETURN TO BUS DECODE/CONTROL LCAP EMREG FROM XBUS RETURN TO DECODE /CONTROL END ********* L9912418 ***** 671 ************** MESH 97 APPENDIX C IBM SYSTEM/360 EMULATOR MICROPROGPAM The following listing is the partial system/360 emulator described in chapter h. The symbolic microcode conventions described in chapters 2 and h apply. Literal source data is enclosed in single quotation marks, and is given in ^-digit hexadecimal if it is a numerical constant, and is given in its symbolic form if it is an address constant. Extensive comments, and the explanation given in chapter h should make the code easy to understand. 98 Label Bus Destina- Source tion Function Comment * * IN IT 2 ICHI CARH * 3 ICLO CARL x- * 3 ICLO ALL INC 3 ALO ICLO ' IFCH ' BCU CCL * 2 ICHI ALL INC * 2 ALO ICHI ■* tt * * * * * * * * *- IFCH 1 'OOFF« ALR # 2 'STR' ALL AND 'EXT' BCU ZCL * 8 ICLO UCOM 1 CDR IRO * UCOM ALL INC 2 ICHI CARH 3 ICLO CARL R 'A' BCU/1 CSET * B * IRO SHR Ih/'EA 3 ' OOOF ' ALR SHR ALL AND 2 ALO SHR Ll/O 1 'DCBS' ALR SHR ALL ADD2 ALO BCU UC "INIT" initiates the fetch of the first half word of the program. Load high-order byte of instruction counter into core address register. Load low-order l6 bits of instruc- tion counter into core address register. Increment instruction counter. Store updated instruction counter. If carry clear then instruction counter is ok. Go to "IFCH." Carry is set. Increment high-order byte. Store updated high-order byte of instruction counter. "IFCH" gets the first half-word of the program from core memory, and decodes the first k- bits, branching to one of sixteen routines for final decoding. External interrupts are checked first. Check low-order byte of STR for pending interrupts. "EXT" handles external interrupts (unimplemented) . Route ICLO to bus 0. Get 1st half word of current instruction. Access next half word of program. If carry is set, increment high- order byte of IC. Shift high order k bits to low order position. Load mask into ALU. Mark off all but the low k- bits. Double the format/type field to get branch-table displacement. Load the base address of the branch table into ALU. Add the displacement to get branch table entry address. Enter the branch table. 99 Destina- Label Bus Source tion Function A 2 ICHI ALL INC »B' BCU(l) UC # * 2 ALO ICHI DCBS •RRO 1 BCU UC 'RR1' BCU UC 'RR2 1 BCU UC »BR5' BCU UC RXO BCU UC RX1 BCU UC RX2 BCU UC EX3 BCU UC RSIO BCU UC RSI1 BCU UC RSI2 BCU UC RSI3 BCU UC 1 OPER^ ' BCU UC * OEERil- ' BCU UC ' OPER6 ■ BCU UC 'SSI' BUC UC ' OPER6 ' BCU UC •SS3' BCU UC * ■X- * RRO * IRQ SHR R8 'OOOF' ALR SHR ALL AND 1 ALO ALR 2 1 0007 ' ALL XOR 'BCR' BCU ZSET 2 '0005* ALL XOR 'BALR' BCU ZSET 2 '0006' ALL XOR 'BCTR' BCU ZSET 2 ' OOOlf ' ALL XOR 'SPM' BCU ZSET 2 ' 0008 ' ALL XOR 'SSK' BCU ZSET 2 * 0009 ' ALL XOR Comment Carry set, increment high hyte of instruction counter. Jump back to "B" after 1 post instruction. (Post instruction) Save updated high-order byte of counter instruction. Branch table routes control to 1 of l6 routines. Op-code error (h- byte instruction) Op-code error (k byte instruction) Op-code error (6 byte instruction) Op-code error (6 byte instruction) "RRO" decodes bits <4-7> of the current instruction. This is done by testing the field directly. Move instruction code bits to least significant position. Load mask into ALU. Mask off all but instruction code bits. Load instruction code into ALU. Compare code to the "BCR" code. If match jump to "BCU." Continue similarly. 100 Destina- Label Bus Source tion Function ■SVC BCU ZSET , 0PER2 t BCU UC * * * * # * -* RR1 1 mo ALR 2 1 OOOF ' ALL AND •* 3 ALO SHR Ll/O l SHR SAR R * 2 '0F00» ALL AND •# SODR 0P2L0 * * 3 l ALO SHE SHR SHR R8/0 Ll/O SODR 0P2HI * 3 'RR1B' ALR * 2 SHR ALL ADD2 * ALO BCU UC ER1B 'LPO BCU UC N i * 1 Comment Op-code error (2 byte instruction) , "RR1" decodes the instruction code (<&-7>) via a branch table, and transfers control to execution routines. Load instruction into ALU. Load mask and zero all but R2 field. Double R2 to get scratch memory address. Access low order 16 bits of operand two. Mask all but instruction code field from instruction. Get low-order half word of operand two and saveo Right justify instruction code field. Double this to get branch table displacement. Get high-order half word of operand 2 and save. Load base address of RR1 branch table. Add the displacement to get branch table entry address. Enter RR1 branch table. RR1 branch table starts here (2 words per entry) . Unimplemented instructions. 'LOAD' BCU 'CMP' BCU 'ADD' BCU 'SUB' BCU 'MR' BCU UC UC UC UC UC Unimplemented instructions. * * SLR' BCU UC "RXO" does final decode for the RXO group of instructions. A branch table with k words per entry is used, which allows the execution routine entry point to be stored prior to the jump to the setup routine° 101 Destina Label Bus Source tion RXO 1 IRO SHR -X- 3 '000F' ALR SHR ALL * 1 CDR IR1 * 2 ALO SHR * -)(. 3 'RXOB' ALR 2 SHR ALL * ALO BCU RXOB 'STH' BCU 9 Function R8 AND L2/0 ADD2 UC UC Comment Right justify instruction code field (<&•- 7>) of current instruction. Load mask into ALU. And zero all but instruction code bits. While waiting for ALU, get second half word of current instruction. Multiply instruction code bits by k- to get branch table displacement. Load base address of RXO branch table into ALU. And add the displacement to get table entry address. Enter table. RXO branch table starts here. Don't core words. Unimplemented instruction, skip to address = RXOB+20. B+20 B+28 * •BAL' «BC 'Hvfeu' 'LOAD' BCU UC BCU UC BCU/l UC LKRG 3 3 3 'HWSU' 'CMP' 'HWSU' 'ADD' 'HWSU' 'SUB 1 BCU/1 LKRG BCU/l LKRG BCU/1 LKRG * * -X- -X- -X- * -x- * -X- RX1 -X- •EA2 1 BCU/2 3 »C LKRG UC UC UC UC s Unimplemented instruction, address = RXOB+28. \ Don't care words. Skip to (l post instruction) Jump to half- word setup routine. After saving entry address of execution routine. •x- Unimplemented instructions. "RX1" uses subroutine "EA2" to prepare the effective address of operand 2. Final decode is then accomplished via a k- word/entry branch table using "FWSU" to complete the setup. Link to subroutine "EA2", with 2 post instructions. (Post instruction) Save return address. 102 Destina- Label Bus Source tion Function 1 CDR IR1 * C 1 IPO SHR R8 * 3 T OOOF ' ALR 2 SHR ALL AND ALO SHR L2/0 * 1 'BXLB' ALR 2 SHR ALL ADD2 * ALO BCU UC RX1B , ST . BCU UC * 'OPERV BCU UC 'OPEBV BCU UC 'oeerV • • BCU UC \ -* J B+32 'FWSU' BCU/l UC 3 'LOAD' LKRG * 'FWSU' BCU/l UC 3 'CMP' LKRG 'FWSU' BCU/l UC 3 'ADD' LKRG 'FWSU' BCU/l UC 3 'SUB' • • LKRG i s * > * * * * BALE l IRO ALR -* ' OOOF ' ALL AND 3 '^000' Tl 2 ALO SHR Ll/O * 1 SHR SAR R 2 SODR 0P2L0 * 'LINK' BCU/l UC * 2 SODR 0P2HI Comment (Post instruction) Get second half- word of current instruction. Right-justify instruction code field (). Load mask. And all but instruction code bits. Multiply by k to get branch table displacement. Load branch table base address. Add the displacement to form table entry address. Branch to table. (Branch table starts here.) Jump to "ST" to execute ST. Op-code error. Unimplemented instructions. Skip to address = RXIB+32. Jump to full word setup, "FWSU" after 1 post instruction. (Post instruction) Save execution routine for L. Unimplemented instructions, "BALR" sets up the BALR instruction, then jumps to "LINK." Load first half-word of instruction into ALU. Zero all but R2 field. Save instruction length code. Double R2 field to get scratch address. Access 1st half-word of operand 2. Get 1st half-word of operand 2 and save. Jump to "LINK" (After 1 post instruction) to complete BALR. (Post instruction) Get and save 2 half-word of operand 2. 103 Label Bus Source Destina- tion Function Comments * BCR Ul * SI * * BAL D E TEST' rlj.li 'IFCH' SODR BCU/l LKRG BCU 0P2HI 'EA2' BCU/l 3 'D' LKRG 2 EAHI CARH 3 EALO CARL 3 EALO UCOM 1 ALO EALO 'E' BCU 2 EAHI ALL 2 ALO EAHI 1 CDR UCOM 2 UCOM 0P2L0 2 EAHI CARH 3 EALO CARL uc uc 1 IRO ALR 2 * OOOF ' ALL AHD 3 ALO SHR Ll/O 1 SHR SAR R 2 SODR 0P2L0 'JUMP' BCU/l UC R CCL INC "BCR" uses subroutine "TEST" to check branch condition. If unsuccessful, control returns to "IFCH." If successful, the branch address is gotten and subroutine "JUMP" entered. Jump to "TEST." (l post instruction) (Post instruction) Save return address. If "TEST" unsuccessful, return here, and go to "IFCH". Load 1st half-word of instruction. Mask all but R2 field. Double to get scratch memory address. Access operand 2 is scratch memory. Get and save 1st half-word. Jump (after 1 post instruction) to "JUMP." (Post instruction) Get and save 2nd half-word. "BAL" uses subroutine "EA2" to complete the effective address of operand 2, and fetches operand 2, the branch address. Control then jumps to "LINK" to complete the "BAL" instruction. Link to "EA2." (Post instruction) Access first half of branch address. Route low order half word of effective address. Store updated low-order half-word. Jump to "e" if high-order byte of effective address is ok. Else increment high-order byte. And store updated high-order byte. Route the 1st word of the branch address to bus 2, and save in 0P2L0. Access second word of branch address. R 10U Label Bus Source Destina- tion Function Comment 3 i ■ * -X- X * •x -x- •* BC * x U2 •x- * F G -x UN LINK' 8000' CDR UCOM BCU/5 Tl UCOM 0P2HI 'TEST' BCU/l 3 'UZ' LKRG 'UN' BCU 'EA2« BCU/l 3 'F' LKRG 2 EAHI CARH 3 EALO CARL 3 EALO UCOM 2 UCOM ALL 3 ALO EALO 'G T BCU 2 EAHI ALL 2 ALO EAHI 1 CDR UCOM 3 UCOM 0P2L0 2 EAHI CARH 3 EALO CARL 'JUMP' BCU/l 2 CDR 0P2HI 2 ICHI CARH 3 ICLO CARL 3 ICLO UCOM 2 UCOM ALL 3 ALO ICLO UC Go to "LINK" to complete the BAL instruction (3 post instruction). Save instruction length code. Route second word of branch address to bus 2. Save second word of branch address, "BC" uses subroutine "TEST" to check condition. If successful, "EA2" computes the effective address of the branch address, and "JUMP" completes the instruction. If unsuccessful, the next program half-word is accessed, and control returns to "IFCH." Go to "TEST" to check branch condition. (Post instruction) Save "unsuccess- ful" return address. UC If unsuccessful, "TEST" returns here, and then jumps to UN. UC If successful, "TEST" returns here. Link to "EA2." (Post instruction) Save return address. R Access the branch address in core. Route "EALO" to bus 2. INC Increment "EALO" and save in EALO. CCL If no carry EA now ok for second half-word of branch address. INC Carry--need to increment EAHI. Save incremented EAHI. Route core data to bus 3, and save operand 2. R Access 2nd half-word of branch address. UC Branch (after 1 post instruction) to "JUMP." Get 2nd half word of effective address. R Access next half-word of program. Route ICLO to bus 2. INC Increment ICLO, and save. 105 Label -x- * STH -* •* H * J Destina- Bus Source tion Function •ITCH' BCU CCL 2 ICHI ALL INC ALO ICHI 'IFCH' BCU UC 'EA2' BCU/2 UC 3 'H' LKRG 1 IRO SHR •Bh 2 EAHI CARH 3 '000F' ALR SHE ALL AND 3 ALO SHR Ll/O l SHR SAR R 2 SODR UCOM 1 UCOM CDR 3 EALO CARL W 3 ICLO UCOM 2 UCOM ALL INC 'J' BCU CSET 3 ALO ICLO 'IFCH' BCU/2 UC 2 ICHI CARH UCOM CARL 3 ALO ICLO 2 ICHI ALL 2 ALO CARH ' IFCH ' CU/2 R INC UC ALO ICHI Comment If no carry, "IFCH." Carry set, increment ICHI and save. Now go to "IFCH." "STH" uses "EA2" to calculate the effective address and then completes the STH instruction, returning control to "IFCH." Link to "EA2" to compute effective address of op 2. (Post instruction) Save return address. Right justify Rl field. Load high-order byte of effective address into core address buffer. Load mask. Zero all but Rl field. Double Rl to get scratch address. Access (Rl). Route (Rl) to bus 1. Load (Rl) into core data buffer. Load address, and do the memory write. Route low order half-word of instruction counter to bus 2. Increment low-order half-word of instruction counter. If carry, jump to "j" to increment ICHI. Otherwise, save the incremented instruction counter And return to "IFCH" (after 2 post instructions). (Post instruction) Load high-order byte of instruction counter into core address buffer. (Post instruction) Load low-order half-word of instruction counter and access. Save incremented ICLO. Increment ICHI, And load it into core address buffer. Return to "IFCH" after (2 post instructions) . (Post instruction) Save incremented ICHI. K I Label Bus Source Destina- tion Function Comment UCOM CARL W * * * * * ST K -x- -X- -X- -X- * EA2 1 IRO SHR 3 '000F' ALR SHR ALL 2 EAHI CARH ALO EALO SHR 2 SODR UCOM 1 UCOM CDR 3 EALO CARL 3 EALO UCOM 2 UCOM ALL 3 ALO EALO •K 1 BCU 2 EAHI ALL 2 ALO CARH •I' BCU/3 2 SODR UCOM 1 UCOM CDR CARL AND Ll/O W INC CCL INC UC W 2 'OOOO' EAHI 1 LR1 ALR ' OFFF ' ALL AND 3 ALO EALO 2 'FOOO' ALL AND 1 ALO SHR ik/m 'BZ' BCU ZSET 3 SHR SHR Ll/O (Post instruction) Access next half-word of program. "ST" completes execution of the ST instruction, then accesses the next half-word of the program and returns to "IFCH." Right justify Rl field. Load mask. Zero all but Rl field. Load high-order byte of effective address into core address buffer. Double Rl field to get scratch address of Rl. Route (Rl) to bus 1, and load into core data buffer. Load low-order half-word of effective address into core address buffer and write. Route EALO to bus 2, and increment. Save EALO. If no carry, go to K. Carry, so increment EAHI. And load into core address buffer. Jump (after 3 Pi's) to I. (P. I.) Route SODR to bus 1. Load (Rl) into core data (P.I.) buffer. (P.I.) Load low-order half-word of effective address to core, and write. "EA2" computes the effective address for operand 2 in RX format instructions. It assumes IRO, 1 contain the instruction and LKRG contains the return address. Initialize EAHI to zero. Load IR1 into ALU, and zero all but D2 field. Store this in EALO. Now zero all but B2 field, and right justify. If B2 = 0, jump to BZERO. Otherwise double B2 to get scratch memory address. 107 Destina- Label Bus Source tion Function 1 SHR SAR R 3 EALO ALR 2 SODR ALL ADD2 3 ALO EALO 2 EAHI UCOM 3 UCOM ALR 2 SODR ALL ADC * 2 ALO EAHI BZ 1 IPO ALR ' OOOF ' ALL AND LKRG BCU ZSET * 2 ALO SHR Ll/O * 1 SHR SAR R 3 EALO ALR 2 SODR ALL ADDZ 3 ALO EALO 2 EAHI UCOM 1 UCOM ALR 2 SODR ALL ADDS 2 ALO EAHI 3 EALO SHR Rl •SEE' BCU SSET LKRG BCU UC # -* * * * •* * * TEST 1 '2000' ALR * CCR ALL AND 'CCB' BCU ZCL * 1 CCR SHR Ll/O 'CC1' BCU SSET * 2 SHR SHR Ll/O 'CCO' BCU SSET * CC2 'MSK' BCU/l UC * 3 1 0020 ' Tl Comment Access the base register. Load EALO into ALU, and add the base register contents. Store this in EALO. Route EAHI to bus 3, and load into ALU. Get high order byte of base address from scratch, and add it to EAHI. Store this in EAHI. Load IRO into ALU. Zero all but X2 field. If X2 field is zero, return to caller. Otherwise double to get scratch address of X2. Access scratch memory. Load EALO into ALU. And add (X2). Store this in EALO. Route EAHI to bus 1, and load it into ALU. Add the carry (if any), and store in EAHI. Shift the 1 s b of EALO into spill, If spill set - have spec, exception, Otherwise effective address ready. Return to caller. "TEST" assimilates the condition code from the internal format, and compares it to the mask bits. It returns via LKRG if unsuccessful, and to [LKRG] + 1 if successful. Load mask corresponding to overflow bit. Load CCR and compare to mask. If zero bit is clear, then match. Branch CC = 3. Shift N bit into spill. If spill set, then CC = 1, branch there. Shift Z bit into spill. If spill set, then CC = 0, branch there. CC = 2 by default. Jump to MSK after 1 P.I. (P. I.) Put mask for CC2 into Tl. so so 108 Destina- Label Bus Source tion Function CC3 'MSK' BCU/l UC 3 ' 0010 * Tl CC1 'MSK' BCU/l UC 3 '00^0' Tl UC CCO 3 '0080' Tl MSK 3 Tl AIR l IRO UCOM 2 UCOM ALL AND 3 LKRG UCOM -¥• UCOM BCU ZSET Tv 2 UCOM ALL INC ALO BCU UC * •X- -* * •X- -x- -X- * * -X- -X- * LINK 1 '2000' ALR CCR ALL AND # 3 'CC3 1 BCU ZCL CCR SHR LI 'CCl' BCU SSET 1 SHR SHR LI 'CCO' BCU SSET »'L« BCU/l UC 1 '2000' ALR CC3 •L' BCU/l UC 3 '3000' ALR CCl T L« BCU/l UC 1 1 1000 * ALR L 2 Tl ALL OR 3 ALO ALR 2 ICHI ALL OR * 1 IRO SHR Rk 3 ALO Tl 2 'OOOF' ALL AND ALO SHR Ll/O 1 SHR SAR W 2 Tl SIDR Comment (P.I.) (P.I.). Load the condition code into ALU. Route IRO to bus 2. Compare condition code with Ml field. Route LKRG to bus and bus 2. If zero bit is set, then unsuccessful, so return to (LKRG). Otherwise increment (LKRG) and return there. "LINK" has 2 global entry points; "LINK" and "JUMP." "LINK" assembles PSW <32-63> in Rl and then loads the branch address into the core address buffer and ICHI * ICLO. The access of this memory location is started, and control returned to "IFCH." Conditional branches enter at "JUMP" when a branch is successful. Load overflow bit mask and check OV bit of condition register. If OV set, CC = 3- Shift N bit into spill. If N set, CC = 1. Shift Z bit into spill. If set, CC = 0. Jump to L. CC = 2 by default. (P. I. ) code for CC = 2. Go to L, (l p.l.) CC = 3 code (P. I.) Go to L (l P.I.) CC = 1 code (P. I.) Tl contains instruction length code. Insert high-order byte of instruction counter. Right justify R2 field. Save Tl = PSW <32-VT-- Zero all but R2 field. Double to get scratch address. Load scratch address register. Write PSW <32-^7> into scratch memory. 109 Label Bus Source Destina- tion Function Comment ICLO SIDR * * JUMP 2 0P2L0 UCOM 3 UCOM ICLO • FFOO ' ALL AND * 1 ALO ALR 2 0P2HI ALL OR * ALO • ICHI 2 ICHI CARH 3 ICLO CARL R 3 ICLO UCOM 2 UCOM ALL INC 3 ALO ICLO 'IFCH' BCU CCL 2 ICHI ALL INC •IFCH' BCU/1 UC 2 ALO ICHI * * * * # * * * * * HWSU 3 LKRG Tl 'EA2' BCU/l UC * 3 •M' LKRG M 2 EAHI CARH 3 EALO CARL R 3 ICLO UCOM UCOM ALL INC * l CDR UCOM 2 UCOM 0P2L0 2 ICHI CARH 3 ICLO CARL R 3 ALO ICLO ifli BCU CSET * P 2 0P2L0 ALL TRL -* 3 Tl UCOM * UCOM BCU/1 NSET 2 'FFFF' 0P2HI Write PSW <&8-63> into scratch memory. Linkage now complete. Route Zero old instruction counter m.s. byte but save prog. mask. Insert new m D s. byte of instruction counter. Save new ICHI. Access next word of program. Route ICLO to bus 2. Increment ICLO. Save incremented ICLO. If no carry, go to 'IFCH'. If carry, increment ICHI. Jump to 'IFCH', (1 P.I.) (P. I.) Save incremented instruction counter. "HWSU" does set-up for the half- word instructions. "EA2" is used to compute the effective address of operand 2. "HWSU" transfers directly to the execution routine whose entry point is in LKRG. Save (LKRG) in Tl. Link to subroutine "EA2" to compute effective address. (P. I.) Save return address. Access half-word at EA. Route ICLO to bus 0. Increment low-order half-word of instruction counter. Route CDR to bus 2. Get operand 2 from core. Access next word of program. Store incremented ICLO. If carry clear, ICHI ok, else go to "N". Transfer 0P2L0 to ALO. Route Tl (= execution routine address) to bus 0. If high-order bit of 0P2L0 is 1, branch after (P. I.) setting 0P2HI to all ones (sign extend). 110 Destina- [iabe] Bus Source tion Function UCOM BCU/1 UC 2 '0000' 0P2HI N 2 ICHI ALL INC tpi BCU/1 UC 2 ALO ICHI * * * * * -* i -X- * * FWSU 2 EAHI CARH 3 EALO CARL R 3 EALO UCOM UCOM ALL INC 3 ALO EALO 'Q' BCU CCL 2 EAHI ALL INC 2 ALO EAHI Q 1 CDR UCOM 2 UCOM 0P2L0 * 3 ICLO UCOM UCOM ALL INC 2 EAHI CARH 3 EALO CARL R 1 CDR UCOM 2 UCOM 0P2HI 2 ICHI CARH 3 ICLO CARL R LKRG BCU/1 CCL * 3 ALO ICLO 3 ICHI ALL INC LKRG BCU/1 UC 3 ALO ICHI * * * * * CMP l IRO SHR R1+/0 3 1 OOOF ' ALR 2 SHR ALL AND ALO SHR Ll/O Comment (N not set) Branch after Zeroing 0P2HI. Increment ICHI and jump to P. (after 1 P.I.) (P. I.) Save incremented ICHI. "FWSU" does set-up full for full word instructions. "EA2" is used to calculate the address of operand 2. "FWSU" transfers directly to the appropriate execution routine (in LKRG). Access operand 2. Route EALO to bus 0. Increment effective address. And save in EALO. If carry clear, EAHI ok. Otherwise increment EAHI. And save in EAHI. Route core output data to bus 2. Save low order half-word of operand 2 in 0P2L0. Route ICLO bus 0. Increment ICLO. Access second half-word of operand 2, Route core output to bus 2. Save core output in OP^HI. Access next half-word of program. If no carry, jump to execution routine. (Post instruction) save new ICLO. Carry set - so increment ICHI. Now go to execution routine. Save updated ICHI . "CMP" does final execution steps for CR and C instructions. Right justify Rl field. Load mask. Zero all but Rl field. Double to get operand 1 scratch address. Ill Label * * * * * Destina- Bus Source tion Function 1 SHR SAR R 3 0P2L0 AIR 2 SODR ALL SUB2 2 0P2HI ALR SODR SHR ALL SAR 2 0P2L0 SIDR •IFCH 1 BCU/l 2 0P2HI SIDR SUBC »R ! BCU OVCL 1 STR SHR Ll/O * f S' BCU SSET 3 SHR SHR Rl/l 'IFCH' BCU/l UC SHR CCR s 1 SHR SHR Rl/O 'IFCH' BCU/l UC 1 SHR CCR R 'IFCH' BCU/l UC 1 STR CCR * # * # * * * LOAD 1 IRO SHR Rk/O 3 'OOOF' ALR 2 SHR ALL AND ALO SHR Ll/O w UC Comment Access low-order half-word, of (Rl). Load low order half-word of compared into ALU. Subtract comparands. Load high-order half-word of comparand 2. Get and compare high-order half- word of comparand 1. If overflow clear go to R. Overflow set. Shift sign bit into spill. If spill set, go to S, result < 0). Complement N bit by shifting in a 1. Go to IFCH after 1 P.I.) (P. I.) Save new condition in CCR. Complement N bit by shifting in a 0. Go to IFCH (after 1 P.I.). (P. I.) Save new condition. Go to IFCH (after 1 P.I.). (p. I.) Save new condition. "LOAD" completes execution of LH.L. and LR instructions. Control is returned to "IFCH." Operand 2 is located in 0F2L0 * 0F2HI. Right justify Rl field. Load mask into ALU. Zero all but Rl field. Double Rl to get scratch address of (Rl). Load address into scratch address register. Load 0P2L0, and begin write cycle. Return to "IFCH" (after 1 P.I.). Load 0P2HI. "ADD" executes the addition required for AR, AH, and A instructions. The condition is updated, exceptions checked and control transfered to "LOAD" upon completion of "ADD." The code for handling fixed point overflow immediately follows this code as indicated, but has not been implemented. 112 Label ADD * CMK FPOV * * -* * * Destina- iUS Source tion Function 1 IRO SHR RU/O 3 '000F' ALR SHR ALL AND 2 ALO SHR Ll/O l SHR SAR R 2 0F2L0 UCOM 3 UCOM ALR SODR ALL ADD2 50DR UCOM 2 ALO OP^LO 1 UCOM ALR 2 0P2HI ALL ADC STR CCR 2 ALO 0P2HI 'CMK' BCU OVSET ' LOAD ' BCU UC 1 '2000' ALR 2 ICHI ALL AND ♦LOAD' BCU ZSET Comment Right justify the Rl field. Load mask. Zero all but Rl field. Double to get scratch memory address of Rl. Access low-order half-word of Rl. Route 0P2L0 to bus 3. Load 0P2L0 into ALU. Add (Rl) low order half-word to 0P2L0. Route Rl high-order half-word to bus 1. Save low-order sum in 0P2L0. Load Rl high-order half-word into ALU. Add with carry 0P2HI. Store the resulting condition code. Save high-order sum in 0P2HI. If OVSET, jump to "CMK" to check program mask. Go to load to complete this instruction. Load mask. Load program mask and check OV bit. If clear, go to load. Otherwise process fixed point over- flow exception. (Unimplemented) "SUB" executes SP, SH, and S subtractions similar to "ADD. " Code is identical to "ADD" except with SUB2 and SUBC arithmetic operations in place of ADD2 and ADC. In the case of an overflow exception, control is transferred to FPOV above. The code is not repeated here for brevity. BIBLIOGRAPHIC DATA SHEET 1 Report No. UIUCDCS-R-73-56^ 3. Recipient's Accession No. 5. Report Date January, 1973 4. Title and Subtitle A Modular Microprogrammed Computer with Concurrent Decentralized Control 6. f. Auctions) Mark Loren Ketelsen 8. Performing Organization Rept. No. ?. Performing Organization Name and Address Computer Science Department University of Illinois Urbana, Illinois 6l801 10. Project/Task/Work Unit No. 11. Contract /Grant No. US NSF GJ-36265 12. Sponsoring Organization Name and Address National Science Foundation Washington, D.C. 13. Type of Report & Period Covered 14. 15. Supplementary Non.-s 16. Abstracts This thesis presents a computer organization capable of concurrent processing at the microinstruction level. A complete logical description is given, and the results of simulation studies are presented. An emulator for the IBM System/360 is developed, and its performance analyzed for comparison with various models of the IBM line. 17. Kev Words and Document Analysis. 17a. Descriptors Computer Organization Microprogramming Emulation 17b. Ident if lers/Open-F.nded Terms 17c. C.OSATI Fit Id /Group 18. Availability Statement Release unlimited 19. Security Class (Thi = Report) UNCLASSIFIED 20. Secutity Class (This Page UNCLASSIFIED 21. No. of Pages 119 22. Price FORM NTIS-35 (10-70) USCOMM-DC 40329- P7 I ♦ & I ocr as 1973 Hi ffl [SSSjOwHuminn > <• n ass n SvEB«TVOF.LUMO.S.U«BAMA Modul.- mlcrop.ogramm.O comp I MB 3 0112 088400566 ■HI H mm ■ San BhH *>#&# ■ wwBanmwHBaH 3wHfl HNS