BBBHil 
 
 ■HHHI 
 
 BHffmfflflHff TfflPMflff P5 
 
 HBbhHBL 
 
 «|K^^^^ Mil 
 
 ■raaaraHHSHHraEHaHBancN 
 
 m 
 
 MiaMtmiiiwBm 
 
 HKSSZfiEQedBEssESE KR580 
 
 BSBfa 
 8wBSa 
 
 Jam 
 
LIBRARY OF THE 
 
 UNIVERSITY OF ILLINOIS 
 
 AT URBANA-CHAMPAIGN 
 
 510.84 
 lifer 
 no. £19- &2<b 
 
Digitized by the Internet Archive 
 in 2013 
 
 http://archive.org/details/deadlockindistri619mill 
 
D/C.tt 
 
 uiucDcs-R-7^-619 
 
 DEADLOCK IN DISTRIBUTED 
 COMHJTER NETWORKS 
 
 by 
 Thomas Jay Miller 
 
 December 197^ 
 
 DEPARTMENT OF COMPUTER SCIENCE 
 UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN 
 
 URBANA, ILLINOIS 
 
UIUCDCS-R-7^-6l9 
 
 DEADLOCK IN DISTRIBUTED 
 COMPUTER NETWORKS 
 
 This volume is bound without h£>. L?ZLO 
 
 which is/are unavailable. 
 
 Department of Computer Science 
 
 University of Illinois at Urbana-Champaign 
 
 Urbana, Illinois 
 
 This work was supported in part by the National Science 
 Foundation under Grant No. US NSF GJ-36265 and was 
 submitted in partial fulfillment for the Master of 
 Science degree in Computer Science, 197*+. 
 
UIUCDCS-R-7^-6l9 
 
 DEADLOCK IN DISTRIBUTED 
 COMPUTER NETWORKS 
 
 by 
 Thomas Jay Miller 
 
 December 197^ 
 
 Department of Computer Science 
 
 University of Illinois at Urb ana -Champaign 
 
 Urbana, Illinois 
 
 This work was supported in part by the National Science 
 Foundation under Grant No. US NSF GJ-36265 and was 
 submitted in partial fulfillment for the Master of 
 Science degree in Computer Science, 197^. 
 
51 l>4r 
 
 Xll 
 
 Acknowledgment s 
 
 My sincere thanks go to Professor Jane Liu who allocated a large 
 amount of personal resources to me for direction, review and support 
 in the preparation of this paper. I would like to thank June Wingler 
 and my wife Judy for their excellent work on the typing, retyping and 
 illustrations. Finally, thanks are also due to Professor Don Gillies, 
 who was always available for discussion. 
 
IV 
 
 TABLE OF CONTENTS 
 
 page 
 
 1. INTRODUCTION 1 
 
 2 . DEFINITION AND A GRAPH MODEL 5 
 
 3. SURVEY OF CENTRALIZED RESOURCE MANAGEMENT POLICIES 2k 
 
 k. DESIGN OF A DISTRIBUTED RESOURCE MANAGEMENT POLICY 3^ 
 
 5. RECOVERY FROM DEADLOCK 50 
 
 6. PERFORMANCE OF DISTRIBUTED DEADLOCK POLICIES $k 
 
 LIST OF REFERENCES 57 
 
1. INTRODUCTION 
 
 With the advent of computer systems capable of supporting a 
 number of processes (running programs) simultaneously through multi- 
 programming and time sharing, a class of problems arose which did not 
 exist previously in the single-user systems. These problems came about 
 as the result of contention by the processes for the resources of the 
 computer (such as tape drives, files, memory units and CPU time). It 
 became apparent that if the behavior of the processes was unrestricted, 
 not only could the overall resource utilization be inefficient, but also 
 some processes in the system could actually induce incorrect operation 
 on other processes. Thus, the operating system was forced to take on 
 the additional roles of supervisor and mediator of the processes. 
 
 One of the problems which arose was related to the allocation of 
 the resources of the computer. A computer system is said to contain a 
 deadlock if one or more processes in the system can never proceed because 
 their resource requirements can never be satisfied. For example, 
 consider a system in which a process cannot release a resource while 
 waiting for the request for another resource to be granted. Suppose 
 that this system consists of processes PI and F2. and resources Rl and 
 R2. Furthermore, suppose that Rl has already been allocated to PI and 
 R2 has already been allocated to P2. Now, if PI requests R2 and P2 
 requests Rl, then PI and P2 are deadlocked, since neither process can 
 
proceed. In general, deadlock can involve any number of processes 
 interacting through their resource requirements. 
 
 Theories have evolved which allow us to determine necessary and 
 sufficient conditions for the existence of deadlock in different types 
 of computer systems. From these theories, resource management policies 
 have been designed to deal with deadlock in real systems. The simplest 
 policies prevent deadlock from occurring by restricting resource requests 
 by processes. For example, if processes are required to request all 
 the resources which they will need at one time, then the resource manager 
 can allocate the resources for each process all at once and deadlock 
 can never occur. Thus, in simple policies such as this one the resource 
 manager simply has to enforce the restrictions. 
 
 Unfortunately, such restrictions often lead to inefficient 
 resource utilization. Therefore, other resource management policies 
 have been designed which are necessarily more complicated, but which 
 remove restrictions on processes and attempt to increase resource 
 utilization. In these policies the resource manager monitors the system 
 and controls the allocation of resources to deal with deadlock 
 dynamically [CI, H2, Ek, H5, H6] . 
 
 The resource management policies proposed so far have been central- 
 ized, that is, only one resource manager can be active in the system at once, 
 Request and release operations placed by processes must be queued up for 
 service by the centralized resource manager one at a time. Inherent in 
 a centralized resource management scheme is a lim i tation on the size of 
 the system which can be serviced. In large multiprocessors and computer 
 networks with common resources, the centralized resource manager can 
 
become the bottle-neck in the system. One way to alleviate this problem 
 is with distributed resource management. In this report we will study 
 the design of distributed resource management policies. In particular, 
 we shall re€xamine the theory of deadlock and see how centralized 
 algorithms may best be adapted for distributed control. 
 
 In Chapter 2 we shall define the necessary terms and introduce a 
 graph model (proposed by Holt [Ek, H5] ) for our analysis of deadlock. 
 We have chosen this model because of its generality and because of the 
 systematic approach to the building of deadlock-related algorithms which 
 it allows. It has also been chosen because it is perhaps the only model 
 proposed which considers the passing of messages in its analysis of 
 deadlock, which is definitely an important feature when applied to 
 computer networks. 
 
 Our knowledge of deadlock will then be applied in Chapter 3 to 
 examine the types of centralized resource management schemes currently 
 proposed (as related to their treatment of deadlock), with specific 
 examples. 
 
 Chapter k is devoted to the development of distributed resource 
 management. When any process is allowed to execute system procedures to 
 handle resource management, the main problem is one of synchronization. 
 Thus we will discuss how current algorithms may be synchronized for 
 distributed control and show how to do so for an algorithm given in 
 Chapter 3. 
 
 Deadlock can occur in any system, even if an algorithm is employed 
 to prevent it. In the latter case, deadlock can be caused by an unexpected 
 
occurrence, such as a hardware failure. Therefore, Chapter 5 is devoted 
 to recovery from deadlock. The recovery process is explained and an 
 example of a very fast recovery algorithm is discussed. 
 
 Finally, in Chapter 6 we will discuss the performance and cost of 
 the various types of distributed resource management as related to the 
 algorithm employed for deadlock. We will give formulas for the size of 
 the data base required and worst-case execution times. There will also 
 be a brief discussion on how to decide what type of deadlock algorithm 
 should be employed for any given system. 
 
2. DEFINITION AND A GRAPH MODEL 
 
 We now introduce the terminology which we shall use in our study 
 of deadlock. 
 
 A process is a sequence of instruction executions specified by 
 a computer program. In particular, in the analysis of deadlock, we are 
 only concerned with those points in the execution at which the process 
 changes or attempts to change its current allocation of resources. (These 
 points are called operations and will be described later. ) The interim 
 execution is of no concern to us, except that it does take a non-zero 
 amount of time. 
 
 A resource , is anything in the computer system which a 
 process may require to continue in execution. The resources are organized 
 into resource types . Resources that are identical in terms of satisfying 
 specific requirements by processes are said to be of the same type. The 
 smallest assignable quantity of a resource is called a unit . For example, 
 "tape drive" is a resource type and a specific tape drive is a unit of 
 this type; or the "memory" type in a given system may be assignable in 
 512-byte units. 
 
 There are two classes of resources. A reusable resource is one 
 which a process may acquire from the system, use, and return to the 
 system for use by other processes. Examples of reusable resources are 
 I/O devices, tapes, disk packs, files and memory units. A unit of a 
 consummable resource , on the other hand, no longer exists after being 
 
acquired and used by a process. Messages in inter-process communication 
 are examples of consummable resources. Every consummable resource type 
 must have one or more processes which are designated as its producers . 
 A general resource system is completely characterized by: 
 
 1) a nonempty set of processes {P, , P„, . .., P ] t 
 
 2) a nonempty set of resource types [R_, R~,, . .., R ), 
 
 3) for each type of reusable resource R., a positive 
 
 J 
 
 integer t., which denotes the total units of R., 
 J d 
 
 h) for each consummable resource type, a nonempty set 
 of producers. 
 
 In a general resource system, a process may place a request to 
 acquire units of a resource type which are available for acquisition. 
 Subsequent to a request, a process may make an acquisition of the requested 
 units. At this point, these units are no longer avaialable for acquisition 
 by other processes. Finally, a process may release units of a resource type, 
 making these units available to other processes. Reusable resource units 
 are released by the processes who previously acquired them. Consummable 
 resource units are released (or produced) only by the producers of the 
 resource type. The request, acquisition and release of resources are the 
 operations which a process may perform in a general resource system. 
 Later in this section we will give more precise definitions for these 
 operations . 
 
 At any time either one of the following transient relationships 
 may exist between a process P. and a resource R.: 
 
 l) P. has requested, but not yet acquired, one or more 
 
 units of type R.. 
 3 
 
2) P. has acquired and not yet released one or more 
 
 units of a reusable type R.. We say these units 
 
 J 
 
 have been assigned to P. . 
 
 At any time the state of a general resource system is characterized 
 by the number of available units of each resource type and the relationships 
 between the processes and resource types in the system. Hence, any 
 operation by a process changes the state of the system. 
 
 The behavior of a general resource system can be described by 
 its state diagram, such as in Figure 2.1. Each node in the diagram 
 represents a state of the system and is labelled by the name of the state. 
 If an operation by P. exists by which P. transforms state S to state T, 
 then there is an edge labelled with "i" in the state diagram directed 
 from S to T. We also represent this operation by 
 
 S -i -* T 
 
 (read "process P. takes S to T). Figure 2.1, therefore, represents a 
 system with states {S, T, U, and V) and processes {P, and P ). Some of 
 the operations in this system are S -1 -> U, U -1 -* S and T -2 -> V. 
 If a sequence of zero or more operations exists which takes S to V, then 
 we say 
 
 S -* - V. 
 
 Thus, in Figure 2.1, T -* -» S since T -2 ■+ V, V -2 -> U and U -1 -> S. 
 
 When a process P. can perform no operations in state S, we say P. 
 
 is blocked in S. P. is deadlocked in S if P. is blocked in S and will 
 i i 
 
 remain blocked for any sequence of operations performed by other processes. 
 That is, P. is deadlocked in S if for all T such that S -* -* T, P. is 
 
8 
 
 Figure 2.1 The state diagram of a system. 
 
"blocked in T. We say S is a deadlock state if one or more processes are 
 deadlocked in S. It follows that if S is a deadlock state, then for 
 all T such that S -* -* T, T is a deadlock state. (The converse of this 
 statement is not true.) In Figure 2.1, P is blocked in T (but not 
 deadlocked) and P p is deadlocked in states S and U. 
 
 Though the state diagram discussed above is helpful in the 
 formalization of basic concepts in deadlock analysis, it is not a useful 
 tool for the analysis of deadlock in real systems. Just to generate the 
 state diagram for any but the smallest of systems would be a formidable 
 task. What we would like to have is the ability to analyze at any time the 
 information contained in the current state of the system in order to answer 
 such questions as whether the system is deadlocked or how to restrict 
 the operations of processes so that deadlock will not occur. For this 
 purpose we will introduce a graph model proposed by Holt [H^, H5] . 
 
 To describe this graph model we need to define some terms from 
 graph theory. A directed graph , such as the one in Figure 2.2, consists 
 of a set of nodes and a set of edges connecting these nodes. We denote 
 an edge directed from node n-, to node n p by (n.,, n p ). Furthermore, we 
 say n-, is the father of n p and n p is the son of n,. A node from which 
 no edges are directed is called a sink . Thus, node g is the only sink 
 in Figure 2.2. A path is a sequence of nodes (n ,rip,n„, . . .,n/ >,,rL ) 
 such that (n., n/. -,\) is an edge for i=l, ..., k-1. A cycle is a path 
 in which the first and last nodes are the same. (a,b,e, d, a) is a cycle 
 in Figure 2.2. A knot is a set of nodes such that every node in the set 
 has a path to every other node in the set, and no node in the set has a 
 path to a node not in the set. In Figure 2.2 {a,b,c, d, e) is a knot. 
 
10 
 
 Figure 2.2 A directed graph. 
 
11 
 
 A general resource graph is a directed graph in which the nodes 
 
 represent processes and resource types in a general resource system. An 
 
 edge directed from a process node to a resource type node is called a 
 
 request edge, and represents a request by the process for a unit of the 
 
 resource type. An edge directed from a reusable resource type node to a 
 
 process node is called an assignment edge , and represents the assignment of 
 
 a unit of the resource type to the process. An edge directed from a con- 
 
 summable resource type node to a process node is called a producer edge and 
 
 represents the processes being a producer of units of the resource type. 
 
 Figure 2.3 is an example of a general resource graph describing 
 
 a state in a general resource system. We can see that this general 
 
 resource system has processes P-., Pp and P~ (represented by square nodes) 
 
 and resource types R-,, Rp and R^ (represented by the large circular nodes). 
 
 Each individual unit of a resource type is represented by a circular 
 
 subnode within the type node. 
 
 For reusable resources, such as R-, and Rp, the subnodes represent 
 
 the total units of the resource type. Thus, t-,=2 and tp=3. Assignment 
 
 edges are directed from the subnode representing the unit assigned. Since 
 
 the number of available units r. of a reusable resource R. is equal to 
 
 1 i 
 
 the total number of units minus the number of units assigned, we can see 
 that r-,=0 and r 2 =l. 
 
 For consummable resources, the subnodes simply represent the 
 available units, that is, the units which have been produced, but not yet 
 acquired. Thus, in Figure 2.3, ro=2. 
 
 To see exactly how operations by processes are reflected in the 
 state graph, let us describe in detail the three operations. 
 
12 
 
 Reusable 
 Resource 
 
 Process 
 
 Consummable 
 Resource 
 R„ 
 
 Reusable 
 Resource 
 
 Process 
 
 Figure 2.3 A general resource graph. 
 
13 
 Request . If in the graph describing state S, no request 
 edges are directed from P^ and S -i ~* T is a request, 
 then the graphs describing S and T are identical, except 
 that there are one or more request edges directed from 
 P. in the graph describing state T. 
 
 Acquisition . Suppose that in the graph describing state 
 S there are one or more request edges directed from P. to R. 
 and r. is equal to or greater than the number of request edges. 
 
 J 
 
 Then, if S -i -*■ T is an acquisition, the graphs 
 describing S and T are the same, except, for each 
 request edge (P., R.) in the graph describing S: 
 
 1) r. is decreased by one, 
 
 2) if R. is a reusable resource then each request 
 
 J 
 
 edge (P., R.) is replaced by an assignment 
 edge (R , P ± ), 
 
 3) if R. is a consummable resource, then each 
 
 J 
 
 request edge (P., R.) is deleted. 
 
 i a 
 
 Release . Suppose that in the graph describing state S 
 
 there are no request edges directed from P. and there 
 
 is at least one assignment edge or a producer edge 
 
 directed to P. . Then, if S -i -> T is a release, the 
 
 graphs describing S and T are identical except that 
 
 one or more of the resources having edges directed to 
 
 P. has it available units (r.) increased by some 
 i 3 
 
 number m. If R. is a reusable resource then m 
 
 •3 
 
 assignment edges (R., P.) are deleted. 
 
 Since a process is blocked when it can perform no operations, it 
 follows from the definitions above that a process is blocked if and only 
 if for some resource R. the number of request edges (P., R.) exceeds r.. 
 
lit 
 
 We will say a state is expedient if all processes having requests arc 
 blocked in that state. A process P. is deadlocked if and only if no 
 sequence of operations by other processes can produce a state in which P. 
 is not blocked. (If such a sequence exists then P. is not deadlocked. ) A 
 direct method of determining if a sequence of operations leaves a process 
 unblocked is a graph reduction method . A graph reduction by an unblocked 
 process P. corresponds to the best set of operations which P. can perform 
 to unblock other processes, which amounts to releasing as many units of 
 reusable and consummable resources as possible. 
 
 Specifically, a reduction by an unblocked process node does the 
 following. For each reusable resource R., all request edges (P., R.) 
 are deleted, corresponding to acquisition and release operations. 
 Furthermore, for each assignment edge (R., P.), the edge is deleted 
 and the available units, r., are increased by one, corresponding to 
 release operations. Similarly, for consummable resources, r. is decremented 
 by one for each request edge, and the edge is deleted. If P. is a producer 
 of R., then the producer edge is deleted and r. is set to positive 
 
 infinity, corresponding to the production of sufficient units to unblock 
 all processes. 
 
 If by some sequence of reductions, all edges of the general 
 resource graph are deleted, we say the graph is completely reducible . 
 Thus, we have the following results [H^,H5] : 
 
 Result 2.1 . A process in a general resource graph is 
 
 not deadlocked if and only if a sequence of graph 
 
 reductions produces a graph in which the process is 
 
 not blocked. 
 
 Result 2.2 . If a resource graph is completely reducible 
 
 then the corresponding state is not a deadlock state. 
 
15 
 
 These results allow us to design algorithms to determine if any- 
 process in a general resource system is deadlocked. Such algorithms 
 operate by trying successive reduction sequences until the process in 
 question is unblocked. Unfortunately, if a deadlock does exist, nearly n! 
 reduction sequences (where n is the number of processes in the system) may 
 have to be tried. This is clearly impractical in systems with large numbers 
 of processes. As we shall see in the next chapter, by making various 
 restrictions on the types of resource system which we will handle, algorithms 
 with reasonable execution times can be designed. 
 
 But first, we will develop here an example of some data structures 
 which can be used to encode the state information of a system as described 
 by its state graph. These data structures will then be used in the 
 examples of resource management procedures contained in succeeding 
 chapters. We will develop these structures for single-unit request systems , 
 i.e., systems in which each process has at most one request edge directed 
 from it in the state graph at any one time. First we will determine the 
 information required to describe each node and subnode of the graph. Then 
 we will discuss how these data can be organized into structured arrays 
 and how these structures can be linked together to describe a state graph. 
 
 Table 2.1 lists the data fields necessary for describing each kind 
 of node in the state graph and the edges directed from it. The "name" 
 field for each type of node will hold the index of the name of the node. 
 For example, the name field for process P. will hold the value j. The 
 "request edge" field for each process node will hold the index 
 
16 
 
 Table 2.1 The data fields required to 
 describe nodes in the state 
 graph. 
 
 Process node 
 
 name 
 
 request edge 
 
 Resource Type node 
 
 name 
 
 type 
 
 available units 
 
 producer edges 
 
 Resource Unit 
 name 
 assignment edge 
 
 of a resource type node if a request has been placed for a unit of that 
 type by the process. It will hold a zero otherwise. 
 
 In the description of resource type nodes, the "type" field will 
 contain a zero if the node is a reusable resource and a one if it is 
 consummable. The number of available units of the resource type will be 
 stored in the "available_jinits" field. If the resource type is consumraable, 
 then there must be at least one "producer edge" field which contains the 
 
 index i for each process P. which is a producer of the resource type. 
 
17 
 
 Resuable resource units which have been assigned to a process P. will 
 have the value i stored in the "assignment_edge" field of their description. 
 This field will be zero for all unassigned units and for units of consummable 
 resource types. 
 
 The information describing nodes in the state graph can be con- 
 veniently organized into structured arrays, as illustrated in Figure 2.k. 
 Each identifier preceded by a 1 is the name of a structured array of 
 information and is followed by a number specifying the size of the array. 
 This name is merely a collective identifier for the data fields in the 
 structure. The data fields are specified by the identifiers in succeeding 
 lines, each preceded by a 2. This sequence of data fields is repeated 
 as many times as the number specified with the name of the structured array. 
 For example, "1PR0CESS" (100) is a structured array containing a sufficient 
 number of data fields for describing 100 processes. The data fields 
 describing process P. are organized into the structure "PROCESS(i). " 
 Individual data fields describing P. can be referred to separately as in 
 "request_edge (i ) . " 
 
 Note that there are no "name" fields in any of the structures in 
 Figure 2.k. The information which was contained in the "name" field is 
 now implicitly contained in the index of the structure. Note also that, 
 since a variable number of producer edges may be directed from a consummable 
 resource type node, there are no "producer_edge" fields in the 
 "RESOURCE_TYPE" array. The encoding of producer edges will be discussed 
 shortly. 
 
 Identifiers in Figure 2.k which are not discussed here will be explained 
 as they are used in the remainder of this chapter and Chapter 3. 
 
iB 
 
 1 PROCESS (100), 
 
 2 request_edge, 
 
 2 acquired_unit, 
 
 2 private_semaphore, 
 
 1 RESOURCEJEYPE (100), 
 2 type, 
 
 2 aval lab le_units, 
 2 unit_queue_ptr, 
 2 producer_queue_ptr, 
 2 request_count, 
 2 request queue ptr, 
 
 1 RESOURCE_UNIT (1000), 
 2 assignment_edge, 
 
 1 QUEUE_ELEMENT (3000) 
 2 next_ptr, 
 2 structure index; 
 
 Figure 2.h Structured arrays for the encoding of 
 state graphs in single-unit-request 
 systems. 
 
19 
 
 To represent relationships between nodes in the state graph, fields 
 in the node structures can be used to link the structures together. For 
 example, we have already shown how some of the process-resource relationships 
 can be represented with the "request_edge" and "assignment_edge" fields. 
 Relationships between one node and a variable number of other nodes can 
 be represented via a circular queue. The circular queue, for example, is 
 used to link the structures describing resource unit subnodes to the 
 structure describing the associated resource type node. Members of the 
 "QUEUE_ELEMENT " structured array are circularly linked through their 
 "next_ptr" fields to form the queue. The "structure_index" field of each 
 element in the queue contains the index of (or a pointer to) a member 
 of the "RESOURCEJJNIT " structured array. The "unit_queue_ptr" in the 
 associated member of the "RESOURCE_TYPE" structured array points to the 
 "rear" element in this circular queue. Since the "next_jptr" in the rear 
 element points to the "front" element, easy access to either end of the 
 queue is accomplished via the single "unit_queue_ptr . " 
 
 Figure 2 . 5a shows an example of how the relationship between 
 consummable resource node R. and its units U. , IL and U is represented. 
 We will assume system dequeue and queue procedures exist as illustrated 
 in Figures 2.5b and 2.5c. 
 
 In roughly the same manner we use a circular queue to encode the 
 producer edges of a consummable resource type. In this case the 
 "structure_index" field of each queue element contains the index of a 
 process which is a producer of the resource. A pointer to the queue is 
 stored in the "producer_queue_ptr" in the consummable "RESOURCE_TYPE" 
 structure. 
 
 The simple graph in Figure 2.6 and the data structure describing it in 
 Figure 2.7 illustrates how the structured arrays discussed above are used to 
 
20 
 
 unit queue ptr(j) • ' 
 
 a) Initial circularly linked unit queue. 
 
 unit queue ptr(j) 
 
 m 
 
 b) After dequeue (U, unit_queue_ptr (j ) ), 
 U contains k. 
 
 unit queue ptr ( j ) • > 
 
 c) After queue (n, unit queue ptr(j)), 
 
 Figure 2.5 An example of the circularly linked unit queue, 
 
21 
 
 Reusable 
 Resource Rl 
 
 Process PI 
 
 Consummable 
 
 Resource 
 
 R2 
 
 Process P2 
 
 Figure 2.6 A simple state graph. 
 
22 
 
 rt 
 
 
 H 
 
 
 
 
 II 
 
 
 
 
 w 
 
 II 
 
 O 
 
 
 ■P 
 
 Ph 
 
 II 
 
 
 
 -P 
 
 cu 
 
 
 | 
 
 ft 
 
 
 
 
 <l) 
 
 fl>. 
 
 
 CD 
 
 2 
 
 
 
 H 
 
 CO 
 
 h 
 
 
 ,a 
 
 i 
 
 0) 
 
 o 
 
 s 
 
 rr 1 
 
 o 
 
 II 
 
 H 
 
 
 3 
 
 (U 
 
 •H 
 
 -H 
 
 
 B 
 
 erf 
 
 ■H 
 
 +3 
 
 cd 
 
 P 
 
 ft 
 
23 
 
 encode a state graph. Each box in Figure 2.7 represents a structure. All 
 such structures are labelled by the name of the node they represent, with the 
 exception of the "QUEUE_ELEiyiENT " structures which are not labelled. Only 
 those data fields which we have already explained are included in the 
 figure. The values of all structure indices are indicated by arrows 
 pointing to the structure indexed. Each of the dashed- line circles 
 encloses a collection of structures which together describe a single 
 resource type node. 
 
2k 
 
 3. SURVEY OF CENTRALIZED RESOURCE 
 MANAGEMENT POLICIES 
 
 We now briefly discuss various policies used to handle deadlock in 
 centralized resource management systems. Available units in a centralized 
 system belong to the system and are controlled by the resource manager. 
 Processes queue up request and release operations to the resource manager 
 and the manager actually performs these operations on the system for the 
 processes. Acquisition operations are also performed by the resource 
 manager in accordance with the deadlock and allocation policies being 
 followed. 
 
 Basically three different types of policies have been used by 
 
 centralized resource managers to deal with the problem of deadlock. We 
 
 t 
 shall refer to them as : 
 
 1) static prevention, 
 
 2 ) detection and recovery, and 
 
 3) dynamic prevention. 
 
 Static prevention policies are the simplest and most straightforward 
 of the three. A resource manager adopting such a strategy enforces 
 restrictions on the placement of requests by processes in the system so 
 that a necessary condition for the existence of deadlock cannot occur. 
 An example of a resource manager using a static prevention policy is 
 
 Static and dynamic prevention are sometimes referred to as prevention 
 and avoidance respectively [CI], 
 
25 
 
 described by Havender [H2]. The necessary condition denied in this policy, 
 in terms of our graph model, is a cycle in the state graph. To deny this 
 condition, all resource types are organized into an ordered set of classes. 
 A process in the system may only request resource units in classes higher 
 than the highest class of units currently assigned to the process. This 
 means that while traversing a path directed from a process node in a state 
 graph, each resource node encountered must be of a higher class than the 
 previous one, thus a cycle can never exist and deadlock cannot occur. 
 
 In a resource management scheme following a detection and recovery 
 policy, the placement of requests is unrestricted and the system state is 
 examined at specific times for the existence of deadlock. If a deadlock is 
 detected, a recovery scheme is invoked to eliminate the deadlock. (Recovery 
 is discussed in Chapter 5. ) Detection may be concurrent or periodic . 
 Concurrent detection is accomplished by invoking the detection algorithm 
 after each request and acquisition operation which could cause a deadlock. 
 A periodic detection algorithm can be invoked at any time, regardless of the 
 number of operations which have occurred since the last time it was invoked, 
 and therefore may be invoked as frequently or infrequently as desired. 
 Detection algorithms have been proposed by Murphy [Ml] and Holt [H^,H5], 
 For an example of a detection algorithm, let us consider a general 
 resource system in which the resource manager treats multiple-unit 
 requests as a series of single-unit requests. It has been shown [HU,H5] 
 that given an expedient general resource graph with single-unit 
 requests : 
 
 Result 3.1 . The graph represents a deadlock state 
 if and only if it contains a knot. 
 
26 
 
 Result 3.2 . A particular process is not deadlocked if 
 and only if it is a sink or has a path directed from it 
 to a sink. 
 
 Results 3.1 and 3.2 lead to fast periodic and concurrent detection 
 algorithms respectively. Result 3.2 reduces the problem of deadlock 
 detection to one of finding a path in the state graph from a process node 
 to a sink. Algorithm 3.1 is a concurrent detection algorithm based on this 
 result. The algorithm is essentially an exhaustive path search for a 
 sink on the paths directed from the process node p ] . whose index, "I", is 
 passed through the parameter I. (in an expedient graph, the only type of node 
 which can be a sink on a path is a process node with no outstanding request.) 
 The algorithm begins by setting a switch "D", which initially indicates that 
 P is deadlocked. The "LIST, " which will be used to contain indices 
 of processes with outstanding requests, is initialized to contain only 
 "I". The main WHILE loop is then executed while "D" remains set. In 
 this loop, successive process indices are read from the "LIST" into "Q. " 
 A resource type index is then read from "request_edge (Q)" (see Figure 2.5) 
 into "R." This resource "R/ is the son of process "P Q ". The algorithm 
 then checks successive sons "P g " and "^ to see if they are sinks, that is, i: 
 "request_edge(S)=0." If a son is found to be a sink, then "D" is cleared 
 to indicate that % is not deadlocked. Otherwise, the son is added 
 to the "LIST, " if it is not already there. The WHILE loop terminates 
 when "D" is seen to be cleared or when the process indices on the "LEST" 
 have been exhausted. At this point if "D" is still set, then no sink 
 was found on the paths directed from »]*» and the recovery algorithm is 
 invoked. (Recovery is discussed in Chapter 5. ) Result 3.1 indicates 
 
27 
 
 that, in this case a knot exists consisting of the process nodes whose 
 indices are contained in the LIST and the resource nodes to which the 
 process nodes have directed requests. 
 
 It can be shown [Rk, H5] that in an expedient single-unit-request 
 system, the only operation which can cause a deadlock is a request for an 
 unavailable unit. Therefore, Algorithm 3.1 must be invoked after each 
 such request. 
 
 The third type of policy is dynamic prevention. A resource manager 
 adopting a dynamic prevention policy controls the allocation of resources 
 dynamically in such a manner that deadlock cannot occur. To accomplish 
 this, advanced information is required from each process concerning its 
 expected resource requirements. The placement of requests by processes is 
 unrestricted as long as they do not contradict the advanced information 
 provided. Dynamic prevention algorithms have been proposed by Habermann [HI] 
 and Holt [HU, H5] which apply to systems containing only reusable resources. 
 Habermann' s algorithm is based on the following results stated in terms of 
 our graph model [HI, Rk, H5]. Let S be any state in a reusable resource system. 
 
 Result 3.3 S is not a deadlock state if and 
 
 only if it is completely reducible. 
 
 Result 3.^ . If when different sequences of reductions 
 
 are applied to S, the resulting states cannot be 
 
 reduced, then all of these resulting states are 
 
 identical. 
 
 Algorithm 3.2 is equivalent to Habermann* s algorithm. The algorithm 
 requires that each process state its claim for the maximum units of 
 each resource type which it may require. Resources are only allocated 
 
28 
 
 if the resulting state is safe . A state is safe if, were all the processes 
 in the system to request enough units to make their total allocation 
 of resources equal to their claim, the resulting state, called the 
 claim- limited state , does not contain a deadlock. Algorithm 3.2 simply 
 determines if a deadlock exists in the claim- limited state, and thus if 
 new allocations are safe. If the state is not safe, then the new 
 allocations must be retracted. 
 
 Used alone, Algorithm 3.2 is simply a deadlock detection algorithm. 
 It detects deadlock by attempting to completely reduce the state graph, 
 taking advantage of Result 3.^- which says only one reduction sequence 
 need be tried. The algorithm returns a zero if the state is safe (i.e., it 
 can be completely reduced) and a one otherwise. By making the following 
 observation we can see one way to speed up Algorithm 3.2 : If state S is 
 safe and state T is formed from S by allocating resources to p., then T 
 can be determined as safe at that point in the graph reduction when P. 
 becomes unblocked, since it can then release the newly acquired units. 
 Thus if the index of the process which has acquired the new units is passed 
 as a parameter, then a special test can be added to terminate the algorithm 
 early, if this process became unblocked in the reduction of the claim- limited 
 graph. 
 
 In addition to a deadlock policy a resource manager must have an 
 allocation policy. The allocation policy determines the order in which to 
 honor requests by more than one process for units of the same resource type. 
 First-come first-serve (FCFS) and priority driven allocation policies 
 are common examples. Care must be taken such that the allocation policy 
 does not allow for the occurrence of effective deadlock. Effective 
 deadlock is a situation in which a process never becomes unblocked, even 
 
29 
 
 though it is never deadlocked. As an example of effective deadlock, 
 consider a priority-driven allocation policy in a system with a resource 
 type for which a number of processes are always blocked with requests. A 
 low priority process requesting this resource could remain blocked 
 indefinitely, even though it may never be deadlocked. 
 
 In Chapter h, when we consider distributed resource management, 
 it will be necessary to consider all resource management procedures which 
 effect operations by processes on the system. Therefore, we will now 
 look at some simple request and release procedures by which a centralized 
 resource manager effects these operations, so that they can be compared 
 to the distributed procedures. Algorithms 3.3 and 3.^- are request and 
 release procedures for single-unit-request systems. In these algorithms 
 "I", "R" and "U" are used to contain indices of "PROCESS", "RESOURCEJTYPE" 
 and "RESOURCEJJNIT" structures respectively. For definitions of the 
 structures referred to in these algorithms see Figure 2.5. 
 
 When a process performs a request or release operation in a 
 centralized resource management scheme, this operation is queued up 
 
 to be later effected by the resource manager. We will assume that after 
 queueing a request operation, the process performs a P operation on its 
 "private_semaphore, " thus blocking itself. (For a discussion of semaphores 
 and P and V operations see Dijkstra [Dl] or Haberman [H6] . ) When the 
 resource manager dequeues an operation it generates a call to Algorithm 3.3 
 or 3.^+. Acquisition operations are also performed in these procedures 
 whenever requested units become available. To simplify the acquisition 
 operations in the release procedure, blocked requests are redundantly 
 encoded in the "RESOURCE TYPE" structure. A circular queue is maintained 
 
30 
 
 in which the "structure-index" field of each queue element contains the 
 index of a process blocked on a request for a unit of the resource type. 
 A pointer to this queue is stored in "request_queue^ptr. " The "request_count" 
 field keeps track of the number of blocked processes in the queue. 
 
 Algorithm 3.3 effects request operations. If there are no available 
 ■units, then the request edge is placed and the index of the process is queued 
 into the request queue. (Queuing operations are illustrated in Figure 2.5.) 
 If an available unit does exist, then the acquisition operation is effected 
 immediately for the process. The index of the unit acquired is placed in 
 the ,, acquired_unit" field of the process, to indicate to the process which 
 unit it has acquired. Also, a V operation is performed on the 
 "private_semaphore" of the process so that it may continue. 
 
 Release operations are effected in Algorithm 3.^-. If the 
 "request_count" is greater than zero, for the type of unit being released, 
 then the index of a process is dequeued from the "request_queue" and an 
 acquisition operation is performed for the process. This is done by placing 
 the index of the unit in the "acquired_unit" field of the process node and 
 performing a V operation on the "private_semaphore" of the process. If the 
 request queue is empty, then the unit is simply made available. 
 
 Note that since the request queue is handled in a first-in first-out 
 manner, a first-come first- serve allocation policy is being employed. 
 Algorithm 3.1 . A detection algorithm for single-unit- request systems, 
 detect (i); 
 
 Set D; 
 
 Initialize LEST to contain only I; 
 
 WHILE D=l DO for each process index Q on the LIST; 
 
31 
 
 R < - request_edge (Q); 
 DO for each son S of R; 
 
 IF request_edge (S)=0 
 THEN clear D 
 ELSE IF S is not on LIST 
 THEN add S to LIST; 
 FI; 
 FI; 
 OD; 
 ELIHW; 
 IF D is set 
 
 THEN recover (i); 
 FI; 
 END; 
 
 Algorithm 3.2 . A deadlock prevention algorithm for reusable resource 
 
 systems. 
 
 safe_state; 
 
 Initialize list to contain all processes; 
 WHILE list is not empty DO; 
 Set D; 
 
 DO for each process P on list; 
 IF P is not blocked; 
 
 THEN reduce graph by P; 
 Clear D; 
 
 Remove P from list; 
 FI; 
 
32 
 
 OD; 
 IF D is set 
 
 THEN RETURN (l)j 
 FI; 
 ELIHW; 
 RETURN (0); 
 END; 
 
 Algorithm 3.3. A request procedure for single^-unit-request systems, 
 request (I, R); 
 
 IF available_units (R)=0 
 
 THEN request_edge (i) < - R; 
 
 increment request_count (R); 
 queue (I, request_queue_ptr (R)); 
 detect ( I) ; 
 ELSE decrement aval lab le_units (r); 
 IF type (R)=reusable 
 
 THEN find available unit U of R; 
 
 assignment_edge (u) <- I; 
 ELSE dequeue (U, unit_queue_ptr (R)); 
 FI; 
 acquired_unit (i) < - U; 
 V (private_semaphore (i)); 
 FI; 
 END; 
 
33 
 
 i 
 Algorithm 3.^- » A release procedure for single-unit-request systems. 
 
 release (R, U); 
 
 IF request_count (r) > 
 
 THEN decrement request_count (R); 
 
 dequeue (Q, request_queue_ptr (R)); 
 
 request_edge (Q,) < - 0; 
 
 acquired_unit (Q,) < - U ;. 
 
 IF type (R)=reusable 
 
 THEN assignment_edge (U) < - Q; 
 
 FI; 
 
 V (private_semaphore (Q)); 
 
 ELSE increment available_units (R); 
 
 IF type (R^consummable 
 
 THEN queue (U, unit_queue_ptr (R)); 
 
 ELSE assignment_edge (u) < - 0; 
 
 FI; 
 
 FI; 
 
 END; 
 
3U 
 
 h. DESIGN OF A DISTRIBUTED RESOURCE MANAGEMENT POLICY 
 
 Under a distributed resource management policy, instead of having 
 a single resource manager perform all operations, each process performs 
 its own operations by invoking system procedures. To perform these 
 operations, processes must manipulate the state information of the system, 
 but if allowed to do so in an unrestricted manner, it is possible for 
 them to interfere with each other and incorrectly carry out the operations. 
 
 There are three types of interference problems encountered in 
 distributed resource management. One of these is the mutual exclusion 
 problem . This problem results from allowing processes to have read/write 
 access to common data elements in the state information. For example, if 
 two processes were simultaneously executing Algorithm 3.3 to request and 
 acquire a unit of the same resource type when only one unit was available, 
 they might first both see that "available_units (R)" is not equal to zero, 
 then both decrement "available_units (R)" (making it -l) and then both 
 attempt to acquire the same unit. 
 
 When a dynamic deadlock algorithm is employed, it is possible for 
 one process which is executing a dynamic deadlock algorithm to have the 
 information which it has already examined modified by a second process. 
 Hence, the first process may terminate its algorithm with an incorrect 
 result. We shall call this type of interference problem the update problem . 
 
35 
 
 Finally, consider a distributed resource management system which 
 employs a concurrent detection algorithm or a dynamic prevention algorithm. 
 The deadlock algorithm is invoked by a process, after it has performed an 
 operation on the system, to see if further action is required, such as retract- 
 ing the operation or recovering from deadlock. We shall, therefore, say that 
 the newly performed operation is tentative and assume that it is marked as 
 such in the state information. In such a system, the collision problem 
 occurs when a process executing a dynamic deadlock algorithm comes upon 
 the tentative operation of another process. In this situation we will 
 say the first process has "collided" with the second. The first process 
 must be able to make a decision under these circumstances which will 
 eventually allow it to terminate the algorithm with an appropriate result. 
 
 Solutions to these three interference problems will involve 
 synchronization and occasional blocking of processes [Dl, H6] . It is 
 obviously desirable that these solutions require as little process 
 blocking as possible. Care must also be taken so that it is impossible 
 for processes to become permanently blocked in the resource management 
 procedures. 
 
 We shall briefly discuss the general solution to each of the inter- 
 ference problems. In particular, we shall show how these problems can 
 be solved to adapt the resource manager described in Algorithms 3.1, 3.3 
 and 3.J+ for distributed control. The resulting example of a distributed 
 resource management scheme is given as Algorithms ^-.1, k.2 and ^-.3. To 
 implement these algorithms, additional data fields must be added to the 
 structured arrays illustrated in Figure 2.5 to facilitate synchronization 
 
36 
 
 and blocking of processes. The expanded structures are illustrated in 
 Figure k.l. For simplicity, we assume that the state information resides 
 in common memory accessible by all processes. 
 
 The mutual exclusion problem can be solved by partitioning the 
 state information and controlling access to each partition with a 
 semaphore. The semaphore is initialized to one and any process which 
 accesses the segment must first perform a P operation on the semaphore 
 and then a V operation when it is done. For our algorithms, a semaphore 
 is associated with each node in the state graph, i.e., a "node_semaphore" 
 is added to the process node structure and the resource type node 
 structure. Access to resource unit structures are controlled through the 
 "node_semaphores" in the associated resource type structure. 
 
 To solve the update problem, we must decide what part of the 
 state information is critical to the outcome of the dynamic deadlock 
 algorithm being used. Then we must prevent the critical information 
 which has already been read by a process executing the algorithm from 
 being changed by another process. For example, since Algorithm 3.1 is 
 essentially an exhaustive path search for a sink, it is the information 
 describing these paths which is critical. In the algorithm processes 
 are only added to the IIST if they have an outstanding request. 
 Therefore, path changes can be prevented by preventing the release of 
 units which would be acquired by processes on the LIST. 
 
 To do this, a "hold_count" and "holdingjprocess" field is added 
 to the process structure and both are initialized to zero. In detection 
 Algorithm k.l, the "hold_count" of each process is incremented as the 
 process is added to the LIST. In Algorithm k.3 when a process 
 
37 
 
 1 PROCESS (100), 
 
 2 request_edge, 
 2 acquired_unit, 
 2 private_semaphore, 
 2 node_semaphore, 
 2 hold_count, 
 2 holding_process, 
 2 tentative, 
 2 rank, 
 2 wait_count, 
 2 wait_queue_ptr, 
 1 RESOURCE_TYPE (100), 
 2 type, 
 
 2 available_units, 
 
 2 unit_queue_ptr, 
 
 2 producer_queue_j?tr, 
 
 2 request_count, 
 
 2 request_queue_ptr, 
 
 2 node_semaphore, 
 
 1 EESOURCEJUMIT (1000), 
 2 assignment_edge, 
 
 1 PRODUCER_ELEMENT (100), 
 2 producer_edge, 
 
 1 QUEUE_ELEMEJW (3000), 
 2 next_ptr, 
 2 structure index: 
 
 Figure U.l Structured arrays for the encoding of state 
 graphs for single-unit-request systems with 
 distributed resource management. 
 
38 
 
 finds that it is releasing a unit to be acquired by a process with a 
 non-zero "hold_count, " the process releasing the unit places its index 
 in "holding_j?rocess" and performs a P operation on its "private_semaphore. " 
 Each process completing Algorithm k.l decrements the "hold_count" for all 
 processes on its LIST. If any "hold_count" goes to zero for a process 
 whose "holding_process" field is non-zero, a V operation is performed on 
 the "private_semaphore" of the holding process. 
 
 Somewhat more complicated is the solution to the collision problem. 
 When one process collides with another there are three possible courses 
 of action it might take: 
 
 1) It may block itself and wait for the other process to 
 either retract the operation or mark it as no longer 
 tentative . 
 
 2) It may assume the operation will be retracted and 
 continue. 
 
 3) It may assume the operation will be marked as no 
 longer tentative and continue. 
 
 Unfortunately, it is undesirable to let all processes take the 
 same course of action every time. To see why, consider the situation in 
 which two processes, P. and P., are executing a concurrent detection 
 algorithm and each collides with the other. If both processes take the 
 same course of action, the following three possibilities could result, 
 corresponding to the three courses of action above: 
 
 1) P- and P^ became permanently blocked. 
 
 2) A deadlock goes undetected. 
 
 3) Multiple reports to the recovery algorithm are 
 generated for the same deadlock. 
 
39 
 
 The first two possibilities are unacceptable. The third 
 possibility is acceptable, but a very sophisticated recovery procedure 
 is required to handle multiple deadlock reports efficiently. As we 
 shall show in our example, more desirable solutions to the collision 
 problem can be achieved by rank ordering the processes and allowing 
 the action taken by a process after a collision to depend upon its rank 
 relative to that of the other process in the collision. 
 
 We have added four more fields to the process structure to 
 solve the collision problem. They are "tentative, " "rank, " "wait_count" 
 and "wait_queue_ptr. " The "rank" field is initialized to an arbitrary 
 positive integer unique for every process. All other fields are 
 initialized to zero. When a process makes a request for an unavailable 
 unit in Algorithm k.2, it sets "tentative" to one and invokes the 
 detection Algorithm k.l. • 
 
 A collision occurs in Algorithm k.l when a process node P on a 
 
 o 
 
 path leading from the process P T is found to have a tentative request 
 ("tentative (s) = 1"). In this event, P-j- compares its rank with that of P . 
 If the rank of P is larger, P relies on P to find a path to a sink or 
 
 -L 1 o 
 
 invoke the recovery routine if a deadlock exists. This is equivalent 
 to assuming that P g will retract its request. Thus in this case P- benefits 
 from the collision by being able to immediately terminate the algorithm. If 
 the rank of P_ is smaller, then P-j- waits for P^ to terminate its path 
 search (and possibly invoke the recovery routine) before continuing. It 
 accomplishes this wait by incrementing the "wait_count" of P queueing 
 itself on the "wait_queue" of P„ and performing a P operation on its own 
 "private_semaphore. " Then as P g terminates Algorithm k.l, it will perform 
 
h0 
 
 a V operation on the "private_semaphore" of every process in its 
 "wait_queue. " 
 
 Note that a process can have only one collision with a lower- 
 ranked process, for then it will terminate Algorithm k.l. On the other 
 hand, a process can have any number of collisions with higher-ranked 
 processes, each time waiting for the higher-ranked process to terminate 
 Algorithm k.l, then continuing. 
 
 To illustrate that our distributed deadlock policy works correctly, 
 we need to show that: 
 
 1) Every deadlock is reported to the recovery routine 
 exactly once. 
 
 2 ) No spurious deadlock reports are generated. 
 
 3) No process can remain blocked forever in any of 
 the algorithms. 
 
 To see that the first statement is true, we recall from the last 
 section that a necessary and sufficient condition for deadlock in 
 expedient single-unit-request systems is a knot in the state graph. 
 Furthermore, a knot can only be formed by a request for an unavailable 
 unit. Therefore, at least one process in every knot will execute 
 Algorithm k.l. If only one process in the knot executes Algorithm k.l, 
 then it will have no collisions and Algorithm k.l will lead to detection 
 of the deadlock in exactly the same manner as would Algorithm 3.1. 
 
 When two or more processes simultaneously make requests and form 
 a knot, each will invoke Algorithm k.l. Since, in a knot, a path 
 exists from every process node to every other process node in the knot 
 (and to no other process node), there will be collisions in the knot. 
 
1+1 
 
 Obviously, the highest -ranked process which is executing Algorithm k.l 
 
 in the knot can never be blocked from a collision. Furthermore, when 
 
 this process collides with another process, it will terminate Algorithm k.l. 
 
 As the highest-ranked process terminates the algorithm, it will unblock 
 
 all processes which have collided with it. At this point we can again be 
 
 sure that the process which is now highest-ranked will not be blocked 
 
 from a collision. Since at least one process will not be blocked at any time, 
 
 the processes in the knot will continue to execute Algorithm k.l until all but 
 
 the one with lowest rank has terminated the algorithm. At this point the 
 
 lowest-ranked process will continue the algorithm, detect the deadlock 
 
 and recover. 
 
 We know now that exactly one process in a knot will report a 
 deadlock, but a process P not in a knot can be deadlocked if it has a 
 path to a knot. So we need to show also that such a process will not 
 generate an extra deadlock report. Let us designate the lowest-ranked 
 process in the knot executing Algorithm k.l as P.. We will assume for 
 now that as the result of the recovery procedure invoked by P. ; P. 
 becomes a sink. (Recovery is discussed in Chapter 5. ) Before it also 
 can detect this deadlock, P fc must visit every process node in the knot and place 
 each one on its LIST. If p reaches P^ after recovery has taken place, 
 then Pp will be a sink and P, will terminate the algorithm with no deadlock 
 report. If P^ reaches Pn before recovery, one of two things will happen. 
 If the rank of P, is higher than that of P», then P, will terminate with 
 no deadlock report. If the rank of P, is lower, then it will be blocked 
 and on the 'Vait_queue " of P* until after recovery has taken place. Of 
 course P, may collide with another process with a lower rank and never 
 
k2 
 
 reach P/, at all. In any case, the process P which has a path to a knot but 
 is not in the knot will not detect a deadlock. Thus we have shown that 
 every deadlock has exactly one report. 
 
 No spurious deadlocks will be detected because of our solution 
 to the update problem. A process detects a deadlock in Algorithm U.l 
 after searching for a sink on all paths leading from process nodes on 
 the LIST and not finding one. Since we have eliminated the possibility 
 of any of these paths changing after being examined, every deadlock 
 detected will be a real one. 
 
 To see that no processes will become permanently blocked, we must 
 examine the use of semaphores in our algorithms. Each algorithm has 
 been designed so that no process attempts to perform two successive P 
 operations on semaphores without an intervening V operating being 
 performed on the first semaphore. So all that we need to show is that 
 there is a V operation for every P operation, and that processes cannot 
 become permanently blocked in a circular manner. 
 
 By examining the algorithms, we can see that this is obviously the 
 case with each use of the "node_semaphore. " Each "node_semaphore " is 
 initialized to one and every process which performs a P operation on a 
 "node_semaphore " eventually performs the V operation itself. 
 
 Each "private_semaphore" is initialized to zero. When a process 
 performs a P operation on its own "private_semaphore, " it cannot continue 
 until another process performs a V operation on the semaphore. There 
 are three places where a process performs a P operation on its 
 "private semaphore. " 
 
^3 
 
 The first place is where a process has made a request for an 
 unavailable unit in Algorithm k.2. In this case, the process queues the 
 index of its "PROCESS" structure in the "request_queue" so that another 
 process releasing a unit will eventually dequeue the index of the requesting 
 process and perform a V operation on its private semaphore. Permanent 
 blocking in this case is, of course, a deadlock, and thus is eliminated by 
 the detection and recovery algorithms. 
 
 The second place occurs in Algorithm k.3 when a process cannot release 
 a unit until one or more other processes have terminated Algorithm k.l. The 
 process assures a V operation in this case by placing its index in 
 "holding_process. " This process will not be permanently blocked as long 
 as the processes executing Algorithm k.l cannot become permanently blocked 
 (which we show next). 
 
 Processes executing Algorithm k.l become blocked upon collisions 
 with higher- ranked processes. The "wait_queue" is used in the same manner 
 as the "request_queue" to insure the occurrence of a V operation. Since 
 blocked processes can only wait on processes of higher rank to unblock them, 
 circular dependencies obviously cannot form, and no process can become 
 permanently blocked. 
 
 Thus, no cases exist in which processes can become permanently 
 blocked executing our algorithms. This completes the demonstration of 
 the correct operation of our deadlock policy. 
 
 One pleasant side-effect of the synchronization code in Algorithm 
 ^.1 is its effect on the execution of the algorithm in a system heavily 
 loaded with request operations. As this load increases, the frequency 
 of collisions increases. On the average, half of all collisions will 
 allow a process to terminate Algorithm k.l earlier than if the process 
 had no collisions. Thus as the load increases, the average time spent 
 in Algorithm k.l decreases. 
 
kk 
 
 Another type of detection policy for distributed systems could 
 be designed with a periodic detection algorithm. In such a policy no 
 resource operations would be monitored. Periodically a process would be 
 scheduled (perhaps during CPU idle time) to execute the detection algorithm 
 to see if any deadlocks have occurred since the last execution of the 
 algorithm. Since only one process would be executing the algorithm, there 
 would be no collision problem and only the mutual exclusion problem and 
 the update problem would have to be solved. 
 
 For a distributed resource management system with dynamic 
 prevention, solutions to the interference problems very similar to the 
 ones we have discussed can be used with Haberman's algorithm (Algorithm 
 3.2). Unfortunately, the time-saving shortcut discussed in Chapter 3 for 
 this algorithm cannot be used in the distributed case. Therefore, every 
 process executing the distributed algorithm must attempt to reduce a claim- 
 limited graph for the entire system. Furthermore, since the algorithm 
 operates by reducing the state graph, each process executing the algorithm 
 needs its own copy of some of the information associated with each node 
 in the graph. 
 
 We will discuss how to choose the best deadlock algorithm for a 
 given distributed system in Chapter 6. 
 
 Algorithm k.l . A detection algorithm for single-unit-request systems 
 with distributed resource management, 
 detect (i); 
 
 Set D; 
 
 P (node_semaphore (l))=0 
 
 IF request edge (l)=0 
 
h$ 
 
 THEN clear D; 
 
 tentative (i) < - 0; 
 ELSE increment hold_count (i); 
 
 Initialize LIST to contain only I; 
 FI; 
 V (node_semaphore (i)); 
 
 WHILE D=l DO for each process node . Q on LIST; 
 R < - request_edge (Q,); 
 DO for each son S of R; 
 
 LI: P (node_semaphore (S)); 
 IF request_edge (S)=0 
 THEN clear D; 
 
 tentative (i) < - 0; 
 ELSE IF tentative (S)=l 
 
 THEN IF rank (i) > rank (s) 
 THEN clear D; 
 
 tentative (i) < - 0; 
 ELSE IF I 1 = S 
 
 THEN increment wait_count (S); 
 
 queue (I, wait_queue_ptr (S)) 
 V ( node_s emaphor e ( S ) ) ; 
 P (private_semaphore (i)); 
 GOTO LI; 
 FI; 
 FI; 
 ELSE IF S is not on LIST 
 
k6 
 
 THEN add S to LIST; 
 
 increment hold_count (S); 
 FI; 
 PI J 
 FI; 
 V (node_semaphore (S)); . 
 OD; 
 ELIHW; 
 IF D=l 
 
 THEN recover (i); 
 FI; 
 P (node_semaphore (i)); 
 tentative (i) < - 0; 
 WHILE wait_count (i) -. = 0; 
 
 decrement wait_count (i); 
 dequeue (Q, wait_queuej?tr (l))j 
 V (private_semaphore (Q,)); 
 ELIHW; 
 V (node_semaphore (i)); 
 DO for each process node Q on LIST; 
 P (node_semaphore (Q,)); 
 decrement hold _count (Q); 
 IF hold_count (Q)=0 
 
 THEN IF holding_process (Q) -, = 
 
 THEN V (private_semaphore (holding_process (0,))); 
 holding process (Q) < - 0; 
 
^7 
 FI; 
 FI; 
 V (node_semaphore (Q)); 
 OD; 
 END; 
 
 Algorithm k.2. A request algorithm for single-unit-request systems 
 with distributed resource management. 
 
 request (I, R); 
 
 P (node_semaphore (R)); 
 IF available_units (R)=0 
 
 THEN tentative (i) < - 1; 
 
 request_edge (i) < - R; 
 increment request_count (R); 
 queue (I, request_queue_ptr (R)); 
 
 V (node_semaphore (R)); 
 detect (i); 
 
 P (private_semaphore (I ) ) ; 
 
 ELSE decrement avai lab le_u nits (R); 
 IF type (R) = reusable 
 
 THEN find available unit U of R; 
 assignment_edge (U) < - I; 
 ELSE dequeue (U, unit_queue_ptr (R)); 
 FI; 
 
 V (node_semaphore (R)); 
 acquired_unit (i) < - U; 
 
 FI; 
 return (acquired_unit (i)); 
 END; 
 
1+8 
 
 Algorithm U.3. A release algorithm for single-unit-request systems 
 with distributed resource management. 
 
 release (I, R, U); 
 
 P (node_semaphore (R)); 
 IF request_count (R) > 
 
 THEN decrement request_count (R); 
 
 dequeue (Q, request_queue_ptr (R)); 
 V (node_semaphore (R)); 
 II: P (node_semaphore (Q,)); 
 IF hold_count (Q) n ■= 
 
 THEN holding_process (oj < - I; 
 
 V (node_semaphore (Q)); 
 
 P (private_semaphore (i)); 
 GOTO LI; 
 ELSE request_edge (Q) < - 0; 
 acquired_unit (Q) < - U; 
 IF type (R) = reusable 
 
 THEN assignment_edge (u) < - Q; 
 
 FI; 
 
 V (node_semaphore (Qj); 
 
 V (private_semaphore (Q)); 
 FI; 
 
 ELSE increment available_units (R); 
 IF type (R) = consummable 
 
 THEN queue (U, unit_queue_ptr (R)); 
 
h9 
 ELSE assignment edge (u) < - 0; 
 
 FI; 
 V (node_semaphore (R)); 
 FI; 
 END; 
 
50 
 
 5. RECOVERY FROM DEALDDCK 
 
 We will now direct our attention to the problem of recovery from 
 deadlock, that is, in the event that a deadlock does occur, how can it 
 be eliminated? Since there can exist no sequence of operations discussed 
 so far by which the deadlock can be eliminated, recovery must 
 be effected by special operations which we will define shortly. These 
 operations may be performed by a recovery procedure in the operating 
 system or by a human operator. The occurrence of the deadlock and the 
 recovery from it should be made completely invisible to all processes 
 in the system. 
 
 The ability to recover will depend on the availability of a certain 
 amount of backup information. When necessary, this information is used 
 to restart processes at prior points in their execution and to replace 
 resource units which have been modified (such as files) with copies of 
 the unit before the modification took place. 
 
 We will restrict our discussion to recovery in reusable resource 
 systems for the following reason. Consider messages for interprocess 
 communication as an example of consummable resource units. The content 
 of messages can be very time-dependent. A particular message may contain 
 information obtained from system tables and queues at particular times 
 or from other messages obtained over a period of time. In general, it 
 may be impossible to determine the effects throughout a system of 
 restarting a process which is a producer or consumer of messages. Since 
 
51 
 
 recovery from deadlock may necessitate restarting processes, we will not 
 consider systems which have consummate le resources. 
 
 There are two special recovery operations which may be used to 
 eliminate a deadlock. First, there is termination of a process, after 
 which the process is restarted at its "beginning. In the state graph, 
 termination of a process means the removal of all request edges directed from 
 the process node and the release of all. units with assignment edges directed 
 to the node (thus creating available units). Any resource units which 
 have been modified by the process will have to be replaced by backup copies. 
 
 Resource preemption is the second recovery operation. In terms 
 of the state graph, preemption of a resource unit means that the 
 assignment edge directed from the unit to a process node is deleted, 
 and the unit is assigned to another (usually deadlocked) process. If 
 the unit is one which cannot be modifed, then the recovery procedure can 
 place a request for the unit for the process, and the process may continue 
 execution when it becomes unblocked. Otherwise, the recovery procedure 
 may have to replace the unit with a backup copy and restart the process 
 at a point at or before the request was made. In this case, the 
 recovery procedure will have to similarly handle any resource units 
 which were assigned to the process after the point at which it will 
 be restarted. 
 
 Associated with each recovery operation is a cost. This cost 
 includes the cost for the recovery procedure to perform the operation 
 plus, if a process must be restarted, the cost of the resources which 
 the process has used since the point at which it will be restarted. 
 
52 
 
 When a recovery procedure is invoked, it may simply perform 
 recovery operations until it determines that the deadlock no longer 
 exists. For example, a very simple recovery routine might simply terminate 
 processes until the deadlock is eliminated. On the other hand, a recovery 
 procedure may calculate in advance a sequence of recovery operations 
 which it can perform to eliminate the deadlock. 
 
 There may be any number of sequences of recovery operations which 
 can be used to recover from a particular deadlock. The cost of performing 
 different sequences of operations may vary widely. It is, of course, 
 desirable to minimize the cost of recovery. A recovery procedure is said 
 to be "optimum" if it always performs the lowest-cost sequence of recovery 
 operations for any deadlock. Shoshani [SI] has designed an algorithm for 
 optimum recovery in centralized reusable resource systems. 
 
 There are, of course, extra problems to be considered in the 
 design of recovery algorithms for systems with distributed resource 
 management. The main problem is that in some systems it may be possible 
 for deadlocks to be reported to the recovery routine more than once. 
 Indeed, it may be possible for a deadlock to be reported after the 
 recovery routine has already eliminated it. If it is impossible to 
 eliminate multiple deadlock reports in the deadlock algorithm being used, 
 then the recovery routine must be able to handle multiple reports. 
 
 Recovery from deadlock in single-unit-request systems is extremely 
 simple. Furthermore, since as we have shown in Section^, Algorithm k.l 
 reports each deadlock only once, recovery algorithms can be the same for 
 the centralized and the distributed case. A necessary and sufficient 
 
53 
 
 condition for deadlock in single-unit-request systems is a knot in the 
 state graph. To recover from deadlock, any unit of a resource node in 
 the knot can be preempted and assigned to any process node which has 
 requested a unit of that resource type. With its request now granted, 
 this process node becomes a sink. Since all nodes in the knot had a 
 path to the resource node from which the unit was preempted, they now 
 all have a path to the process node to which the unit was assigned, and 
 this process node is now a sink. The deadlock is therefore eliminated. 
 
 For a very fast recovery algorithm we can preempt a resource 
 unit of the type requested by the process which detected the deadlock. 
 For optimum recovery, an algorithm simply needs to calculate the minimum 
 cost of preempting any unit of a resource node in the knot. 
 
5h 
 
 6. PERFORMANCE OF DISTRIBUTED DEADLOCK POLICIES 
 
 In this chapter we will discuss the performance of various deadlock 
 policies in distributed resource management schemes. The ultimate 
 criterion by which resource management schemes should be judged is 
 effective resource utilization . We define effective resource utilization 
 to be the total resource utilization minus that which is spent in resource 
 management procedures and that which must be duplicated when processes 
 are restarted. For example, consider a system in which the resources are 
 kept busy 5&?o of the time on the average, but 3% of the time is spent in 
 resource management and 2% is wasted in duplication when processes must 
 be restarted. The effective resource utilization of this system would 
 be k %, 
 
 There are two ways in which deadlock policies limit the effective 
 resource utilization. First, inherent in each type of deadlock policy 
 is a certain amount of inefficient resource utilization. We will discuss, 
 in general terms, only those inefficiencies peculiar to each type of 
 policy. Secondly, since processes executing resource management procedures 
 compete with other processes for resources, the effective resource 
 utilization goes down as the resources required by the management procedures 
 increase. Thus for the algorithms we have discussed, we will derive 
 formulas for worst- case execution time. 
 
 Static prevention algorithms are the simplest and fastest algorithms. 
 Of course, the restrictions made on the placement of requests in these 
 
55 
 
 algorithms are not required by the dynamic deadlock algorithms. These 
 restrictions are more than an inconvenience, for they lead to inefficient 
 resource utilization as well. A process forced to request a resource unit T 
 seconds before it will actually be used (as is often the case in the policy 
 described by Havender [H2]) is effectively preventing any use of this resource 
 for the T seconds. Further inefficiency occurs, when after the process has 
 acquired several units of resources, it becomes blocked waiting only for the 
 acquisition of a unit which it does not immediately need. The execution 
 time of static prevention algorithms does not vary with the number of 
 processes and resource types in the system and is negligible in comparison 
 to that of the dynamic algorithms. 
 
 The inefficiencies in resource utilization characteristic of 
 detection and recovery algorithms result from the resources expended in 
 recovery from deadlock. This includes the resources used by the recovery 
 procedure as well as the resources wasted when a process must be restarted. 
 
 To calculate the worst-case execution time of Algorithm U.l, we 
 will consider its operation on a slightly modified state graph with 
 weighted edges . In such a graph multiple edges directed from one node 
 to another are represented by a single edge and an integer designating 
 the number of edges represented. Furthermore, assume that a further test 
 is added at the beginning of the first WHILE loop to insure that the loop 
 is not executed more than once for the same resource R^. Then, for a 
 graph with m resources and n processes, the WHILE loop can be executed 
 a maximum of m times and the inner DO loop is executed a maximum of n 
 times. For large m and n, the execution time for the remainder of the 
 algorithm is negligible. Thus the maximum execution time for Algorithm i+.l 
 is proportional to mn. Note that in single-unit request systems with 
 
56 
 
 m > n, paths can only exist to a maximum of n resources, and thus in these 
 
 2 
 systems the maximum execution time is proportional to n . 
 
 Dynamic prevention algorithms introduce inefficiencies into 
 resource management by not allocating available units which have been 
 requested when it finds that the resulting state is unsafe. Naturally, 
 in some cases, were the allocations made, deadlock would not occur. 
 
 The worst-case execution time of Haberman's algorithm as stated 
 in Algorithm 3.2 occurs when only one process is unblocked initially 
 and only one process becomes unblocked on each time through the DO loop. 
 If each process is assigned one or more units of each of the m resources, 
 then the reduction step in the inner loop takes a time proportional to m. 
 The first time through the DO loop there are n processes on the list and 
 there is one less each time after that. Thus the worst-case execution 
 time of the loop is proportional to n+(n-l)+. . .+1 = n(n+l)/2. Therefore, 
 
 the worst-case execution time of the entire algorithm is proportional 
 
 2 
 to mn . Holt [H4,H5] has shown an equivalent algorithm which uses extra 
 
 data elements at the process and resource nodes and has a worst-case 
 
 execution time proportional to mn. These factors would be the same for 
 
 either algorithm if it were distributed. 
 
 To calculate formulas for the average execution time of the dynamic 
 
 deadlock algorithms in a given system or to predict which deadlock 
 
 algorithm will give the maximum effective resource utilization in a 
 
 system, detailed data on the typical resource requirements of the 
 
 processes in the system is needed. With this information and the use 
 
 of probability theory or simulation, an optimal distributed resource 
 
 management scheme can be designed for any system. 
 
57 
 
 LIST OF REFERENCES 
 
 [CI] Coffman, E. G-. Jr., M. J. Elphick and A. Shoshani, "System Deadlocks/' 
 Computing Surveys, 3 , 2 (June 1971 )> PP. 67-78. 
 
 [Dl] Dijkstra, E. W., "Cooperating Sequential Processes," Programming 
 Languages (F. G-enuys, ed. ), Academic Press (1968), pp. 1+3-132. 
 
 [Fl] Fontao, R. 0., "A Concurrent Algorithm for Avoiding Deadlocks in 
 Multiprocess Multiple Resource Systems, " Sigops Operating Systems 
 Review, 6 , 1, 2 (June 1972), pp. 72-79- 
 
 [HI] Habermann, A. N. , "Prevention of System Deadlocks, " Comm. ACM, 12 , 
 7 (July 1969), pp. 373-385. 
 
 [H2] Havender, J. W., "Avoiding Deadlocks in Multitasking Systems," 
 IBM Systems Journal, 7 , 2 (1968), pp. 7^-Qk. 
 
 [R"3] Hebalker, P. G., Deadlock- free Sharing of Resources in Asynchronous 
 Systems , MAC-TR-75, ' Project MAC, MIT, Cambridge, Massachusetts, 
 (September 1970). 
 
 [Ek] Holt, R. C, On Deadlock in Computer Systems , TR-CSRG-6, Computer 
 Systems Research Group, University of Toronto, Toronto, Canada, 
 (April 1971). 
 
 [H5] , "Some Deadlock Properties of Computer Systems, " Sigops 
 
 Operating Systems Review, 6 , 1, 2 (June 1972), pp. 6U-71. 
 
 [H6] Habermann, A. N., "Synchronization of Communicating Processes, " 
 
 Recovery from Deadlocks in Multi-process, Multiple Resource Systems , 
 TR-80, Department of Electrical Engineering, Princeton University, 
 Princeton, New Jersey. 
 
BIBLIOGRAPHIC DATA 
 SHEET 
 
 1. Report No. 
 
 UIUCDCS-R- 7^-619 
 
 3. Recipient's Accession No. 
 
 . I it U- and Sunt itle 
 
 Deadlock in Distributed Computer Networks 
 
 5. Report Date 
 
 December 1974 
 
 6. 
 
 Authors) 
 
 Thomas Jay Miller 
 
 8. Performing Organization Rept. 
 No. 
 
 Performing Organization Name and Address 
 
 Department of Computer Science 
 University of Illinois 
 Urbana, Illinois 
 
 10. Project/Task/Work Unit No. 
 
 1 1. Contract/Grant No. 
 
 NSF DCR72-03740 Al 
 
 2. Sponsoring Organization Name and Address 
 
 National Science Foundation 
 Washington, D.C. 
 
 13. Type of Report & Period 
 Covered 
 
 14. 
 
 5. Supplementary Notes 
 
 6. Abstracts 
 
 In this report, the theory of deadlock is studied to determine 
 how centralized resource management algorithms may best be adapted 
 for distributed control. The types of centralized resource 
 management schemes known to date are examined. A distributed 
 resource management scheme is proposed. When any process is 
 allowed to execute system procedures to handle resource management, 
 the main problem is one of synchronization. We discuss how algorithms 
 may be synchronized for distributed control and show how to do so for 
 an algorithm given in this report. Finally, we discuss the worst case 
 performance and cost of the various types of distributed resource 
 management as related to the algorithm employed for deadlock. 
 
 7. Key Words and Document Analysis. 17a. Descriptors 
 
 7b. Identifiers /Open-Ended Terms 
 
 7c. COSATI Field/Group 
 
 8. Availability Statement 
 
 19. Security Class (This 
 Report) 
 
 UNCLASSIFIED 
 
 20. Security Class (This 
 
 Page 
 UNCLASSIFIED 
 
 21- No. of Pages 
 
 22. Price 
 
 ORM N TIS-35 (10-70) 
 
 USCOMM-DC 40329-P ! 1 
 
wtf 5 
 
 19T7 
 
■B 
 
 IB gj|| 
 
 HS BB 
 
 ■vvvS 
 
 UNIVERSITY OF ILLINOIS-URBANA 
 
 510 84 ILJR no COO? no619 6?6H9M 
 (iulde in Inlormitlon tyslim 
 
 3 0112 088401143 
 
 HH 
 
 HHMH 
 
 BBa0llWBIHlllB OMHMH»WCWHBMHBaM 
 
 BSE SsSEaHH WmWnnH Wflw 
 
 SB Km Hi 
 
 ■ 
 
 m . 
 
 1 H HH 
 
 
 Ills HBH^ H 
 
 raaras&nns HI 
 
 DOsSk seam JEtiM