key: cord-029325-7zceop25
authors: Li, Xiao; Houshmand, Farzin; Lesani, Mohsen
title: Hampa: Solver-Aided Recency-Aware Replication
date: 2020-06-13
journal: Computer Aided Verification
DOI: 10.1007/978-3-030-53288-8_16
sha: 
doc_id: 29325
cord_uid: 7zceop25

Replication is a common technique to build reliable and scalable systems. Traditional strong consistency maintains the same total order of operations across replicas. This total order is the source of multiple desirable consistency properties: integrity, convergence and recency. However, maintaining the total order has proven to inhibit availability and performance. Weaker notions exhibit responsiveness and scalability; however, they forfeit the total order and hence its favorable properties. This project revives these properties with as little coordination as possible. It presents a tool called [Formula: see text] that given a sequential object with the declaration of its integrity and recency requirements, automatically synthesizes a correct-by-construction replicated object that simultaneously guarantees the three properties. It features a relational object specification language and a syntax-directed analysis that infers optimum staleness bounds. Further, it defines coordination-avoidance conditions and the operational semantics of replicated systems that provably guarantees the three properties. It characterizes the computational power and presents a protocol for recency-aware objects. [Formula: see text] uses automatic solvers statically and embeds them in the runtime to dynamically decide the validity of coordination-avoidance conditions. The experiments show that recency-aware objects reduce coordination and response time.

Replicated objects [12, 13, 23, 32, 45] are pervasively used for fault-tolerance, availability, responsiveness and scalability. They are used in diverse application areas [14, [20] [21] [22] 37, 39, 40, 50, 53] including embedded controllers, online services and game engines. However, coordinating the replicas has proven to be challenging. Strongly consistent replication, provided by consensus protocols such as Viewstamp [42] , Paxos [34] and Raft [44] , guarantees the same total order of operations across replicas. The total order simultaneously provides a hoard of favorable properties: integrity, convergence and recency. Replicas converge to the same state as the result of the same sequence of operations. Further, a propagated operation executes in the same state as the originating replica. Therefore, if an operation preserves the integrity properties [8] at the originating replica, it This project was supported by the NSF grant #1942711.

will certainly preserve them in the other replicas as well. In addition, the lockstep execution keeps the replicas recent: an operations executes in all replicas before the next. Thus, replicas can be stale by at most one operation.

However, strong consistency may not be available and responsive during network failures or offline use. Further, its scalability is limited. The tradeoff between strong consistency of replicated objects, and their availability and responsiveness is a famous dilemma [1, 3, [26] [27] [28] . Therefore, system designers opted for weaker notions of consistency such as eventual [4, 15, 17, 19, 24, 25, 48, 52] and causal [2, 13, 33] consistency that can provide availability, responsiveness and scalability but lose the same total order of operations. Several projects [16, 49, 51] provide programming interfaces for weak consistency notions. Unfortunately, the large collection of subtle weak consistency notions is unintuitive to users. If the chosen notion is too weak, it can affect correctness, and if it is too strong, it may degrade scalability.

Therefore, researchers have recently provided high-level abstractions to shield the user from low-level complexities of weak consistency. These projects seem to be the steps towards reviving the same three pillars of consistency, i.e. integrity, convergence and recency, with as little coordination [7, 35, 47] as possible. CRDTs [48] revived convergence. If an object satisfies a few algebraic properties, its replication can enjoy convergence even on top of eventual consistency. However, the replicas can experience states that violate the integrity properties. Therefore, follow-up projects revived the integrity property. CISE [29] and Soteria [41] present proof techniques to verify the integrity properties of a replicated object. Sieve [36] , Indigo [10] and Hamsaz [30] translate the given high-level integrity properties to hybrid models. However, they are oblivious to state recency. The operations are eventually delivered to all replicas, however, they may be arbitrarily delayed. Some updates may be delivered too late and expose the clients to stale data. On the other hand, at the expense of more communication, some updates may be immediately sent and delivered. However, applications may prefer to obtain more scalability and energy efficiency in return for bounded staleness. In fact, many applications such as ticketing, distributed sensors and network accounting can work with fairly recent data. Previous work such as TACT [55] , TRAPP [43] , FRACT [59] , and PBS [9] considered staleness but did not address integrity and communication minimization. Further, they did not provide automatic analysis, decision and synthesis. In addition to convergence and integrity, this project, Hampa, revives recency. Given a sequential object with the declaration of its integrity properties and recency requirements for its methods, it automatically synthesizes a correct-by-construction replicated object that guarantees integrity, convergence and recency while avoiding unnecessary coordination.

To capture object specifications from the user, we present a relational language and its denotational semantics. The language provides a complete set of relational operators to define the object methods and integrity properties, and allows the user to declare recency requirements for the return value of each method. Given a principled object specification, we present a syntax-directed analysis that infers optimum staleness bounds for each element of the state.

We present the conditions required to simultaneously preserve the three properties: convergence, integrity and recency. These conditions are used to define a novel operational semantics of replicated objects that provably preserve convergence, integrity and the inferred staleness bound. We observe that recencyawareness not only guarantees a limit on the staleness, but also allows buffering of calls and reduces the coordination required to preserve integrity.

We characterize the computational power of recency-aware replicated objects. We show that recency-aware objects have the same power as the perfect failure detector. We present a novel protocol for recency-aware replicated objects that implements the semantics. We use off-the-shelve SMT solvers both statically and embed them at runtime to decide the validity of coordination-avoidance conditions. We present a tool called Hampa that given an object definition, analyzes the object and instantiates the protocol to synthesize replicated objects. Our experiments with the synthesized objects show that the staleness bound has an inverse relationship with the coordination and response time.

In summary, this paper presents the following contributions: (1) A relational object specification language that captures integrity and recency declarations, and its denotational semantics (Sect. 2). (2) The coordination conditions and the operational semantics of replicated systems that simultaneously preserve convergence, integrity and recency (Sects. 3 and 4). (3) A syntax-directed analysis that infers optimum staleness bounds for each element of the state (Sect. 5). (4) The characterization of the computational power and a protocol for recency-aware replicated objects, (Sect. 6). (5) The Hampa replicated object synthesis tool and its experimental results (Sect. 7). All the proofs are available in the appendix [5] .

Language. Figure 1 shows our core relational language for object specification. An object is a record Σ, I, M that includes a state type Σ, an invariant I on the state, and a set of methods M. The state can be a tuple of natural number Nat and relation Rel types. The invariant I is a boolean function on the state. A method m is a function from the parameter x and the pre-state x 1 , .., x n to a record of e g , e u , e r . The guard e g is a boolean expression that captures the semantic preconditions of m such as conditions on the arguments. The expressions e u and e r are for the post-state and the return value. We use guard, update and retv as functions that extract elements of this record. For each method, the user declares an integer as the staleness bound for its return value. A method call c is a method applied to its argument i.e. it is a function from the current state to a record of e g , e u , e r .

An expression e is either a value v (that can be either a number n or a relation R), a variable denoted by x, an application of the operators {+, −, =, <, &, !} to operand expressions where & is the conjunction and and ! is the negation operator, a selection σ λ x .e (e ) that binds the attributes of each element of the relation e to the variables x and returns the elements that satisfy the condition e, a projection Π λ x . e (e ) that for each element of the relation e , binds its attributes to the variables x and calculates a tuple of elements e and returns the set of resulting tuples, a union e ∪ e that results in a relation with elements of both of the relations e and e , a difference e \ e that results in a relation with the elements in the relation e that are not in the relation e , and the Cartesian product e × e that results in a relation with pair elements where the first and second elements are in the relations e and e respectively. The language supports a complete set of relational operators: any relational algebra expression can be expressed by a combination of them. Selection (σ), projection (π), union (∪), difference (\), product (×) and renaming (ρ) are a complete set of operators. We note that since the language uses functions with argument names, a renaming operator is unnecessary. The update and join operations are defined as a syntactic sugar. The update operation U λ x . e, e e returns a relation that updates each element of e that satisfies the condition e to the tuple e . The join e 1 λ x1,x2 . e e 2 results in pairs of elements of e 1 and e 2 that satisfy the condition e. Figure 1 presents a denotational semantics for expressions. The semantics for values, variables, and binary and unary operations is standard. The semantics of the selection expression σ λ x .e (e) is the set of tuples t in the semantics of e such that substitution of the attributes x in e with their corresponding values in t evaluates to true. The semantics of the projection expression Π λ x . e (e ) is a set of tuples, one per each tuple t in the semantics of e : a tuple resulted from substituting x with t in the expressions e and evaluating them. The semantics of union, difference and product are standard from the set theory. We define the difference Δ between two values as follows: the difference between two natural numbers is the absolute value of their subtraction i.e. Δ(n, n ) = |n − n |; the difference of two relations is the size of their symmetric difference i.e. Δ(R, R ) = |R \ R | + |R \ R|. We use delta δ to represents the staleness of a value that is the difference between the value and its target value. The delta for a completely recent (or exact) value is zero. For a call c, the weight weight(c) is a bound on the difference that the execution of c can make on the state of the object. In other words, for every call c, we have ∀σ. Let , σ , := c(σ) in Δ(σ , σ) < weight(c).

Running Use-Case. Figure 2 shows the movie booking use-case. The state of the object is the two relations reservation rs and movie ms. The reservation relation rs stores the movies that the users have booked; it is the pairs of users u and movies m. The movie relation ms stores the number of available spaces for each movie; it is the pairs of movies m and spaces a. The integrity property I is a conjunction of three conditions: (1) The movie in ms should be unique. (2) The referential integrity requires that every movie in rs exists in ms. ( 3) The number of available spaces for every movie should be non-negative. The object provides five update methods and three query methods. Given a user u and a movie m, the method book adds the pair to rs and decrements the available spaces for m in ms. Similarly, the method cancelBook removes a reservation and increments available spaces. Given a movie m, the method offScreen removes the corresponding tuple from ms. Given a movie m and a number n, the method specialReserve subtracts n from the available spaces for m in ms. The dual method increaseSpace adds n to the spaces for m. Given a movie m, the method querySpace returns the number of available spaces for m. The method queryReservations returns the set of movies that the given user has booked. Given a user u, the method querySpaces returns the pairs of movies and their available spaces for the movies that u has booked. The staleness bound for the update methods is specified as 0. The returned none constant ⊥ is always exact. The bound values 1 , 2 and 3 of the query methods represent the number of tuples that are different between the current state and the pending stable state of the result relation. To reduce communication, certain calls can be executed locally and buffered, and the buffer can be communicated to other replicas later. As an example, in Fig. 3(a) , the first two calls to the method increaseSpace do not exceed the staleness bound for ms and can be buffered. However, the third call exceeds the bound and cannot be added to the buffer. Therefore, the buffer is flushed to other replicas and the third call is blocked until an acknowledgement for the delivery of the buffer is received. All the calls of the buffer can be sent in a single message and the acknowledgement for them can be sent in a single message as well.

Let us now consider the interaction of buffering with coordination. We will see that buffering (staleness) interestingly reduces the coordination required for the conflicts. (We will define conflicting calls that should be synchronized later in Sect. 3.) Fig. 3(b) and (c) show the same execution without and with buffering respectively. In Fig. 3(b) , the first replica rep 1 executes the sequence of calls increaseSpace, specialReserve and increaseSpace. The method increaseSpace does not conflict with any other method; therefore, calls to it are simply broadcast. The method specialReserve conflicts with itself and the method book; therefore, the call to it goes through synchronization. The second replica rep 2 calls book that conflicts with four other methods. Hence, it should synchronize. (The synchronization reaches to other replicas, blocks calling the four methods, and propagates previous calls to those methods.) In this example, the conflicting specialReserve call in rep 1 should be propagated to rep 2 before the book call can be executed.

In Fig. 3 (c), the recency bound allows the three calls of rep 1 to be buffered. Replicas use SMT solvers at runtime to check the validity of three properties for the buffers: all-S-commutativity, invariant-sufficiency and let-P-Rcommutativity that we will formally define in Sect. 3. In this example, the buffer is invariant-sufficient if the number of spaces that the call specialReserve decrements is less than the number that the increaseSpace calls increment. Therefore, the buffer can be sent to other replicas without any additional synchronization; the invariant in the pre-state is sufficient for the invariant in its post-state. We note that the call specialReserve that previously went through synchronization does not need any synchronization inside the buffer. Further, the let-P-Rcommutativity property of the buffer guarantees that the book call will preserve the integrity after the buffer. Thus, the synchronization of the book call that previously waited for the specialReserve call does not need to wait anymore.

In this section, we present the coordination conditions for replicated objects that preserve the three properties: convergence, integrity and recency. The state of the given sequential object is replicated across replicas. Clients can request method calls at every replica, and replicas coordinate the calls. Convergence is the safety property that when all pending updates are processed, the replicas converge to the same state. Integrity is the safety property that every method call is executed only on a state where the guard of the method and the invariant are satisfied. Recency is the safety property that bounds the difference between the state of a replica and its impending state after the pending calls are applied.

The state of each replica is initialized to the same state σ 0 that satisfies the invariant I. The replica that accepts the request for a call from the user is called the originating replica of the call. We uniquely identify requests by identifiers r. We use the two maps call and orig that map request identifiers to the method call and originating replica respectively. The execution history of a replica is modeled as a permutation of a set of request identifiers. An execution x of a set of requests R is a bijective from positions [0..|R| − 1] to R. We denote the range of x as R(x). An execution x of R defines the total order ≺ x on R: A request r precedes another request r in an execution x written as r ≺ x r iff x −1 (r) < x −1 (r ). A replicated execution xs is a function from replicas N to executions. The post-state of each call at a replica is the result of applying the call to its pre-state.

We first revisit the coordination conditions for convergence and integrity [30] , and then present coordination conditions for recency and their impact on the prior conditions.

A replicated execution is convergent if the state of the replicas is the same after all the calls are propagated. Out of order delivery of method calls at different replicas can lead to divergence of their states. Method calls such as special reservation specialReserve and increasing space increaseSpace result in the same state if their order of execution is swapped. However, the resulting state of the two method calls book and cancelBook is dependent on their execution order. Therefore, they should synchronize.

Integrity. The body of each method relies on the invariant in the pre-state. Further, methods have explicit guards that declare their pre-conditions. We say that a method call enjoys integrity at a state if the invariant and the guard of the method hold in that state.

Method calls should be executed only in states that they have integrity in. The integrity condition is simply lifted to executions and replicated executions: An execution enjoys integrity iff every request in it enjoys integrity.

In contrast to integrity that requires the invariant to hold in the pre-state, permissibility requires it to hold in the post-state. The post-state of a call is the pre-state of the next call in a replica. Further, the initial state is assumed to satisfy the invariant. Therefore, if every call is permissible in its pre-state, then every call enjoys integrity. By induction, permissibility leads to integrity.

To execute a method call, we check that it is permissible at its originating replica. Thus, we say that each method call is locally permissible. Otherwise, the call is aborted or delayed. Still, if the call is simply broadcast, it is not necessarily permissible when it arrives at other replicas. Some calls need coordination.

There are calls such as increaseSpace that are always permissible as far as they are applied to a state that satisfies the invariant. Increasing the space cannot result in a missing or duplicate movie or a negative number for available spaces. Thus, if it is broadcast and executed on another replica, it is sufficient that the pre-state satisfies the invariant to preserve it in the post-state.

However, not all calls are invariant-sufficient. For example, a book call may be permissible in a replica but may become impermissible in another when it is executed after an already executed offScreen call for the same movie. These two calls should synchronize to preserve integrity. Nonetheless, some pairs of calls such as offScreen and specialReserve do not affect each other's permissibility. (In the running example, specialReserve has no guards. After an offScreen call, it remains permissible as it doesn't find the movie and leaves the relation unchanged).

The call c 1 P-Rcommutes with the call c 2 written as

If a call c 1 is invariant-sufficient or P-R-commutes another call c 2 , then the call c 1 will stay permissible when it is propagated and applied to another replica even if c 2 is executed before it in that replica.

The call offScreen P-concurs with the call specialReserve; however, the call book P-conflicts with the call offScreen.

We say that two calls concur iff they both S-commute and P-concur with each other. Otherwise, we say they conflict and need synchronization.

A pair of calls c 1 and c 2 concur iff they S-commute and P-concur with each other. Otherwise, they conflict c 1 c 2 .

Dependency. As we saw above, invariant-sufficient method calls can always preserve the invariant. However, there are calls whose preservation of the invariant is dependent on the calls that have executed before them at that replica. For example, taking the movie off-screen offScreen is dependent on cancelling the last booking cancelBook. If offScreen is moved left before cancelBook, it can become impermissible. Nonetheless, taking a movie off-screen offScreen is independent of the previous special reservations specialReserve.

A call c 2 P-L-commutes a call c 1 , written as c 2 ← P c 1 iff for every σ, if P(update(c 1 )(σ), c 2 ) then P(σ, c 2 ).

A call can avoid tracking dependencies to another call if the former is invariant-sufficient or P-L-commutes with the latter.

If c 1 is executed before c 2 in the originating replica of c 2 and c 2 is dependent on c 1 , then c 2 should be applied to other replicas only if c 1 is already applied.

Recency. Calls executed at a replica may be delayed in the network before they are executed in other replicas. Further, they may be buffered at the originating replica to reduce communication. The pending calls for a replica are the calls that have executed in other replicas but not at that replica yet. The staleness of a replica is the difference of its current state and its state after applying its pending calls. Given a bound , a replica is sufficiently recent if its staleness is less than . The calls that have originated in the current replica n but have not been received yet by another replica n make the state of n stale. To bound the staleness of n by , the staleness imposed to n by the calls originated by each of the other |N | − 1 replicas should be bounded by /(|N | − 1). The difference that these calls can make is bounded by the sum of their weights (defined in Sect. 2). The staleness bound can be evenly divided between the replicas. However, in general it can be distributed unevenly and even dynamically. In particular, replicas that tend to issue updates more often can get a larger share.

Given a recency bound, a buffering quota can be calculated for each replica and the recency bound can be preserved when calls are buffered. Buffering calls can reduce communication; however, it can affect the convergence and integrity properties. To preserve these properties a buffer should have three properties: allstate-commutativity, invariant-sufficiency and let-P-R-commutativity. We consider each condition in turn.

The calls of the buffer are executed locally and are not synchronized with other replicas. Therefore, if the buffer is not all-S-commutative, concurrent execution of S-conflicting calls in other replicas can lead to divergence. Similarly, if the buffer is not invariant-sufficient, concurrent execution of P-conflicting calls in other replicas can lead to impermissibility of the buffer when it is propagated and executed in other replicas. The buffer in Fig. 3(c) is all-S-commutative: it includes increaseSpace and specialReserve calls that result in increasing or decreasing the space for movies; the result is S-commutative with respect to all method calls. Further, it is invariant-sufficient if the net result of its calls is a non-negative addition to the space of each movie. For example, if the increaseSpace calls add s spaces and the specialReserve calls subtract s spaces from the same movie where s ≤ s, then the net effect is adding spaces and the buffer is invariant-sufficient.

Calls in other replicas are checked to be permissible with no knowledge of the buffered calls in the current replica. Let-P-R-commutativity of the buffer of the current replica guarantees that the calls in other replicas will continue to be permissible once they are propagated and executed after the buffer in the current replica. The buffer in Fig. 3(c) is let-P-R-commutative; it may only increase the number of spaces that cannot make any call impermissible.

In this section, we define the operational semantics of replicated objects where (1) the integrity property I on the state of each replica is always preserved, (2) replicas converge to the same state once all the calls are propagated, and (3) the staleness of each replica is always bounded by . The semantics declares the conditions for execution and propagation of method calls on the replicated object to guarantee the three properties. In particular, it represents the conditions for local buffering of method calls to avoid communication while preserving the recency of the other replicas. In Sect. 5, we will see a static analysis that infers staleness bounds for the state. In this section, the semantics preserves the inferred staleness bound for the state σ of the object. (For objects with multiple pieces of state, the staleness of each piece can be tracked separately.) The semantics strives to concisely define the conditions; we will present the protocols that implement these conditions in Sect. 6. Fig. 4 shows, the global state of the replicated system is represented as a world w that is a tuple of h, t, xs, orig, call . The hosts h is a mapping from replica identifiers N to the local state of replicas. Each call is assigned a unique request identifier r at the originating replica. The two maps call and orig keep a mapping from request identifiers to the call and the originating replica of the request respectively. The state of each replica is a statement s ∈ S, the state of the object σ ∈ Σ, and the identifier r ∈ R of the current buffer. A statement s is either x ← c; s that is the sequence of a call c and another statement s , or the terminal statement skip. A call c is the application of a method m to an argument expression e. A call can also be the identity call id that leaves the state unchanged. (It is assumed that client statements do not make id calls.) The network t is the set of packets that are sent but not yet delivered. A packet p contains the identifier of the destination replica n and the request identifier r of the call. If a packet is transmitting a buffered call, it is decorated with an asterisk * . The history xs is a mapping from replica identifiers N to the list of request identifiers of the calls that are previously applied to that replica. The initial value of the world state is w 0 where each replica n hold its initial statement s n , the initial state σ 0 of the object that satisfies the integrity property I, and an empty buffer. Empty buffers are represented by mapping the buffer identifier r n of each replica n to the identity call id. Figure 5 presents the operational semantics. The rule Call executes a method call c at a replica n. The call c can be executed if the following conditions hold. (1) To preserve integrity, the call c should be locally permissible P(σ, c) in the current state σ. (2) To preserve convergence and integrity, any pair of conflicting calls should have the same order across the replicas, a property that we call conflict-synchronization. Thus, to execute a new request r, the rule Call requires the condition ConflictSyncInit: any call r that is already executed in another replica n and conflicts with the current call r should have been already executed in the current replica n. Otherwise, once the calls r and r are propagated and executed on the other replicas, they will have different orders in the two replicas n and n . (3) To preserve recency, this rule requires the condition InBound: the difference that the pending calls from the current replica n can make to the state of every other replica n should be bounded by / (|N | − 1) . If the conditions above hold, a fresh identifier r is created for the call, the history xs and the maps orig and call are updated to reflect the new call, a packet is sent in the network t to every other replica, and the variable x is substituted with the returned value v of the call in the continuation statement s of the current replica.

The rule Deliver delivers a call that has been sent to the current replica. It requires two conditions: conflict-synchronization and dependency-preservation.

(1) Similar to the rule Call, conflict-synchronization requires ConflictSync: if a conflicting call r is executed before the received call r in another replica n , then r should have been already executed before r in n as well. (2) To preserve integrity, the dependencies of calls should be preserved. Thus, the dependencypreservation condition DepPres requires that a call r originated from a replica n is executed in the current replica n only if the calls r that have been executed before r in n and r is dependent on r should have been already executed in n.

Recency-aware replication can be applied to any object, but it can improve performance when there are method calls that can be buffered. The rule Cal-lLocal executes a call but locally buffers it. Similar to the rule Call, it first checks the local permissibility of the call c. Since a buffered call is not immediately coordinated with calls in other replicas, it should satisfy the three properties (that saw in Sect. 5) to make it concur with any call: (1) all-statecommutativity AllSComm, (2) invariant-sufficiency InvSuff, and (3) let-P-Rightcommutativity LetPRComm. The identifier of the current buffer is r; the current → (s, σ , r ) ], t, xs , orig, call) call c is composed with the current buffered call call(r) to result in a composed call c for the updated buffer. The composition · of calls simply cascades their updates to the state. The all-state-commutativity condition is stated for single calls c (that implies the same condition for the composed call c as well). This condition is required for the call c because there might be other calls delivered between the last buffered call and the currently buffered call c. The call c should state-commute past the calls in between. Further, as explained for the rule Call, the condition InBound requires that the added staleness remains within bound. If the above conditions hold, the map call is updated with the new buffer call c , and the identifier r of the buffered call is added to the history xs, if the buffer was empty and the current call c is the first buffered call.

The rule SendBuffer sends the buffer to every other replica and resets the buffer. Packets transmitting buffers are decorated with an asterisk. The rule DeliverBuffer receives a packet containing a buffer. As we saw in the rule CallLocal, buffers are checked to be invariant-sufficient in the originating replica. Therefore, on receiving a packet containing a buffer, in contrast to the rule Deliver, the rule DeliverBuffer does not checks the dependency-preservationDepPres and the conflict-synchronization ConflictSync conditions.

The following lemmas state the three properties of the semantics. The following lemma states that once the buffers are flushed call(r) = call(r ) = id and the messages are delivered t = ∅, the replicas converge to the same state. For all h, n, n , σ, σ , r and r 

The following lemma states that every call enjoys the integrity property. For all h, n, r, c, w and σ , σ, then integrity(σ, c) .

The staleness of a replica is the difference of its current state and its state after applying its pending calls from others (buffered calls and in transit calls). The following lemma states that the stateless of every replica is bounded by . 

In Sect. 4, we presented an operational semantics that preserves a given staleness bound for the state. The users declare the recency that they expect from the return value of each method of the object. The specified bounds for the methods can be used to infer the bounds for the elements of the state. In this section, given an object specification that includes recency declarations for the methods, we present a static analysis that infers optimum staleness bounds for each element of the state. We present a syntax-directed analysis that derives recency constraints between bound variables for the state elements. A solution to the constraints assigns a bound value to each state element such that if every state element keeps its staleness bound then the result of every method call respects the recency declaration of the method. The optimum solution maximizes the (weighted) sum of the bounds to increase buffered calls and hence decrease communication. Fig. 6 . Bound constraint derivation Figure 6 presents the constraint inference rules for the object language that we saw in Fig. 1 . A delta bound δ is either a natural number n, a delta variable dx, or addition or multiplication of two deltas. A constraint C is equality or comparison of two deltas, or conjunction of two constraints. A delta environment Γ is a mapping from variables to delta variables or values. The judgements are of the following forms: the judgement o C states the bounding constraint C for the object o, the judgement m C states the constraint C for the method m, and the judgement Γ e δ, C states that under the delta environment Γ, the staleness of the expression e is bounded by δ when the constraints C are satisfied. The rule CObj states that the constraint for an object is the conjunction of the constraints for its methods. (We assume that the state variables passed to all the methods are renamed to the same variables σ 1 , .., σ n .) The rule CMet infers the constraints for a method by first, inferring the constraints for its return expression under a delta environment where the argument is mapped to the delta value of zero (exactly recent) and the state variables σ i are mapped to delta variables dσ i to be inferred, and second, bounding the return value. The rule CVal assigns the delta value zero to values with no constraints. (Values are exact.) The rule CVar retrieves the bindings for delta variables from the environment. The rule COp states that the delta for the result of the operators {+, −, ∪, \} is the sum of the delta of its operands. On the other hand, the rule CBOp requires the operands of the boolean operators {=, <, &} to be exact and states that the result is exact as well. We elide the similar rule for the unary negation operator !. The rule CSel requires the selection condition to be exact and states that the delta of the resulting relation is the same as the input relation. In other words, the resulting relation is stale by the same number of elements as the input relation. Similarly, the rule CProj states that the delta of the resulting relation is the same as the input relation. On the other hand, the rule CProd states that the delta for the resulting relation is the multiplication of the deltas for the input relations. In our running example, let us associate the bound variables drs and dms to rs and ms respectively. The constraint inferred for querySpace is dms ≤ 2 , for queryReservations is drs ≤ 1 , and for querySpace that involves the join operator (product and selection) is drs × dms ≤ 3 . More detailed explanation for these derivation is available in the appendix [5] .

We now define the notion of sufficiently-recent states. Intuitively, a state is sufficiently-recent with respect to the target state if the difference of the return value of every method call on that state versus the target state is within the declared bound of the method.

A state v 1 , .., v n is a sufficiently-recent state with respect to the target state v * 1 , .., v * n for an object o iff for every method def m(x)( σ 1 , .., σ n ) e g , e u , e r of o, and every argu-

The following lemma states that the bound inference presented in Fig. 6 is sound. In other words, if the inference derives the constraints C for an object, for any solution S of C, if the staleness of each state element σ i of the object remains within the bound S(dσ i ), then the state remains sufficiently-recent. 

There may be many solutions for the derived constraints, and hence, many sound state bounds that preserve the user-specified bounds for the object. However, solutions that allow more staleness (albeit appropriately bounded) are more favorable since they allow more buffered calls and require less communication. Thus, a candidate objective function to maximize is dσ 1 + .. + dσ n . In other words, what are the largest delta bounds for the state elements that still preserve the recency specifications of the methods? This function gives the same weight to all the state elements; however, some may be updated more frequently. Let f i be the relative update frequency of the state element σ i . Frequencies can be obtained from historical logs or profiling. The objective function is defined as the following weighted sum dσ 1 /f 1 + .. + dσ n /f n . More frequently updated state elements are given proportionally larger bounds. In our running example, let 1 = 3, 2 = 4, and 3 = 6. If the update frequency of rs is twice as ms, the optimum solution is drs = 3 and dms = 2. It is obvious that the objective function can be easily translated to a linear function by multiplying the least common denominator of the frequencies.

Now, we show that recency-aware objects are stronger than the perfect failure detector abstraction [18] and present a protocol that implements recency-aware objects using perfect failure detectors. These two results show that recency-aware objects have the same computational power as the perfect failure detector.

The perfect failure detector abstraction P notifies processes about the crash of the other processes in a synchronous network. It has the following properties: Liveness: Every crashed process is eventually detected by all correct processes. Safety: No correct process is ever suspected by other processes. The recencyaware object R has the following liveness and safety properties. Liveness: If the user makes a request to a correct replica, it eventually responds. Safety: Executed calls that are yet pending for each correct replica is bounded. The following lemma states that P is reducible to R and also its opposite, R is reducible to P. 

return False return True indication (rb, deliver(n, buff(buff ))) if (self = n) exec(buff ) issue request (pl, send(n, ack(buff ))) indication (pl, deliver(n, ack(c)))

issue request (pl, send(orig(c), ack(c))) issue indication ret(c, v) Fig. 7 . Recency-aware protocol

For the proof of the first conjunct, consider two replicas rep 1 and rep 2 . We show by contradiction that rep 1 will eventually know whether rep 2 has crashed. We assume the opposite. Consider an execution where rep 1 has already executed a set of requests R and receives another request r from the user, such that the pending set R ∪ {r} makes a difference in the state of rep 2 that pushes it outof-bound. By the contradiction assumption, rep 1 is never informed when rep 2 crashes. Therefore, if rep 1 does not hear from rep 2 , the following two scenarios are indistinguishable to rep 1 . (S 1 ) The replica rep 2 has crashed. (S 2 ) The replica rep 2 is too slow. The replica rep 1 has the following two choices: (C 1 ) The replica rep 1 waits to hear from rep 2 about receiving a request in R before processing and responding to r. (C 2 ) The replica rep 1 processes and responds to r. If the protocol makes the choice C 1 , it might be the scenario S 1 and then the liveness property is violated. If the protocol makes the choice C 2 , it might be the scenario S 2 and then the recency bound for rep 2 is violated. The second conjunct, directly follows from the protocol. We briefly describe the protocol in Fig. 7 that implements a recency-aware replicated object. The full description of the protocol is available in the appendix [5] . Given an object definition, the protocol benefits from both static and dynamic coordination analysis to guarantee convergence, integrity and recency. To reduce communication, replicas try to execute the calls locally while maintaining the staleness bound . Each replica keeps its locally executed calls in a buffer buff before they are broadcast. Replicas send an acknowledgement ack to the originating replica once they receive and execute a call or a buffer of calls. Each replica rep keeps a map called pending p from each replica rep to the set of pending calls sent from rep to rep . When a replica originates a call c, it adds c to its local pending set for each of the other replicas; once it receives an acknowledgement for c from a replica rep , it removes c from the set of pending calls for rep . Each replica keeps the set of correct replicas up, and removes a replica from the set if the prefect failure detector pfd issues a crash event for that replica. A requested call can be executed only if it does not push the pending set for any correct replica out of the bound. Otherwise, it cannot be immediately executed and is kept in a waiting queue wq to be retried later, and further, the buffer is sent to the other replicas and is reset to accelerate the shrinking of the pending set. To decide whether a call can be executed locally, the conditions of the rule CallLocal of the operational semantics (Sect. 4) are checked. The set of state-conflicting methods SConf that is statically calculated is consulted to check if the call is allstate-commutative. The validity of the two conditions invariant-sufficiency and let-P-R-commutativity of the buffer (after the new call is added) are dynamically decided by a solver at run-time. If the conditions do not hold, the call is coordinated with other replicas using the basic blocking coordination protocol bro [30] that guarantees integrity and convergence but not recency.

We have implemented the analysis and protocol as a synthesis tool called Hampa.

We applied it to two use-cases: the bank account use-case (with the withdraw, deposit and balance methods and the integrity property of non-negative balance) and the movie booking use-case (Fig. 2) . The experiments show that as the staleness bound increases, the coordination overhead and response time of recency-aware objects is decreased. Further, recency-aware objects are twice as responsive as sequentially consistent counterparts. 

We measure two comparison criteria: coordination load and response time. At the lower layers, the protocol reduces to three communication primitives: total-order-broadcast (TOB), reliable-broadcast (RB) and point-topoint links (P2P). To measure the coordination overhead, we separately count the number of different types of messages that replicas send during the execution of their requests. The response time for a call is the duration between the time that the client requests the call and the time that the user receives the return value.

We performed three experiments. In the first experiment, we study the effect of increasing the staleness bound on the coordination load. We report the ratio of the number of messages that the protocol sends for the bound under test over the number of messages that it sends for the base-line bound. (The base-line recency bound is the maximum weight of the calls. The baseline allows every single call to be buffered.) In the second experiment, we study the effect of increasing the staleness bound on the response time of each method. Finally, in the last experiment, we compare the response time of our protocol with the baseline recency, with the sequential consistency (SC). SC uses total-order broadcast for all the methods. Figure 8(a) and (c) show the effect of increasing the staleness bound on the coordination load for the two use-cases. As the staleness bound is increased, the ratio of the messages sent by RB, TOB and P2P decreases. Figure 8 (a) (bank account), shows 88% decrease in the number of messages sent to RB when the bound is increased from 20 to 200. Likewise, the TOB and P2P ratios decrease by 78% and 90%, respectively. In Fig. 8 (c) (movie booking), buffering helps to reduce TOB calls by 40% across the experiments. This decrease, however, unlike the bank account use-case, is steady over different bounds. This is because it is more difficult to "buffer" in the movie booking use-case. There are no S-conflicts in the bank account use-case and hence two out of two update methods can be buffered. However, S-conflicts in the movie use-case allow only 2 out of 4 update methods to be buffered: increaseSpace and specialReserve. Also, we observe that the number of RB and P2P messages decrease by at most 10%. (d) shows the effect of increasing the staleness bound on the response time for the two usecases. In Fig. 8 (b) (bank account), the response time of withdraw and deposit methods decrease by 71% and 75%, respectively when the staleness bound is increased from 20 to 200. The withdraw method is the least responsive method. The reason is that it has a self-conflict and requires synchronization if it cannot be buffered. In Fig. 8(d) (movie booking), we observe slight increase in response time for the book method while increasing the bound from 2 to 20. This is because the book operation cannot be buffered due to the S-conflict with other methods and has to be synchronized. On the other hand, the response time of the specialReserve method decreases by 33% when the bound is increased from 2 to 20. The reason is that it has a selfconflict and if it cannot be buffered, it should be synchronized by the TOB and TOB incurs a high coordination overhead. Therefore, as buffered calls increase and the use of TOB decreases, the response time is significantly improved. The response time of the increaseSpace method also benefits from recency awareness; it decreases by 72%. The methods book and cancelBook have conflicts. In the blocking protocol that Hampa uses, the method book handles synchronization; therefore, the method cancelBook just broadcasts the request. As the recency bound is increased, the network is less crowded and therefore, the response time of cancelBook is decreased. Figure 9 compares the response time of recency-aware objects with the baseline bound with the sequentially consistent objects. The SC protocol synchronizes all the calls and orders them with respect to each other. However, Hampa minimizes coordination while preserving convergence, integrity and recency. We observe that the response time speedup is in average as high as 2× and 1.8× for the bank account and movie use-cases respectively. More experiments are available in the appendix [5] . In particular, they show that the runtime cost of SMT solving is only 0.2% to 1% of the average response time.

Epsilon serializability [46] allows concurrent execution of updates with queries and bounds the difference of the inconsistent values that are observed in these executions and the consistent values that would be observed in a serializable execution. In contrast, Hampa preserves the integrity of the state, bounds staleness, allows different orders in different replicas, and formally defines the difference for relational operators.

In TACT [54] [55] [56] [57] [58] , operations return tentative values; they might be eventually reordered to preserve strong consistency. TACT bounds the numeric error between the tentative and final return values. The user specifies the granularity of the bounded object "conit" and the strength of the protocol. On the other hand in Hampa, the states are final and enjoy integrity provided on top of weak consistency. Further, the staleness bound with respect to the pending future state is automatically optimized with static and dynamic analyses.

In AQuA [31] , given a query and a staleness bound, the master server dynamically selects a recent enough server to service the query. Similarly, TRAPP [43] finds recent enough servers for different parts of data that are needed for the query. FRACS [59] allows operations to be buffered at replicas up to a given threshold. In contrast to Hampa, these projects do not guarantee integrity and convergence, and do not automatically infer the staleness bounds. PIQL [6] bounds the number of key-value store operations for each query trading the precision of the result for performance. However, it does not consider the staleness of replicas.

To reduce synchronization, PBS [9] communicates with only a partial quorum of replicas to bring a total order to operations, and probabilistically bounds the staleness of the observed states. In contrast, Hampa performs synchronization with full quorums but only for conflicting calls, and allows different orders for replicas. Further, it analyzes and synthesizes replicated objects and supports relational in addition to single-key operations.

The trade-off between consistency and latency presented as PACELC [1] aligns with our experiments. As the consistency decreases (staleness bound increases), the latency decreases (responsiveness increases). Warranties [38] and Homeostasis [47] allow local updates if they keep the validity of certain assertions. Although other replicas can rely on the validity of the assertions, the staleness of their state is not bounded. In contrast, Hampa maintains a staleness bound. Further, it exploits weak consistency and guarantees convergence.

This paper presented a relational object specification language that captures the integrity and recency requirements of the object. It presented a syntax-directed analysis that given a specification, infers optimum staleness bounds. In addition, it presented the coordination avoidance conditions, operational semantics, a protocol and a synthesis tool for replicated systems that guarantee convergence, integrity and recency. The recency-aware protocol embeds a solver to decide whether coordination avoidance is safe and increases the responsiveness.

Consistency tradeoffs in modern distributed database system design

Causal memory: definitions, implementation, and programming

Protocol-aware recovery for consensus-based distributed storage

Syntax and semantics of the weak consistency model specification language cat

PIQL: success-tolerant query processing in the cloud

Coordination avoidance in database systems

Feral concurrency control: An empirical investigation of modern application integrity

Probabilistically bounded staleness for practical partial quorums

Putting consistency back into eventual consistency

CAV 2011

PRACTI replication

Replication and fault-tolerance in the ISIS system

Automated conflict-free distributed implementation of component-based models

Verifying eventual consistency of optimistic replication systems

Formalizing and checking multilevel consistency

Replicated data types: specification, verification, optimality

Introduction to Reliable and Secure Distributed Programming

Monotonicity types for distributed dataflow

PNUTS: Yahoo!'s hosted data serving platform

Spanner: Google's globally distributed database

Dynamo: Amazon's highly available key-value store

PSYNC: a partially synchronous language for fault-tolerant distributed algorithms

Monitoring weak consistency

Weak-consistency specification via visibility relaxation

Impossibility of distributed consensus with one faulty process

Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services

Perspectives on the CAP theorem

cause i'm strong enough: reasoning about consistency choices in distributed systems

Hamsaz: replication coordination analysis and synthesis

An adaptive quality of service aware middleware for replicated services

Providing high availability using lazy replication

Time, clocks, and the ordering of events in a distributed system

The part-time parliament

Conflict-aware replicated data types

Automating the choice of consistency levels in replicated systems

Making georeplicated systems fast as possible, consistent when necessary

Warranties for faster strong consistency

Don't settle for eventual: scalable causal consistency for wide-area storage with COPS

Stronger semantics for low-latency geo-replicated storage

Proving the safety of highly-available distributed objects

Viewstamped replication: a new primary copy method to support highly-available distributed systems

Offering a precision-performance tradeoff for aggregation queries over replicated data

In search of an understandable consensus algorithm

Flexible update propagation for weakly consistent replication

A formal characterization of epsilon serializability

The homeostasis protocol: Avoiding transaction coordination through program analysis

A comprehensive study of convergent and commutative replicated data types

Declarative programming over eventually consistent data stores

Transactional storage for georeplicated systems

Consistency-based service level agreements for cloud storage

Eventually consistent

Replication-aware linearizability

Design and evaluation of a continuous consistency model for replicated services

Efficient numerical error bounding for replicated network services

Combining generality and practicality in a conit-based continuous consistency model for wide-area replication

The costs and limits of availability for replicated services

Minimal replication cost for availability

Trading replication consistency for performance and availability: an adaptive approach

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.