key: cord-0522029-oaznp2sx authors: Ferraro, Pietro; Zhao, Lianna; King, Christopher; Shorten, Robert title: Personalised Feedback Control, Social Contracts, and Compliance Strategies for Ensembles date: 2021-03-12 journal: nan DOI: nan sha: 23e2921ca228f418531634d5d4ad34626e112486 doc_id: 522029 cord_uid: oaznp2sx This paper describes the use of Distributed Ledger Technologies as a mean to enforce social contracts and to orchestrate the behaviour of agents in a smart city environment. Specifically, we present a scheme to price personalised risk in sharing economy applications. We provide proofs for the convergence of the proposed stochastic system and we validate our approach through the use of extensive Monte Carlo simulations. promised, and in good condition. Our basic proposal is to deploy digital tokens as a bond or deposit to ensure compliance: if the agent remains in compliance then the token is returned, otherwise the token is lost. This issue of compliance has often been approached indirectly using for example a game theoretic framework, or using some privacy averse voting or recommendation platform. The game theoretic approach is based on creating an equilibrium that incentivizes good behaviour; self-regulatory recommendation platforms simply seek, in principle, to name and shame miscreants. Both strategies are flawed and are subject to attack from resource-rich nefarious actors (for example by making false recommendations). Our approach is fundamentally different. Building on ideas from stochastic approximations, hyperbolic discounting, and notions of digital identity, we use distributed ledger (DLT) technology coupled with a suite of control algorithms to create a personalised economic commitment mechanism which will enforce compliance. We also design privacy preserving distributed personalized interventions which encourage good behaviour for parties which share and trade assets in cities [2] . It is worth also noting that this approach is well grounded in behavioural economics by the theory of hyperbolic discounting [3] , and can be implemented using DLT's that support Smart Contracts and which do not require transaction fees. Furthermore our work in this arena addresses the thorny problem of how to price compliance individually, by using a control theoretic approach.cThis is not a new idea per se as the use of control theory to model pricing signals have been applied in many different fields. As a first example, link pricing concepts are very common in networking [4] , and many stochastic signalling strategies can be interpreted as a price [5] , [6] , [7] . Moreover, a large body of literature has been published on the topic of dynamic pricing to increase the quality of service provided to customers in various domains. The term, commonly used to refer to this field, is Transactive control, i.e., the use financial transactions, as a feedback loop, to improve quality of service in various domains: some examples of work in this area can be found in [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] . Other specific instances of dynamic pricing are [18] (incentivizing users to schedule electricity-consuming applications more prudently), [19] (managing EVs charging and discharging in order to reduce the peak loads), [20] (combining the classical hierarchical control in the power grid with market transactions) and [21] (where the authors propose a transactive control system of commercial building heating, ventilation, and air-conditioning for demand response). While at first glance this may resemble traditional approaches to pricing, the methodology is actually quite different to that discussed elsewhere, both from a philosophical and technical perspective. In all the aforementioned works, in fact, the control mechanism is obtained using economic transactions in the form of dynamic toll pricing; what we propose is a form of dynamic deposit pricing: we explicitly price the risk of not using an asset correctly, rather than access to the asset, and we provide a rigorous methodology to design personalised interventions supporting aggregated regulation constraints. In this context the specific contributions of this paper are to: (i) present a control system for stochastic agents, enabled by the use of DLTs, to enforce a desired level of compliance to social contracts in a fair manner; and (ii) present a theoretical analysis that establishes the stochastic convergence of the proposed control system. A. Paper structure The remainder of this paper is organised as follows: In Section II we discuss why the use of Distributed Ledger Technologies is desirable when designing a compliance scheme. Sections III and IV describe the proposed control strategy and provide theoretical guarantees on its convergence. In Section V we show the effectiveness of the proposed approach through extended Monte Carlo simulations and, finally, Section VI summarizes the presented results and outlines future lines of research. II. PREAMBLE: DISTRIBUTED LEDGER TECHNOLOGIES, SOCIAL CONTRACTS, AND COMPLIANCE As discussed in the introduction, the basic idea advocated in this paper is to exploit the opportunities afforded by distributed ledgers (often referred to as DLTs; one example is Blockchain) when designing and enforcing social contracts. In particular, we wish to exploit and combine the notions of digital identity and smart contracts coming from DLTs with the rigorous design methods afforded by control theory, in order to design personalised interventions which can orchestrate the behaviour of an entire ensemble of human-like actors. In our context a Social Contract is a set of rules, or policies, designed to govern the interaction between humans, other humans, and societal infrastructures. For example, in the context of shared assets such as a pool of vehicles, a social contract might require that vehicles are returned to a specified location at a contracted time. Another example of a social contract is the requirement that plastic bottles are returned to point-of-sale after use. For example, in Germany and other countries, bottle deposits, or pfands, are used to encourage consumers to adhere to the social contract. We argue that in many cases such schemes do not lead to the desired resultss. Often, the minimal marginal cost of the pfand is not large enough to encourage good behaviour; instead, in many cases the plastic bottles end up in waste bins. This effectively creates a bounty for anybody who is willing to sorting through public rubbish bins in order to find these discarded bottles. The sight of people searching through rubbish bins is common in cities in the western world. So rather than incentivising responsible behaviour, the enforcement of this social contract has incentivised unfortunate people (who often belong to the most vulnerable in society) to sort rubbish on others' behalf. Often they must collect many bottles just to make minimum wage and the sorting ecosystem is in fact akin to a modern form of exploitation. In the case of bottle deposits, the enforcement mechanism is fragile in many ways: through the creation of a de-facto currency that can be redeemed by anyone; by not placing a high enough price on the deposit to incentivise compliance; and by not applying differential penalties to miscreants depending on levels of personalised compliance. A further issue in such systems is that they can be attacked by resource-rich nefarious actors. Anyone with enough resources and bad intent could attack the entire system by simply dumping many plastic bottles in the ocean, and thereby result in an increased pfand for everyone. The major impediment in solving the problems alluded to above are related to the inability to assign deposits to individuals. If we could do this in a privacy preserving manner we would be able to create systems where only those purchasing goods could redeem deposits. We could also create personalised interventions based on differential penalties, and be able to price risk of misbehaviour accordingly. However enforcing social contracts and nudging people to comply with rules for the greater good is a delicate and complex task. In particular the use of privacy-invasive mechanisms to micromanage behaviour is not acceptable and might lead to dangerous scenarios where a centralised authority violates the rights of individuals. How is it then possible to mitigate the risks while still maintaining the advantages for the community? One technology that offers great potential in this context is Distributed Ledger Technology. We believe that DLTs represent a significant step towards a democratization process for these compliance mechanisms. In particular, DLTs possess a number of properties that make them desirable in a context where the presence of a central authority might be detrimental: A DLT is nothing more than a shared database. By its very nature it is decentralised and therefore no central authority is required in order to achieve consensus amongst users. In DLTs, due to the cryptographic nature of the private address 1 , transactions are pseudo-anonymous. This ensures that users are able to protect their privacy behind a large number of transactions [22] . Of course, it is still feasible to trace back the original transaction and the true owner but, through randomization of the address, it is possible to make it expensive for malicious entities to trace the transactions. This makes DLTs robust from a privacy perspective, while at the same time allowing for the issuing of digital identities. Transactions in DLTs can be encrypted. This allows every agent to manage access to the data present in their own transactions. In our setting the only requirement is that the ownership of the tokens needs to remain accessible to the compliance control algorithms, whereas other information (e.g., user quality of service, statistics on the usage of the system) can be encrypted. This allows each user to maintain ownership of their data and to use them as they please (e.g., to monetize them at a later stage). Moreover, due to their intrinsic consensus mechanism, DLTs are also immutable (meaning that it is not possible for single users to alter the content of previous transactions) and resistant to doublespending (a potential flaw in a digital cash scheme in which the same single digital token can be spent more than once). All these properties make DLTs the ideal infrastructure for a distributed compliance system. Accordingly, the basic idea is to use digital tokens pegged at a stable fixed value, in the form of a cryptocurrency, to nudge users to comply with a social contract E. Token balances, and records of compliance on the ledger, associated with digital identities are the basis of the compliance system. To be more specific, let E represents a general statement, such as people should wear a face mask or each car needs to obey traffic signals, and its generality allows us to model a very large class of problems. The digital token is thus used as a bond, or digital deposit, to ensure that various agents comply with rule E: if the agent behaves according in compliance with the contract then tokens are returned in their entirety to the owner, whereas if the agent misbehaves the some tokens are lost. Repeat offenders may in fact pay more and lose more tokens. Thus, the risk of losing a token is the mechanism that encourages agents to comply with these social contracts. Note that this system is based on the loss of tokens and is not based on the willingness of agents to report misbehaviour to a central authority (such as the police). Rather than endangering civic rights, such systems should be regarded as protecting the rights of citizens by allowing a more nuanced and fairer form of punishment for indiscretions; fair in the sense that miscreants might pay more depending on the level of their compliance over time. Finally, in order to address the compliance issue, we will focus on the use of DLTs built around Directed Acyclic Graphs (DAGs) such as IOTA [23] [24] . This is due to the fact that such ledgers faciliate large transaction speeds, and are fee-less. In contrast, many standard payment systems (e.g., VISA, Mastercard) and classical Blockchain architectures (e.g., Bitcoin, Ethereum) require users to pay a fee for each transaction. This makes such systems inadequate to serve as the backbone for the proposed compliance scheme. It is essential that the deposited token is returned in its entirety to the owner in the event of full compliance with rule E: the proposed social compliance mechanism woudl break down if the agent were required to pay a fee every time a token is deposited or returned, as this would effectively erode the value of her tokens over time. The proposed architecture is shown in Figure 1 . The scheme is divided into three main components: the Distributed Ledger discussed in this section, which acts as the communication backbone for the whole infrastructure, the Physical Layer, in which agents interact with their environment in the setting of the social contract E, and finally the Controller, whose task is to regulate the price of the token bonds in order to achieve the desired level of compliance. The latter two components of the architecture will be the focus of the next two Sections. A complete discussion of DAG-based DLTs and their comparison to classical Blockchain is beyond the scope of this paper; the interested reader can refer to [2] [23] [25] [26] [27] for a thorough discussion of their properties. For the purpose of this paper, all we need to assume is that there is a fast and secure way to execute the deposit and retrieval of these token bonds. Before proceeding to the analysis and the modelling of the proposed framework it is worth stressing that the issue of compliance is often not incorporated into algorithms which are designed to regulate, control, and optimise city infrastructures. Many studies addressing human behaviour in this context assume full compliance with policies that have been engineered to optimally organise city infrastructures. As an example, consider traffic flow optimization: a crucial element that is often left out is that humans break rules, and the effect of this rule-breaking profoundly affects how cities operates and how well the engineered algorithms actually perform. As explained before we are interested in using DLT's to create a type of digital bond which will encourage compliance with social contracts. Figure 2 provides a visual representation of this basic idea. In order to engage with this social scheme each agent stakes an amount of tokens (to which some monetary cost is associated) that acts as a bond. The whole process can be made automatic by using smart contracts (these are distributed computer programs which execute as soon as certain conditions are met) [28] to carry out the operations of deposit and return of the tokens. All these operations are recorded on a DLT that is shared amongst all agents (anonymously). A basic question that arises is how to price this bond: namely, how many tokens should be required as a bond in order to assure compliance with a social contract? Clearly, if this number is too low, one can expect low levels of compliance (as in the case of the aforementioned plastic bottle deposits), and if it is too high, activity will cease and the social contract will be meaningless. In what follows, we shall develop a method for personalised pricing of the bond based on a feedback signal. The feedback signal will be designed so that aggregate levels of compliance satisfy some constraint. Before proceeding we present two examples of social contracts and show how they lead to different policy choices. The first example is akin to the traffic signal situation mentioned in the previous section. In this kind of application, a desirable policy might be the following: if you behave, your token is returned, otherwise you lose your token. The second example concerns an agent who enters and moves within a public building (such as an airport or a train station) where the social contract might represent a rule such as keep your mask on. Clearly, in this type of contract, if the agent behaves then all tokens are returned. But what should happen if the agent misbehaves? One policy might be to issue and redeem tokens at discrete intervals of time. This would punish participants who misbehave multiple times, and would also incentivise those individuals who remove their mask to wear it again so as to avoid further penalty. An alternative policy would be for a miscreant to lose their tokens, but for the pricing algorithm to operate at discrete intervals without further loss of tokens. In this case the miscreant is incentivised to wear the mask again so as to keep their personalised price as low as possible. Below we summarise some policies that are of interest to us. • Fixed penalty policy: Before participating in the social scheme each agent deposits a certain amount of tokens, the amount being set by the controller. When the action is completed or when the agent exits the scheme, all tokens are returned in the event that they complied with rule E; otherwise no tokens are returned to the agent. In the latter case the pricing algorithm continues to adjust the price based on both the agents' level of compliance and that of the network. • Adaptive penalty policies: Initially each agent deposits a certain amount of tokens, the amount being set by the controller. The contract is reissued at every time-step. At each time step, compliant agents retrieve their tokens, and stake new ones to continue the activity. Non-compliant agents lose all their tokens every time they do not comply. At all time steps, the pricing algorithm continues to adjust the price based on both the agents' level of compliance and that of the network. • Event driven policies: Initially each agent deposits a certain amount of tokens. Whenever the agent fails to comply with rule E the tokens are lost; in order to keep participating in the scheme the agent needs to deposit more tokens. In this version of the scheme the amount of tokens that are required as a bond changes over time (again a smart contract can easily take care of the update process). Clearly, these are just three possible policies that might be adopted by the issuer of a social contract, and many others are possible. Our main contribution in this paper is to develop a modelling and feedback control strategy to describe and enable a wide class of policies that include the three aforementioned ones. IV. MATHEMATICAL FRAMEWORK As previously explained, we are interested in designing a feedback mechanism to avoid scenarios in which the value of the bond is either too low (leading to non-compliance) or too high (meaning that agents would not engage in the scheme for fear of losing their tokens). The issue of finding this value is the subject of this section. We will use typical elements of control theory in a stochastic environment where a large Distributed Ledger Accordingly we consider n agents and, for each of them, we define independent binary random variables {M i (k)} n i=1 , for discrete values of k, such that P(i complies with rule E at time k) = P(M i (k) = 1) (1) Moreover, we assume that the probability of these events is entirely dependant on a constant q i , which represents the proclivity of each agent to comply with rules, and two control variables, C(k), c i (k). The variable C(k) + c i (k) represents the value of the token bond staked by agent i at time-step k. The combination q i + C(k) + c i (k) determines the likelihood that agent i will comply with the rule at time-step k + 1. Then, (1) can be expressed as with p : R −→ [0, 1] being a monotone increasing function (which is used to bind the probability between 0 and 1). C(k) and c i (k) represent, respectively, a global and an individual feedback signal whose purpose is to regulate the behaviour of each agent so as to achieve the desired level of compliance. Accordingly, we consider the following control laws, ∀k ∈ N and ∀i ∈ {1, . . . , n}, with α > 0 and β > 0 being two constants, Q * ∈ [0, 1] being the desired level of compliance and M i (k) representing a windowed time average of the compliance of agent i, defined as In this last expression, the factor (1 − γ) −1 plays the role of the length of the window for the average, with γ < 1. Notice that the proposed framework is very flexible and it would be possible to employ more sophisticated control laws. In this paper, however, we limit ourselves to the study of a proportional action and the extension to more complex feedback loops will be the subject of a future work. The reason to use both a global and an individual control signal, as opposed to just an individual or a global one, is that these two feedback signals achieve different complementary goals: • Fairness: Due to differences in individual behaviour, some agents are going to comply with rules less than others. This means that if only a global shared signal were used to control the behaviour of multiple agents, the signal would be driven up by the behaviour of the least complying users, and this would result in an unfair price for the most virtuous agents. On the other hand, the introduction of a personalised cost ensures that individuals are going to be priced according to their own behaviour (e.g., the less you comply the more you are going to pay, and vice versa). • Distributed trading of compliance levels: While the presence of a global cost is not necessary to guarantee the desired level of average compliance in the presented framework (because if every agent's compliance signal were equal to the target value Q * then the overall average compliance would be Q * ), its absence would make the system vulnerable to the repeated misbehaviour of malicious agents who purposely attempt to drive down the compliance level. The introduction of a global signal ensures that the system is able to achieve the desired level of compliance even in the presence of this kind of disturbance. In effect, the global cost allows compliant agents to compensate for non-compliant ones. This aspect is further explored in Section VI. • Pricing attacks: In traditional pricing models, even one nefarious agent could, in principle, drive up the cost for all agents simply by misbehaving. In view of the previous comment, a natural concern is that similar effects might be possible in our schemes. Fortunately, in our scheme such attacks are not possible. Even though non-compliant agents may drive up the control signal C(k) and hence drive up the cost of the bond, compliant agents will always receive their full deposit after expiration of a contract, leaving them unaffected by the increased price. On the other hand miscreants would continue to lose tokens while they drive C(k) to a high value. Furthermore, in the event that miscreants prevent the desired levels of compliance being reached, c i (k) may in fact tend to zero, further rewarding compliant agents. Remark: Before moving on, we want to point out that, since we are considering a DLT as the communication backbone of the whole architecture, the loss of a token is recorded on each agent's copy of the ledger. This means that everyone is aware at all times of the behaviour M i (k) of every other actor in the scheme (as the loss of the token implies non compliance at time k). Therefore, equations (3) can be validated by each user individually. Also, notice that equations (3) are consistent with the scenarios described in Section III. In this section we provide a theoretical analysis of the convergence properties of the stochastic processes {C(k), c i (k)}. We begin by noticing that M i (k) satisfies the following recursion formula We also define the function Let F(k) be the σ-algebra generated by the random variables {M i (j) : 1 ≤ i ≤ n, 1 ≤ j ≤ k}. Conditioned on F(k), the Bernoulli random variables {M i (k + 1)} (i = 1, . . . , n) are independent and the distribution of M i (k + 1) is Before proceeding we note from this preamble that we chose a specific form for the probability function p. However, we emphasize that this choice of a single defined function is merely to aid exposition and that our results can be extended to allow personalized probability functions {p i } for each agent, where in each these functions are non-decreasing uniformly Lipschitz function. Our main result is a concentration of probability for the time averaged compliance variables M i (k) around the target value Q * in the regime Accordingly we introduce a small parameter and a window size parameter w, and we allow α, β, γ to scale with , w as follows: where α 0 , β 0 are fixed constants. We will also assume that the parameter β 0 satisfies the following condition: Theorem 1. Suppose that the processes {C(k), c i (k), M i (k)} satisfy the recursion equations (3) and (6) and the parameters α, β, γ satisfy the relations (7) and (8) . Then there are positive constants B, 0 such that for all < 0 , for all initial values {c i (0), C(0)} satisfying 0 ≤ q i + C(0) + c i (0) ≤ 1, and for all δ > 0 and i ∈ {1, . . . , n}, Remark 1: The result of Theorem 1 says that the long-run average compliance level M i (k) ultimately stays close to the target value Q * , for every agent i. The crucial term is the factor on the right side of (9). By choosing sufficiently small the deviation between average compliance and Q * can be reduced to any desired level for all agents. Remark 2: Although we chose a specific form for the probability function p, our results can be extended to allow personalized probability functions {p i } for each agent, where in each case p i : R → [0, 1] is a non-decreasing uniformly Lipschitz function. We will establish a bound on the difference |M i (k) − Q * | for each agent i = 1, . . . , n. Accordingly we fix the index i and introduce the variables and we define the R 2 -valued process Then the equations (3) and (6) imply that Y (k) is measurable with respect to F(k) and satisfies the recursion relation is a R 2 -valued martingale difference, and G(k + 1) = (G 1 , G 2 ) ∈ R 2 is bounded and measurable with respect to F(k +1). Specifically we have Clearly G(k) is uniformly bounded and the function h is uniformly Lipschitz on R 2 , so there are constants K 1 , L such that We also define the associated deterministic dynamical system y(k) = (y 1 (k), y 2 (k)) on R 2 by y(k + 1) = y(k) + h(y(k)) (19) and observe that the system (19) has the unique fixed point y * = (Q * , Q * ) T . There is K 2 > 1 such that for all sufficiently small and all y(0) ∈ [0, 1] 2 , the sequence y(k) defined by (19) satisfies b) There are constants τ, B 1 < ∞ such that for all sufficiently small and all y(0) satisfying y 1 (0) ∈ [0, 1] and |y 2 (0)| ≤ 2K 2 , the sequence y(k) defined by (19) satisfies Lemma 2 will be proved in the Appendix. First we note that ξ i (k + 1) = 0 if X i (k) ≤ 0 or X i (k) ≥ 1, so that Y = y whenever Y is outside the square [0, 1] 2 . Now suppose that 0 ≤ Y 2 (k − 1) ≤ 1, Y 2 (k) ≤ 0 and define y(j) by the system (19) with y(0) = Y (k). It follows that for all j such that y 2 (l) ≤ 0, all 0 ≤ l ≤ j. Equation (20) implies that for all j such that y 2 (l) ≤ 0, all 0 ≤ l ≤ j, while (21) implies that y 2 (j) ≥ 0 for some j = O( −1 ). Therefore This bound applies to any excursion of the process Y (k) into the region Y 2 ≤ 0, and therefore we deduce that Note that 0 ≤ Y 1 (k) ≤ 1 for all k, and therefore Y (k) ≤ 1 + 4K 2 2 1/2 for all k. We now complete the proof of Theorem 1 using ideas and techniques from [29] , Chapter 9. We will use the norm The next Lemma addresses the martingale part of (12); its proof appear in the Appendix. We now use Lemma 3 to bound the difference between the process Y (k) and the sequence y(k). From (12) we deduce that Assuming Y (0) = y(0) we get θ(j + 1) + and therefore We now define T = −1 τ (where τ was introduced in Lemma 2(b)), and by applying the discrete Grönwall inequality we deduce that (29) For any k ≥ 1 we write k = mT + j where m, j are integers and 0 ≤ j ≤ T − 1. We define y(l) as the solution of (19) with initial condition y(0) = Y (k). Combining the bound (22) with (21) we deduce that By iterating this bound and using (22) we deduce that Also using Markov's inequality we get Therefore (9) Figures 3 and 4 show the results of the simulations in scenario I and II. Even by visual inspection it is clear that the lack of an individual signal to control the agents' behaviour leads to unfair results: while the overall compliance converges to Q * , this is achieved at the expense of the users that would behave better under normal circumstances (i.e., the users with larger q i ), that are forced to comply with a higher probability than Q * in order to compensate for the behaviour of the less compliant agents. Of course, this is undesirable and the use of the personalised cost, as shown in Figure 4 , tackles this problem by adjusting the individual price depending on the past behaviour of each agent. While in Scenarios I and II we explored how the lack of an individual cost leads to unfair results, it is less clear why a global cost is needed at all. In fact, equations (3) show that, when C(k) is set to zero, the personalised control signals would be sufficient to drive the average behaviour to the desired level of compliance. Nevertheless, without the global cost, the system might fail to achieve the desired target for compliance in scenarios when for some reason a certain number of agents fail to comply repeatedly with rule E. This could be due to malfunctions or malicious behaviour. This is highlighted in Figures 5, related to scenario III, where 10 % Fig. 3 . Compliance control using only the global signal. Compliance is enforced but at the expense of some agents that have to comply more than others. Compliance is achieved and the use of personalised signals make it so that every agent contributes in a fair way. of the agents, for k ≤ 100 does not comply with rule E and the system is not able to achieve the desired level of compliance Q * . On the other hand, in scenario IV, shown in Figures 6, it is possible to see that the presence of the global signal corrects this disturbance, thus making the system more robust to malfunctions and malicious behaviour (of course, the drawback is the the honest agents will have to comply more in order to compensate for the misbehaviour of the non compliant users). In this paper we explored the use of a feedback control system to regulate the behaviour of stochastic agents and to enforce the desired level of compliance, both globally and individually. The use of personalised feedback signals takes into account the behaviour of each agent and leads to fair regulation, with respect to each individual base compliance q i , whereas the global signal increases the robustness of the control system to malfunctions and malicious behaviour. We proved a theorem that establishes that the averaged compliance of each agent, under the proposed regulation scheme, will accumulate around the target compliance Q * and finally we validated our results through extensive Monte Carlo simulations. As per future lines of research we intend to provide theoretical results for the robustness of the proposed compliance control against malicious actors, explore different formulations of fairness to include, as an example, the economic status of each agent. Finally, we intend to extend our framework by using elements of game theory to take into account more complex scenarios. Ferraro, Zhao and Shorten are funded in part by the IOTA Foundation, by EPSRC project EP/V018450/1, and by Science Foundation Ireland grant 16/IA/4610 respectively. Finally we address the question of convergence of y(t) to z(t). It remains to show that the function y(t) satisfies (20) and (21) . For t = k (k integer) the function y can be written Letting j = −1 s , for all j ≤ s ≤ (j + 1) there is some a ∈ [0, 1] such that −1 s = ay(j) + (1 − a)y(j + 1) and therefore for some constant K 5 . Therefore we have the bound We now bound the difference between y and z as The Grönwall inequality now yields the bound Each excursion outside the square [0, 1] 2 has duration less than τ , so (38) and (51) imply that (20) holds with K 2 = K 2 + K 5 τ e Lτ . Now suppose y(0) satisfies y 1 (0) ∈ [0, 1] and |y 2 (0)| ≤ 2K 2 }. Let z(t) be the solution of (36) with z(0) = y(0). For sufficiently small we have 2K 2 ≤ 3K 2 , and therefore from (46) and (51) work, and was also the holder of a Marie Curie Fellowship. In 1996 he was invited to work as a visiting fellow at the Center for Systems Science, Yale University, commencing a long-standing research collaboration with Professor K. S. Narendra on the study of switched systems. Since returning to Ireland in 1997 as the recipient of a European Presidential Fellowship, Professor Shorten has been active in a number of theoretical and applied research areas including: computer networking; classical automotive research; collaborative mobility (including smart transportation and electric vehicles); as well as basic control theory and linear algebra. Professor Shorten is a co- Analytics for the sharing economy: Mathematics, Engineering and Business perspectives Distributed Ledger Technology for Smart Cities, the Sharing Economy, and Social Compliance Hyperbolic discounting The mathematics of Internet Congestion Control Electric and Plugin Hybrid Vehicle Networks: Optimization and Control Delay-tolerant stochastic algorithms for parking space assignment On classical control and smart cities A modelbased dynamic toll pricing strategy for controlling highway traffic A traffic congestion avoidance algorithm with dynamic road pricing for smart cities Dynamic traffic congestion pricing mechanism with User-Centric considerations Transactive Control in Smart Cities Residential transactive control demonstration Analytics and transactive control design for the pacific northwest smart grid demonstration project iParker-A New Smart Car-Parking System Based on Dynamic Resource Allocation and Pricing Transactive control of air conditioning loads for mitigating microgrid tie-line power fluctuations Smart buildings can help smart grid: Transactive controls Transactive control: a framework for operating power systems characterized by high penetration of distributed energy resources CTS2M: concurrent task scheduling and storage management for residential energy consumers under dynamic energy pricing Decentralized cloud-SDN architecture in smart grid: A dynamic pricing model A hierarchical transactive control architecture for renewables integration in smart grids: Analytical modeling and stability Transactive control of commercial buildings for demand response Security and privacy on blockchain The Tangle-Version 1.4.3 Equilibria in the Tangle On the resilience of dag-based distributed ledgers in iot applications Decentralized Assignment of Electric Vehicles at Charging Stations Based on Personalized Cost Functions and Distributed Ledger Technologies On the stability of unverified transactions in a DAG-based Distributed Ledger A next-generation smart contract and decentralized application platform Stochastic approximation a dynamical systems viewpoint APPENDIX 1) Proof of Lemma 2: For t ≥ 0 we define the function y(t) as the linear interpolation of the sequence defined at values { −1 j} byWe will show shortly that as → 0 the function y(t) converges to z(t) = (z 1 (t), z 2 (t)), which is the solution of the differential equationBefore showing this property we will first show that the solution z(t) converges to the fixed point y * = (Q * , Q * ) in a suitably uniform way. The system (36) changes at the boundaries z 2 = 0 and z 2 = 1 so in order to apply linear stability analysis we need to show that the solution eventually resides in 11 the square [0, 1] 2 . Inspection of the system (36) shows that the solution follows a trajectory that spirals clockwise around the fixed point y * = (Q * , Q * ). The component z 1 remains within the range 0 ≤ z 1 ≤ 1, so the set 0 ≤ z 1 ≤ 1 is invariant for the system (36). Furthermore in the region {z 2 ≥ 1} the solution of (36) has the formIf the solution enters the region {z 2 ≥ 1} from the region {z 2 < 1} then we can take z 2 (0) = 1 in (37). In this case (37) shows that z 2 (t) ≤ 1 + β 0 while the solution remains in the region {z 2 ≥ 1}, and also z 2 (t) ≤ 1 for some t ≤ w −1 (1 − Q * ) −1 , that is the solution must exit from the region {z 2 ≥ 1} by this time. Hence the duration of any excursion into the region {z 2 ≥ 1} is at most w −1 (1 − Q * ) −1 , and during this excursion the solution remains bounded, satisfying {1 ≤ z 2 ≤ 1 + β 0 }. Similar reasoning for the region {z 2 ≤ 0} allows us to conclude that there are constants τ 1 and K 2 > 1 such that if the initial value z(0) lies within the region {0 ≤ z 2 ≤ 1} and the solution subsequently leaves this region, then the solution z(t) returns to the region {0 ≤ z 2 ≤ 1} before time τ 1 , and during its excursion remains within the regionFurthermore, by using again (37) and the corresponding solution in the region z 2 < 0 we can conclude that there are constants τ 1 ,K 2 such that if the initial value z(0) lies within the region {−3K 2 ≤ z 2 ≤ 3K 2 }, then the solution z(t) returns to the region {0 ≤ z 2 ≤ 1} before time τ 1 , and during its excursion remains within the regionWe now consider separately the cases Q * ≥ 1/2 and Q * < 1/2. Suppose first that Q * ≥ 1/2, and consider the functionNote thatV = −2β 0 w (z 1 − Q * ) 2 in the square [0, 1] 2 , so V is a Lyapunov function in the square [0, 1] 2 . We defineThe condition β 0 ≤ 1 implies that E ⊂ [0, 1] 2 . Therefore if the solution z(t) enters E then it will remain thereafter inside the square [0, 1] 2 , and thus its future evolution is determined by the linear systeṁThis linear system can be written in matrix form as follows:It is easy to see that the matrix A is stable with eigenvaluesTherefore the condition β 0 > 0 implies that λ ± have negative real parts, and therefore the solution of (41) converges exponentially to the fixed point (Q * , Q * ). It remains to show that the solution z(t) enters the set E within a bounded time. Accordingly we define three closed subsets of the square [0, 1] 2 as follows:It follows from the definition of E thatRecall again that the solution spirals clockwise around the fixed point y * = (Q * , Q * ) If z(0) ∈ S 1 then z(t) must eventually reach S 2 , either directly from S 1 or after an excursion into the region z 2 > 1. We define τ 12 to be the supremum over all starting points z(0) ∈ S 1 of the time until first entering the set S 2 . These times depend continuously on z(0) and S 1 is compact, therefore τ 12 < ∞. Similarly τ 23 < ∞ is the maximum time to reach S 3 starting from S 2 , and τ 31 < ∞ is the maximum time to reach S 1 starting from S 3 . Therefore there is τ 2 ≤ τ 12 + τ 23 + τ 31 such that starting from any point z(0) in [0, 1] 2 , the solution z(t) will reach the interval S 2 ∩ S 3 within the time τ 2 . Finally if the trajectory starts at a point z(0) satisfying 1 ≤ z 2 (0) ≤ 3K 2 then it must enter the square [0, 1] 2 in the region S 2 , and so must reach the interval S 2 ∩ S 3 within the time τ 2 +τ 1 . The same bound holds if the trajectory starts in the region −3K 2 ≤ z 2 (0) ≤ 0.Since S 2 ∩ S 3 ⊂ E this shows that the solution z(t) enters the set E within time τ 2 + τ 1 , starting from any point in the region −3K 2 ≤ z 2 ≤ 3K 2 . As noted before the system is a contraction in E, so for any r > 0 there is some τ (r) < ∞ such that for all z(0) ∈ E, z(t) − y * ≤ r z(0) − y * for all t ≥ τ (r).We now define R 1 = {z : 0 ≤ z 1 ≤ 1, −3K 2 ≤ z 2 ≤ 3K 2 }. As noted before, for sufficiently small, for all z(0) ∈ R 1 the solution z(t) remains in the regionWe defineThen for all z(0) ∈ R 1 , we know that z(t) reaches E by latest time τ 2 + τ 1 , and then is contracted after time τ (r). So we have z(t) − y * ≤ r ρ z(0) − y * for all t ≥ τ (r) + τ 2 + τ 1Finally we choose r ≤ (2ρ) −1 and τ = τ (r) + τ 2 + τ 1 and conclude that for all z(0) ∈ R 1 ,A similar argument applies when Q * ≤ 1/2.