On the Instability of Bitcoin Without the Block Reward Miles Carlsten carlsten@cs.princeton.edu Harry Kalodner kalodner@cs.princeton.edu S. Matthew Weinberg smweinberg@princeton.edu Arvind Narayanan arvindn@cs.princeton.edu ABSTRACT Bitcoin provides two incentives for miners: block rewards and transaction fees. The former accounts for the vast ma- jority of miner revenues at the beginning of the system, but it is expected to transition to the latter as the block rewards dwindle. There has been an implicit belief that whether miners are paid by block rewards or transaction fees does not affect the security of the block chain. We show that this is not the case. Our key insight is that with only transaction fees, the variance of the block reward is very high due to the exponentially distributed block arrival time, and it becomes attractive to fork a “wealthy” block to “steal” the rewards therein. We show that this results in an equilibrium with undesirable properties for Bitcoin’s security and performance, and even non-equilibria in some circumstances. We also revisit selfish mining and show that it can be made profitable for a miner with an arbitrarily low hash power share, and who is arbitrarily poorly connected within the network. Our results are derived from theoretical analysis and confirmed by a new Bitcoin mining simulator that may be of independent interest. We discuss the troubling implications of our results for Bitcoin’s future security and draw lessons for the design of new cryptocurrencies. 1. INTRODUCTION The security of Bitcoin’s consensus protocol relies on min- ers behaving correctly. They are incentivized to do so via mining revenues under the assumption that they are ratio- nal entities. Any deviant miner behavior that outperforms the default is thus a serious threat to the security of Bitcoin. Miners receive two types of revenue: block rewards and transaction fees. The former account for the vast majority of miner revenues at the beginning of the system, but it is expected to transition to the latter as the block rewards dwindle (specifically, they halve every four years). There has been an unexamined belief that in terms of the security of the block chain (including incentives of the mining game), it is immaterial whether miners receive (say) 25 bitcoins in each block as a block reward or 25 bitcoins in expectation as transaction fees. Illustrative example (Figure 1). Imagine a popula- This is an extended version of our paper that appeared at ACM CCS 2016. Some of the figures have been updated with more accurate versions due to improvements to our simulator. Figure 1: One possible state of the block chain and two possible actions a miner could take. tion of rational, self-interested miners. Consider a block chain with blocks of exponentially distributed rewards, as we expect when the fixed block reward runs out. A miner has numerous options to consider when mining, but let’s fo- cus on just two possibilities. She could extend the longest chain (Option One), obtaining a reward of 5 and leaving a reward of 0 for the next miner (at least until more transac- tions arrive). Alternatively, she could fork it (Option Two), obtaining reward of 55 while leaving a reward of 50 Bitcoin unclaimed. The Bitcoin protocol dictates Option One, but a quick reasoning suggests that Option Two is better. To reason about this correctly, we must consider which strategies the other miners are using. For instance, if all other miners follow the heuristic of mining on the block they heard about first in the case of a 1-block fork (and if there is no latency in the network), then forking is ineffective, and Option One is clearly superior. On the other hand, since other miners are rational, perhaps they will choose to build on the fork instead of the older block, in which case Option Two would yield more rewards. Examples like these reveal novel incentive issues that sim- ply don’t arise when block rewards are fixed. The goal of this paper is to understand the potential impact on Bitcoin’s sta- bility by investigating the mining game in the regime where the block reward has dwindled to a negligible amount, and transaction fees dominate mining rewards. We find new and surprising incentive issues in a transaction-fee regime, even assuming that transactions (and associated fees) arrive at a steady rate. To be clear: the incentive issues we uncover arise not because transaction fees may arrive erratically, but because the time-varying nature of transaction fees allows for a richer set of strategic deviations that don’t arise in the block-reward model. At a high level, there is an analogy with pool hopping [22]. With certain mining pool reward schemes, the miner’s ex- pected reward for participation varies over time, depend- ing on how many shares have been contributed since the pool found its last block. The concern is that miners would respond by “hopping” in real time to the pool that max- imizes their expected rewards. For another illustration of this theme, consider a future where there are multiple cryp- tocurrencies with time-varying rewards which can be mined by the same hardware. Perhaps this will give rise to coin- hopping, i.e., miners hopping to the cryptocurrency with the largest transaction fee pool. Contribution 1: A mining strategy simulator. While we establish a number of theoretical results in Sections 5 and 6, the variety of possible parameters and assumptions makes it completely infeasible to pose a perfectly accurate Game-Theoretic model of Bitcoin that is also tractable. To fill the gaps and to confirm our theoretical results, we’ve built a mining strategy simulator. Theoretical results in simple yet principled models provide good intuition to guide practice, and simulations of more complex scenarios confirm that these results have applicability to more realistic models where mathematical proofs are intractable. Miners in our simulation learn over time which strate- gies are successful using no-regret learning algorithms that iteratively update a probability distribution over strategies (Section 4.2). Our simulator is versatile and allows model- ing different numbers of miners, hash power distributions, network latencies, and reward schemes. We show how it allows researchers to quickly prototype and study new set- tings within this parameter space. The simulator does have limitations: it cannot model mining pools or a non-constant arrival rate of transactions. We have made the simulator open source.1 In addition to the versatility of settings, our simulator allows exploring a large space of mining strategies, defined by the miner’s responses to three questions: which block to extend, how much of the outstanding transactions to include in the block, and when to publish found blocks. We define a formal language to compactly express any strategy in this space (Section 4). Contribution 2: Undercutting attacks. The focus of this paper is on analyzing deviant mining strategies in the transaction-fee regime that can harm Bitcoin’s security. We begin with the observation that if there is a 1-block fork, it is more profitable for the next miner to break the tie by extending the block that leaves the most available transaction fees rather than the oldest-seen block. We call this strategy PettyCompliant. Once any non-zero fraction of miners is PettyCompli- ant, it enables various strategies that are more aggressive and harmful to Bitcoin consensus. We call this the undercut- ting attack, where miners will actively fork the head of the chain and leave transactions unclaimed in the hope of incen- tivizing PettyCompliant miners to build on their block. In some scenarios, our simulation reveals a non-equilibrium with increasingly aggressive undercutting. But with an ex- panded strategy space, and suitable assumptions, we are able to prove that an equilibrium exists. However, it is one where miners include only a fraction of available transactions 1https://github.com/citp/mining simulator into their blocks. This results in a backlog of transactions whose size grows indefinitely with time. We confirm this result using simulation. Accurately predicting the steady-state mining behavior requires modeling a vast number of variables such as miners’ cost structure, and is not the goal of our work. Instead, our results can be seen as an informal “lower bound” on the departures from compliant behavior that are likely in a transaction-fee regime. We can realistically predict that PettyCompliant miners will arise, and that the existence of such miners opens the field for various more aggressive strategies (Section 5). Contribution 3: Revisiting selfish mining. We re- visit the selfish mining strategy of Eyal and Sirer [9] and show that, contrary to intuition, it performs even better in the transaction-fee regime than in the block-reward regime. Next, we propose a more sophisticated selfish mining strat- egy that accounts for the non-uniformity of rewards and out- performs both default mining and “classic” selfish mining. Worse, unlike classic selfish mining, this strategy works for miners with arbitrarily low hash power and regardless of their connectedness in the Bitcoin network. Moreover, the attack is profitable as soon as it is deployed, whereas classic selfish mining only becomes profitable after a two-week dif- ficulty adjustment period, arguably giving the community a crucial window of time to detect and respond to such an attack [10]. We validate these results via both theory and simulation (Section 6). Impact on Bitcoin security. If any of the deviant min- ing strategies we explore were to be deployed, the impact on Bitcoin’s security would be serious. At best, the block chain will have a significant fraction of stale or orphaned blocks due to constant forks, making 51% attacks much eas- ier and increasing the transaction confirmation time. At worst, consensus will break down due to block withholding or increasingly aggressive undercutting. This suggests a fundamental rethinking of the role of block rewards in cryptocurrency design. Nakamoto appears to have viewed the block reward as a necessary but temporary evil to achieve an initial allocation of bitcoins in the absence of a central authority, with the transaction fee regime being the ideal, inflation-free steady state of the system. But our work shows that incentivizing compliant miner behavior in the transaction fee regime is a significantly more daunting task than in the block reward regime. Perhaps instead, de- signers of new cryptocurrencies must resign themselves to the inevitability of monetary inflation and make the block reward permanent. Transaction fees would still exist, but merely as an incentive for miners to include transactions in their blocks. 2. RELATED WORK Several recent works analyze incentives in Bitcoin min- ing. Some examples include [12] and [8], which analyze how strategic mining pools may attack competing pools in vari- ous ways, and [16], which analyzes how strategic Ethereum miners can trick others into wasting their computational power verifying the validity of complex scripts. Understand- ing miner incentives in the Bitcoin system is important — there is empirical evidence that miners/mining pools are willing to attack others in order to maximize their own prof- its (e.g. launching DDoS attacks against other pools) [24]. Eyal and Sirer develop the selfish mining attack [9], a de- viant mining strategy that enables miners to get more than their fair share of rewards. We build on their results in Sec- tion 6. Other works, notably Sapirshtein et al. [23] have analyzed selfish mining in more detail using Markov Deci- sion Processes (MDP). In an MDP, a player moves through a discrete state space and tries to maximize reward (the state- transition function and reward function are probabilistic). This makes it a good fit for modeling Bitcoin mining. In the fixed-reward model, states are discrete. In the transaction fees model, states are continuous, so we cannot apply MDP machinery directly. Still, our analysis takes an MDP-like approach. In more recent work, Kiayias et. al. [13] perform a theoretical analysis of various selfish mining strategies in the fixed-reward model, and proves that when miners are sufficiently small, the default mining behavior is an equilib- rium. There is some work on understanding the market for trans- action fees and its relation to the block size (i.e. what fees will users have to pay in order for transactions to be in- cluded in a block?) [14, 11, 21, 18]. Our work avoids this discussion; we show that undesirable behavior emerges even if the market reaches an equilibrium where transaction fees are non-negligible, and arrive steadily and reliably. Inter- estingly, Möser and Böhme reach the same conclusion as us (that monetary inflation is a preferable mechanism to trans- action fees) through very different methods [18]. On the simulation side, numerous prior works have devel- oped simulators for some aspect of Bitcoin. Some simulators are aimed at aspects of Bitcoin aside from strategic min- ing, such as privacy [3], or the peer-to-peer network [17]. Those developed in [9] and [8] also focus on simulating de- viant mining strategies, but our understanding is that these simulators are tailor-made for the specific deviant strategies they wish to test. In comparison, our simulator allows for easy implementation of a broad range of strategies in var- ious environments. Indeed, the versatility of our simulator is crucial for getting intuition for every result in this paper. We have made it open-source and hope it will be a useful tool for future research on strategic miner behavior. 3. MODEL AND STRATEGIES In this section, we cover the model of Bitcoin that we investigate. We will use this model to quickly illustrate how the switch to transaction-fee dominated rewards may lead to interesting and potentially harmful effects for Bitcoin. We also introduce a formal language for describing Bitcoin strategies that we will use throughout the paper. 3.1 Model of the system Briefly, let us describe the theme of our model before get- ting into specific details. The goal of this work is not to accurately predict exactly what mining behavior will arise in practice, but instead to uncover incentive issues that arise solely due to the time-varying nature of transaction fees ver- sus block rewards. To this end, our model is intentionally simple because we want to isolate the effects of time-varying versus fixed rewards. As an example, we will assume that transactions (and their associated fees) arrive at a constant and continuous rate. We make this assumption not be- cause we necessarily predict it will hold in practice, but because without it we can’t guarantee that we’ve isolated time-varying transaction fees as the cause for any incentive issues we uncover. Put another way, our results are only made stronger by simplifying assumptions, because we are claiming that weird and undesirable consequences arise even if one is willing to grant simplifying assumptions. Getting to details, the model of Bitcoin that we analyze is after the block reward has dropped to zero. That is, trans- action fees are the only source of revenue for miners, and we model available transaction fees as arriving to the Bit- coin system at a constant rate. Specifically, we assume that for any time interval I of length t, the total sum of transac- tion fees for transactions announced during I is t (the choice of t instead of ct for some constant c is just normalization). This is different from Bitcoin as it is today with a large block reward compared to the small transaction fees, but this sce- nario is consistent with the vision of the long-term steady state behaviour of Bitcoin after all Bitcoins have eventually been minted. We also assume that the difficulty is set so that a hash puz- zle is solved by someone in the network every one time unit in expectation (this is again just a normalization). Addition- ally, for simplicity, in our theoretical results and reported simulations we model the network having no latency (unless otherwise stated). Once a miner publishes a block, all other miners immediately gain knowledge of it. Similarly, once a transaction is announced, all miners immediately learn of its existence. However, our simulator is capable of simulating latency of both types, and we do not see any substantive change in our results as latency changes. Finally, we assume that when there are R transaction fees available, the miner can choose to include any real-valued number of transaction fees between 0 and R in their block. That is, transactions are fine-grained enough that a miner can selectively choose a set of transactions whose fees are very close to whatever real-valued target they have in mind. We believe this is a reasonable approximation due to the large number of transactions per block. We also assume that miners always have space to include all available transactions. If the block size is not large enough to meet demand for transactions, we believe the qualitative content of all our results continue to hold, but the quantitative impact is mitigated. This belief is supported by the following data, taken from the most recent 1000 blocks (roughly one week’s worth) as of July 11, 2016: of these 1000 blocks, 702 are full. Of the full blocks, the total sum of transaction fees ranges from 0.03 BTC to 4.51 BTC. The mean is 0.49 BTC and the standard deviation is 0.25 BTC, more than half the mean. It’s unclear how to extrapolate these data to the future, but it is clear that there will indeed be fluctuation in the available fees that fit in a block. So if the block size is not large enough to meet demand for trans- actions, even though the available fees immediately after a block is found will not be zero (as in our analysis), they may be significantly lower than (say) ten minutes later. So even though our exact analysis will not apply in this setting, the intuition does carry over. 3.2 What could go wrong? The mining gap Without a block reward, immediately after a block is found there is zero expected reward for min- ing but nonzero electricity cost, making it unprof- itable for any miner to mine. In order to provide insight as to how time-varying rewards Figure 2: Illustration of Mining Gaps. Miners will only mine when the instantaneous expected reward exceeds the instantaneous cost. could be harmful for Bitcoin, let’s walk through an example. Imagine that we are in the model previously described, that all miners are using the default compliant strategy (mine on top of the longest chain, authorize all available transactions, publish immediately), but also that that miners have some cost in electricity to run their mining rigs (i.e., running one rig for t units of time costs pt Bitcoin worth of electricity). Now, immediately after a block is found, there will be no more transactions in the network to be claimed by a miner making the next block. This means that for the instant fol- lowing the discovery of a new block, there is actually zero expected reward for mining, but a non-zero electricity cost for doing so! Figure 2 shows how to extend this reasoning to the time period beyond. Essentially, every instant your rig is running, you claim some expected reward, which increases depending on the available transaction fees. But every in- stant your rig is running, you also have to pay a constant amount for electricity. So the expected reward for running your rig won’t exceed the cost of electricity until some min- imum number of transaction fees are available to include. If a is the fraction of the total (effective) hash power that a single rig generates, then a miner must wait t = p/a time steps after a block is found before mining becomes profitable again. In Appendix A, we discuss in more detail the effects of such a mining gap, and find that it leads to miners mining for a smaller and smaller fraction of the time between the ar- rival of blocks (with the difficulty dropping to compensate). Clearly, this would have a negative impact for Bitcoin secu- rity, as the effective hash power in the network would drop, and it would become easier for a malicious miner to fork. Of course, turning a rig on and off every ten minutes may be practically infeasible. Nevertheless, this analysis illustrates that strategic miners might look for ways to deviate when the default protocol would have them wasting electricity to mine a near-valueless block. 3.3 Formal language for mining strategies In the rest of this paper, we focus on mining strategies that always mine within the same cryptocurrency, but may deviate from the default protocol in choosing how to build blocks and what to do with them once they’re found. We consider a variety of known and novel Bitcoin mining strate- gies. All of these can be formalized into the same general structure. At each instant, every miner makes several dis- tinct decisions: • Which block to extend. • How much of the available transactions (and associated fees) to include in the block they are solving. • For each unpublished block, whether or not to publish. The first decision is which block to extend. As an ex- ample, the default compliant miner chooses to mine on the longest chain that they are aware of, and in the case of multi- ple blocks that are tied for the longest chain, they will favor mining on the first of these blocks that they became aware of. This decision forms the basis for how a mining strat- egy will determine which side of a fork it wants to support, or, alternatively, if the miner wants to create a new fork. The next decision is how much of the available transaction fees to claim. Again, as an example, the default compliant miner will include all of the unclaimed transaction fees they are aware of in their block. The final decision is when to publish blocks. When a miner mines a block, only they are aware of its existence. At each moment, miners can choose whether or not to alert the other miners of the block that they have found. This allows for mining strategies where miners intentionally choose to not reveal their blocks (such as selfish mining [9]). We define the following concepts in order to more rig- orously describe the mining strategies: First, for a set of transactions T, we will abuse notation and use T to also denote the total transaction fees included for transactions in T. For a block, B, we will denote Tx(B) to be the set of transactions included in block B, and Rem(B) to denote the remaining transactions after block B. That is, Rem(B) contains all announced transactions in that are not included in B or any of its predecessors (thus, this is a set that varies over time). We will also use Height(B) to denote the height of a block (i.e. the height of a chain that ends at block B), denoting by H the height of the current longest chain that has been announced,2 and Owner(B) to denote the miner that produced block B. When a miner m is deciding which block of height i to extend in the case of a tie, all strategies considered in this pa- per first select a block that they themselves mined (Owner(B) = m). Also, all strategies in this paper avoid mining multiple blocks at the same height, so if a block with Owner(B) = m at height i exists, it would be unique. If m did not pro- duce any blocks at height i, the default client would then select the first block that m became aware of. So we define Oldestmi to be the unique block of height i produced by miner m if it exists, or the first block of height i that m became aware of. Note that if i = H, then this is the block m would extend using the default strategy. We also define Mosti to be the block of height i that maximizes the remain- ing transaction fees (formally: argmaxB|Height(B)=i{Rem(B)}). Note that while Rem(B) changes over time, the block Mosti can only change if a new block of height i is published. Fi- nally, we denote by Mostmi the block of height i produced by m (if it exists), or the block of height i that maximizes the remaining transaction fees otherwise. 2So for instance, if a chain of height 2 has been announced, but some miner is privately storing a chain of length 10, we would define H = 2. We can now formally define mining strategies we consider. We model strategies as time-driven (rather than event-driven): in every infinitesimally small time step, the miner must de- cide which block to extend (denoted by Mining(m)), what set of transactions to include, and for each of their own unpublished blocks, whether to publish. Note that by pub- lishing a block B, we mean ensuring that every node in the network is aware of B and all its predecessors, and aren’t concerned with exactly what physical measures m takes to ensure this. In this language, the default mining strategy would be formalized as follows: DefaultCompliant: The default Bitcoin mining strategy, including all avail- able transactions, mining on the end of the longest chain, choosing the older block in a tie, and publishing all blocks. Which Block: Mining(m) = OldestmH . How much: include Rem(Mining(m)). Publish(B)?: yes. 4. MINING STRATEGY SIMULATOR In order to more clearly analyze what the game theoretic landscape will look like once the Bitcoin mining incentive be- comes transaction fee based instead of block reward based, we have developed a versatile Bitcoin mining strategy sim- ulator.3 Here we discuss the strategies our simulator is ca- pable of implementing, the process by which our simulator can explore a strategy space, the configurable parameters of the simulator, and its limitations. 4.1 Strategies, Rounds, and Games We first describe the basic units of our simulator and how they interact with each other before getting into details. Strategies. The simulator is designed in such a way to be able to run any strategy that fits the strategy space detailed in Section 3.3. That is, every strategy is fully defined by a function that outputs a block to extend, a set of trans- actions to include, and a rule to decide whether to publish any found blocks. All of these functions may take as input any public information, including all published blocks and all announced transactions. Rounds. Our simulator is time-driven, as opposed to event- driven. We made this decision because we want it to be easy to add new strategies to the simulator. In an event-driven simulation, new strategies would be limited by the current list of possible events. However, in our time-based simula- tions, any strategy that details how to make the decisions above at any moment can be easily implemented. A round is the smallest unit of time in our simulator (cur- rently, 1/600 of the time it takes for the entire network to find a block). During a round, every miner first takes as in- put the block chain (that they’re aware of) and all transac- tions (that they’re aware of) and decides which block to (try to) extend, and which transactions to include. Then there is a random check (as a function of that miner’s hash rate 3While this is the original motivation for developing our simulator, it is indeed capable of simulating non-zero block reward as well — more on that in Section 4.3. and the current network difficulty) to determine whether the miner successfully found a block or not. Then, the miner de- cides which unpublished blocks to publish. The duration of a round is a configurable parameter, which we discuss shortly in Section 4.3. Games. A game involves setting parameters such as choos- ing a number of miners, assigning their strategies and hash power, etc. (all detailed in Section 4.3). Once these parame- ters are set, a game runs for several rounds, and keeps track of the rewards earned by each miner. Simulations. A simulation might consist of a single game (to see how certain strategies fare against each other), or several games with parameter adjustments in between. For example, in order to model miners who learn over time, we have them play several games and decide which strategies to use in future games based on results of past games. In principle, any parameters can be adjusted between games. 4.2 Strategy exploration For several of our simulations we want miners to utilize the strategies that are doing the best, to simulate how strate- gic miners might adapt over time. In order to accomplish this, we run several games, with hundreds of miners in each game. Miners choose strategies proportional to how success- ful those strategies have historically done. Formally, min- ers in our simulator perform no-regret learning, a standard notion of learning that is popular in game theoretic con- texts. This is due to the fact that in any repeated game where each player separately performs no-regret learning, the repeated play converges to a coarse correlated equilib- rium [1, 2]. Moreover, numerous simple no-regret learning algorithms are known that converge quickly (i.e. in a num- ber of rounds sublinear in the number of possible strate- gies) [5, 6, 4, 15]. If a miner has no regret, their total re- ward across all of time is at least as good as had they instead picked “the best” strategy and used it in every game. Sim- ilarly, a coarse correlated equilibrium is a joint distribution over strategy profiles such that every miner gets more ex- pected payoff by following the equilibrium than deviating to any possible strategy. These learning algorithms all maintain a weight for every strategy, and adjust the weights of the strategies from game to game depending on how well they’re doing. Our simulator offers two alternatives for these update rules. The first al- ternative is an exact implementation of the EXP3 algorithm for learning with adversarial bandits [5, 6]. This update rule provides a theoretical guarantee on the regret of each miner as a function of the number of games played and a tunable parameter in the update rule, �. The second alternative is based on the multiplicative weights update rule (MWU) for learning with experts [4, 15]. We find that MWU is com- putationally expensive, so we use a less expensive proxy in- stead. That means there is no theoretical guarantee on the regret bounds. But in practice this update rule is signifi- cantly faster and does converge quickly to coarse correlated equilibrium. For a further discussion of these update rules, see Appendix B. All of the figures included in this paper were generated from simulations using EXP3, so they come with a theo- retical guarantee that all miners in the simulation have no regret. 4.3 Versatility Our simulator has many configurable parameters: Strategies. Just to reiterate: every miner in our simulator is assigned a time-driven strategy that chooses which block to extend, how many transactions to include, and whether to publish any found blocks. Any strategy that fits this framework can be implemented in the simulator. To design a new strategy, a user would create a new function that takes as input the current public state of Bitcoin network (the blockchain and available transaction fees), and the miner who is using the strategy. The function would then use this information to determine which block to extend, and how many of the transaction fees to include in the next block. Finally, the user would go to the publication rules and add a rule for how the strategy should choose whether or not to publish any found blocks. Hash Power. Every miner m is assigned a hash power αm. Any number of miners, and any αm such that ∑ m αm = 1 can be supported. Round Duration. The size of a round can be set so the network finds a block every r rounds in expectation, for any r > 0. Rewards. At the end of each game, miners are rewarded based on their blocks within the longest chain. The reward they receive is b per block (fixed reward), plus any trans- action fees. a transaction fees accrue in the system every round. Both of these parameters are configurable. Costs. There is a configurable parameter cm for every miner m that denotes the cost (i.e. in electricity) for miner m to mine. For our simulations, we always set cm = 0 because we aren’t looking at this aspect of mining. Latency. If desired, latency can be introduced to the simu- lation. There is a configurable parameter λ such that when blocks are published, it takes λ rounds before other miners are aware of this blocks existence. Latency in hearing about transactions can also be implemented — it is currently easi- est to do this by modifying strategies to randomly “pretend” they haven’t heard of some transactions. Learning parameter. Our learning rules are parameter- ized by an � ∈ [0, 1/2]. For EXP3, it is customary to set � ≈ √ n ln n/T, where n is the number of strategies consid- ered and T is the number of games played. For MWU (and our “MWU-like” update rule), it is customary to set � ≈√ ln n/T. Larger � encourages beliefs (about the strength of strategies) to be updated rapidly in response to recent games. Smaller � encourages waiting for more evidence be- fore updating beliefs. Atomic versus Non-Atomic Miners. We say miners are atomic if there are finitely many of them, and each has a finite fraction of the total hash power. Such miners may have an interest in sacrificing immediate gains related to a block mined now in order to achieve greater gains for blocks mined in the future. Non-atomic miners are infinitesimally small, but there are infinitely many of them. When such miners find a block, they are only interested in maximizing their gains related to that block (because they will never find another block in the future). Obviously our simulation cannot create infinitely many miners, but we can functionally simulate them. To simulate that an α fraction of non-atomic miners are using strategy s, we instead create a single atomic miner with an α fraction of the hash power, and ensure that all of this miner’s strategic decisions take as input only the public information available to the entire network, and does not treat “their own” blocks any differently than generic blocks. Of course, the real world is atomic. But it is extremely helpful to compare simulation results between the two mod- els to isolate behavior that arises only when miners are atomic (example: selfish mining), as intuitively this behav- ior “gets worse” with big miners (as with selfish mining). 4.4 Implementation and performance. The simulator is written in C++, and has a running time proportional to the product of the number of games, the number of rounds per game, and the number of miners. We find that for accurate results, the games need to include enough rounds so that that for every strategy, the miners using it together find tens of blocks. We also find that it takes on the order of a few hundred thousand games for our learning algorithms to converge to an equilibrium. On a commodity laptop with a 2.7 GHz Intel Core i5 proces- sor, running a simulation of 1000 games with 200 miners, an average interarrival time of 600 rounds, and a total of 6,000,000 rounds (≈10,000 blocks will be created), takes ap- proximately 22 seconds. Limitations. A current limitation of the simulator is that the transaction fees can only be modeled as coming in at a uniform rate in time. Additionally, the simulator is not capable of modeling mining pool dynamics beyond treating them as a single miner with hash power equal to that of the pool. This doesn’t allow for consideration of attacks such as those presented in [8]. 5. NEW DEVIANT MINING BEHAVIOR In this section, we examine what deviant mining behavior might unfold in the transaction fees model that doesn’t arise in the block-reward model. Specifically, we argue that: • It is reasonable to expect self-interested miners to be- come PettyCompliant instead of DefaultCompli- ant once transaction fees take over. • The existence of PettyCompliant miners in the net- work opens the field for a range of aggressive strategies with detrimental effects to Bitcoin’s stability. 5.1 Phase One: Petty compliant Observation: The default client behavior of min- ing on the oldest block is not optimal. Miners can do strictly better by mining on the block that leaves the most transactions fees unclaimed. Consider the case where there is a fork: two blocks are tied for longest chain. The traditional behavior, and the one programmed into the default client,4 would have the miner select the older of the two potential block heads. However, there is really no cost for that miner instead to tie-break arbitrarily. In particular, if the miner is planning to in- clude all unclaimed transactions in their block, it would be in that miner’s interest not to mine on the oldest block, but instead the block that leaves the most remaining fees. Therefore, a strategic miner would want to mine on MostmH instead of OldestmH . We call this strategy petty compliant, 4Note: this is not a self-enforcing part of the protocol. It’s purely client-side behavior. as it is still mining on a longest chain, including all available transactions, and publishing all blocks that are found (like a default compliant miner). It is just tie-breaking between longest chains in a “petty” way to achieve greater revenue. PettyCompliant: Mine like a default compliant miner, except when choos- ing between two sides in a fork; mine on the block that has claimed the fewest transaction fees. Which Block: Mining(m) = MostmH . How much: include Rem(Mining(m)). Publish(B)? yes. If forks ever exist, then PettyCompliant strictly outper- forms DefaultCompliant. The two are identical except for the case where the miner is required to choose between two equal height blocks to mine on. In this case PettyCompli- ant always makes the decision to mine in a location that maximizes their rewards, and DefaultCompliant might not. In our mining strategy simulator, we compare De- faultCompliant to PettyCompliant and do in fact see that PettyCompliant outperforms DefaultCompliant, regardless of the breakdown of other miners in any simula- tion where there is enough latency (in learning of both new blocks and transactions) that forks naturally occur. Note that the existence of petty compliant miners is not necessarily harmful by itself: so what if miners are tie- breaking differently in the rare event that forks naturally occur? The problem arises when other strategic miners no- tice the existence of petty compliant miners and choose to exploit this with more aggressive tactics. We’ll see some examples of this in the remainder of this section. The ex- istence of PettyCompliant miners impact other deviant strategies in surprising ways too. For example, a selfish miner (discussed more in Section 6), performs better against PettyCompliant miners than DefaultCompliant. 5.2 Phase Two: Lazy Undercutting Observation: Once some fraction of miners is petty compliant, other miners may profit by in- tentionally forking the chain. The key insight for more aggressive strategies is that a deviant miner can incentivize petty compliant miners to ex- tend their block, even if an older block of the same height was discovered several minutes earlier, for instance, by ex- tending that block’s direct predecessor and including slightly fewer transaction fees. If the current unauthorized transac- tion fees are substantially fewer than those included by the current MostH, then maybe it is in a miner’s interest to try and replace MostH with a new block of height H, instead of continuing on top of it. We call this undercutting. So what might a strategic miner do to take advantage of this? They might first compare between the maximum rewards they could get by continuing versus undercutting (while still becoming the new MostH), and mine on top of whichever block yields greater rewards. Then, to protect themselves with certainty against future undercutters using the same rule, they could take half of the remaining transac- tions. Because of the somewhat lax reasoning used to choose these parameters, we call this strategy LazyFork. While the existence of PettyCompliant miners them- selves is relatively benign, the existence of LazyFork min- ers would be bad: they frequently decide to intentionally orphan blocks in order to achieve greater rewards. In addi- tion to creating uncertainty about when blocks are “safely” in the eventual longest chain, this decreases the effective hash power of the network and makes Bitcoin more prone to double spend attacks. For cleanliness in formally defining LazyFork and other undercutting strategies, we introduce the notation Gapi = Rem(Mosti−1) − Rem(Mosti), the maximum transaction fees that a miner could include while mining on top of Mosti−1 to become the new Mosti. LazyFork: Forks the blockchain if the head block is more valuable than the unclaimed transaction fees it leaves behind. Only takes half of the possible transaction fees to prevent other lazy forkers from forking their block. Which Block: if Owner(MostmH ) = m or Rem(Most m H ) ≥ GapH Mining(m) = MostmH . else Mining(m) = MostmH−1. How much: include Rem(Mining(m))/2. Publish(B)?: yes. 5.3 Phase Three: Aggressive Undercutting Simulation result: increasingly aggressive under- cutting behavior evolves when miners strategize. Once miners consider undercutting, they may also try to aggressively optimize the tradeoff between maximizing the transaction fees included in blocks they mine and minimizing the chance that their block will be undercut by other miners in the system (as opposed to using the less-principled reason- ing of LazyFork). We define these strategies so that when they are presented with Rem(Mining(m)) = x, they will authorize f(x) transactions, for some f(·) with f(x) ∈ [0,x] for all x, and call them forkers. While in principle, forkers could consider going back sev- eral blocks to undercut, the strategies we study only consider mining on top of a block of height H or H − 1. Certainly, it would be an interesting direction for future work to see if any additional gains can be achieved by considering blocks of height H − 2 or less, but already we uncover interesting behavior when forkers go back just a single block. A function forking miner looks at potential blocks at height H that they could extend, and within this set considers ex- tending only MostmH , since it leaves the most remaining transaction fees. If a miner indeed chooses to mine on top of MostmH , we call this continuing. They also look at poten- tial blocks of height H−1, again considering only extending the block MostmH−1 from this set. If a miner indeed chooses to mine on top of MostmH−1, we call this undercutting. When deciding whether to continue or undercut, a forker simply observes that they will choose to claim f(Rem(MostmH )) by continuing, versus min{f(Rem(MostmH−1)),GapH} if they undercut (the min is taken because they must actually un- dercut in order to incentivize future miners to select their Figure 3: Normalized weights of different linear co- efficient function forking strategies over a series of games. Strategies that are slightly more aggres- sive than the most common strategy perform the best and have their normalized weights increase. This simulation had 200 miners, 9 strategies, 10,000 blocks per game and an � value of .01. block). So for a given f, we can define: Valcont(f) = f(Rem(Most m H )) Valunder(f) = min{f(Rem(MostmH−1)),GapH} If Valcont(f) > Valunder(f), then more rewards can be achieved by continuing. Otherwise, more rewards can be achieved through undercutting. Formally, for any function f(·), this induces the following formal strategy: Function-Fork(f): Always takes a certain function, f(·), of the possible transactions it could claim. Always mines in the location to maximize the size of the block they would make, with the constraint that if they fork, they must undercut. Which Block: if Owner(MostmH ) = m or Valcont(f) > Valunder(f) Mining(m) = MostmH . else Mining(m) = MostmH−1. How much: if Mining(m) = MostmH include Valcont(f). else include Valunder(f). Publish(B)?: yes. Any reasonable choice of f(·) will be monotonically in- creasing, which means that f(MostmH−1) will always be larger than f(MostmH ), so the decision on whether to continue or undercut will come down to a comparison of f(MostmH ) ver- sus GapH. One natural family of f(·) to consider is linear functions (that is, f(x) = kx for some k ∈ [0, 1]). If we take a group of Figure 4: This is a simulation of 8 atomic miners. The simulation parameters are otherwise configured the same way as Figure 3. We see that when there are a small number of atomic miners the more ag- gressive undercutters are no longer effective since they are beaten by more gentle forkers who are lucky enough to mine two blocks in a row. these strategies, and let non-atomic strategic miners learn over many games which perform best, we get the plot in Figure 3. What we see is the following: when the majority of miners are using Function-Fork(kx), the best response is to use Function-Fork(k′x), for k′ a little smaller than k, (i.e. to undercut just a little bit more aggressively). So eventually the smallest coefficient in our simulation becomes dominant. If we instead consider atomic miners, we observe the be- havior in Figure 4 — less aggressive undercutters remain dominant. This is because even when other miners are ag- gressively undercutting, each miner still has a decent chance to get their block accepted “for free,” by mining two blocks in a row. Note that simulation is vital to this understanding due to the large number of parameters to consider. 5.4 An Undercutting Equilibrium Analytical result: An equilibrium exists where all miners use the same undercutting strategy. It in- duces a growing backlog of transactions. Linear function-forking is of course a natural class of strate- gies to consider, but our simulations in the previous section show that long-term behavior may be erratic if miners only consider these strategies. Our goal in this section is to un- derstand what undercutting behavior is stable. Our approach is to find a function f(·) such that Function- Fork(f) is an equilibrium. That is, as long as every other miner is using the strategy Function-Fork(f), it is in your interest to do so as well. In other words, we would like to find an f such that Function-fork(f) is a best-response to the case when all other miners themselves use Function- fork(f). We provide now intuition for why the f(·) we present yields an equilibrium. So what does it mean for a strategy to be a best-response to other miner behavior? Recall that a strategy proposes which block to extend, how many transaction fees to claim, and which blocks to publish as a function of the currently held information. A strategy is a best response if it maxi- mizes the miner’s expected reward (taking into account fu- ture events, and in particular the probability that the cur- rent block is in the eventual longest chain) over all potential strategies that miner could have used instead. In particular, a best-response must be at least as good as all other strate- gies that mine at the same location and publish the same blocks (but differ in which transactions to include). To get some intuition for what conditions a potential equi- librium must satisfy, let’s first consider the decision facing a miner who has already decided to continue on top of the longest chain and is just deciding how many transaction fees to include. If F denotes the number of transaction fees in- cluded, define π(F,f,x) to be the probability that this block is included in the eventual longest chain, conditioned on including F BTC worth of transaction fees in the block, all other miners using strategy Function-Fork(f), and x = Rem(MostmH ) (note that π is well-defined). Then the miner’s expected reward, should they be fortunate enough to find a block right now would be F ·π(F,f,x). A best-response would then be to include argmaxF≤x{F · π(F,f,x)} transaction fees. The strategy Function-Fork(f) would recommend including f(x) transaction fees. So for Function-Fork(f) to be a best-response to other min- ers using Function-fork(f), it better be the case that f(x) ∈ argmaxF≤x{F · π(F,f,x)} for all x. Note that this is a somewhat strong condition on f, as the fact that the other miners are using Function-fork(f) affects π(F,f,x), whereas we also want this miner’s best response to have f(x) ∈ argmaxF≤x{F ·π(F,f,x)}. At this point, we show that there is a continuous and piece-wise differentiable function f(·) that satisfies this con- dition. We also show that combined with the fact that f(·) is monotonically non-decreasing, this is sufficient for Function-fork(f) to be an equilibrium under some as- sumptions (which we will discuss post-theorem). In the theorem statement below, W0 is the upper branch of the Lambert W function which satisfies W0(xe x) = x for all x ∈ [−1/e,∞), and W0(x) ∈ [−1,∞). The “Furthermore...” portion of the theorem is proved by showing a connection between the number of backlogged transactions and an un- biased single-dimensional random walk. Theorem 5.1. For any constant y ≤ 1/2 such that 2y − ln(y) ≥ 2,5 define: f(x) = x, ∀ x ≤ y (1) f(x) = −W0(−yex−2y), ∀ y < x < 2y − ln(y) − 1 (2) f(x) = 1, ∀ x ≥ 2y − ln(y) − 1 (3) Then it is an equilibrium for every miner to use the strategy Function-fork(f) as long as: • Every miner is non-atomic. • Miners may only mine on top of chains of length H or H− 1. Furthermore, in any such equilibrium, the expected number of backlogged transactions after n time steps is Θ( √ n). 5Such y exist. This range is (0,≈ 0.2]. Figure 5: Plot of the Lambert function fork starting with a weight of 0.0001 and becoming the strongest strategy in a learning simulation with � = .01. This simulation had 100 miners, and 10,000 blocks per game. These miners are non-atomic. A proof of Theorem 5.1 appears in Appendix C. To un- derstand the impact of Theorem 5.1, first consider the block reward model. With non-atomic miners, DefaultCompli- ant is trivially an equilibrium, and this result is robust to general models of latency (proof in Appendix D). But as we move to atomic miners, strategies like selfish mining arise and equilibria get messy (if they exist at all). Now, in the transaction-fee model, even when miners are non-atomic, equilibrium behavior is complex and undesirable, as we have just shown. Therefore, we should expect that analysis with atomic miners should conclude with even more chaos. Figure 5 shows miners learning to play this equilibrium, even with various other strategies available. Observe the in- terplay between theory and simulation: Theorem 5.1 guides us towards a potentially strong strategy, but it is intractable to prove that the equilibrium will actually arise via learning even when (say) 99% of miners are already there. Simula- tion fills the gap and shows an equilibrium will indeed even when only .01% of the miners initially use the equilibrium strategy.6 Simulation alone could not search through the infinitely many possible strategies, and theory alone cannot prove that learning converges to the desired equilibrium. 5.5 Undercutting Non-strategic Miners Analytical and simulation result: even if 66% of miners remain default compliant, undercutting is profitable. Our analysis and simulations in the previous sections as- sumed that all miners were strategic learners. While we clearly learn a lot from this analysis, it is perhaps more realistic to also consider a setting where some miners will stubbornly (or honestly, depending on your perspective), continue running DefaultCompliant even if it is subop- timal. If a large fraction of the miners are non-strategic, 6It is hard to see in Figure 5, but the weight assigned to “Lambert” is initially .0001. then function-forking becomes immediately less profitable, because only a small fraction of the network will actually mine on top of your block when you undercut. In particu- lar, if 100% of other miners are non-strategic, undercutting serves no purpose. In this section, we detail results from our simulation when varying fractions of miners are non-strategic. In these sim- ulations, we fix a fraction of the network to always mine DefaultCompliant, and play enough games until the dis- tribution of learned strategies stabilize.7Figure 6 shows a stacked area plot of our simulation results for equilibria at different fractions of miners refusing to abandon Default- Compliant. There are many interesting features of the plot, but we focus on one: even if the majority of miners choose to stay DefaultCompliant (and the rest strategize), then forking strategies start to become viable. A theoretical analysis indeed predicts the continuing pres- ence of FunctionFork(x) until 2/3 of the miners remain DefaultCompliant. To see this, imagine that every miner is the system is currently DefaultCompliant or Petty- Compliant, and we want to see if it is profitable for a Pet- tyCompliant miner to switch to FunctionFork(x). At any point in time, consider the current MostH. Then if the miner runs PettyCompliant, they will always try to con- tinue, and will get Rem(MostH) should they find a block (because no one else in the network is undercutting). If in- stead they run FunctionFork(x), they will continue when- ever Rem(MostH) > GapH and undercut otherwise. When they continue, they will always get Rem(MostH). When they undercut, they would include GapH transaction fees. If the next miner to find a block is PettyCompliant (or this miner), then the undercut will be successful and the miner will receive GapH in rewards. But if the next block is found by a DefaultCompliant miner, the undercut fails and they get nothing. So if y is the fraction of the network that remains DefaultCompliant, we see that the expected reward obtained by FunctionFork(x) is proportional to:8 We emphasize that while the theory gives us a crisp un- derstanding of what should happen when exactly 2/3 of the miners are DefaultCompliant, it is intractable to rigor- ously analyze the equilibria at various other fractions of DefaultCompliant miners. Thus our simulation both con- firms and extends our theoretical understanding (Figure 6). E[Rem(MostH) · I(Rem(MostH) > GapH)] + (1 −y) ·E[GapH · I(GapH > GapH)] Finally, because Rem(MostH) and GapH are i.i.d. exponen- tial random variables with mean 1, we have that E[GapH · I(GapH > Rem(MostH))] = E[Rem(MostH)·I(Rem(MostH) > GapH)] = 3/4. Therefore, whenever y ≤ 2/3, the reward from FunctionFork(x) is at least one, and therefore it is a better choice than PettyCompliant (which gets expected reward exactly one). 7Note that learning is by no means guaranteed to result in a static equilibrium at all, although in these simulations that happens to be the result. 8E[X] denotes the expectation of the random variable X, and I(E) denotes the indicator random variable for event E (that is 1 when E occurs and 0 otherwise). Figure 6: Stacked area chart showing the equilib- rium distributions of strategies covered thus far, given that a fraction of miners will always use the de- fault strategy. These simulations involved 100 min- ers, with 10,000 blocks per game. We found that the strategies would reach an equilibrium around 300000 games with � = 0.01. 6. SELFISH MINING WITH TRANSACTION FEES Selfish mining is a deviant strategy first identified by Eyal and Sirer [9]. Essentially, a selfish miner chooses not to release blocks immediately upon being found, instead with- holding them in hopes of tricking the rest of the network into wasting their mining power mining blocks that will be orphaned. We find that the selfish mining strategy performs even better in the transaction fees model than the block-reward model. A priori, there’s no reason to expect this. In this sec- tion we provide simulation results, along with some intuition and a theoretical analysis proving this. Essentially what winds up happening is that while the selfish miner mines the same fraction of blocks in either reward model, the self- ish miner’s blocks will tend to be larger. In the block-reward model, this doesn’t matter because all blocks are worth the same, but in the transaction fees model this means the self- ish miner gets greater reward. 6.1 The Selfish Mining Strategy Analytical and simulation result: selfish mining performs slightly better in the transaction fee model. The goal of a miner employing the selfish mining strategy is to essentially trick the other miners in the Bitcoin network to mine on top of a block that will be orphaned. By having other miners waste their power, the selfish miner is capa- ble of exaggerating their own portion of the overall network hash-rate. Selfish miners do this by maintaining a chain in private that only they know about. When the selfish miner initially finds a block, they will not announce their block to the rest of the network. They will continue to mine on their private block, hoping to find a second block before the rest of the network finds a block. If the miner succeeds, now they’re in a very strong posi- tion: they know of a block with height H + 2, whereas the rest of the network only knows a block of height H. If the rest of the network finds the next block at height H + 1, the selfish miner can reveal their private chain and the public block will be immediately orphaned. Of course, maybe the selfish miner will find the third block as well. In this case, they’re in an even better position and can waste even more of the network’s power. But the point is that with a lead of two or more, the selfish miner can guarantee that the rest of the network is wasting power. Of course, the selfish miner might also fail to find a sec- ond block before the rest of the network finds their first. In this case, they immediately release their block and hope that others hear about theirs first. Obviously this is not ideal: had they released their block immediately, they could have guaranteed that it was heard about first. So there’s a tradeoff — withholding the block has a chance to give the selfish miner a private chain of length two or more, in which case the selfish miner benefits, but it could also cause their block to be orphaned, resulting in less profits. Selfish-Mine: Selfish mining strategy from [9]. This miner hides their blocks, which risks losing their first block, in order to try to get the rest of the network mining in a useless location, amplifying their own apparent hash power. Which Block: OldestmPrivatem . How much: include Rem(Mining(m)). Publish(B)?: if Height(B) = H yes. elseif RacingmH , and Private m = H + 1 yes. else no. Assuming the selfish miner has less than half of the overall hash power of the network, they will eventually need to pub- lish their private chain. In order to maintain our focus on the difference between transaction fees and fixed block-rewards, we consider just “vanilla” selfish mining, although it is an interesting consideration for future work to consider selfish miners who also undercut, or various other generalizations (e.g. [7, 20, 23]). Similarly to [9], we examine the potential rewards a selfish miner would receive assuming that the rest of the network is default mining. In our analysis, we also use α to denote the fraction of the total mining power pos- sessed by the selfish miner, and γ to be the probability that in the event of a race (selfish miner is triggered to release a private block of length one) that ends with the honest portion of the network finding the next block, that the self- ish miner’s block is not orphaned. We introduce notation Privatem to denote the height of the longest chain that m is aware of (at least as long as H, and possibly longer if m is keeping any blocks private). We also introduce notation Racingmi to be a boolean variable that is true iff there exist two blocks B1,B2 with Height(B1) = Height(B2) = i, and Owner(B1) = m 6= Owner(B2). In other words, Racingmi denotes whether or not there are two competing blocks of height i, one of which was produced by m. Analysis. We proceed now with an analysis of the rewards obtained in the transaction fee model by a selfish miner. Parts will look similar to the analysis done in [9]. For every infinitesi- mally small transaction fee that arrives, we wish to compute the probability that it winds up in a block mined by the self- ish miner. Note that if the selfish miner just used default mining instead, this probability would be exactly α. The determining factor in this probability will be the size of the selfish miner’s private chain. To this end, let’s de- fine the following states (same states used in [9]), and we’ll compute this probability separately for each state. • State 0: Everyone agrees on the longest chain — RacingmH = false. • State i > 0: The selfish miner m has a private chain of length i — Privatem = H + i. • State 0′: There are competing blocks of height H, one of which was produced by the selfish miner, and the selfish miner has no private blocks — RacingmH = true and Privatem = H. Let fs denote the probability that a transaction winds up in a block mined by the selfish miner in the eventual longest chain, conditioned on the system being in state s when the transaction is announced. We compute there probabilities below. If we then define ps to be the probability that the system is in state s, we can then observe that the expected fraction of transaction fees claimed by the selfish miner is exactly ∑ s fs ·ps. Eyal and Sirer [9] have already computed ps for all s. The values for ps are: p0 = 1 − 2α 2α3 − 4α2 + 1 p0′ = (1 −α)(α− 2α2) 2α3 − 4α2 + 1 pi = ( α 1 −α ) i−1 α− 2α2 2α3 − 4α2 + 1 , i > 0 To complete the analysis, we just need to compute fs for each s. Appendix E contains the derivation of fs for all s, which are stated below: f0 = α 2 + α(1 −α) (α + γ(1 −α)) . f0′ = α. f1 = α + (1 −α)α = α(2 −α). fi = 1 − ((1 −α)i−1(1 −f0)). Finally, when α ∈ (0, .5) and γ ∈ [0, 1], we show in the Appendix E that the selfish miner’s rewards are given by Reward(α,γ) = 5α2 − 12α3 + 9α4 − 2α5 + γ(α− 4α2 + 6α3 − 5α4 + 2α5) 2α3 − 4α2 + 1 Figure 7: We see simulation matching the theory for selfish mining in a transaction based model for γ = 0, 0.5, and 1. We make the following observations: • Simulation confirms the above analytical formula for Reward(α,γ) (Figure 7) • This function is extremely close to the reward function with block rewards ( α(1−α)2(4α+γ(1−2α))−α3 1−α(1+(2−α)α) ) from [9]. We find, numerically, that the absolute difference never exceeds 0.026 in the region of interest. • For 0 ≤ γ < 0.55 (in particular, for γ = 0), for all α ∈ (0, 0.5), the reward is strictly greater in the transaction fee model than in the block reward model. We provide some intuition for this last point. First, it is clear that the fraction of blocks mined by the selfish vs. default miners is independent of the reward model. So the gap must come from the size of blocks found by the respec- tive miners. Let’s assume just for the sake of example that we are in state 100 and the selfish miner has an α = 1/10 fraction of the mining power. Almost certainly, the next un-orphaned block will be found by the selfish miner. How long will it take for this block to be found? The answer is approximately 10 time steps. This is because while the en- tire network finds a block roughly every time step, because the selfish miner is the only miner extending his chain (and he mines at 1/10 the speed of the full network) it will take ten times as long. What this means is that blocks found by the selfish miner while the selfish miner has a huge lead are disproportionately large compared to blocks found when the selfish miner has no lead (or a tiny lead). So even though the selfish miner wins the same fraction of blocks, some of these blocks are much larger than those won by the default miners. A brief discussion. The main point of this section is to highlight one example of surprising incentive issues that dif- fer between the transaction fees model and the block-reward model, not to argue that selfish mining becomes significantly better (the improvement is minor). Still, we wish to point out two possibly salient differences between selfish mining in the two models. First, in the block-reward model, self- ish mining is actually not ever immediately profitable — it only becomes profitable once the difficulty readjusts to ac- count for the fact that the effective mining power in the network is lower. This is because before the difficulty ad- justs, the selfish miner is literally just throwing blocks away, but tricking the rest of the network into throwing blocks away at a higher rate. In the transaction fees model, selfish mining is immediately profitable — every transaction that arrives goes somewhere, so neither the selfish miner nor the default miners are throwing rewards away. Note also that our analysis in no way requires the difficulty to adjust before it becomes accurate — our analysis would hold no matter how the difficulty of hash puzzles adjusted or didn’t adjust over time. Moreover, if some of the rest of the network has switched to the PettyCompliant strategy, then the selfish miner’s block is actually more likely to win when a race is triggered (because it was mined earlier and therefore contains fewer transactions). So the existence of Petty- Compliant miners in the transaction fees regime indirectly improves Selfish-Mine’s performance by increasing γ. 6.2 An Improved Selfish-Mine Analytical and simulation result: in the transac- tion fee model, selfish miners can make the de- cision whether to hide their first block based on the value of the block. This improved selfish min- ing strictly and always outperforms both default mining and traditional selfish mining. In this section we develop an improved selfish mining strategy. Essentially, we observe that in the transaction fees model, a selfish miner has additional information when de- ciding whether to hide or publish their private chain (namely, how many transactions are included). We show that, for all α,γ < 1, our strategy strictly outperforms both default mining and “vanilla” selfish mining in the transaction fees model. Our strategy will decide to hide only “small” blocks, with at most β (some cutoff parameter chosen by the strat- egy as a function of α,γ) transaction fees included, but will immediately publish any “large” blocks, with more than β transaction fees in order to avoid the risk of losing them. Selfish-Mine(β): An improvement to the selfish mining strategy, where the miner will chose to mine as a selfish miner or a default compliant miner based on the value of the block they risk losing. Which Block: OldestmPrivatem . How much: include Rem(Mining(m)). Publish(B)? if Height(B) = H or Tx(B) ≥ β yes. elseif RacingmH, and Private m = H + 1 yes. else no. Intuitively, imagine you are mining and find yourself solv- ing a new block immediately after a previous block was announced and before any new transactions have been an- nounced. This block is literally worthless, so instead of pub- lishing, why not use it to try and selfish mine? There is Figure 8: We show the ideal cutoff factor, β, for a selfish miner with mining power α, and γ = 0. no cost, but a positive probability that you build a lead of two, no matter your hash power. Similarly, imagine instead that just by chance an hour goes by since the last block was found and you just solved a new block including all trans- actions that arrived during that period. This block is worth roughly six “normal” blocks, so why risk losing it? Unless your hash power is very close to 50%, the expected gains from selfish mining are dwarfed by the possibility of losing this unusually wealthy block. So the trick is just choosing the proper cutoff β as a function of your hash power α and network connectivity γ. Note that Selfish-Mine(0) = DefaultCompliant, and that Selfish-Mine(∞) = Selfish-Mine. So clearly, tak- ing the optimal choice of β will result in a strategy that equals or outperforms both. Using an analysis similar to that of Section 6.1, we are able to compute the expected reward achieved by a miner with an α fraction of the min- ing power, a γ success probability of winning a race, and using strategy Selfish-Mine(β). A derivation is included in Appendix E.2. Reward(α,γ,β) =( 1 + β(1 −α)2(1 −γ) eβ − 1 + 5α + (1 −α)2γ + 2α2 1 − 2α − 2α2 ) × ( α(1 − 2α)(1 −e−β) 1 − 2e−βα− 3(1 −e−β)α2 ) Figure 8 contains a plot showing the optimal choice of β as a function of α when γ = 0. A few noteworthy points from this plot: as α → 0, so does the optimal β. As α → 1/2, the optimal β approaches ∞. Figure 9 plots our theoretical predictions against simulation results, confirming that the analysis is correct. We conclude this section with Figure 10 plotting the (the- oretical) performance of default mining, selfish mining, and selfish mining with the optimal cutoff for a range of α and γ = 0. Note that in some ranges, the gains are quite signif- icant. Specifically, when α = 1/3, both selfish mining and default mining achieve expected reward of ≈ 1/3, but selfish mining with the optimal cutoff achieves an expected reward of ≈ .38, a 13.6% increase! Test Figure 9: Theory matching simulation for a variety of cutoff thresholds for selfish mining, all with γ = 0. The smaller cutoffs do better for a miner with a smaller hash-power (α) and the larger cutoffs do better with a larger hash-power. Intuitively, this makes sense as a more powerful miner should be willing to risk a larger block to try to selfishly mine. Figure 10: A selfish miner using the optimal cutoff outperforms both the original selfish mining proto- col and default mining for all values of α, with γ = 0. The simulation points confirm that the theory is ac- curate. 7. IMPACT ON BITCOIN AND LESSONS FOR CRYPTOCURRENCY DESIGN We have argued that deviant mining strategies in a transaction- fee regime could hurt the stability of Bitcoin mining and harm the ecosystem. In a block chain with constant forks caused by undercutting, an attacker’s effective hash power is magnified because he will always mine to extend his own blocks whereas other miners are not unified. This would make a “51%” attack possible with much less than 51% of the hash power. Many other unanticipated side-effects may arise. In the block size debate, it is frequently argued or assumed that space in the block chain will be a scarce resource and a market will emerge, with users being able to speed up the confirmation of a transaction by paying a sufficiently large transaction fee. But if miners intentionally “leave money on the table” when solving blocks, as is the case in undercutting attacks, it breaks this assumption. That is because under- cutting miners are not looking to maximize the transaction fee that they can claim, and don’t have a strong reason to prioritize a transaction with a high fee.9 Put another way, the block size imposes a constraint on the total size of trans- actions in a block and the threat of being undercut imposes another constraint on the total fee. The two interact in complex ways. We believe that qualitatively our results will continue to hold in a world where the available block size is much smaller than the demand, but quantitatively the im- pact of undercutting will be mitigated (see end of Section 3.1). Still, it is an important direction for future research to understand this connection more rigorously. Despite the variety of our results, we believe we have only scratched the surface of what can go wrong in a transaction- fee regime. To wit: we have not presented an analysis of miners whose strategy space includes both undercutting and selfish mining, primarily due to the complexity of the result- ing models. There has been scant attention paid to the transition to a transaction-fee regime. The Nakamoto paper addresses it briefly: “The incentive can also be funded with transaction fees... Once a predetermined number of coins have entered circulation, the incentive can transition entirely to transac- tion fees and be completely inflation free” [19]. Similar com- ments on the Bitcoin Wiki and other places suggest that the community views the transition as unremarkable. Some altcoins (Monero, Dogecoin) have even opted to hasten the block reward halving time. Our results suggest a different view. We see the block re- ward as integral to the stability of the mining game. At a minimum, analyzing equilibria in the transaction-fee regime appears dramatically harder than in the block-reward regime, which is a cause for concern by itself. The monetary infla- tion resulting from making the block reward permanent, as Ethereum does, may be a small price to pay to ensure the stability of a cryptocurrency. 8. ACKNOWLEDGMENTS We are extremely grateful to Jiechen Chen, Kira Goldner, 9They do have a weak reason: miners benefit from creat- ing the smallest possible block for a given value of the to- tal transaction fee they seek to claim, since smaller blocks propagate faster through the network and are less likely to be orphaned. Anna Karlin, and Rainer Böhme for very detailed feedback on an earlier draft of this paper. 9. REFERENCES [1] Calibrated learning and correlated equilibrium. Games and Economic Behavior, 21(1):40–55, 1997. [2] A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5):1127–1150, 2000. [3] E. Androulaki, G. O. Karame, M. Roeschlin, T. Scherer, and S. Capkun. Evaluating user privacy in bitcoin. In Proceedings of Financial Cryptography, 2013. [4] S. Arora, E. Hazan, and S. Kale. The multiplicative weights update method: a meta-algorithm and applications. Theory of Computing, 8(1):121–164, 2012. [5] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal of Computing, 32(1):48–77, 2002. [6] A. Blum and Y. Mansour. From external to internal regret. Journal of Machine Learning Research, 8:1307–1324, 2007. [7] N. T. Courtois and L. Bahack. On subversive miner strategies and block withholding attack in bitcoin digital currency. CoRR, abs/1402.1718, 2014. [8] I. Eyal. The miner’s dilemma. In Security and Privacy (SP), 2015 IEEE Symposium on, pages 89–103. IEEE, 2015. [9] I. Eyal and E. G. Sirer. Majority is not enough: Bitcoin mining is vulnerable. In Financial Cryptography and Data Security, pages 436–454. Springer, 2014. [10] K. Hill. Bitcoin is not broken. Forbes, 2013. http://www.forbes.com/sites/kashmirhill/2013/11/ 06/bitcoin-is-not-broken/#55d4a8812568. [11] N. Houy. The economics of bitcoin transaction fees. Working Paper GATE 2014-07. halshs-00951358., 2014. [12] B. Johnson, A. Laszka, J. Grossklags, M. Vasek, and T. Moore. Game-theoretic analysis of ddos attacks against bitcoin mining pools. In Proceedings of the First Workshop on Bitcoin Research, 2014. [13] A. Kiayias, E. Koutsoupias, M. Kyropoulou, and Y. Tselekounis. Blockchain mining games. In ACM Conference on Economics and Computation (EC), 2016. [14] J. A. Kroll, I. C. Davey, and E. W. Felten. The economics of bitcoin mining, or bitcoin in the presence of adversaries. In Proceedings of the Twelfth Annual Workshop on the Economics of Information Security (WEIS), 2013. [15] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Inf. Comput., 108(2):212–261, 1994. [16] L. Luu, J. Teutsch, R. Kulkarni, and P. Saxena. Demystifying incentives in the consensus computer. In Proceedings of the ACM Conference on Computer and Communications Security (CCS), 2015. [17] A. Miller and R. Jansen. Shadow-bitcoin: scalable simulation via direct execution of multithreaded applications. In Proceedings of the eighth workshop on Cybersecurity Experimentations and Test (CSET), 2015. [18] M. Möser and R. Böhme. Trends, tips, tolls: A longitudinal study of bitcoin transaction fees. In Workshop on Bitcoin Research, pages 19–33, 2015. [19] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008. [20] K. Nayak, S. Kumar, A. Miller, and E. Shi. Stubborn mining: Generalizing selfish mining and combining with an eclipse attack. In IEEE European Symposium on Security and Privacy (EuroS&P), 2016. [21] R. Peter. A transaction fee market exists without a block size limit. 2015. [22] M. Rosenfeld. Analysis of bitcoin pooled mining reward systems. CoRR, abs/1112.4980, 2011. [23] A. Sapirshtein, Y. Sompolinsky, and A. Zohar. Optimal selfish mining strategies in bitcoin. In Financial Cryptography and Data Security, 2016. [24] M. Vasek, M. Thornton, and T. Moore. Empirical analysis of denial-of-service attacks in the bitcoin ecosystem. In Proceedings of the First Workshop on Bitcoin Research, 2014. APPENDIX A. MINING GAP This appendix contains a theoretical analysis of the mining gaps referenced in Section 3.2. Let’s consider the following simplified model: there is one style of “rig” available to miners, which costs p BTC per time unit in electricity to run. Let’s first analyze what effect this has in the fixed reward model, where each block found is worth one BTC, and the difficulty is adjusted so that the time between successive blocks is one unit in expectation. Then if there are k rigs in the network, the expected reward from running a rig for one time unit is exactly 1/k, whereas the cost in electricity is p. So the network is sustainable as long as 1/k ≥ p, or k ≤ 1/p. In other words, the cost of electricity imposes a hard cap on the total effective mining power of 1/p rigs worth. Of course, this can always be adjusted if necessary by changing the fixed reward per block. Also, it is important to point out that as long as k ≤ 1/p, the effective hash power in the network will be k rigs worth. Now let’s consider what happens in the transaction fee model, where transaction fees arrive continuously at a rate of 1 per time unit. Miners will always turn off their rigs (/coin-hop) immediately after a block is found, because the instantaneous expected reward of running a rig is 0, but the cost is non-zero. If the current effective hash power in the network is c rigs worth, then the miner needs to wait until x = cp transaction fees have arrived in order for mining to be profitable. Now, assuming that miners are cleverly turning their rigs on and off at the right times, how many rigs must be in the network in order to attain an effective hash power of c? The rigs are all off for cp units of time, and then all k of them are turned on, and the expected time to find a block is 1 unit of time. This means that the expected time to find a block with all k units running must be 1−cp (due to difficulty adjustment), whereas the expected time to find a block with c units running is 1 (because the effective hash power is c). Finally, we observe that for a fixed difficulty, if x denotes the number of rigs running, and yx denotes the expected time for x rigs to find a block, then x1 ·yx1 = x2 ·yx2 for all possible number of rigs x1,x2. Together, this yields the following equation: k · (1 − cp) = c · 1 ⇒ k = c 1 − cp . What do we learn from this? First, we see that no c ≥ 1/p can possibly be supported, just like in the fixed-reward model. On the other hand, we see that it takes an additional factor of 1 1−cp rigs in order to get the effective hash power of c ≤ p rigs. As c → 1/p, the maximum possible effective hash power, this ratio approaches ∞! More quantitatively, if we plug in c = x/p for x < 1, we see that the blow-up is 1 1−x . This means the following: In the transaction fees model, to obtain an x fraction of the maximum possible effective hash power, a multiplicative blow-up of 1 1−x rigs are necessary. Recall that in the fixed-reward model, no blow-up is necessary. We can also reason in the other direction: for a fixed k number of rigs in the network, what is the effective hash rate in the fixed reward model versus the transaction fees model with mining gaps? In the fixed reward model, this is easy: it’s just min{k, 1/p}. In the transaction fees model, for a fixed k, we need to solve for the c such that k = c 1−cp . This is: k −kcp = c ⇒ c = k 1 + pk So for fixed k, the effective mining power of k rigs degrades by a factor of 1 1+pk , which is always < 1. Note that at k = 1/p, every rig is 100% effective in the fixed reward model, whereas the effective mining power is just k/2 in the transaction fees model. We can again make a quantitative statement: In the transaction fees model, when the raw hash power in the network is an x fraction of the maximum possible, the effective hash power degrades by a factor of 1 1+x . Recall that in the fixed rewards model, there is no degradation in effective hash power when x ≤ 1. B. LEARNING MINERS IN SIMULATOR As referenced in Section 4.2, we provide two options for learning in our simulator. Let’s introduce these with a clear set-up for learning. Let there be a set of strategies a learner can use, indexed by k. At each round i ∈ [T], the learner receives/would have received some reward rik ∈ [0, 1], which may be arbitrary. The goal is to select a sequence of strategies si guaranteeing: T∑ i=1 r i si ≥ max k { T∑ i=1 r i k}− c. In other words, we would like to select a sequence of strategies that does nearly as well as the best strategy, assuming we knew it from the beginning. It is well-known [4] that setting wik = w i−1 k (1 − �) rik and selecting sik proportional to the weights wik results in a guarantee with c = �T + ln(# strategies)/�. Similarly, [5] shows that even if we don’t learn r i k for Figure 11: Illustration of a mining gap. The blue line shows the current P.D.F. of the time to next block. If the block reward by itself is too small to incentivize mining, rational miners will wait until enough transactions have accumulated before starting to mine. This will lead to a P.D.F. of a different shape (red line). Note that in either scenario the mean time to the next block is 10 minutes (green line) strategies k that we didn’t choose in round i, there is an algorithm (namely, EXP3, see [5] for description) that guarantees c = 2�T + # strategies · ln(# strategies)/�. So option one in our simulator is just to run EXP3 in earnest: whenever a miner uses some strategy k during game i, they learn their payoff and update their weights accordingly. Still, MWU converges faster, so it would be nice if we could learn how much payoff the miner would have received if they used strategy k during game i for all k, but this is computationally very expensive as it essentially requires us to rerun the entire game for all miners and strategies k (thereby becoming more expensive than just running the additional games to let EXP3 converge). Instead, we make the following observation: even if this miner is not using strategy k during game i, maybe some other miner is - could we use that miner’s payoff instead of recomputing exactly what payoff this miner would have received? The answer is of course we can, we just won’t get a theoretical guarantee like if we used MWU in earnest. The payoff from different miner perspectives are of course different, but not wildly so. Specifically, the difference is that miner 1 is facing opponents 2, 3, . . ., whereas miner 2 faces opponents 1, 3, . . .. If miner 1 and miner 2 use different strategies in round i, then strategy k would yield slightly different rewards when used by each of them. With many small miners, this difference should be small, so we include this learning option as it seems to converge faster than EXP3, even though there is no theoretical guarantee. Specifically what we mean is the following: instead of learning the payoff that the miner would have received had they used strategy k during round i, they simply take the average payoffs of all miners that used strategy k during round i instead. It is certainly possible that improvements to the learning aspect of the simulation are possible (and we encourage future work on this aspect once the simulator is open-source), but we note that the current implementations sufficed for the settings we studied. C. PROOF OF THEOREM 5.1 Below is a complete proof of Theorem 5.1. Some quick notation: for an increasing function f(·), we’ll denote by f−1(x) = min{y|f(y) ≥ x}. If no such y exists, then we’ll denote f−1(x) = +∞. First, we make an extremely useful observation about when miners will receive payment for their blocks. Essentially, because miners only consider mining on MostH or MostH−1, once a block is a predecessor of both such blocks, it is guaranteed to be in the eventual longest chain. Observation 1. As long as miners only consider mining on top of blocks MostH or MostH−1, a miner receives eventual payment for mining a block if and only if the next block found chooses to continue her chain instead of undercutting. Proof. Because miners only consider chains MostH−1 or MostH, immediately after producing a new block B, B is in the longest chain. Either B goes on top of MostH−1, in which case it is in a chain of length H, which is the longest. Or it goes on top of MostH, which creates a new longest chain of length H + 1. Let Hnew denote the new length of the longest chain (H if the miner undercut, and H + 1 if she continued). Either the newly minted B is equal to MostHnew , or it isn’t. If it isn’t, then neither the next miner, nor any other miner in the future will ever mine on top of it, because there is a “better” chain of length Hnew to mine on top of instead. If it is, then the next miner will either undercut or continue. If the next miner continues, then B is now equal to MostHnew and the predecessor of MostHnew+1. This means that all future miners will continue a chain containing B, and therefore it will certainly be in the eventual longest chain. If instead the next miner undercuts, then there will be a new chain of length Hnew that leaves more available BTC, meaning that MostHnew does not contain B as a predecessor. MostHnew−1 clearly does not contain B either, as B was mined on top Figure 12: One example of the function that function forking miners might use that leads to an equilibrium. Recall, the function is f(x) = x on the range [0,y], the −W0(−yex−2y) on [y, 2y−ln(y)−1, and 1 on [2y−ln(y)−1,∞). of this chain. So B is contained in neither MostHnew nor MostHnew−1, and therefore no future miners will ever consider a chain containing B. In conclusion, whether or not a miner receives payment for block B depends entirely on whether or not the subsequent miner decides to mine on top of B or not. We now want to figure out a best response for an individual non-atomic miner, conditioned on all other miners using FunctionFork(f). So we need to figure out the probability that a miner will get undercut when authorizing B BTC in transactions, assuming that all other miners are using FunctionFork(f). Note that as more and more new BTC of transactions arrive, other miners become less inclined to undercut. What we need to figure out is exactly how many new BTC of transactions need to arrive before the next miner switches from preferring to undercut to preferring to continue the longest chain. Lemma C.1. If a miner authorizes B BTC of transactions on Mosti (of course, i will be in {H − 1,H}), then other FunctionFork(f) miners will try to undercut her until max{0,f−1(B) +B−Rem(Mosti)} new BTC of transactions arrive (Rem(Mosti) taken at the instant that the miner authorizes her block). Therefore, the expected BTC obtained by authorizing B BTC of transactions is Be−max{0,f −1(B)+B−Rem(Mosti)}. Proof. First, observe that because the miner chooses to build upon MostH−1 or MostH, then the chain containing their block is the new MostH, and that same chain minus their block is the new MostH−1. So the gap between the number of available BTC in MostH versus MostH−1 (GapH = Rem(MostH−1) −Rem(MostH)) for the next miner is exactly B. Now, immediately when the miner publishes her block, there are Rem(Mosti) BTC of transactions available on MostH−1, and Rem(Mosti) −B BTC of transactions available on MostH. So at this point, other miners would choose to undercut iff f(Rem(Mosti)−B) < B. As more new BTC of transactions arrive (call it x), the other miners would choose to undercut iff f(Rem(Mosti) −B + x) < B. As f(·) is increasing, we can look for the minimum x where this ceases hold, which is exactly when Rem(Mosti) −B + x = f−1(B), or x = f−1(B) + B −Rem(Mosti). We now prove three corollaries of Lemma C.1 regarding what choices of B might possibly be optimal. Corollary C.2. If every other miner is playing FunctionFork(f), then the optimal choice B∗ of BTC to authorize when building upon chain Mosti satisfies • B∗ ∈ argmaxB∈[0,GapH]{Be −max{0,f−1(B)+B−Rem(MostH−1)}}, if i = H− 1. • B∗ ∈ argmaxB∈[0,Rem(MostH)]{Be −max{0,f−1(B)+B−Rem(MostH−1)}}, if i = H. Proof. This is an immediate corollary of Lemma C.1, combined with the fact that a miner who chooses to undercut can authorize at most GapH BTC, while a miner who chooses to continue can authorize at most Rem(MostH). Corollary C.3. If B1 ≥ B2, and B1e−B1−f −1(B1) ≥ B2e−B1−f −1(B1), then for all X, the expected reward from authorizing B1 BTC in transactions is at least as large as the expected reward from authorizing B2 BTC when Rem(MostH) = X. Proof. There are two cases to consider. First, maybe X > B1 +f −1(B1) (the miner guarantees that she is not undercut by authorizing B1 BTC in transactions). In this case, because B1 ≥ B2 and f−1(·) is increasing, we clearly have X > B2+f−1(B2) as well, meaning that the expected reward by authorizing B1 BTC is exactly B1, and that the expected reward by authorizing B2 BTC is exactly B2, by Lemma C.1. As B1 ≥ B2, the reward from B1 is at least as large. In the second case, maybe X ≤ B1 + f−1(B1) (the miner is undercut with positive probability by authorizing B1 BTC in transactions). In this case, the reward from authorizing B1 BTC is B1e −B1−f−1(B1)+X, by Lemma C.1. Also by Lemma C.1, the reward from authorizing B2 BTC is B2e −max{0,B2+f−1(B2)−X} ≤ B2eX−B2−f −1(B2) = eX ·B2e−B2−f −1(B2). By hypothe- sis, this is upper bounded by eXB1e −B1−f−1(B1), which is exactly the reward obtained by authorizing B1 BTC. So authorizing B1 BTC provides at least as much reward. In both cases, we see that authorizing B1 BTC is at least as good as B2. Corollary C.4. If B1 ≥ B2, and B1e−B1−f −1(B1) ≤ B2e−B2−f −1(B2), then for all X ≤ B2 + f−1(B2), the expected re- ward from authorizing B2 BTC in transactions is at least as large as the expected reward from authorizing B1 BTC when Rem(MostH) = X. Proof. By hypothesis, X ≤ B2 + f−1(B2) (the miner is undercut with positive probability by authorizing B2 BTC in transactions). Therefore, the expected reward from authorizing B2 BTC is B2e −B2−f−1(B2)+X. As B1 > B2 and f −1(·) is increasing, we have X ≤ B1 + f−1(B1) as well. This means that the expected reward from authorizing B1 BTC is B1e −B1−f−1(B1)+X. By hypothesis, this is less than the reward of authorizing B2. We now recall quickly properties of W0(·): • The domain of W0(·) is [−1/e,∞) and the range is [−1,∞). • W0(·) is increasing. • W0(xex) = x for all x ∈ [−1,∞). We will need to make use of some technical facts about f(·) (our specific choice from the statement of Theorem 5.1) that we first prove below. Fact 1. f(x) ≤ x everywhere. Proof. Clearly, f(x) ≤ x on [0,y]. Also clearly, f(x) ≤ x on [2y − ln(y) − 1,∞) iff f(2y − ln(y) − 1) ≤ 2y − ln(y) − 1. So we just need to check the range [y, 2y− ln(y)−1]. The derivative of W0(x) = W0(x)x(W0(x)+1) . So the derivative of f on this range is (by the chain rule): − W0(−yex−2y) −yex−2y(W0(−yex−2y) + 1) ·−yex−2y = −W0(−yex−2y) 1 + W0(−yex−2y) = f(x) 1 −f(x) As f(·) is increasing and positive on [y, 2y−ln(y)−1] (because of the form for f′(x) we just derived above - not all positive, increasing f(·) have increasing derivatives), this means that f′(·) is also increasing and positive on [y, 2y− ln(y) − 1]. As the derivative of x is constant (1), this means that if f(x) > x anywhere on this interval, f(2y − ln(y) − 1) > 2y − ln(y) − 1 or f(y) > y. We can clearly see that f(y) = −W0(−ye−y) = y, and f(2y− ln(y)−1) = −W0(−ye2y−ln(y)−1) = −W0(−1/e) = 1. So we can’t have f(y) > y, and we have f(2y−ln(y)−1) > 2y−ln(y)−1 if and only if 2y−ln(y)−1 < 1, which is the same as 2y− ln(y) < 2. As this is exactly the range of y we disallow, we see that we also can’t have f(2y− ln(y)−1) > 2y− ln(y)−1 for any y we allow. Therefore, f(x) ≤ x everywhere. Fact 2. Be−B−f −1(B) = • Be−2B,B ∈ [0,y]. • ce−2c,B ∈ [y, 1]. • 0,B > 1. Proof. We first observe that f−1(B) = B for all B ∈ [0,y], which immediately proves the first bullet. We next observe that f−1(B) = +∞ for all B > 1, which immediately proves the last bullet. For the middle bullet, observe that: −W0(−ye(2y+ln(z/y)−y)−2y) = −W0(−yeln(z/y)−z) = −W0(−ze−z) = z. Note that the last equality is due to the fact that W0(·) is the inverse of xex. This proves that f−1(B) = 2y + ln(B/y)−B when B ∈ [y, 1] and completes the middle bullet. Corollary C.5. If y ∈ (0, 1/2], then Be−B−f −1(B) is strictly increasing on [0,y] and constant on [y, 1]. Proof. Be−B−f −1(B) is clearly constant on [y, 1], so we just need to confirm that it’s strictly increasing on [0,y]. The derivative of Be−2B is (1−2B)e−2B, which is strictly positive on [0, 1/2] (and therefore on [0,y] for all y ≤ 1/2), as desired. Proof of Theorem 5.1: We want to invoke Corollary C.3 combined with Corollary C.5. Together, these immediately say that for any 1 ≥ B1 > B2 ≥ 0, it is at least as good to authorize B1 BTC as B2. As authorizing B > 1 BTC always results in expected reward of 0, this immediately implies by Corollary C.2 that for any b, min{1,b}∈ argmaxB∈[0,b]{Be −max{0,B+f−1(B)−Rem(Mosti)}}. Now, we also want to invoke Corollary C.4 to show that there may exist other maximizers as well if Rem(Mosti) ∈ [y, 2y − ln(y) − 1]. Note that f(·) is strictly increasing in this range, meaning that f−1(f(Rem(Mosti))) = Rem(Mosti). Therefore, we see that Rem(Mosti) ≤ f(Rem(Mosti)) + f−1(f(Rem(Mosti))) (the miner will be undercut with positive probability when authorizing f(Rem(Mosti)) BTC) on this entire range. Together with Corollary C.5, this means that the hypotheses of Corollary C.4 are satisfied taking B2 = f(Rem(Mosti)) and any B1 ≥ B2. Combined with the reasoning above, this means that when Rem(Mosti) ∈ [y, 2y − ln(y) − 1] and b ≥ f(Rem(Mosti)), we also have f(Rem(Mosti)) ∈ argmaxB∈[0,b]{Be −max{0,B+f−1(B)−Rem(Mosti)}}. Therefore, when b = Rem(MostH), we recover that f(Rem(MostH)) is an optimal choice of BTC to authorize when continuing. When b = GapH, we recover that min{1,GapH,f(Rem(MostH−1))} = min{GapH,f(Rem(MostH−1))} is an optimal choice of BTC to authorize when undercutting. So FunctionFork(f) correctly chooses how many BTC to authorize when continuing and when undercutting, we just need to check that it also chooses when to undercut and when to continue. If GapH > f(Rem(MostH)), then min{GapH,f(Rem(MostH−1))}≥ f(Rem(MostH)) as well, and we can invoke Corollary C.3 with B1 = min{GapH,f(Rem(MostH−1))} and B2 = f(Rem(MostH)). By the argument above, because 1 ≥ B1 ≥ B2, the hypotheses of Corollary C.3 are satisfied, and the expected re- ward is at least as high when authorizing B1 as B2, so undercutting is at least as good as continuing. Similarly, if f(Rem(MostH)) ≥ GapH, then f(Rem(MostH)) ≥ min{GapH,f(Rem(MostH−1))}. So we may again invoke Corollary C.3, this time with B1 = f(Rem(MostH)) and B2 = min{GapH,f(Rem(MostH−1))}. So now we have shown that the FunctionFork(f) correctly chooses how many BTC to authorize when continuing and when undercutting, and also chooses correctly whether to continue or undercut. So it is an equilibrium. The last part we need to reason about is the connection to random walks. Observe that the number of transaction fees grows continuously at a rate of 1 per unit. Every time a block is found, it drops by at most 1. So definitely the backlogged transactions will be at least as bad as a random walk that drops by exactly 1 (because it will only drop further). Lemma C.7 below proves that with constant probability, the number of blocks found in a time interval of length n + √ n is at most n. When this occurs, there is a backlog of at least √ n transactions at time n + √ n. Therefore, the expected backlog is at least Θ( √ n) (in fact, it is exactly Θ( √ n)). During this time, new transactions take Θ( √ n) time steps before they are included in a block. 2 Before proving Lemma C.7, we recall the Berry-Esseen theorem: Theorem C.6 (Berry-Esseen). Let X1, . . . ,Xn be i.i.d. random variables with mean 0, E[X2i ] = σ 2, E[X3i ] = ρ. Then for all x: Pr[ ∑ i Xi σ √ n ≥ x] − Φ(x) = O( ρ σ3 √ n ), Where Φ(x) denotes the probability that a Gaussian random variable with mean 0 and standard deviation 1 exceeds x. Lemma C.7. Define Xi to be an exponential random variable with mean 1. Then: Pr[ n∑ i=1 Xi > n + √ n] = Θ(1). In particular, this implies that probability that fewer than n blocks are found in n + √ n time steps is Θ(1). Proof. Define Yi = Xi − 1. Then the Yi are i.i.d. random variables with mean 0, E[Y 2i ] = σ < 2, and E[Y 3i ] = ρ < 6. Plugging into Berry-Esseen (stated below), we get: Pr[ n∑ i=1 Yi > √ n] = Pr[ ∑n i=1 Yi σ √ n > 1 σ ] ≥ Φ( 1 σ ) −O( 1 √ n ). As σ is a constant independent of n, Φ(σ) is also independent of n, so Φ( 1 σ ) −O( 1√ n ) = Θ(1), as desired. D. WHEN DEFAULT MINING IS AN EQUILIBRIUM FOR NON-ATOMIC MINERS In the absence of latency, default mining is an equilibrium for non-atomic miners regardless of the reward model, and the reasoning is simple: if you do anything except extend the unique longest chain, your block will be orphaned and you will receive reward zero. If you wait to publish your block, you risk losing the option to publish it without being orphaned. All other miners ignore the transactions included in your block when deciding where to extend, so you may as well include as many transactions as possible. In the presence of latency, forks will naturally occur, so PettyCompliant outperforms DefaultCompliant in the trans- action fees model. In the fixed reward model, DefaultCompliant remains an equilibrium under quite general models of latency (still assuming non-atomic miners). Consider, for instance, any model of latency with the following property. When- ever miner m finds a block, and miner m′ finds a block at a later time, we have Bm ⊆ Bm′, where Bm denotes the set of blocks that miner m had heard of when they found their block. In other words, by the time miner m′ solves their block, they have become aware of at least every block that m was aware of when they solved their block earlier (but perhaps not m’s block, nor any blocks that m was not herself aware of). It is easy to see that simple latency models (such as all announcements being grouped into chunks of λ seconds) have this property, as well as much more general latency models. It is also easy to see that in the transaction fees model, the simple latency model where announcements are grouped into chunks of λ seconds is rich enough so that DefaultCompliant is strictly outperformed by PettyCompliant and therefore not an equilibrium. Proposition D.1. When miners are non-atomic, even in the presence of any latency of the form described above, it is an equilibrium for every miner to use DefaultCompliant. Proof. The proof is actually very straight-forward: assuming that all other miners are DefaultCompliant, mining anywhere except on top of a longest chain guarantees that your block will be orphaned and you will receive a reward of zero (because our latency assumptions guarantee that the next miner and all future miners will have heard about the blocks you chose to undercut before yours, and they are all DefaultCompliant). So the only choices are how to tie-break among multiple longest chains. But this choice neither affects your rewards (they are fixed!), nor the likelihood that your block will be chosen by the next miner (as this depends only on how quickly they hear about your block and not on its contents). So tie-breaking in favor of the earliest chain is at least as good as any other tie-breaking rule. Finally, it is also easy to see that publishing as soon as possible is optimal, as this maximizes the likelihood that your block is chosen to be extended. The point of Proposition D.1 is again just to contrast the difference between transaction fees and fixed rewards. In the non- atomic regime, even in quite general latency models, DefaultCompliant mining is an equilibrium in the fixed-reward model. The proof is simple and matches exactly our intuition for why DefaultCompliant should make sense. But in the transaction fees model, whenever there exists a possibility for forks, DefaultCompliant is strictly outperformed by PettyCompliant, and the space of equilibria is therefore much more complex. In particular, it would be interesting for future work to identify an equilibrium for non-atomic miners in the transaction fees model in any non-trivial latency model. E. SELFISH MINING E.1 Classic Selfish Mining with transaction fees Here we provide details on how to analyze selfish mining in the transaction fee regime. Recall that Eyal and Sirer [9] have already computed ps for all s, the probability that the block chain is in state s. Below we compute fs for all states s, the probability that a transaction winds up with the selfish miner conditioned on that transaction arriving while the blockchain is in state s. Computing f0: Let’s consider the possible outcomes when a transaction arrives in state 0: • If a default miner mines the next block, it will contain this transaction, and this block will definitely be in the eventual longest chain. This happens with probability (1 −α). • Alternatively, the selfish miner could find the next block. If the selfish miner finds the next block, they will include the transaction in their block, but they keep this block private after they find it. This happens with probability α, but this block is not guaranteed to make it into the eventual longest chain, yet. • From here, maybe the selfish miner finds the next block as well. This happens with probability α. Once this happens, both blocks are guaranteed to be in the eventual longest chain. So this event contributes a probability α2 that the transaction winds up in the selfish miner’s block. • Alternatively, a default miner might find the next block, which triggers a race. This happens with probability (1−α). Both racing blocks contain the transaction being considered, so whoever wins the race receives the corresponding transaction fees. The selfish miner wins the race with probability α + γ(1 −α), so this event contributes α(1 −α)(α + γ(1 −α)) in total. Therefore, we see that: f0 = α 2 + α(1 −α) (α + γ(1 −α)) (4) Computing f0′: If a new transaction is announced in state 0 ′, then the next block found is certainly contained in the eventual longest chain because it is always announced and every miner chooses to mine on top of it. So this transaction is won by whichever miner finds the next block, which is the selfish miner with probability α. Therefore: f0′ = α (5) f1: Consider now a transaction announced in state 1, and where it might wind up: • If the selfish miner finds the next block, they will have a private chain of length 2, in which case both blocks are guaranteed to make it into the final block chain. Therefore, this transaction will certainly wind up in a block mined by the selfish miner. This happens with probability α. • Alternatively, the rest of the network might find the next block. This happens with probability (1−α). But we don’t yet know whether or not this block will make it in the eventual longest chain because this triggers the “race,” and puts us in state 0′. Note though that the racing selfish block does not contain this transaction that arrived once we were already in state 1. Therefore, even if the selfish miner wins the race, but because a default miner chose their block, the selfish miner will not get this transaction. So the only way for the selfish miner to win this transaction is to find the block that ends the race. This happens with probability α. f1 = α + (1 −α)α = α(2 −α). (6) Computing fi: Finally, consider a transaction arriving to the system in state i, i > 1. In these states, it is easier to consider what must happen in order for the transaction to not end up in a block the selfish miner owns. For the transaction to wind up in a default miner’s block, it needs to be the case that the selfish miner releases their entire private chain before mining a new block (which would contain this transaction). This is because any blocks found by default miners before this trigger are all orphaned. For a release to be triggered, a default miner must find each of the next i− 1 blocks, which happens with probability (1 −α)i−1. If this happens, we still don’t know where this transaction winds up, because each of the i−1 blocks found will be orphaned. But we have now returned to state 0, and the remainder of the analysis concludes as if the transaction had been announced during state 0. So the probability that a default miner winds up with a transaction arriving in state i is (1 −α)i−1(1 −f0), and therefore: fi = 1 − ((1 −α)i−1(1 −f0)) (7) Summing everything together, we get the following: Theorem E.1. If all other miners remain DefaultCompliant, a selfish miner in the transaction fees model with an α ∈ (0, .5) fraction of the mining power and racing parameter γ ∈ [0, 1] achieves reward Reward(α,γ) with: Reward(α,γ) = 5α2 − 12α3 + 9α4 − 2α5 + γ(α− 4α2 + 6α3 − 5α4 + 2α5) 2α3 − 4α2 + 1 Proof. The only remaining part of the proof is summing p0f0 + p0′f0′ + p1f1 + ∑ i>1 pifi. p0f0 = 1 − 2α 2α3 − 4α2 + 1 · ( α 2 + α(1 −α)(α + (1 −α)γ) ) = 2α2 − 5α3 + 2α4 + αγ − 4α2γ + 5α3γ − 2α4γ 2α3 − 4α2 + 1 p0′f0′ = (1 −α)(α− 2α2) 2α3 − 4α2 + 1 ·α = α2 − 3α3 + 2α4 2α3 − 4α2 + 1 p1f1 = α− 2α2 2α3 − 4α2 + 1 ·α(2 −α) = 2α2 − 5α3 + 2α4 2α3 − 4α2 + 1 pifi = ( α 1 −α ) i−1 α− 2α2 2α3 − 4α2 + 1 −αi−1(1 −f0) α− 2α2 2α3 − 4α2 + 1∑ i>1 ( α 1 −α ) i−1 = α 1 − 2α ⇒ ∑ i>1 pifi = α2 2α3 − 4α2 + 1 − ∑ i>1 α i−1 (1 −f0) α− 2α2 2α3 − 4α2 + 1∑ i>1 α i−1 = α 1 −α ⇒ ∑ i>1 pifi = α2 2α3 − 4α2 + 1 − α(1 −α + α(1 −α)2(1 −γ))(α− 2α2) (1 −α)2α3 − 4α2 + 1 = α2 −α(1 + α(1 −α)(1 −γ))(α− 2α2) 2α3 − 4α2 + 1 . = 2α3 −α2(α− 2α2 −α2 + 2α3 −γα + 2γα2 + γα2 − 2γα3) 2α3 − 4α2 + 1 = α3 + 3α4 − 2α5 + γα3 − 3γα4 + 2γα5 2α3 − 4α2 + 1 The proof concludes by just summing the four terms. E.2 Improved Selfish Mining with a cutoff In this section, we complete our analysis of our improved selfish mining with a cutoff. In order to keep the analysis of this strategy tractable, we choose to slightly tweak our analysis (but our theory-matches-simulation plot in Figure 6.2 shows that this tweak is essentially irrelevant). The only tweak we make is that right after the selfish miner releases a chain of length two simultaneously, they immediately publish the next block (if they find it), and then return to selfish mining. In the language of Eyal and Sirer, this is like adding an additional state 0′′ where the selfish miner honestly mines. No matter who finds a block in this state, the next state is 0. The only transition into this state is when the honest portion of the network finds a block when the selfish miner has a lead of 2 13 shows an updated Markov chain with state 0′′. Again, note that this modification is just for analysis. The selfish mining with cutoff that is implemented in our simulator is as described in the body. In order to calculate the selfish miner’s expected revenue, we must again calculate the probability of the system being in any given state, and the chance that a transaction arriving to the system while in one of these states eventually ends up in a block mined by the selfish miner. From looking at the state transitions in Figure 13, we can derive the following formulas relating the probabilities of being in each state: piα = pi+1(1 −α) (8) =⇒ pi = ( α 1 −α ) i−1 p1 (9) p0′′ = (1 −α)p2 = αp1 (10) p0′ = (1 −α)p1 (11) Figure 13: State machine for selfish mining with a cutoff, introducing state 0′′. p0 = p1 α(1 −e−β) (12) We also know that the system is guaranteed to be in some state, which means the following. p0 + p0′ + p0′′ + i=∞∑ i=1 pi = 1 (13) Which together imply that p1 = α(2α− 1)(eβ − 1) 3α2(eβ − 1) + 2α−eβ (14) With equations 9-12, this gives expressions for the probabilities of all the possible states the system could be in. Now we need to compute the probability that a transaction that arrives when the system is in state s winds up with the selfish miner. Unfortunately, this is not a clean approach: because in state 0 the selfish miner will sometimes publish and sometimes hide their block, depending on how much time has passed since the last block was found, we need actually to introduce a continuum of states for each amount of time x for the size of the block that is building during state 0. So let’s define a new variable, p0(x) which denotes the probability that the system is in state 0 and x units of time have passed since the system entered state 0. Because we introduced this new state 0′′, whenever we enter state 0, the initial block is empty. Therefore, the probability that we wind up in state 0 with a block of size at least x is p0e −x, and we have: p0(x) = p0e −x dx (15) We must now calculate the associated fs (probability that a transaction winds up with the selfish miner conditioned on arriving during state s) in order to calculate the expected fraction of the rewards claimed by the selfish miner. Computing f0(x). If a new transaction arrives in state 0, let’s look at where this transaction might wind up. Note that this depends on how long it’s been (x) since the last block was found. • If the next block is found by the honest miner, then this transaction will certainly wind up with the honest miners. This happens with probability 1 −α. • If x ≥ β, and the next block is found by the selfish miner, then it certainly winds up with the selfish miner. This happens with probability α. • If x < β, and the next block is found by the selfish miner after time β−x as passed, then it certainly winds up with the selfish miner. This happens with probability αe−β+x. • If x < β, and the next block is found by the selfish miner within β −x time, then this transaction isn’t determined yet because the selfish miner chooses to hide that block. But this happens with probability α(1 −e−β+x). • If both of the next two blocks are found by the selfish miner, than this transaction is contained in a block of the selfish miner that will certainly be included in the eventual longest chain. This happens with probability α2(1 −e−β+x). • If the next block is found by the selfish miner, followed by a block by the honest miner, then a race is triggered. This transaction is contained in the two racing blocks, so whoever wins the race gets this transaction. The race occurs with probability α(1 −e−β+x)(1 −α), and the selfish miner wins the race with probability α + (1 −α)γ. So in total, we see that f0(x) = α, when x ≥ β, and f0(x) = αe−β+x + α2(1 −e−β+x) + α(1 −α)(1 −e−β+x)(α + (1 −α)γ) if x ≤ β. Computing f0′. If a new transaction arrives when there are two chains competing of the same length, then the next block found is certainly contained in the eventual longest chain (because both miners choose to mine on top of it). So if the next block is found by the selfish miner, this transaction is won by him. Otherwise, it’s won by the honest miner. So we have f0′ = α. Computing f0′′. If a new transaction arrives during the state 0 ′′, the next block found is certainly contained in the eventual longest chain again. So we again have f0′′ = α. Computing f1. If a new transaction arrives when the sefish miner has a private chain of length 1, let’s consider where the transaction might wind up: • If the next block is found by the selfish miner, then this transaction is contained in a block of the selfish miner that will certainly be included in the eventual longest chain. This happens with probability α. • If the next block is found by the honest miner, then this triggers a release of the private block and a race. But, the racing selfish block does not contain this transaction, whereas the racing honest block does. So if the racing honest block wins, the honest miner gets this transaction. If the racing selfish block wins, whoever finds the block that ends the race gets this transaction. So the selfish miner gets the transaction in this case only if he finds the block that ends the race. This happens with probability (1 −α)α. So we see that f1 = α + (1 −α)α = α(2 −α). Computing fi, i > 1. If a new transaction arrives when the selfish miner has a private chain of length i > 1, let’s again consider where this transaction might wind up: • If the next block is found by the selfish miner, then this transaction is contained in a block of the selfish miner that will certainly be included in the eventual longest chain. This happens with probability α. • If the next i− 1 blocks are all found by the honest miner, then this triggers a release of the private chain, and all those blocks found by the honest miner are immediately ignored. At this point, the transaction has still not been included in any block, so it is as if the transaction arrived in state 0′′. So the selfish miner gets this transaction with probability f0′′ in this case. • If any of the next i−1 blocks are found by the selfish miner, then this block is certainly included in the eventual longest chain, because it is found when the selfish miner has a lead of at least two. So we see that the only way the selfish miner might possibly lose the transaction is if each of the next i− 1 blocks are found by the honest miner, and even in this case the selfish miner still wins the transaction with probability f0′′ = α. So the honest miner only wins this transaction with probability (1 −α)i−1(1 −α), and we have fi = 1 − (1 −α)i. Now, we just have to sum/integrate over all states and success probabilities to compute the fraction of transactions that go to the selfish miner. f0′p0′ = α(1 −α)p1. f0′′p0′′ = α 2 p1. f1p1 = α(2 −α)p1. fipi = (1 − (1 −α)i)αi−1p1 (1 −α)i−1 , i > 1. f0(x)p0(x) = p1e −xdx 1 −e−β , x ≥ β. f0(x)p0(x) = p1e −xdx(e−β+x + α(1 −e−β+x) + (1 −α)(1 −e−β+x)(α + (1 −α)γ)) 1 −e−β , x ≤ β. ∑ i>1 αi−1 (1 −α)i−1 = α 1 − 2α . ∑ i>1 α i−1 = α 1 −α . ⇒ ∑ i>1 fipi = p1 (∑ i>1 αi−1 (1 −α)i−1 − (1 −α) ∑ i>1 α i−1 ) = ( α 1 − 2α −α ) p1 = 2α2p1 1 − 2α . ∫ x≥β f0(x)p0(x) = p1 1 −e−β ∫ x≥β e −x dx = e−βp1 1 −e−β . ∫ x=0 β f0(x)p0(x) = ∫ β x=0 p1 1 −e−β (( e −β −αe−β − (1 −α)(α + (1 −α)γ)e−β ) + ( αe −x + (1 −α)(α + (1 −α)γ)e−x )) dx. = p1βe −β(1 −α− (1 −α)(α + (1 −α)γ)) 1 −e−β + p1(1 −e−β)(α + (1 −α)(α + (1 −α)γ)) 1 −e−β = p1 ( βe−β(1 −α)(1 −α− (1 −α)γ) + (1 −e−β)(α + (1 −α)(α + (1 −α)γ)) ) 1 −e−β . Summing everything together, we then get:∫ ∞ 0 p0(x)f0(x) + ∑ i>1 pifi + p0′f0′ + p0′′f0′′ + p1f1 = ( βe−β(1 −α)(1 −α− (1 −α)γ) 1 −e−β + α + (1 −α)(α + (1 −α)γ) + e−β 1 −e−β + 2α2 1 − 2α + 3α−α2 ) p1. This can be further simplified to yield the bound provided in the paper. ( βe−β(1 −α)(1 −α− (1 −α)γ) 1 −e−β + α + (1 −α)(α + (1 −α)γ) + e−β 1 −e−β + 2α2 1 − 2α + 3α−α2 ) p1. = 1 + β(1 −α)2(1 −γ) eβ − 1 + 4α + (1 −α)(α + (1 −α)γ) + 2α2 1 − 2α −α2.