key: cord-0171087-nf9pqwx3
authors: Liu, Boyi; Li, Jiayang; Yang, Zhuoran; Wai, Hoi-To; Hong, Mingyi; Nie, Yu Marco; Wang, Zhaoran
title: Inducing Equilibria via Incentives: Simultaneous Design-and-Play Finds Global Optima
date: 2021-10-04
journal: nan
DOI: nan
sha: 6c2c0d5f8e90fb31466ab0da55943ab6d681f5ef
doc_id: 171087
cord_uid: nf9pqwx3

To regulate a social system comprised of self-interested agents, economic incentives (e.g., taxes, tolls, and subsidies) are often required to induce a desirable outcome. This incentive design problem naturally possesses a bi-level structure, in which an upper-level"designer"modifies the payoffs of the agents with incentives while anticipating the response of the agents at the lower level, who play a non-cooperative game that converges to an equilibrium. The existing bi-level optimization algorithms developed in machine learning raise a dilemma when applied to this problem: anticipating how incentives affect the agents at equilibrium requires solving the equilibrium problem repeatedly, which is computationally inefficient; bypassing the time-consuming step of equilibrium-finding can reduce the computational cost, but may lead the designer to a sub-optimal solution. To address such a dilemma, we propose a method that tackles the designer's and agents' problems simultaneously in a single loop. In particular, at each iteration, both the designer and the agents only move one step based on the first-order information. In the proposed scheme, although the designer does not solve the equilibrium problem repeatedly, it can anticipate the overall influence of the incentives on the agents, which guarantees optimality. We prove that the algorithm converges to the global optima at a sublinear rate for a broad class of games.

to a sub-optimal solution. This dilemma prompts the following question that motivates this study:

can we obtain the optimal solution to an incentive design problem without repeatedly solving the equilibrium problem?

In this paper, we propose an efficient principled method that tackles the designer's problem and agents' problem simultaneously in a single loop. At the lower level, we use the mirror descent method [30] to model the process through which the agents move towards equilibrium. At the upper level, we use the gradient descent method to update the incentives towards optimality. At each iteration, both the designer and the agents only move one step based on the first-order information.

However, as discussed before, the upper gradient relies on the corresponding lower equilibrium, which is not available in the single-loop update. Hence, we propose to use the implicit differentiation formula-with the equilibrium strategy replaced by the one at the current iteration-to estimate the upper gradient, which might be biased at the beginning. Nevertheless, we prove that if we improve the lower-level solution with larger step sizes, the upper-level and lower-level problems converge simultaneously. The proposed scheme hence guarantees optimality because it can anticipate the overall influence of the incentives on the agents eventually after convergence. The proposed approach is more efficient because the computation cost at each iteration is low and the algorithm converges to the optimal solution at a provably fast rate.

Notation. We denote ·, · as the inner product in vector spaces. Given n matrices {A i } n i=1 , we denote blkdiag(A 1 , . . . , A n ) as the block diagonal matrix whose ith block is A i . For a vector form a = (a i ), we denote a −i = (a j ) j =i . For a finite set X ∈ R n , we denote ∆(X ) = {π ∈ R n + :

x i ∈X π x i = 1}. For any vector norm · , we denote · * = sup z ≤1 ·, z as its dual norm. Given a strongly convex function ψ, we define the corresponding Bregman divergence as

The incentive design problem studied in this paper is a special case of mathematical programs with equilibrium constraints (MPEC) [14] , a class of optimization problems constrained by equilibrium conditions. MPEC is closely related to bi-level programs [7] , which bind two mathematical programs together by treating one program as part of the constraints for the other. The study of bi-level programming can be traced back to the Stackelberg game in economics [36] .

In the optimization literature, it was first introduced to tackle resource allocation problems [4] and has since found applications in such diverse topics as revenue management, network design, traffic control, and energy systems. In the past decade, researchers have discovered numerous applications of bi-level programming in machine learning, including meta-learning (ML) [9], adversarial learning [16] , hyperparameter optimization [23] and neural architecture search [19] . These newly found bilevel programs in ML are often solved by descent methods, which require differentiating through the (usually unconstrained) lower-level optimization problem [20] . The differentiation can be carried out either implicitly on the optimality conditions as in the conventional sensitivity analysis [see e.g., 1, 32, 3] , or explicitly by unrolling the numerical procedure used to solve the lower-level problem [see e.g., 23, 10]. In the explicit approach, one may "partially" unroll the solution procedure (i.e., stop after just a few rounds, or even only one round) to reduce the computational cost. Although this popular heuristic has delivered satisfactory performance on many practical tasks [21, 27, 9, 19] , it cannot guarantee optimality for bi-level programs under the general setting, as it cannot derive the accurate upper-level gradient at each iteration [38] .

Unlike bi-level programs, MPEC (see [22] for a comprehensive introduction) is relatively underexplored in the ML literature so far. Recently, Li et al. [18] extended the explicit differentiation method for bi-level programs to MPECs. Their algorithm unrolls an iterative projection algorithm for solving the lower-level problem, which is formulated as a variational inequality (VI) problem.

Leveraging the recent advance in differentiable programming [1] , they embedded each projection iteration as a differentiable layer in a computational graph, and accordingly, transform the explicit differentiation as standard backpropagation through the graph. Although backpropagation is efficient, constructing and storing such a graph-with potentially a large number of projection layers needed to find a good solution to the lower-level problem-is still demanding. To reduce this burden, partially unroll the iterative projection algorithm is a solution. Yet, it still cannot guarantee optimality for MPECs due to the same reason as for bi-level programs.

The simultaneous design-and-play approach proposed in this study is proposed to address this dilemma. Our approach follows the algorithm of Hong et al. [15] and Chen et al.

[6], which solves bi-level programs via single-loop update. Importantly, they solve both the upper-and the lower-level problem using a gradient descent algorithm and establish the relationship between the convergence rate of the single-loop algorithm and the step size used in gradient descent. However, their algorithms are limited to the cases where the lower-level optimization problem is unconstrained.

Our work extends these single-loop algorithms to MPECs that have an equilibrium problem at the lower level. We choose mirror descent as the solution method to the lower-level problem because of its broad applicability to optimization problems with constraints [30] and generality in the behavioral interpretation of games [25, 17] . We show that the convergence of the proposed simultaneous design-and-play approach relies on the setting of the step size for both the upperand lower-level updates, a finding that echos the key result in [15] . We first give the convergence rate under mirror descent and the unconstrained assumption and then extend the result to the constrained case. For the latter, we show that convergence cannot be guaranteed if the lower-level solution gets too close to the boundary early in the simultaneous solution process. To avoid this trap, the standard mirror descent method is revised to carefully steer the lower-level solution away from the boundary.

We study incentive design in both atomic games [29] and nonatomic games [33], classified depending on whether the set of agents is endowed with an atomic or a nonatomic measure. In social systems, both types of games can be useful, although the application context varies. Atomic games are typically employed when each agent has a non-trivial influence on the payoffs of other agents. In a nonatomic game, on the contrary, a single agent's influence is negligible and the payoff could only be affected by the collective behavior of agents.

Consider a game played by a finite set of agents N = {1, . . . , n}, where each agent i ∈ N selects a strategy a i ∈ A i ⊆ R d i to maximize the reward, which is determined by a continuously differentiable

Suppose that for all i ∈ N , the strategy set A i is closed and convex, and the reward function 

Example 3.1 (Cournot oligopoly). In the Cournot oligopoly model, each firm i ∈ N supplies the market with a quantity a i of goods. We assume that the price of the good is p = p 0 − j∈N γ j · a j , where a, γ 1 , . . . , γ n > 0. The profit of the firm i is given by

where c i : R → R is the cost function for the firm i.

Consider a game played by a continuous set of agents, which can be divided into a finite set of classes N = {1, . . . , n}. We assume that each i ∈ N represents a class of infinitesimal and homogeneous agents sharing the finite strategy set A i with |A i | = d i . The mass distribution for the class i is defined as a vector q i ∈ ∆(A i ) that gives the proportion of agents using each strategy. Let the cost of an agent in class i to select a strategy k ∈ A i given q = (q 1 , . . . , q n ) be c ik (q). Formally, a joint mass distribution q ∈ ∆(A) = i∈N ∆(A i ) is a Nash equilibrium, also known as a Wardrop equilibrium [37] , if for all i ∈ N , there exists b i such that

The following result extends the VI formulation to Nash equilibrium in nonatomic game: denote

Example 3.2 (Routing game). Consider a set of agents traveling from source nodes to sink nodes in a directed graph with nodes V and edges E. Denote N ⊆ V × V as the set of source-sink pairs, K i ⊆ 2 E as the set of paths connecting i ∈ N and E ik ⊆ E as the set of all edges on the path k ∈ K i . Suppose that each source-sink pair i ∈ N is associated with ρ i nonatomic agents aiming to choose a route from K i to minimize the total cost incurred. Let q ik and f ik be the proportion and number of travelers using the path k ∈ K w , then we have f ik = ρ i · q ik for all i ∈ N and k ∈ K i .

Let x e be the number of travelers using the edge e, then the path and edge flow are related by

x e = i∈N k∈K i f ik · δ eik , where δ eik equals 1 if e ∈ E ik and 0 otherwise. Let the cost on edge e be t e (x e ), then the total cost for a traveler selecting a path k ∈ K i will be e∈E t e (x e ) · δ eik .

Despite the difference, we see that an equilibrium of both atomic and nonatomic games can be formulated as a solution to a corresponding VI problem in the following form

where v i and X i denote different terms in the two types of games. Suppose that there exists an incentive designer aiming to induce a desired equilibrium. To this end, the designer can add incentives θ ∈ Θ ⊆ R d in the reward/cost functions. In this respect, the function v i in (3.3) will be parameterized by θ, denoted as v i θ . We assume that the designer's objective is determined by a function f : Θ × X → R. Denote S(θ) as the solution set to (3.3) . We then obtain the uniform formulation of the incentive design problem for both atomic games and nonatomic games If the equilibrium problem admits multiple equilibria, the agents may converge to different ones and it would be difficult to determine which one can better predict the behaviors of the agents without further information. Therefore, in this paper, we only consider the case where the game admits a unique equilibrium because, in the absence of uniqueness, additional rules are usually required to single out a representative one, which is beyond the scope of this paper. Sufficient conditions under which the game admits a unique equilibrium will be provided later.

We propose to update θ and x simultaneously to improve the computational efficiency. The game dynamics at the lower level is modeled using the mirror descent method. Specifically, at the stage k, given θ k and x k , the agent first receives v i θ k (x k ) as the feedback. As in many cases, its exact value is difficult to obtain [25], we assume that the agents receive v i k as an estimator of v i θ k (x k ). After receiving the feedback, the agents update their strategies via

is the Bregman divergence induced by a strongly convex function ψ i . Subsequently, the designer aims to update θ k in the opposite direction of the gradient of its objective function, which can be written as

To compute the exact value of such a gradient, we first need to obtain the equilibrium x * (θ k ).

However, at the stage k, we only have the current strategy x k . Therefore, we also have to establish an estimator of ∇x * (θ k ) and ∇f * (θ k ) using x k , the form of which will be specified later.

We first consider unconstrained games with X i = R d i , for all i ∈ N . We select ψ i (·) as smooth function, i.e., there exists a constant H ψ ≥ 1 such that for all i ∈ N and

Examples of ψ i satisfying this assumption include (but are not limited to)

It can be directly checked that we can set H ψ = max i∈N δ i , where δ i is the largest singular value of Q i . In this case, the corresponding Bregman divergence becomes

which is known as the squared Mahalanobis distance. Before laying out the algorithm, we first give the following lemma characterizing ∇ θ x * (θ).

Proof. See Appendix A.2 for a detailed proof.

Then for any given θ ∈ Θ and x ∈ X , we can define

as an estimator of ∇f * (θ). Now we are ready to present the following bi-level incentive design algorithm for unconstrained games.

Update strategy profile

where ∇f k is an estimator of ∇f (θ k , x k+1 ) as defined in (4.3).

Output: Last iteration incentive parameter θ k+1 and strategy profile x k+1 .

We then consider the case where for all i ∈ N , x i is constrained within the probability simplex

Here 1 d i ∈ R d i is the vector of all ones. In such a case, we naturally consider

which is the Shannon entropy. Such a choice gives the following Bregman divergence,

which is known as the KL divergence. To construct the estimator of ∇f * (θ), we have the following lemma as an extension of Lemma 4.1.

As a consequence of Lemma 4.2, corresponding to (4.3), we define

as an extension to the gradient ∇f * (θ), where J θ is an estimate of J θ . In addition to a different gradient estimate, we also modify Algorithm 1 to keep the iterations x k from hitting the boundary at the early stage. The modification involves an additional step that mixes the strategy with a uniform strategy 1 d i /d i , i.e., imposing an additional step

upon finishing the update (4.4), where ν k+1 ∈ (0, 1) is a the mixing parameter, decreasing to 0 when k → ∞. In the following, we give the formal presentation of the modified bi-level incentive design algorithm for simplex constrained games.

Input: θ 0 ∈ Θ, x 0 ∈ X , step sizes (α k , {β i k } i∈N ), k ≥ 0, and mixing parameters ν k , k ≥ 0. For k = 0, 1, . . . do:

Update strategy profile

where ∇f k is an estimator of ∇f (θ k , x k+1 ) defined in (4.5).

In this section, we study the convergence of the proposed algorithms. We first make the following assumptions on the incentive design problem and our algorithms. For simplicity, define

Assumption 5.1. The lower-level problem in (3.4) satisfies:

• The strategy set X i of player i is a nonempty, compact, and convex subset of R d i .

• There exist constants ρ θ , ρ x > 0 such that for all x ∈ X and θ ∈ Θ,

• For all θ ∈ Θ, the equilibrium x * (θ) of the game is variationally stable with respect to D ψ , i.e., for all x ∈ X ,

Assumption 5.2. The upper-level problem in (3.4) satisfies:

• The set Θ is compact and convex.

• The function f * (θ) is µ-strongly convex, i.e., there exists some µ > 0 such that, for any θ, θ ′ ∈ Θ and x ∈ X ,

• The gradient ∇f * (θ) is uniformly bounded, i.e., there exists M > 0 such that for all θ ∈ Θ, it holds that ∇f * (θ) 2 ≤ M .

• The extended gradient ∇f (θ, x) is Lipschitz continuous with respect to D ψ , i.e., there exists H > 0 such that for all x, x ′ ∈ X and θ ∈ Θ,

The filtrations satisfy:

• The gradient estimates v k and ∇f k are unbiased estimates, i.e.,

• The gradient estimates v k and ∇f k have bounded mean squared estimation errors, i.e., there exist δ f , δ u > 0 such that

Remark. Under Assumption 5. 

In this part, we establish the convergence guarantee of Algorithm 1 for unconstrained games. We define the optimality gap ǫ θ k and the equilibrium gap ǫ x k+1 as

We track such two gaps as the convergence criteria in the subsequent results.

Theorem 5.4. For Algorithm 1, set the step sizes

where H * = ρ θ /ρ x . Suppose that Assumptions 5.1-5.3 hold, then we have

Proof. See Appendix B for a detailed proof and a detailed expression of convergence rates.

In this part, we establish the convergence guarantee of Algorithm 2 for simplex constrained games.

We still define optimality gap ǫ θ k as ǫ θ k = E θ k − θ * 2 2 . Yet, corresponding to (4.6), we track ǫ x k+1 as a measure of convergence for the strategies of the agents, which is defined as

We are now ready to give the convergence guarantee of Algorithm 2.

Theorem 5.5. For Algorithm 2, set the step sizes α k = α/(k+1) 1/2 , β k = β/(k+1) 2/7 , β i k = λ i ·β k , and ν k = 1/(k + 1) 4/7 with constants α and β satisfying

Proof. See Appendix C for a detailed proof and a detailed form of the convergence rates.

The rates in Theorem 5.5 appears to be slower than those in Theorem 5.4. This is due to the fact that the KL divergence is lower bounded by squared vector norms. We also note here that, as opposed to Algorithm 1 that uses smooth potential functions, the ill behavior of the Shannon entropy on the boundaries of the simplex becomes a hurdle in both algorithm design and analysis.

The key to overcoming such an obstacle is the mixing step in (4.6), which keeps the iterations at a diminishing distance away from the boundaries. [6] Chen, T., Sun, Y. and Yin, W. (2021). A single-timescale stochastic bilevel optimization method. arXiv preprint arXiv:2102.04671 .

[7] Colson, B., Marcotte, P. and Savard, G. 

In this section, we present detailed discussions on the properties of the games.

We establish the following sufficient condition for variational stability.

Lemma A.1. Define the matrix H λ (x) as a block matrix with (i, j)-th block taking the form of

Suppose that ψ i satisfies the smooth condition (4.2). If H λ (x) + H λ (x) ⊤ ≺ −2 · H ψ · I d for some λ ∈ R N + , then the Nash equilibrium x * is λ-variationally stable with respect to the Bregman divergence D ψ .

Proof. We define the λ-weighted gradient of the game as

By definition, we can verify that H λ (x) is the Jacobian matrix of v λ (x) with respect to x. For any

Taking additional dot product · , x − x ′ on both sides of (A.1), we have

where the inequality follows from H λ (

where the second inequality follows from the fact that x * is a Nash equilibrium. Eventually, as ψ i satisfies (4.2), we have 

Suppose that ψ i is the Shannon entropy. If H λ (x) + H λ (x) ⊤ ≺ 0 for some λ ∈ R N + , then Nash equilibrium x * is λ-strongly variationally stable with respect to the KL divergence.

where the second inequality follows from the fact that x * is a Nash equilibrium.

Unconstrained game. We now provide the proof of Lemma 4.1.

Proof of Lemma 4.1. Since X i = R d i for all i ∈ N , by the definition of x * (θ), we have v θ (x * (θ)) = 0 for all θ ∈ R d . Then, differentiating this equality with respect to θ on both ends, for any i ∈ N , we have

Thus, we have

As a consequence of Lemma 4.1, we have the following lemma addressing the sensitivity of Nash equilibrium with respect to the incentive parameter θ.

Lemma A.3. Under Assumption 5.1, we have for { x k } k≥0 generated by Algorithm 1,

where

Proof. We have for some θ = ω · θ k + (1 − ω) · θ k−1 with ω ∈ [0, 1],

where the last inequality follows from Assumption 5.1.

Simplex constrained game. As a consequence of Lemma 4.2, we have the following lemma as the simplex constrained version of Lemma A.3.

Lemma A.4. Under Assumption 5.1, we have

Proof. Recall that J θ is defined in Lemma 4.2 and that · 2 is the spectral norm when operating on a matrix. We have

Thus, by Lemma 4.2, we have for some θ = ω · θ k + (1 − ω) · θ k−1 with ω ∈ [0, 1],

where the last inequality follows from Assumption 5.1.

We first present the following two lemmas under the conditions presented in Theorem 5.4.

(1 − β j /8) .

Proof. See Appendix B.1 for a detailed proof.

(1 − µα j ) . 

Thus, by Lemma B.1, we have

Similarly, we have for α k = α/(k + 1),

Thus, combining (B.1) with Lemma B.2, we obtain

Here the third inequality follows from β k = β/α 2/3 · α 

Proof. Recall that

We have

where the second inequality follows from the fact that

Taking the conditional expectation given F x k , we obtain

where the second inequality follows from Assumption 5.3. Summing up (B.3) for all i ∈ N , we have

By the λ-strong variational stability of x i (θ k ), we have

Moreover, by Assumption 5. 3 we have

where the first inequality follows from the optimality condition that v θ k (x * (θ k )) = 0, and the second inequality follows from Assumption 5.1. Thus, taking (B.5) and (B.6) into (B.4), we obtain

By Assumptions 5.2 and 5.3, we have

taking which into (B.9) and taking expectation on both sides, we further obtain

By γ = (4 − 2β k )/β k , we have

where the first inequality follows from α k−1 ≤ 2α k , and the second inequality follows from α k = α/(k + 1), β k = β/(k + 1) 2/3 , and α/β 3/2 ≤ 1/12H ψ HH * . Thus, we obtain

Recursively applying (B.11), we obtain

(1 − β j /4) .

Thus, we conclude the proof Lemma B.1.

Proof. Since the projection argmax x∈X is non-expansive, we have

Taking the conditional expectation given F θ k , we obtain

where the second inequality follows from Assumption 5.2 and the Cauchy-Schwartz inequality, and the last inequality follows from Assumption 5.2. Applying (B.10) to (B.12) and taking expectation on both sides, we get

Recursively applying (B.13), we obtain

(1 − µα j ) .

Thus, we conclude the proof of Lemma B.2.

We first present the following two lemmas under the conditions presented in Theorem 5.5

Lemma C.1. For all k ≥ 0, we have

(1 − β j /8)

(1 − β j /8) .

Proof. See Appendix C.1 for a detailed proof.

Lemma C.2. For all k ≥ 0, we have

(1 − µα j ) .

Proof. See Appendix C.2 for a detailed proof. 

Thus, by Lemma B.1, we have

(1 − β j /8)

(1 − β j /8)

(1 − β j /8) .

(C.1)

Since ν l = 1/(l + 1) 4/7 , we have β l ν l log(1/ν l ) + 2ν l+1 + 2ν 2 l ≤ (4/β 2 + 4/7β) · β 2 l , taking which into (C.1), we obtain

Similarly, we have for α k = α/(k + 1) 1/2 ,

Thus, combining (B.1) with Lemma B.2, we obtain

(1 − µα j )

Here the third inequality follows from β k = β/α 4/7 · α 4/7 k . Therefore, (C.2) and (C.3) conclude the proof of Theorem 5.5.

Proof. We first show that

we have the exact form of x i k+1 as

By the definition of KL divergence, we have

summing up which gives

where the second inequality follows from v θ (x * (θ)) ∞ ≤ V * , and the third inequality follows from Assumption 5.1. Thus, taking (C.9) and (C.10) into (C.8), we obtain for 6N H 2

where the second inequality follows from Lemma D.3. By Lemma D.3, we further have for ν k ≤

By Lemma D.2, we have for any γ > max{1, 1/ν 2 k },

By Assumption 5.3, we have

where the third inequality follows from Assumptions 5.2 and 5.3, and the last inequality follows from Lemma D. 3 . Taking (C.14) into (C.13) and taking expectation on both sides, we obtain ǫ x k+1 ≤ 1 − β k /4 + 3 (1 + γ)/ν 2 k − 1 /2 · H 2 H 2 * α 2 k−1 · ǫ x k + (δ 2 u + 3V 2 * ) · λ 2 2 β 2 k + (1 − β k /4) · (1 + γ)/ν 2 k − 1 2 · H 2 * α 2 k−1 · (δ 2 f + 2M 2 + 6N H 2 ν k−1 )

− N β k ν k log ν k + 2N · (ν k+1 + ν 2 k ). (C.15) With γ = (4β k −2)/β k , α k = α/(k +1), β k = β/(k +1) 2/7 , ν k = 1/(k +1) 4/7 , and α/β 3/2 ≤ 1/7 H H * , we have

taking which into (C.15), we obtain ǫ x k+1 ≤ (1 − β k /8) · ǫ x k + (δ 2 u + 3V 2 * ) · λ 2 2 β 2 k − N β k ν k log ν k + 2N · (ν k+1 + ν 2 k ) + β 2 k · (δ 2 f + 2M 2 + 6N H 2 ν k−1 )/8 H 2 ≤ (1 − β k /8) · ǫ x k + (δ 2 u + 3V 2 * ) · λ 2 2 β 2 k − N β k ν k log ν k + 2N · (ν k+1 + ν 2 k ) + β 2 k · (δ 2 f + 2M 2 + 6N H 2 )/8 H 2 .

(C.16)

Recursively applying (C.16), we obtain

(1 − β j /8) + (δ 2 u + 3V 2 * ) · λ 2 2 + (δ 2 f + 2M 2 + 6N H 2 )/8 H 2 · k l=0 β 2 l · k j=l+1

(1 − β j /8)

(−β l ν l log ν l + 2ν l+1 + 2ν 2 l ) · k j=l+1

(1 − β j /8) .

Proof. Applying (C.14) to (B.12) and taking expectation on both sides, we get

where the second inequality follows from ν k ≤ 1. Recursively applying (C.17), we obtain

(1 − µα j ) .

The following lemma is used in the analysis on unconstrained games.

Lemma D.1. Let ψ(·) be 1-strongly convex with respect to the norm · . Assume that (4.2)

holds. We have for any γ > H 2 ψ ≥ 1,

Proof. By the definition of Bregman divergence, we have for any γ > 0, D ψ (x, z) − D ψ (y, z) = ψ(x) − ∇ψ(z), x − z − ψ(y) + ∇ψ(z), y − z = −D ψ (x, y) + ∇ψ(z) − ∇ψ(x), y − x ≤ −1/2 · x − y 2 + ∇ψ(z) − ∇ψ(x) * · y − x ,

where the inequality follows from 1-strong convexity of ψ(·) and the Cauchy-Schwartz inequality.

By (4.2), we further have D ψ (x, z) − D ψ (y, z) ≤ −1/2 · x − y 2 + H ψ · z − x · y − x ,

where the second inequality follows from 1-strong convexity of ψ(·). Rearranging the terms in (D.1),

we finish the proof of Lemma D.1.

The following two lemmas are involved in the analysis on simplex constrained games.

Lemma D.2. Let ψ(·) be the Shannon entropy. We have for any γ k > max{1, 1/ν 2 k },

for { x k } k≥0 generated by Algorithm 2.

Proof. For the Shannon entropy ψ(·), we have ∇ x j ψ( x) = 1 + log[ x] j , which gives

Thus, we have

Replacing H ψ with 1/ν k in the proof of Lemma D.1, we conclude the proof of Lemma D.2.

Lemma D.3. Let {x k } k≥0 and { x k } k≥0 be the sequences of strategy profiles generated by Algorithm 2 with ν k ≤ O(1/k). We have

Proof. By the definition of KL divergence, we have

Optnet: Differentiable optimization as a layer in neural networks

Hierarchical optimization: An introduction

Deep equilibrium models

The theory of the market economy

Some theoretical aspects of road traffic research

Understanding short-horizon bias in stochastic meta-optimization

where the last inequality holds with N H 2 u λ 2 2 β 2 k ≤ β k , which is satisfied by β ≤ 1/N H 2 u λ 2 2 . By Lemma D.1, we have for any γ > max{1, H 2 ψ },be the normalization factor of the exact form of x i k+1 in (C.5). We havewhere the last equality follows from the fact that j∈which concludes the proof of (C.4).Continuing from (C.4), we havewhere the second inequality follows from the fact thatTaking the conditional expectation given F x k , we obtainwhere the last inequality follows from Cauchy-Schwartz inequality and the fact thatMoreover, by Assumption 5.3, we haveBy (4.6), we have for allTaking (D.6) into (D.5), we obtainwhich concludes the proof of (D.2). Similar arguments also yields (D.3). Also, we havesumming up which for i ∈ N gives (D.4).