key: cord-0541200-torz0feu
authors: Oliveira, Henrique de; Ishii, Yuhta; Lin, Xiao
title: Robust Merging of Information
date: 2021-05-31
journal: nan
DOI: nan
sha: 6a931c0b64c11103187ed39e600c917e88b2da19
doc_id: 541200
cord_uid: torz0feu

When multiple sources of information are available, any decision must take into account their correlation. If information about this correlation is lacking, an agent may find it desirable to make a decision that is robust to possible correlations. Our main results characterize the strategies that are robust to possible hidden correlations. In particular, with two states and two actions, the robustly optimal strategy pays attention to a single information source, ignoring all others. More generally, the robustly optimal strategy may need to combine multiple information sources, but can be constructed quite simply by using a decomposition of the original problem into separate decision problems, each requiring attention to only one information source. An implication is that an information source generates value to the agent if and only if it is best for at least one of these decomposed problems.

During the COVID-19 pandemic, testing has been essential in effectively monitoring the transmission of the virus. Two prevalent diagnostic tests are the molecular and antigen tests, which differ in checking the virus's genetic materials or specific proteins. 1 It might then be appealing to use both tests. 2 However, in order to correctly interpret the joint pair of results from the two tests, knowledge of their correlation is crucial. For example, conditional on the molecular test producing a false negative, what is the probability that the antigen test also yields a false negative? Although the likelihoods of false positives and false negatives for each test are wellunderstood, data regarding the correlations between these tests is scarce. 3 With such limited information about these correlations, how is a health authority supposed to make use of the results of both tests?

Such unclear correlation between information sources is a common difficulty in practical decision problems. For example, someone might have access to the opinions of multiple experts (such as doctors), but these experts might use similar specialized sources (such as a flawed study). In this paper, we assume that the agent fully understands each information source in isolation, but has no knowledge about the correlations between different information sources. We then look for strategies that are robust to such correlation, by considering the worst possible correlation that could occur.

Our main results characterize robustly optimal strategies. The simplest characterization occurs when we have two states and two actions. In that case, to guard against hidden correlation, one must resort to a rather extreme measure: the optimal robust strategy involves paying attention to a single information source, ignoring all others. In the example of the health authority, since the relevant state is whether the patient is infected with covid or not, if the decision to be made is whether to put the patient in quarantine or not, our result implies that only one test should be considered. Even if both tests have already been administered or are completely costless, the health authority should still ignore one of them.

In more general settings, this extreme measure is no longer necessary and it can be beneficial to use multiple information sources. However, we show a method of finding robust strategies that consists of decomposing a decision problem into subproblems, each requiring the use of a single information source. This shows the precise way in which information sources should be merged. In general, this decomposition can depend on the information sources, but we also show that, with two states, there is a canonical decomposition of a decision problem into binary action problems that is independent of information sources.

Finally, these characterizations of the robustly optimal strategy provide normative guidelines for constructing strategies that are robust to potentially misspecified correlations and reduce the computational burden of finding such strategies. They also provide an alternative explanation for some behavioral patterns documented empirically, such as when decision makers ignore free information in making their decisions.

1 For more information regarding these tests, see for example https://www.fda.gov/health-professionals/closer-look-covid-19-2 Taking both tests is indeed recommended by FDA: "(for antigen test) positive results are usually highly accurate, . . . negative results may need to be confirmed with a molecular test." Some medical providers always require one to take both tests.

3 See for example Dinnes et al. [2020] .

Our paper provides practical robust strategies to deal with possible hidden correlation. The practice of finding robust strategies dates back at least to Wald [1950] and our modeling of information structures follows that of Blackwell [1953] . Our way of modeling robustness, by considering the worst case scenario, also is in line with the literature on ambiguity aversion, going back to Gilboa and Schmeidler [1989] . More recently, Epstein and Halevy [2019] run an experiment that documents ambiguity aversion on correlation structures.

More closely related, some papers consider strategies that are robust to unknown correlations in different contexts. In particular, Carroll [2017] studies a multi-dimensional screening problem, where the principal knows only the marginals of the agent's type distribution, and designs a mechanism that is robust to all possible correlation structures. With similar robustness concerns regarding the correlations of values between different bidders, He and Li [2020] study an auctioneer's robust design problem when selling a single indivisible good to a group of bidders.

A recent thread of related literature similarly studies how a decision maker combines forecasts from multiple sources. Levy and Razin [2020a] consider a model where the decision maker can consult multiple forecasts (posterior beliefs), but is uncertain about the information structures that generate these forecasts. Levy and Razin [2020b] study a maximum likelihood approach of combining forecasts, and derive a novel result that only extreme forecasts will be used. A key distinction is that the aforementioned papers consider robust optimality from an interim approach, while we study the decision maker's robustly optimal ex-ante decision plan. Finally, Arieli, Babichenko, and Smorodinsky [2018] also study features of the robustly optimal ex-ante decision plans. An important difference is that they study robust aggregation in a specific decision problem while we characterize the robustly optimal ex-ante decision plan in general decision problems. 4 Moreover, Arieli, Babichenko, and Smorodinsky [2018] study robust aggregation when the decision maker has limited knowledge of the distribution of posteriors/signals generated by each expert. In contrast, in order to focus our analysis on robustness concerns about correlations between information sources, we assume in our model that the decision maker possesses a perfect understanding of the marginal distributions of signals of each expert/information source in isolation.

An agent faces a decision problem Γ ≡ (Θ, ν, A, ̺) with binary state space Θ = {1, 2}, prior ν ∈ ∆Θ, finite action space A, and utility function ̺ : Θ × A → R. To later simplify notation, define u(θ, a) = ν(θ)ρ(θ, a), which represents the prior-weighted utility function.

A marginal experiment P j : Θ → ∆Y j maps each state to a distribution over some finite signal set Y j . The agent can observe the realizations of multiple marginal experiments {P j } m j=1 , but does not have detailed knowledge of the joint. To simplify notation, let Y = Y 1 × · · · × Y m denote the set of possible observations the agent can see. Thus, the agent conceives of the following set of joint experiments:

A strategy for the agent is a mapping σ : Y → ∆(A), and the set of all strategies is denoted by Σ. The agent's problem is to maximize his/her expected utility robustly among the set of possible joint experiments (i.e. considering the worst possible joint experiment):

θ∈Θ (y1,...,ym)∈Y P (y 1 , . . . , y m |θ)u(θ, σ(y 1 , . . . , y m )).

We call a solution to the problem a robustly optimal strategy. Clearly if only one experiment P : Θ → ∆(Y ) is considered (m = 1), V (P ) is the same as the classical value of a Blackwell experiment, and a robustly optimal strategy is just an optimal strategy for a Bayesian agent.

It will be useful to rank experiments according to how much information they convey. For that, we will use the Blackwell order, which we review in this subsection for completeness. The reader familiar with the Blackwell order may skip this part.

Definition 1. P : Θ → ∆(Y ) is more informative than Q : Θ → ∆(Z) if, for every decision problem, we have the inequality V (P ) V (Q). We also say that P Blackwell dominates Q.

There are two other natural ways of ranking experiments by informativeness. The first uses the notion of a garbling.

Definition 2. Q : Θ → ∆(Z) is a garbling of P : Θ → ∆(Y ) if there exists a function g : Y → ∆(Z) such that Q(z|θ) = y g(z|y)P (y|θ).

The function g is then called "the garbling".

Thus Q is a garbling of P when one can replicate Q by "adding noise" to the signal generated from P . Another notion of informativeness can be obtained by considering the strategies that are feasible given the experiment.

Definition 3. Given a set of actions A and an experiment P : Θ → ∆(Y ), the feasible set of actions given P is

The feasible set of an experiment specifies what conditional action distributions can be obtained by some choice of strategy σ (see Figure 1 ). One might then try to rank the informativeness of experiments according to the size of the feasible set.

Blackwell's Informativeness Theorem states that these three rankings of informativeness are equivalent (for a proof, see Blackwell [1953] or de Oliveira [2018] .)

Blackwell's Theorem. The following statements are equivalent 1. P is more informative than Q;

2. Q is a garbling of P ;

3. For all sets A, Λ Q ⊆ Λ P .

It should be clear from the definitions that the Blackwell order is not complete-it is possible for two experiments to be unranked. It is also possible for two different experiments to be equivalent, in the sense that each Blackwell dominates the other. For example, we can change the labels in the signal set while keeping the probabilities the same. This lack of uniqueness is easily remedied by considering equivalence classes of experiments when necessary.

In the next section, we will use some lattice properties of the Blackwell order. In particular, the concept of a Blackwell supremum will be useful.

Definition 4. Let P and Q be two arbitrary experiments. We say that R is the Blackwell supremum of P and Q if 1. R is more informative than P and Q 2. If S is more informative than P and Q then S is also more informative than R The definition generalizes immediately to any number of experiments. It is immediate from the definition that, if there are two Blackwell suprema, they must Blackwell dominate each other. Hence, by considering the equivalence class of equally informative experiments, we can say that the Blackwell supremum is unique. However, when |Θ| > 2 a Blackwell supremum may not exist (see Bertschinger and Rauh [2014] , example 18). Fortunately for us, the Blackwell order does form a lattice when |Θ| = 2. In particular, the existence of a Blackwell supremum will be useful.

Lemma 1. When |Θ| = 2, the Blackwell supremum always exists.

For a proof, see Bertschinger and Rauh [2014] , proposition 16. The Blackwell supremum can also be characterized using the feasible set. If R is the Blackwell supremum of P and Q, we know from Blackwell's Theorem that Λ R must contain both Λ P and Λ Q . Moreover, if S is more informative than P and Q, it must be more informative than R as well, so Λ S must also contain Λ R . Hence the feasible set of the Blackwell supremum should be the smallest feasible set containing Λ P ∪ Λ Q . Since the feasible set is always convex, a candidate feasible set is co(Λ P ∪ Λ Q ). If such an R exists satisfying Λ R = co(Λ P ∪ Λ Q ), it must be the Blackwell supremum (see Figure 1 ). The difficulty lies in showing that such an R exists, and that's where the proof of existence fails when |Θ| > 2.

Another useful property of the Blackwell order when |Θ| = 2 is that it is characterized by the feasible set with only two actions-P is more informative than Q if for any set A with |A| = 2, we have Λ Q ⊆ Λ P (see Blackwell [1953] , Theorem 10). We can use this property to obtain a characterization of the Blackwell supremum.

Lemma 2. Suppose |Θ| = 2 and |A| = 2. Then R is the Blackwell supremum of P and Q if and only if Λ R = co(Λ P ∪ Λ Q )

In fact, this result can be used to show the existence of the Blackwell supremum when |Θ| = 2.

Most of our focus will be on the robustly optimal strategies for the agent, but it will be helpful to first understand Nature's problem, of choosing the worst possible correlation structure.

First note that since the objective function is linear in both σ and P , and the choice sets of σ and P are both convex and compact, the minimax theorem implies that V (P 1 , . . . , P m ) = min P ∈P(P1,...,Pm) max σ∈Σ θ∈Θ (y1,...,ym)∈Y P (y 1 , . . . , y m |θ)u(θ, σ(y 1 , . . . , y m )).

That is, the value of the agent's maxmin problem equals the value of a minmax problem where Nature chooses an experiment in the set P(P 1 , . . . , P m ) to minimize a Bayesian agent's value in the decision problem.

An immediate observation is that if there exists a Blackwell least informative element in the set P(P 1 , . . . , P m ), it would solve Nature's problem-any other information structure would yield a higher value for the agent. Notice that every experiment in P(P 1 , . . . , P m ) must be more informative than every P j , since the projection into the jth coordinate defines a garbling. By Lemma 1, there is a Blackwell supremum-the least informative experiment that Blackwell dominates every P j . The only question that remains is whether this Blackwell supremum can be expressed as a joint distribution with marginals P 1 , . . . , P m . This is proved in the following lemma.

Lemma 3. For any collection of experiments {P j } m j=1 , there exists a Blackwell supremum P (P 1 , . . . , P m ) ∈ P(P 1 , . . . , P m ) so that for any P ∈ P(P 1 , . . . , P m ), V (P (P 1 , . . . , P m )) ≤ V (P ).

Immediately from the lemma, we have the following proposition.

Thus, the agent's value from using a robust strategy is the same as the value she would obtain if she faced a single experiment-the Blackwell supremum of all marginal experiments. Moreover, the Blackwell supremum depends only on the marginal experiments, and not on the particular decision problem.

While Proposition 1 provides a useful characterization of the agent's value, it still does not answer our main question: what are the robust strategies? This is because a strategy may be a best response to the Blackwell supremumP (P 1 , . . . , P m ), without being a robustly optimal strategy. In particular, the Blackwell supremum typically specifies a probability of zero for many signal realizations, so that any action is a best response to those signal realizations. But if we fix a strategy that chooses a particularly bad action after such a signal realization, it might be a best response for Nature to make it occur with positive probability. So we now turn to the question of finding the optimal robust strategies.

For any decision problem, one simple strategy that can always be used is to choose exactly one experiment Q ∈ {P 1 , . . . , P m } and play the optimal strategy that uses that information alone, ignoring the signal realizations of all other experiments. By choosing Q optimally, the agent achieves an ex-ante expected payoff of max j=1,...,n V (P j ), regardless of the particular actual joint experiment P ∈ P(P 1 , . . . , P m ). Theorem 1 shows that if the decision problem has binary action, this is indeed a robustly optimal strategy. Theorem 1. If |A| = 2, then V (P 1 , . . . , P m ) = V (P (P 1 , . . . , P m )) = max j=1,...,m V (P j ).

Proof. By Proposition 1, it suffices to show that V (P (P 1 , ..., P m )) = max j=1,...,m V (P j ). By Lemma 2, an experiment P is the Blackwell supremum of P 1 , . . . , P m if and only if

Now, the maximum utility achievable given Blackwell experiment P (P 1 , . . . , P m ) is V (P ) = max λ∈Λ P a,θ u(θ, a)λ(a|θ). Since the maximand is linear in λ, the maximum is achieved at an extreme point of Λ P . By (1), an extreme point of Λ P must belong to some Λ Pj . Hence, we have Figure 2 : The maximum is achieved at an extreme point

The idea of Theorem 1 can be visualized in Figure 2 for two marginal experiments. Each marginal Blackwell experiment P 1 , P 2 can be represented by Λ P1 , Λ P2 , the set of feasible stateaction distribution generated by the experiment. The corresponding Λ P for Blackwell supremum P is the convex hull of Λ P1 ∪ Λ P2 . Since the utility function is linear with respect to λ ∈ Λ P , the maximum is achieved at an extreme point, which belongs to either Λ P1 or Λ P2 , and thus can be achieved by using a single marginal experiment.

Theorem 1 allows us to solve any binary action decision problem in a fairly simple way: finding the best marginal information source and best responding to it. For decision problem with more actions, a robustly optimal strategy may need to use multiple information sources.

To understand how optimal robust strategies work in general, we start our discussion in Section 6.1 and Section 6.2 with a simple class of decision problems: those that can be written as a composition of multiple binary action problems. For these problems, we show that the optimal strategy can be obtained by simply "adding up" the optimal strategies for the isolated binary action problems. Finally, in Section 6.3 and Section 6.4, we show that this simple class of decision problems is exhaustive-any finite action decision problem can be decomposed into binary action decision problems.

We start with an example which showcases how an agent can benefit from using information from multiple sources when she faces a more complex problem.

Example 1. An investor can invest in two assets whose outputs depend on an unknown binary state θ ∈ {1, 2}. Outputs from each asset are given by:

The investor's payoff is the sum of outputs from both assets. This can be written as a decision problem with A = {I, N I} × {I, N I} and u(θ, a) = u 1 (θ, a 1 ) + u 2 (θ, a 2 ) where a 1 , a 2 ∈ {I, N I} and u 1 , u 2 are the outputs function given in the table above. 5 Suppose the investor has access to two experiments P 1 , P 2 : P 1 y 1 = 1 y 1 = 0 θ = 1 0.9 0.1 θ = 2 0.5 0.5 P 2 y 2 = 1 y 2 = 0 θ = 1 0.5 0.5 θ = 2 0.9 0.1

By paying attention to one experiment, for example P 1 , the optimal strategy is to invest in both assets if y 1 = 1 and only asset 2 if y 1 = 0. The expected payoff from this strategy is thus 0.9 · 1 + 0.1 · (−1) + 0.5 · 1 + 0.5 · 2 = 2.3. Now suppose the investor makes the investment decision of asset 1 based on experiment P 1 , and asset 2 based on experiment P 2 . Then for asset i = 1, 2, the optimal strategy is to invest iff y i = 1. "Adding up" these two strategies yield:

Invest in both Invest in asset 1 y 1 = 0 Invest in asset 2

No investment

This strategy guarantees an expected output of 0.9 · 2 + 0.1 · 0 + 0.5 · (−1) + 0.5 · 0 = 1.3 from each asset regardless of the correlations, which gives a total output of 2.6 > 2.3. So the agent strictly benefits from utilizing information from both information sources.

The strategy constructed in Example 1 is in fact a robustly optimal strategy. There are two special structures of this example: 1. The action space is a product space of binary action spaces; 2. The payoff function can be written in an additively separable form of binary action problems. These two features enable us to find a robustly optimal strategy in a fairly simple way: find the robustly optimal strategy for each binary action problem via Theorem 1, and then "add them up". We will generalize and formalize this idea in the next section.

Recall that we define a decision problem as Γ ≡ (Θ, ν, A, ρ), which can simply be summarized by (A, u) where u(θ, a) = ν(θ)ρ(θ, a). Since in this section we are going to alter the decision problems along the analyses, we use V (P 1 , ..., P m ; (A, u)) to denote the agent's value in decision problem (A, u).

Definition 5. Given a finite collection of decision problems (A 1 , u 1 ), ..., (A n , u n ), their composition, denoted by k ℓ=1 (A ℓ , u ℓ ), is a decision problem with action space A = (A 1 × . . . × A k ) and u(θ, a) = k ℓ=1 u ℓ (θ, a ℓ ).

Thus, the composition of decision problems is a single decision problem that has a specific additively separable structure. Notice that the decision problem in Example 1 is a composition of two decision problems A 1 = {I 1 , N 1 }, A 2 = {I 2 , N 2 } and u 1 (·, I 1 ) = (2, −1), u 1 (·, N 1 ) = (0, 0), u 2 (·, I 2 ) = (−1, 2), u 2 (·, N 2 ) = (0, 0).

Consider a finite collection of binary action problems, (A 1 , u 1 ), . . . , (A k , u k ), and consider the composition of these problems (Ā,Ū ) := k ℓ=1 (A ℓ , u ℓ ). In this decision problem, a simple, robust strategy that an agent can always use is to choose exactly one experiment Q ℓ ∈ {P 1 , . . . , P m } for every binary problem ℓ and play the optimal strategy that uses that information alone, ignoring the signal realizations of all other experiments. Furthermore, by choosing this Q ℓ optimally for each ℓ, regardless of the actual joint experiment P ∈ P(P 1 , . . . , P m ), the agent can achieve a total ex-ante utility of k ℓ=1 max j=1,...,m V (P j , (A ℓ , u ℓ )), which is typically strictly greater than max j=1,...,m V (P j , (Ā,Ū )). The following lemma shows that this is indeed the best that the agent can do in (Ā,Ū ). Moreover, let σ ℓ : Y → ∆A ℓ be a robustly optimal strategy for decision problem (A ℓ , u ℓ ).

is a robustly optimal strategy for decision problem k ℓ=1 (A ℓ , u ℓ ).

Proof. Using Proposition 1, V P 1 , . . . , P m ; k ℓ=1 (A ℓ , u ℓ ) = V P (P 1 , . . . , P m ); k ℓ=1 (A ℓ , u ℓ ) . By Theorem 1, we then have: To see the second statement, for any P ∈ P(P 1 , ..., P m ), the agent's payoff from strategy σ is θ∈Θ y1,...,ym P (y 1 , ..., y m |θ) k ℓ=1 u ℓ (θ, σ ℓ (y 1 , ..., y m )) = k ℓ=1 θ∈Θ y1,...,ym P (y 1 , ..., y m |θ)u ℓ (θ, σ ℓ (y 1 , ..., y m ))

Since σ guarantees the maxmin value regardless of P , it is a robustly optimal strategy.

Lemma 4 provides a simple solution to any problem that can be expressed as a composition of binary action problem: For each binary action problem, one can derive a robustly optimal strategy by paying attention to the best marginal experiment and best responding to it. Then assembling these strategies as in (2) yields a robustly optimal strategy for the composite problem.

In the previous section, we saw how a problem that is a composition of binary action problems can be solved by combining the solutions of each binary action problem. It is natural to ask then what problems can be decomposed into binary action problems. As we will see in the next section, it turns out that such a decomposition is possible for any decision problem. Before we get to that, we must define precisely what it means to decompose a decision problem and for that we need a notion of equivalence between decision problems.

For each decision problem (A, u), define the associated polyhedron containing all payoff vectors that are either achievable or weakly dominated by some mixed action: 6 H(A, u) = co{u(·, a) : a ∈ A} − R 2 + .

An example of H(A, u) is depicted in Figure 3 . 

Example 2. Consider two decision problems A 1 = {I 1 , N 1 }, u 1 (I 1 ) = (2, −1), u 1 (N 1 ) = (0, 0) and A 2 = {I 2 , N 2 }, u 2 (I 2 ) = (−1, 2), u 2 (N 2 ) = (0, 0). The associated polyhedra are the blue/red shaded areas in Figure 4 (a). Their composition (A 1 , u 1 ) (A 2 , u 2 ) consists of four actions, which are depicted in Figure 4(b) . Now we consider a three-action decision problem A = {a 1 , a 2 , a 3 } with u(a 1 ) = (−1, 2), u(a 2 ) = (1, 1), and u(a 3 ) = (2, −1). Notice that H(A, u) = H((A 1 , u 1 ) (A 2 , u 2 )) as the shaded area in Figure 4 (b), so (A, u) is equivalent to (A 1 , u 1 ) (A 2 , u 2 ). Therefore, (A 1 , u 1 ), (A 2 , u 2 ) is a decomposition of (A, u).

The analyses in the previous section give some hint on how to find robustly optimal strategies for general decision problems. If a given decision problem (A, u) admits a decomposition ( 3) Moreover, the robustly optimal strategy for k ℓ=1 (A ℓ , u ℓ ), defined in (2), allows us to characterize robustly optimal strategies for (A, u) as by the following lemma.

for all y ∈ Y.

Moreover, any such σ * is a robustly optimal strategy for (A, u).

Proof. For each y, k ℓ=1 u ℓ (σ ℓ (y)) ∈ H k ℓ=1 (A ℓ , u ℓ ) = H(A, u). So there exists σ * (y) such that u(σ * (y)) ≥ k ℓ=1 u ℓ (σ ℓ (y)). Moreover, since σ * guarantees a higher value in (A, u) than σ in k ℓ=1 (A ℓ , u ℓ ), and V (P 1 , . . . P m ; (A, u)) = V P 1 , . . . , P m ; k ℓ=1 (A ℓ , u ℓ ) , σ * is a robustly optimal strategy for (A, u).

If a decision problem (A, u) admits a decomposition into binary action problems, Lemma 4 and Lemma 5 characterize a set of robustly optimal strategies. However, it is not immediately clear what kind of decision problem admits a decomposition into binary action problems. Interestingly, we show by direct construction that, any decision problem admits a decomposition into binary action problems.

We are now ready to show that any decision problem can be decomposed into binary-action problems. Given an arbitrary decision problem (A, u), we start with some normalization to simplify exposition. First we remove all weakly*-dominated actions, 7 so that actions can be ordered as u(θ 1 , a 1 ) < u(θ 1 , a 2 ) < · · · < u(θ 1 , a n ), u(θ 2 , a 1 ) > u(θ 2 , a 2 ) > · · · > u(θ 2 , a n ).

Moreover, by adding a constant vector, we can normalize u(·, a 1 ) = (0, 0).

Definition 8. Given a decision problem (A, u), the canonical decomposition of (A, u) is the following collection of n − 1 binary action problems (A * 1 , u * 1 ), . . . , (A * n−1 , u * n−1 ):

The canonical decomposition can be visualized in Figure 5 for an example with four actions. To see that a canonical decomposition is a decomposition, first notice that for any i = 1, ..., n, u(a i ) =

. For the other direction, we need to show that for any δ ∈ {0, 1} n−1 , H(A, u) . The idea is that any nonconsecutive sum of u * ℓ (1) always lies in the interior of H(A, u), as illustrated in the example in Figure 5 (b).

Lemma 6. The canonical decomposition is a decomposition.

Proof. See Appendix A.2.

Finally Lemma 4, Lemma 5, and Lemma 6 immediately imply Theorem 2.

Theorem 2. Let (A * 1 , u * 1 ), . . . , (A * n−1 , u * n−1 ) be the canonical decomposition of (A, u), and σ * ℓ be a robustly optimal strategy for (A * ℓ , u * ℓ ). Then 1. V (P 1 , . . . , P m ; (A, u)) = n−1 ℓ=1 max j=1,...,m V (P j ; (A * ℓ , u * ℓ )). 2. There exists σ * : Y → ∆A such that

for all y.

Moreover, any such σ * is a robustly optimal strategy for (A, u).

Theorem 2 allows us to construct a robustly optimal strategy for any decision problem (A, u) in two steps: 1. For each (A * ℓ , u * ℓ ), only one (the best) marginal experiment needs to be considered, and an robustly optimal strategy σ * ℓ only need to be measurable with respect to this experiment; 2. For each realization y, pick a (mixed) action σ(y) ∈ ∆(A) such that u(σ * (y)) ≥ n−1 ℓ=1 u * ℓ (σ * ℓ (y)). The theorem features two interesting corollaries. if and only if V (P j ; (A * ℓ , u * ℓ )) ≤ max j ′ =j V (P j ′ ; (A * ℓ , u * ℓ )) for all ℓ = 1, ..., n − 1. Corollary 1 describes when an additional marginal experiment robustly improves the agent's value, which happens if and only if it outperforms all other marginal experiment in at least one of the canonically decomposed problem.

Corollary 2. For any decision problem (A, u) with |A| = n, and any collection of experiments {P j } m j=1 , there exists a subset of marginal experiments {P j } j∈S⊂{1,...,m} with |S| ≤ n − 1, such that V (P 1 , · · · , P m ; (A, u)) = V ({P j } j∈S ; (A, u)).

Corollary 2 implies that in any n-action decision problem, it is not beneficial to use more than n − 1 experiments. Theorem 1 can be viewed as a special case where n = 2.

Our previous analyses focus on binary state decision problems. A natural question is whether those results can be extended into environments with more states. Unfortunately, when |Θ| > 3, Theorem 1 no longer holds as we will show in Example 3 below. Since Theorem 1 is the building blocks of other previous results, the same methodology doesn't work for general state space. Nevertheless, using the duality approach, we show that the central idea of "decomposition" extends, in the sense that the original decision problem can still be decomposed into a collection of "subproblems", and in each of these "subproblem", only one information source needs to be used. We will explain in the end of the section why this decomposition result is weaker than what we have for binary state environment.

Example 3. Suppose that there are three states θ 1 , θ 2 , θ 3 . The marginal experiments are both binary with respective signals x 1 , x 2 , y 1 , y 2 , and given by Table 1 . Table 1 Intuitively, experiment P X tells the agent whether the state is θ 3 or not and experiment P Y tells the agent whether the state is θ 1 or not. Of course, upon observing both experiments, the agent obtains perfect information and so in any decision problem, the agent obtains the perfect information payoff.

Let A = {1, 0} and suppose that the utilities are as follows:

u(θ, a = 1) = 1 (θ ∈ {θ 1 , θ 3 }) − 1 (θ = θ 2 ) , u(θ, a = 0) = 0.

Then the agent's maxmin value from marginals P X , P Y is her perfect information payoff: 0 + 1 + 1 = 2. By using only one information source (either P X or P Y ), a = 0 is always a best response to any signal realization, so the agent's expected payoff is 0.

The example illustrates that even in a binary action decision problem, the agent would like to use more than one information sources, which draws a contrast with Theorem 1.

In what follows, we allow for a more general state space Θ with |Θ| < ∞. The notations we defined for binary state environment naturally generalize to general finite state environment, so we use the same notations without redefining them.

We first introduce the following definition of weak decomposition.

That is, the polyhedron induced by a weak decomposition needs to be contained in the payoff polyhedron induced by (A, u). The following theorem characterizes the idea of decomposition in general state environment.

Theorem 3. Fix a decision problem (A, u) and Blackwell experiments P 1 , . . . , P m . There exists a weak decomposition ((A * 1 , u * 1 ), . . . , (A * k , u * k )) of (A, u) for which

Proof. See Appendix A.3.

Theorem 3 can be seen as a generalization of Theorem 2 from the binary state environment. In particular, when the state space is binary, we showed in the previous section that by representing a decision problem equivalently as n−1 ℓ=1 (A * ℓ , u * ℓ ) corresponding to the canonical decomposition, indeed the constructed robustly optimal strategy in the latter decision problem guarantees the payoff n−1 ℓ=1 max j=1,...,n V (P j , (A * ℓ , u * ℓ )). Moreover, similar to Theorem 2, one can derive a robustly optimal strategy based on a weak decomposition in two steps: 1. For each (A * ℓ , u * ℓ ), the agent chooses a strategy that uses only the best marginal experiment and best responses to it, denoted by σ * ℓ (y); 2. For each realization y, the agent picks a (mixed) action σ(y) ∈ ∆(A) such that u(σ * (y)) ≥ k ℓ=1 u * ℓ (σ * ℓ (y)). The decomposition in Theorem 3 is weaker than Theorem 2 in the following aspects: 1. the decomposition here may depend on the marginal experiments, while the canonical decomposition in Theorem 2 only depends on the decision problem; 2. the decomposition is weak so the payoff polyhedrons of the original problem and decomposed problem might not exactly coincide, though they lead to the same value; 3. in each subproblem under the weak decomposition, it might contain more than two actions, while in the canonical decomposition in Theorem 2, every subproblem is a binary action problem.

Our main results in the binary state learning environment of Section 5 and Section 6 show that the robustly optimal strategy only uses a select-few information sources ignoring the signal realizations of all other information sources. Theorem 1 shows that when the decision problem involves just a binary choice, the robustly optimal strategy takes a particularly stark form where the decision maker pays attention to only a single information source. On the other hand, in general decision problems involving more actions, Theorem 2 demonstrates that robustly optimal strategy makes use of multiple information sources.

In Theorem 4 we show that the robustly optimal strategy again takes the form of paying attention to only a single information source in any decision problem when the state space is binary and the information sources are each individually sufficiently informative.

Let us again assume throughout this section that the state space is binary, i.e. θ ∈ Θ := {1, 2}. Consider any Blackwell experiment, P , with a corresponding signal space Y . Let P t denote the Blackwell experiment with signal space Y t , where P t (· | θ) consists of t i.i.d draws from P (· | θ): for all (y 1 , . . . , y t ) ∈ Y t and each θ ∈ {1, 2}, P t (y 1 , . . . , y t | θ) = t τ =1 P (y τ | θ).

Because we want to study the setting in which each information source is individually, very informative, we study the robustly optimal strategy of a decision maker when the decision maker has access to information sources P t 1 , . . . , P t m when t is large. To state our result, let us define the quantity w(P ) for a given Blackwell experiment P with signal space Y :

where KL(ν, ν ′ ) = y∈Y ν(y) log ν(y) ν ′ (y) is the Kullback-Leibler divergence between probability measures ν and ν ′ . Notice that w(P ) assigns a positive real number to each Blackwell experiment P and also coincides with the well-known Chernoff distance between the probability measures P 0 ∈ ∆(Y ) and P 1 ∈ ∆(Y ).

Theorem 4. Suppose that Θ = {0, 1} and let P 1 , . . . , P m be a finite collection of Blackwell experiments. Suppose that w(P 1 ) > max{w(P i ) : i = 1}.

Then given any decision problem (A, u), there exists some t * such that for all t ≥ t * , V (P t 1 , . . . , P t m ; (A, u)) = V (P t 1 ; (A, u)).

Proof. Let (A * 1 , u * 1 ), . . . , (A * n−1 , u * n−1 ) be the canonical decomposition of (A, u). Then Moscarini and Smith [2002] show that for each (A * ℓ , u * ℓ ) there exists some t * ℓ such that for all t ≥ t * ℓ ,

Now consider any t ≥ t * := max ℓ t * ℓ . By Theorem 2, V (P t 1 , . . . , P t n ; (A, u)) = n−1 ℓ=1 max j=1,...,n V (P t j ; (A * ℓ , u * ℓ )) = n−1 ℓ=1 max j=1,...,n V (P t 1 ; (A * ℓ , u * ℓ )) = V (P t 1 ; (A, u)).

A.1 Proof of Lemma 3

Proof. Consider a collection of experiments {P j } m j=1 and their Blackwell supremum P : Θ → ∆Z. Since P Blackwell dominates P j for all j, there exists garblings g j : Z → ∆(Y m ), j = 1, ..., m, such that for all y j ∈ Y j , P j (y j |θ) = z∈Z g j (y j |z)P (z|θ).

Construct the following experimentP : Θ → ∆(Y 1 × . . . × Y m ): P (y 1 , . . . , y m |θ) = z∈Z m j=1 g j (y j |z)P (z|θ).

Notice that −jP (y 1 , . . . , y m |θ) = z∈Z g j (y j |z)P (z|θ) = P j (y j |θ), soP ∈ P(P 1 , . . . , P m ). Moreover, (5) impliesP is a garbling of P so P Blackwell dominatesP . From the definition of Blackwell supremum,P Blackwell dominates P , so P =P ∈ P(P 1 , . . . , P m ).

Proof. We first show that (A * 1 , u * 1 ), . . . , (A * n−1 , u * n−1 ) is a weak decomposition. Suppose otherwise so that there exists some (a * 1 , . . . , a * n ) for which u * := u(a * 1 ) + · · · + u(a * n ) / ∈ H(A, u). By Corollary 11.4.2 of Rockafellar [1970] , there exists λ ∈ R 2 \ {0} such that

Note that λ ≥ 0 since otherwise sup v∈H(A,u) λ · v = +∞. Given the canonical decomposition, for any ℓ ′ > ℓ,

Let ℓ * = min {ℓ : λ · u ℓ (1, ·) ≤ 0} , where we use the convention that min ∅ = n. Then

But u(a ℓ * , ·) ∈ H(A, u), which contradicts Inequality (6). It remains to show that (A * 1 , u * 1 ), . . . , (A * n−1 , u * n−1 ) is an exact decomposition, but this is straightforward since it suffices to show that {u(a, ·) : a ∈ A} ⊆ H n−1 ℓ=1 (A * ℓ , u * ℓ ) . Clearly this is the case since for every action a k ∈ A, u(a k ) = k−1 ℓ=1 u * ℓ (1).

We start from proving the following lemma that will be useful in the proof.

Lemma 7. If (A ℓ , u ℓ ) k ℓ=1 is a weak decomposition of (A, u), V (P 1 , . . . , P m ; (A, u)) ≥ k ℓ=1 V (P 1 , ..., P m ; (A ℓ , u ℓ )).

Proof. Suppose P * is a solution to min P ∈P(P1,...,Pm) max σ:Y→∆A θ∈Θ y∈Y P (y | θ)u (θ, σ(y)) .

From minmax theorem, V (P 1 , . . . , P m ; (A, u)) = max σ:Y→∆A θ∈Θ y∈Y P * (y | θ)u (θ, σ(y)) .

For ℓ = 1, ..., k, define σ * ℓ be a solution to max σ ℓ :Y→∆A ℓ θ∈Θ y∈Y P * (y | θ)u ℓ (θ, σ ℓ (y)). P (y | θ)u ℓ (θ, σ ℓ (y)) ≤ k ℓ=1 max σ ℓ :Y→∆A ℓ θ∈Θ y∈Y P * (y | θ)u ℓ (θ, σ ℓ (y)) = k ℓ=1 θ∈Θ y∈Y P * (y | θ)u ℓ (θ, σ * ℓ (y))

From the definition of weak decomposition, there existsσ : Y → A such that u(θ,σ(y)) ≥ k ℓ=1 u ℓ (θ, σ * ℓ (y)) for all y and θ. Therefore, k ℓ=1 θ∈Θ y∈Y P * (y | θ)u ℓ (θ, σ * ℓ (y)) = θ∈Θ y∈Y P * (y | θ) k ℓ=1 u ℓ (θ, σ * ℓ (y)) ≤ θ∈Θ y∈Y P * (y | θ)u(θ,σ(y)) ≤ max σ:Y→∆A θ∈Θ y∈Y P * (y | θ)u(θ, σ(y)) = V (P 1 , ..., P m ; (A, u))

The statement of the lemma follows immediately from (7) and (8) Proof of Theorem 3. Consider a robustly optimal strategy σ * : Y → ∆(A) so V (P 1 , ..., P m ; (A, u)) = min P ∈P(P1,...,Pm) θ∈Θ y∈Y P (y | θ)u (θ, σ * (y)) .

By considering the dual of the above linear program (Kantorovich dual), we obtain:

V (P 1 , . . . , P m , (A, u)) = max φ1:Θ×Y1→R,...,φm:Θ×Ym→R m j=1 θ∈Θ yj ∈Yj P j (y j | θ) φ j (θ, y j )

Let φ * 1 , ..., φ * m be the solution to the dual program. Define the collection of decision problems {(A j , u j )} m j=1 such that A j = Y j and u j = φ * j . From the constraint (9), ((A 1 , u 1 ), . . . , (A m , u m )) forms a weak decomposition of (A, u).

Moreover, in every "sub-problem" (A j , u j ), by playing the strategy σ j (y 1 , ..., y j , ..., y m ) = y j for all (y 1 , ..., y m ), the agent achieves exactly a payoff of θ∈Θ (y1,...,ym)∈Y P (y 1 , . . . , y m | θ)u j (θ, σ j (y 1 , . . . , y m )) = θ∈Θ yj∈Yj P j (y j | θ)u j (θ, σ j (y 1 , . . . , y m )) = θ∈Θ yj∈Yj P j (y j | θ)φ * j (θ, y j ), which implies V (P 1 , ..., P m ; (A j , u j )) ≥ max ℓ=1,...,m V (P ℓ ; (A j , u j )) ≥ θ∈Θ yj ∈Yj P j (y j |θ)φ * j (θ, y j ). Summing over all j = 1, ..., m we have m j=1 V (P 1 , ..., P m ; (A j , u j )) ≥ m j=1 max ℓ=1,...,m V (P ℓ ; (A j , u j )) ≥ m j=1 θ∈Θ yj ∈Yj P j (y j | θ)φ * j (θ, y j ).

(10) Now from Lemma 7, equation (10) which implies V (P 1 , . . . , P m ; (A, u)) = m j=1 max ℓ=1,...,m V (P ℓ ; (A j , u j )).

Robust forecast aggregation

The blackwell relation defines no lattice

Equivalent comparisons of experiments. The annals of mathematical statistics

Robustness and separation in multidimensional screening

Blackwell's informativeness theorem using diagrams

Rapid, point-of-care antigen and molecular-based tests for diagnosis of sars-cov-2 infection

Ambiguous correlation. The Review of Economic Studies

Maxmin expected utility with a non-unique prior

Correlation-robust auction design. Working paper

Combining forecasts in the presence of ambiguity over correlation structures

The drowning out of moderate voices: a maximum likelihood approach to combining forecasts

The law of large demand for information

Convex analysis

Statistical decision functions