Richard Bradley, Franz Dietrich, Christian List Aggregating causal judgments Article (Accepted version) (Refereed) Original citation: Bradley, Richard, Dietrich, Franz and List, Christian (2014) Aggregating causal judgments. Philosophy of Science, 81 (4). pp. 491-515. ISSN 0031-8248 DOI: 10.1086/678044 © 2014 Philosophy of Science Association This version available at: http://eprints.lse.ac.uk/59613/ Available in LSE Research Online: May 2015 LSE has developed LSE Research Online so that users may access research output of the School. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LSE Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE Research Online website. This document is the author’s final accepted version of the journal article. There may be differences between this version and the published version. You are advised to consult the publisher’s version if you wish to cite from it. http://www.lse.ac.uk/researchAndExpertise/Experts/profile.aspx?KeyValue=r.bradley%40lse.ac.uk http://www.lse.ac.uk/researchAndExpertise/Experts/profile.aspx?KeyValue=c.list@lse.ac.uk http://press.uchicago.edu/ucp/journals/journal/phos.html http://dx.doi.org/10.1086/678044 http://www.philsci.org/ http://eprints.lse.ac.uk/59613/ Aggregating Causal Judgments� Richard Bradley LSE Franz Dietrich CNRS & UEA Christian List LSE April 30, 2014 Abstract Decision-making typically requires judgments about causal relations: we need to know the causal e¤ects of our actions and the causal relevance of various environ- mental factors. We investigate how several individuals�causal judgments can be aggregated into collective causal judgments. First, we consider the aggregation of causal judgments via the aggregation of probabilistic judgments, and identify the limitations of this approach. We then explore the possibility of aggregating causal judgments independently of probabilistic ones. Formally, we introduce the problem of causal-network aggregation. Finally, we revisit the aggregation of probabilistic judgments when this is constrained by prior aggregation of qual- itative causal judgments. 1 Introduction Decision making typically requires judgments about causal relations. Home owners need to know whether putting locks in their doors will make their houses more secure. Jurors need to know whether the accused is causally responsible for damages before theycanassesswhetherheor she is legally responsible. Aidagenciesneed toknowhow the di¤erent projects they can invest in will a¤ect the lives of those they are concerned about; and so on. Opinions about the nature and strength of causal relations often di¤er, even among experts. How to handle such diversity of opinion is the topic of this paper. We investigate the possibility of coherently aggregating di¤erent causal judgments into a single one that may be applied to the decision problem at hand. The basic set-up of this aggregation problem is the following. Individuals make judgments about both the nature of the causal relations between the variables in some set V = fV;W;:::g and the probabilities of these variables taking certain val- ues, unconditionally or conditionally on the values of other variables. The task is to �Previous versions of this paper were presented at a Choice Group seminar at the LSE, 10/2006, the 2006 conference of the Philosophy of Science Association, Vancouver, 11/2006, and the 2nd Philosophy of Biology at Dolphin Beach workshop, Kioloa, NSW, 8/2007. We thank the seminar and conference participants as well as two anonymous referees for very helpful comments and suggestions. 1 construct a single aggregate judgment on the causal relations between the variables and the relevant probabilities in a way that preserves, as much as possible, the in- formation contained in the individuals�judgments. For present purposes, we assume that individuals�judgments are coherent. More generally, one might allow (localized) incoherence in some of the individuals�judgments, or allow that individuals do not make judgments about all causal relations or all probabilities in question. Their judg- ments could be restricted to just certain variables relevant to the decision problem at hand, or further still, to just some subset of them or just one type of judgment: causal or probabilistic. The causal judgments of individuals could be represented in a number of di¤er- ent ways, but here we adopt the framework familiar from the work of Pearl [2000], Spirtes, Glymour, and Scheines [2000] (see also [1990]), and others, in which they are represented by Bayesian networks: directed acyclic graphs (DAGs) with associated conditional probabilities. We do not intend thereby to take a position on the nature of causal judgments, nor on the question of whether they can ultimately be analysed probabilistically.1 Anyone who holds the view that causal judgments are just features of probability judgments �for instance, that to judge that X causes Y is to hold cer- tain conditional probability judgments, such as that the conditional probability of Y given X exceeds its unconditional probability �is free to regard the Bayesian network representations as adding no information to the underlying probability judgments. In principle, we could also study the aggregation of causal judgments in another frame- work, for instance by representing causal judgments as counterfactual beliefs of the right kind. A DAG represents an individual�s qualitative judgment of causal relevance and irrelevance between variables. Her quantitative judgment of causal dependence is re�ected in the associated conditional probabilities for the values of these variables, given the values of any variables on which they are directly causally dependent. The individual�s unconditional probabilities for the values of the given variables can then be computed from their conditional probabilities together with the individual�s un- conditional probabilities for the parent variables. Consider the following example, which we will use at various points in the discussion. Example: Predicting famine. An aid agency wishes to do some advance planning for its famine relief operations and consults several experts in order to determine the risk of famine in a particular region. All agree that the relevant variables are R: rainfall, Y : crop yields, P: political con�ict, and of course F: famine. But they disagree both on the causal relations between the four variables and on the probabilities of the various values that these variables may take. All consider rainfall to be the main determinant of crop yield. However, while Expert 1 thinks that poor 1A probabilistic analysis may involve variables not included in the DAGs we consider. 2 R Y F P Expert 3 R Y F P Expert 2 R Y F P Expert 1 Figure 1: Expert causal judgments crop yield and disruptive political con�ict are the main causes of famine, Expert 2 thinks that the causal in�uence of political con�ict on famine is indirect, via the e¤ect of the disruption of agricultural production on crop yields. Expert 3 considers the relationship between political con�ict and famine to be more complicated still, with political con�ict both causing famine directly, by disrupting food distribution, and indirectly, through the in�uence on crop yields. These three opinions are represented in Figure 1 by a set of DAGs. The fact that individuals make both causal and probabilistic judgments raises the question of whether aggregation of both kinds of judgments should be conducted all at once or in two stages. In Section 2, we focus on what we call one-stage aggrega- tion, in which only probability judgments are aggregated. This approach draws on the standard literature on probabilistic opinion pooling (as reviewed, e.g., by Genest and Zidek [1986]). It is motivated mainly by the thought that the probability judg- ments of individuals re�ect their causal judgments in various ways and hence that the problem of causal judgment aggregation may be solved by constraining probability aggregation so as to preserve the causal information contained in probability judg- ments. Our verdict on this possibility, however, is largely negative. In Sections 3 to 5, we therefore pursue an alternative two-stage approach, aggregating �rst the qualita- tive causal judgments represented by the DAGs (Section 3) and then the quantitative probabilistic ones (Sections 4 and 5), on the assumption that a consensus about the causal relations between variables has been reached. Our analysis builds on results from the literature on binary judgment aggregation, which combines ideas from social choice theory with ideas from logic.2 2The formal logic-based analysis of binary judgment aggregation was introduced by List and Pettit [2002], [2004] and, in generalized form, by Dietrich [2007]. For a survey, see List and Puppe [2009]. 3 2 One-stage aggregation The problem of aggregating causal judgments has not received much attention, at least in the form presented here, but there is a vast literature on aggregrating expert opinions, mainly in statistics, and especially on aggregating expert probabilities. (As already mentioned, an excellent guide to that literature is the survey paper of Genest and Zidek [1986].) In this section, we draw on this literature to examine the possibil- ity of reasonable one-stage aggregation of several individuals�judgments. One-stage aggregation may be the only method available in cases in which individuals make no explicit causal judgments or their causal judgments are very incomplete. It is nat- ural, moreover, for those holding a probabilistic view about causation to rely on this method. But one-stage aggregation may also be motivated by the less controversial thought that the causal judgments of individuals are re�ected in (even if they are not reducible to) the relations between the individuals�unconditional and conditional probabilities for the relevant events. If this is so, then even on a non-reductionistic view about causal judgments one may hope that probability aggregation could be con- strained in a manner which preserves the causal judgments implicit in probabilistic ones. Broadly, there are three classical approaches to probability aggregation: linear pooling, geometric pooling, and supra-Bayesian approaches. The last approach is directed at a slightly di¤erent problem to ours �namely that of how an individual expert should modify his judgments in the light of the expressed judgments of other experts �and so we can set it aside. The other two approaches assume that the experts�opinions have reached an equilibrium state and that no further modi�cation of their viewpoints will take place before the relevant decision has to be made. Consider an opinion aggregation problem of the following form. A set of events is given (e.g., the event �high political con�ict�or �low political con�ict and famine�), and the task is to merge the probability judgments of individuals 1; :::;n (the �ex- perts�) on these events into an aggregate probability judgment on the events.3 So, we have to merge (individual) probability functions Pr1; :::;Prn into an (aggregate) probability function Pr. Many aggregation rules are imaginable. Formally, a proba- bility aggregation rule is a function that assigns to each n-tuple hPr1; :::;Prni (called a pro�le) of individual probability functions an aggregate probability function Pr. Of the various possible aggregation rules, linear pooling stands out for a variety of formal and conceptual reasons (e.g., Aczel and Wager [1980]; McConway [1981]; Lehrer and Wagner [1981]; and Dietrich and List [2007]). In particular, the following 3Events can be identi�ed with subsets of a given set of possible worlds. In many formal results, the set of events considered (i.e., the domain of the individual probability functions Pr1; :::;Prn and the aggregate probability function Pr) forms an algebra: the negation (complement) of any event is also an event, and the disjunction (union) of any two events is an event too. 4 axiomatic argument can be given. Let us require the aggregation rule to satisfy two seemingly natural conditions: Ind (Event-wise Independence) The aggregate probability of any given event X de- pends only on the individuals�probabilities of X (regardless of the individuals� probabilities of other events Y ).4 ZP (Zero Preservation) The aggregate probability of any given event X is zero when- ever all individuals give X zero probability.5 Applied to the event �famine�, for instance, Zero Preservation implies that famine is assigned an aggregate probability of zero if all individual experts assign a probability of zero to it. Event-wise independence implies that the aggregate probability of famine depends only on the probabilities that the individual experts assign to that event, not on the probabilities they assign to a certain level of crop yield, political con�ict, etc. (This is not to deny, of course, that individuals form their judgments regarding famine in the light of their judgments on crop yield, political con�ict etc.) Perhaps surprisingly, the only aggregation rules satisfying these two conditions are linear pooling functions: the aggregate probability of any event X is a (possibly weighted) arithmetic average of the individual probabilities of X, i.e., Pr(X) = w1Pr1(X)+ :::+wnPrn(X), where the weights w1; :::;wn � 0 add up to one and are the same for all events X (Aczel and Wager [1980]; McConway [1981]).6 Examples of linear pooling functions are equal-weight averaging (w1 = ::: = wn = 1=n) and dictatorial aggregation (some individual i has weight wi = 1 and all others have weight 0). The natural interpretation of these weights is in terms of judgmental competence, so that the choice of a particular linear pooling rule is dictated by considerations of the relative expertise of the individuals whose opinions are being sought. In this light, the fact that linear pooling rules assign weights to the opinions of individuals that are independent of the object of these opinions seems quite unsatisfactory, since individuals may be more or less expert on di¤erent kinds of issues and it would seem natural to vary the weights on their opinions to re�ect this. The aid agency would do well to consult climatologists, agriculturalists, and political scientists to reach a balanced view on the causes of famine, but in doing so it would be reasonable for it to give more weight to the climatologists�probabilities for rainfall than to 4Formally, Pr(X) is a function of Pr1(X); :::;Prn(X). This function may be a di¤erent one for di¤erent events X. 5Formally, Pr(X) = 0 if Pr1(X) = ::: = Prn(X) = 0. 6This result requires that the set of events considered forms an algebra (see footnote 3) and contains at least three events apart from the contradiction (empty set of worlds) and the tautology (set of all worlds). For a generalization, see Dietrich and List [2007]. 5 the political scientists�, but more weight to the political scientists�probabilities for political con�ict.7 Our main concern here, however, is with the question of whether linear pooling functions satisfactorily respect the causal knowledge of the individual experts. An in- dividual�s causal judgments will be re�ected in certain (unconditional or conditional) independencies in his or her probability judgments. For instance, if individual i be- lieves that events X and Y do not causally a¤ect each other but have a common cause Z (and have no other common causes, except for those that a¤ect X and Y via Z), then he or she will take X and Y to be probabilistically independent given Z,8 because any probabilistic correlation between X and Y is �screened o¤�by con- ditionalising on Z. A minimal requirement of respecting causal judgments is that at least unanimously held causal judgments be re�ected in the aggregate probability function Pr. That is, Pr should display at least those (conditional) independencies that are supported by unanimous causal judgments. For example, if all individuals judge X and Y to be causally independent with common cause Z, then that indepen- dence judgment should be re�ected in the aggregate probability function Pr. This motivates the following condition on probability aggregation: IP (Independence Preservation) For any given events X;Y;Z, if all individuals judge X and Y to be probabilistically independent given Z, then this conditional independence also holds under the aggregate probability function.9 Note that, by preserving all unanimous probabilistic independencies (conditional or unconditional), we may also preserve independencies that are not grounded in unanimous causal judgments. For instance, it may be that all individuals judge X and Y to be independent given Z, but some do so on the grounds of judging that X indirectly causes Y through Z, others on the grounds of judging that Y indirectly causes X through Z, still others on the grounds of judging that X, Y , and Z are en- tirely causally disconnected. Even in this case of causal disagreement, Independence Preservation requires the preservation of probabilistic conditional independence. The purely probabilistic informational basis of one-stage aggregation does not allow us to distinguish between di¤erent motivations (causal or other) behind probabilistic inde- pendencies. Without explicit causal information, all we can do is to use Independence Preservation to preserve all unanimous causal judgments, at the cost of preserving even those conditional independencies that are not causally motivated. It turns out, however, that Independence Preservation is violated by all linear pooling functions (unless some individual i gets maximal weight wi = 1) and thus by 7See Bradley [2000] for further discussion of this issue. On problems with the assignment of di¤erentiated expert rights, see also Dietrich and List [2008]. 8Formally, Pri(XY jZ) = Pri(XjZ)Pri(Y jZ). 9Formally, if, for all individuals i, Pri(Z) > 0 and Pri(XY jZ) = Pri(XjZ)Pri(Y jZ), then also Pr(Z) > 0 and Pr(XY jZ) = Pr(XjZ)Pr(Y jZ). 6 all non-dictatorial probability aggregation rules satisfying Event-wise Independence and Zero Preservation. This fact, proven in Genest and Wagner [1984], can be il- lustrated using our earlier example.10 Suppose the aid agency consults a couple of experts in order to determine the risk of famine in a particular region and that both experts agree that famine is caused by a combination of drought (the event of rain- fall R below some critical threshold) and political instability (the event of political con�ict P above some critical threshold), which undermines local solutions to poor crop yields. Furthermore, they agree that these two factors are both causally and probabilistically independent, at least in the short term. But they disagree on the probability of drought and of political instability. Since neither speaks with greater authority than the other, the aid agency calculates its probabilities for these events by taking the linear average of the judgments of the two experts. Let D and I, respectively, denote the occurrence of drought and political insta- bility in the region and DI their concurrence. Let Pr1, Pr2, and Pr, respectively, be the probability functions of Expert 1, Expert 2, and the aid agency. Since pooling happens by averaging, the aid agency will assign the following probabilities: Pr(D) = Pr1(D)+Pr2(D) 2 , Pr(I) = Pr1(I)+Pr2(I) 2 , Pr(DI) = Pr1(DI)+Pr2(DI) 2 = Pr1(D)Pr1(I)+Pr2(D)Pr2(I) 2 , where the last identity uses the experts�judgments that D and I are independent. These independence judgments are preserved if and only if Pr(DI) = Pr(D)Pr(I), i.e., if and only if Pr1(D)Pr1(I)+Pr2(D)Pr2(I) 2 = Pr1(D)+Pr2(D) 2 � Pr1(I)+Pr2(I) 2 . By multiplying both sides of this equation by 4, developing the product on the right- hand side, and simplifying, it follows that Pr1(D)Pr1(I)+Pr2(D)Pr2(I) = Pr1(D)Pr2(I)+Pr2(D)Pr1(I) , Pr1(D)(Pr1(I)�Pr2(I)) = Pr2(D)(Pr1(I)�Pr2(I)) , (Pr1(D)�Pr2(D))(Pr1(I)�Pr2(I)) = 0. The latter can hold only if Pr1(D) = Pr2(D) or Pr1(I) = Pr2(I); i.e., if the experts agree on the probability of drought or of political instability �which is not the case by assumption. So equal-weight linear pooling violates Independence Preservation. Similar violations can be constructed for non-equal weights (unless one individual i gets maximal weight wi = 1). 10Relatedly, Spirtes, Glymour, and Scheines [2000] observe that if we mix two or more probabil- ity distributions that each display certain conditional independence relations, the resulting mixture may fail to display those conditional independence relations. In particular, if we take two or more probability distributions that are each compatible with the same DAG (satisfying the causal Markov condition), their linear mixture may not be compatible with that DAG. 7 While we have focused on linear pooling as a way of aggregating probability judgments, the di¢ culty with preserving causal insights at the aggregate level is a very general one. Genest and Wagner [1984] have shown that Independence Preservation is violated by many (linear or non-linear) probability aggregation rules, including geometric averaging, the most prominent alternative to linear averaging. Thus the di¢ culty of preserving causal knowledge is not an artifact of requiring Event-wise Independence (a condition violated for instance by geometric averaging). Genest and Wagner [1984] interpret this �nding as evidence that Independence Preservation is not a reasonable condition. We would not like to go so far. In our view, those unanimous independence judgments that are grounded in unanimous causal judgments about the world should not be overruled. We take Genest and Wagner�s impossibility �nding not as a reason to abandon the goal of preserving judgments of independence, but as a reason to move to a two-stage approach that explicitly takes qualitative causal judgments into account. 3 Two-stage aggregation: the qualitative stage Under our proposed two-stage approach to aggregation, qualitative causal judgments are aggregated �rst, and quantitative, probabilistic ones only subsequently. Fur- thermore, the latter are aggregated in a way that di¤ers from standard probability aggregation, namely in a way that is constrained by the qualitative causal judgments formed at the �rst stage. This two-stage approach will satisfy a version of Indepen- dence Preservation restricted to unanimously held causal independencies. As before, let V = fV;W;:::g be a (�nite) non-empty set of variables. In our example of the aid agency above, V contains the variables R (rainfall), Y (crop yields), P (political con�ict), and F (famine). How can we represent qualitative judgments on how the variables in V are causally interrelated? Let us introduce a binary predicate symbol c to represent a causal relevance relation on V, where, for any two variables V and W in V, we write VcW to mean that V is directly causally relevant to W . (For brevity, we speak of �causal relevance�, but we mean �direct causal relevance�.11) In the case of the aid agency, an expert who thinks that rainfall is causally relevant to crop yield whereas political con�ict is not would hold that RcY but not that PcY . A causal relevance relation c is called acyclic if, for any �nite sequence V1;V2; : : : ;Vk of variables in V, it is not the case that V1cV2;V2cV3; : : : ;Vk�1cVk and VkcV1: A causal relevance relation c induces a directed graph whose vertices are the variables in V and whose edges (arrows connecting vertices) are de�ned as follows: for any two 11If we wanted to use our formal framework to capture indirect as well as direct causal relationships, we would have to invoke the transitive closure of the relation c. 8 variables V and W in V, there is an edge from V in the direction of W if and only if VcW . This graph is a directed acyclic graph (DAG) if c is an acyclic relation.12 A Bayesian network is a DAG with associated conditional probabilities: each variable in the graph is endowed with a conditional probability distribution given its parents in the graph. In this section, however, we set this quantitative information aside and focus on qualitative features of the DAG alone. In particular, we investigate how a group of individuals can arrive at an aggregate judgment on what the causal relevance relation c between the variables in V is. Consider a group of n individuals, labelled 1, 2, ..., n, each of whom holds a par- ticular judgment on the nature of the causal relevance relation between the variables in V. We write ci to denote the causal relevance relation according to individual i�s judgment. A combination of causal relevance relations across the n individuals is called a pro�le and denoted hc1;c2; :::;cni. A causal judgment aggregation rule is a function that assigns to each pro�le hc1;c2; :::;cni (in some domain of admissible pro�les) a single aggregate causal relevance relation c. To give some examples of causal judgment aggregation rules, consider the class of threshold rules. A threshold rule, with threshold k (where 1 � k � n), assigns to each pro�le hc1;c2; :::;cni the causal relevance relation c de�ned as follows: for any two variables V and W in V, VcW , at least k individuals have VciW . Examples of threshold rules are the majority rule (k = n+1 2 ), the union rule (k = 1) and the intersection (or unanimity) rule (k = n). Are these satisfactory causal judgment aggregation rules? It is easy to see that each of these three rules has a considerable defect. The majority and union rules fail to ensure acyclicity of the aggregate causal relevance relation, even when all individuals hold acyclic such relations. To see this, suppose the aid agency consults three experts, with the following individual judgments. They all agree that rainfall is causally relevant to crop yields, but they disagree on the causal relations between the other variables. Expert 1 thinks that crop yields are causally relevant to famine, which is causally relevant to political con�ict. Expert 2 thinks that famine is causally relevant to political con�ict, which is causally relevant to crop yields. Expert 3 thinks that political con�ict is causally relevant to crop yields, which is causally relevant to famine. In consequence, the causal relevance relation generated by the majority rule violates acyclicity: the relation contains a cycle from crop yields to famines to political con�ict to crop yields. It is obvious that the union rule has the same defect. The intersection (or unanimity) rule, by contrast, ensures acyclicity of the aggregate causal relevance relation, but may generate a sparse or even empty such relation, with few 12Note that our de�nition of acyclicity also rules out cycles of length k = 1, i.e., we cannot have V cV for any variable V . 9 variables deemed causally relevant to any others, whenever there are disagreements between the experts. Although threshold rules are particularly salient examples of causal judgment aggregation rules, they are by no means the only ones. So let us adopt an axiomatic approach and look for rules satisfying certain conditions. UD (Universal Domain) The causal judgment aggregation rule accepts as admissible any logically possible pro�le of acyclic causal relevance relations. AC (Acyclicity) The aggregate causal relevance relation is always acyclic. UB (Unbiasedness) For any two variables V and W in V, the aggregate judgment on whether V is causally relevant to W depends only on individual judgments on whether V is causally relevant to W (the independence requirement), and the aggregation rule is neutral between whether or not this is the case (the neutrality requirement).13 ND (Non-Dictatorship) There does not exist a �xed individual such that, for every admissible pro�le of causal relevance relations, the aggregate causal relevance relation is the one held by that individual. Although these conditions may seem natural at �rst sight, they are mutually inconsistent. Theorem 1 If V contains three or more variables, there exists no causal judgment aggregation rule satisfying UD, AC, UB, and ND. This result follows from an impossibility theorem by Dietrich and List [2010] concerning the aggregation of binary judgments on logically connected propositions. Qualitative causal judgments in the sense investigated here are simply binary (�true� / �false�) judgments on propositions of the form �variable V is (or is not) directly causally relevant to variable W�, where di¤erent such propositions constrain each other via the acyclicity constraint on causal relevance. For example, the set of propo- sitions f�V is directly causally relevant to W�, �W is directly causally relevant to U�, and �U is directly causally relevant to V�g is logically inconsistent relative to the acyclicity constraint. From the theory of judgment aggregation, we know that the aggregation of binary judgments on logically connected propositions is subject to a family of impossibility results broadly similar to Arrow�s impossibility theorem on preference aggregation, as surveyed in List and Puppe [2009] and, more recently, List [2012]. Our present theorem belongs to this family of results. What are the possible escape routes from this impossibility? 13Formally, for any V and W in V and any admissible pro�les hc1;c2; :::;cni and hc�1;c�2; :::;c�ni, if [for all i, V ciW if and only if not V c � iW ] then [V cW if and only if not V c �W ]. This formal statement is slightly weaker than the informal one in the main text but implies it under UD and AC. 10 The �rst route: relaxing universal domain. We may use a causal judgment aggregation rule that accepts, as admissible input, not all logically possible pro�les of acyclic causal relevance relations, but only those that meet an additional structural condition: namely pro�les which, informally speaking, re�ect a certain amount of co- hesion across di¤erent individuals�causal judgments. The additional structural condi- tion on pro�les might be such that the majority rule, or perhaps some other threshold rule, never generates an aggregate causal relevance relation violating acyclicity. In this case, the majority rule or threshold rule in question could be employed on this restricted domain of admissible pro�les. We consider two structural conditions of this kind. Temporal-order restriction. Suppose the individuals agree on the temporal order in which the events captured by the variables in V occur. Suppose further they agree that a variable V can be causally relevant to another variable W only if V strictly precedes W in this temporal order. Call any pro�le of causal relevance relations that is consistent with some such agreement temporal-order restricted. Formally, a pro�le is temporal-order restricted if there exists some weak order of the variables in V (a re�exive, transitive, and connected binary relation on V) such that, for every pair of variables V and W in V, if some individual judges V to be causally relevant to W (i.e., some i holds VciW) then V strictly precedes W in that order. For any such pro�le, the causal relevance relation generated by any threshold rule is acyclic, no matter how low or high the threshold is. The temporal constraint on what causal relevance judgments are deemed admissible guarantees the absence of any causal cycles at both the individual and aggregate levels. Unidimensional alignment. Another structural condition on pro�les that ensures acyclical causal judgments at the aggregate level � here under the majority rule (or any threshold rule with a higher threshold) �is unidimensional alignment (List [2003]; for generalizations, see Dietrich and List [2010]). A pro�le of causal relevance relations is called unidimensionally aligned if the individuals can be linearly ordered from left to right such that, for each pair of variables V and W in V, the individuals who hold that V is causally relevant to W (i.e., the individuals i with VciW) are all to the left or all to the right of those who hold that V is not causally relevant to W (i.e., the individuals i who do not have VciW).14 For any unidimensionally aligned pro�le, the causal relevance relation generated by the majority rule is acyclic and coincides with the causal relevance relation held by the median individual with respect to the left-right ordering of the individuals. (Or, if the number of individuals is even, it coincides with the intersection of the causal relevance relations held by the two median individuals.) 14This allows that, for some pairs of variables, the individuals a¢ rming causal relevance are to the left of those who do not, while for other pairs of variables the former are to the right of the latter. 11 It is an empirical question whether a group of experts �either before or after a period of joint deliberation �exhibits su¢ cient agreement in their causal judgments to meet the condition of temporal-order restriction or that of unidimensional align- ment. The kind of temporal agreement required for temporal-order restriction seems empirically plausible at least in some situations. The second route: relaxing acyclicity. A logically possible way to avoid the impossibility result of Theorem 1 is to give up the requirement that the aggregate causal relevance relation be acyclic. This, however, would constitute a major depar- ture from the consensus on the nature of causal relations, which are widely held to be acyclic (Pearl [2000]). The third route: relaxing unbiasedness. We may use a causal judgment ag- gregation rule that violates the condition of unbiasedness. There are two ways of relaxing this condition. A neutrality relaxation. If we relax the neutrality part of unbiasedness, there can exist pairs of variables V and W in V such that the aggregation rule is not neutral between whether or not V is causally relevant to W . Examples of causal judgment aggregation rules violating neutrality are threshold rules with any threshold k di¤erent from simple majority. It can be shown that a threshold rule is guaranteed to generate an acyclic causal relevance relation if and only if the threshold k exceeds m�1 m n, where m is the number of variables in V. Let us explain why this constraint on the threshold is su¢ cient to ensure acyclicity. Suppose, for a contradiction, that a threshold rule with a threshold k above m�1 m n generates a cyclical causal relevance relation. There must then exist an admissible pro�le hc1;c2; :::;cni of individually acyclic causal relevance relevance such that V1cV2; :::;Vm0�1cVm0; and Vm0cV1; where c is the aggregate causal relevance relation and V1, V2, ..., Vm0 are distinct variables in V, with 2 � m0 � m.15 Given the de�nition of our threshold rule, there must be at least k individuals with V1ciV2; at least k individuals with V2ciV3; and so on. Let N1, N2, ..., Nm0 be the sets of individuals i with V1ciV2; V2ciV3; ..., Vm0cV1, respectively. Since k exceeds m�1 m n, each of these sets must contain more than m�1 m n individuals. But, for combinatorial reasons, any m or fewer subsets of size greater than m�1 m n from a set of n individuals must have a non-empty intersection. For example, any two or fewer subsets of size greater than 1 2 n must have a non-empty intersection; any three or fewer subsets of size greater than 2 3 n must have a non-empty 15Aggregate cycles of length 1 (where V cV for some variable V in V) could never occur under any threshold rule, since no individual i will have V ciV (assuming acyclicity at the individual level). 12 intersection; and so on. Since m0 � m, this implies that there must exist at least one individual i who is contained in all of N1, N2, ..., Nm0, and he or she must then have V1ciV2; :::;Vm0�1ciVm0; and Vm0ciV1: But this contradicts individual acyclicity, which completes the argument. Conversely, if the threshold k does not exceed m�1 m n, it becomes possible to con- struct an admissible pro�le hc1;c2; :::;cni of individually acyclic causal relevance re- lations such that, for some set of distinct variables V1, V2, ..., Vm0, each of V1cV2, ..., Vm0�1cVm0, and Vm0cV1 is a¢ rmed by k or more individuals. For such a pro�le, the intersection of the relevant sets N1, N2, ..., Nm0 is empty, and hence the presence of a cycle in the aggregate causal relevance relation does not con�ict with acyclicity in the individual relations. Formally, our necessary and su¢ cient condition for acylicity (namely a threshold k above m�1 m n) can be derived from a characterization of con- sistent (but possibly incomplete) quota rules in judgment aggregation (Dietrich and List [2007]; the present combinatorial argument builds on a result in List [2001], ch. 9). Note that if the set of variables V is in�nite, only the intersection (or unanimity) rule guarantees acyclicity at the aggregate level. However, if V is �nite, then a supermajority rule with a suitably high threshold is su¢ cient. A problem with this approach, as noted above, is that it may lead to sparse or even empty aggregate causal relevance relations unless the disagreement between experts is limited. An independence relaxation. If we relax the independence part of unbiasedness, there can exist pairs of variables V and W in V such that the aggregate judgment on whether V is causally relevant to W depends not only on individual judgments on whether V is causally relevant to W but also on individual judgments involving other variables. Examples of causal judgment aggregation rules violating independence are sequential priority rules (adapted from List [2004]) and distance-based rules (adapted from Pigozzi [2006] and Miller and Osherson [2009]). Under a sequential priority rule, the di¤erent possible pairs of variables are considered one by one in a given order (which may be chosen, for example, by some criterion of epistemic priority). On each pair of variables V;W , the aggregate judgment is then determined as follows: (i) If the question of whether V is causally relevant to W is constrained (in light of the acyclicity requirement) by the aggregate judgments on pairs of variables considered earlier in the given order, then the aggregate judgment on V�s causal relevance to W is derived from those earlier constraints. (ii) If it is not constrained in this way, then the aggregate judgment on V�s causal relevance to W is made by applying some voting method, such as majority voting, to the individual judgments on V vis-à-vis W . 13 This approach guarantees acyclicity of the aggregate causal relevance relation, but at the expense of path-dependence: the order in which causal judgments are made on di¤erent pairs of variables may determine what the aggregate causal relevance relation will look like. An agenda setter on a committee of experts may strategically exploit this feature of the causal judgment aggregation rule by proposing an order of priority among di¤erent pairs of variables that is likely to give rise to aggregate causal judgments that he or she wants the committee to make. Under a distance-based rule, we �rst de�ne a distance metric between causal relevance relations. For instance, we could de�ne the distance between two relations c and c0 to be the number of ordered pairs of variables V;W on which c and c0 disagree, i.e., d(c;c0) = jf(V;W) 2 V2 : VcW < Vc0Wgj. (This is the Hamming distance.) We then de�ne the aggregate causal relevance relation for any given pro�le hc1;c2; :::;cni as an acyclic causal relevance relation c that minimizes the total distance from the individual causal relevance relations, i.e., where P i=1;:::;n d(c;ci) is minimal. Since there need not be a unique such distance-minimizing relation c, we may require an additional rule for breaking ties. Distance-based rules can be interpreted as generating compromise causal relevance relations. In some cases, a rather signi�cant departure from independence (as a property of the aggregation rule) may be desirable. Suppose, for instance, that all individuals agree that there is a causal path from V1 to V2, but di¤erent individuals disagree about the intermediate variables along this path. Some think that the path goes from V1 to V3 to V2; others think it goes from V1 to V4 to V2; still others think it goes from V1 to V5 to V2; and so on. In such a case, no single causal link between any pair of variables is accepted by more than a small minority of the individuals. If we used a causal judgment aggregation rule satisfying independence, say a threshold rule with a majority or even sub-majority threshold, we could end up with an empty aggregate causal relevance relation here, without any causal links at all. This would fail to do justice to the fact that all individuals agree that V1 is at least indirectly causally relevant to V2. We do not o¤er a concrete proposal on how to handle such cases, but mention it in order to illustrate why a signi�cant relaxation of independence may sometimes be justi�ed.16 The fourth route: relaxing non-dictatorship. A �nal way to avoid the impos- sibility result of Theorem 1 is to allow the aggregate causal relevance relation to be determined by an antecedently �xed individual: a �dictator�. But since we are nor- mally interested in the information contained in the causal judgments of more than one individual, this is not generally an attractive solution to our aggregation problem. Sometimes, however, it may be an acceptable compromise to appoint a trusted expert as the �dictator�for arriving at qualitative causal judgments �in the form of a DAG 16We are grateful to an anonymous referee for raising this point. 14 �while continuing �democratically� when it comes to determining the associated quantitative probability information at the second stage of our two-stage approach. Concluding remark Which of the di¤erent possible escape routes from the impos- sibility result of Theorem 1 is compelling depends on details of the decision problem at hand, the nature of the disagreements between the experts, the level of trust we place in them, whether we are worried about possible agenda manipulation, and other factors. In the next section, we assume that through one of the identi�ed routes � excluding that of relaxing acyclicity �a �consensus�on a causal relevance relation and thereby on a DAG has been achieved, and we turn to the question of how the associated conditional probabilities can be determined. 4 Preliminaries to the quantitative stage We have analysed how a group can arrive at an aggregate judgment on the qualitative causal relations between variables. We now assume that such an aggregate causal judgment has been reached through one of the routes just discussed and suppose that the group seeks to make an aggregate probability judgment (about the variables taking various values) that is compatible with the given aggregate causal judgment. In its most general form �ignoring for the moment the causal judgment �a prob- ability judgment can be represented by a joint probability function over the variables in V. For simplicity, we assume that each variable can take �nitely, or countably in- �nitely, many possible values. For example, we may distinguish between a particular number of possible levels of con�ict. Let us label the variables V1; :::;Vm. A joint probability function Pr assigns a probability Pr(v1; :::;vm) � 0 to each combination (v1; :::;vm) of values of these variables, where the sum of the probabilities is 1. The joint probability Pr(v1; :::;vm) can be factorised into the product of condi- tional probabilities:17 Pr(v1; :::;vm) = Pr(v1)Pr(v2jv1)Pr(v3jv2;v1) � � �Pr(vmjvm�1; :::;v1) = mY j=1 Pr(vjjv1; :::;vj�1): (1) In our famine example, where V1;V2;V3;V4 are the levels of rainfall, crop yield, po- 17In this expression, the conditional probability Pr(vjjv1; :::;vj�1) can be derived from the joint probability function Pr via the formula Pr(vjjv1; :::;vj�1) = Pr(v1;:::;vj) Pr(v1;:::;vj�1) (where Pr(v1; :::;vj) and Pr(v1; :::;vj�1) are marginal probabilities derived from Pr), provided that Pr(v1; :::;vj�1) 6= 0. If Pr(v1; :::;vj�1) = 0, then Pr(vjjv1; :::;vj�1) can be viewed either as unde�ned or as a primitive not derived from the function Pr. Under both interpretations, the factorisation (1) is still possible even if some Pr(v1; :::;vj�1) is zero whatever value is substituted for Pr(vjjv1; :::;vj�1) (because some other factor on the right-hand side of (1) will be zero, as will be the left-hand side of (1)). 15 litical con�ict, and famine, we have P(v1;v2;v3;v4) = P(v1)P(v2jv1)P(v3jv1;v2)P(v4jv1;v2;v3). When is the probability judgment expressed by Pr compatible with a given causal judgment? Recall that a causal judgment takes the form of a particular directed acyclic graph (DAG) over the variables V1; :::;Vm, with an arrow from Vj to Vk just in case Vj is considered causally relevant to Vk (VjcVk). For any variable Vj, we write PA(Vj) to denote the list of Vj�s parent variables in the graph, and we write pa(Vj) to denote any list of values of these parent variables.18 For instance, suppose that the consensus DAG in our famine example is as shown in Figure 2: no variable is causally relevant to rainfall (V1); only rainfall (V1) is causally relevant to crop yield (V2); only crop yield (V2) is causally relevant to political con�ict (V3); but both crop yield (V2) and political con�ict (V3) are causally relevant to famine (V4). Then PA(V1) contains no variable, PA(V2) contains precisely V1, PA(V3) contains precisely V2, and PA(V4) contains both V2 and V3. V2 V4 V1 V3 Figure 2: An illustrative aggregate causal judgment in the famine example Without loss of generality, suppose the variables V1; :::;Vm are labelled such that those with no parent come �rst, those with a parent but no grandparent come next, those with a grandparent but no great-grandparent come thereafter, and so on. If the original labelling V1; :::;Vm does not have this property, we can simply relabel the variables appropriately and replace the factorisation (1) by one using the new labelling. So the parents of any variable Vj come before Vj.19 But of course not all of V1; :::;Vj�1 need to be causally relevant to Vj. For instance, in our famine example V2 but not V1 is (directly) causally relevant to V3. Since causally irrelevant variables should have no e¤ect on Vj, the conditional probability Pr(vjjv1; :::;vj�1) should be insensitive to the non-parental values among v1; :::;vj�1. In other words, it should be 18So pa(Vi) is any instantiation of PA(Vj). 19Formally, PA(Vj) is a sublist of (V1; :::;Vj�1). 16 sensitive only to the sublist pa(Vj) of v1; :::;vj�1. Formally, Pr(vjjv1; :::;vj�1) = Pr(vjjpa(Vj)). (2) We say that the probability judgment Pr is compatible with the given aggregate causal judgment if identity (2) holds for every variable Vj and every combination of values v1; :::;vj with Pr(v1; :::;vj�1) 6= 0. (This compatibility requirement is the ordered Markov condition, which is, in turn, equivalent to the parental Markov condition: any variable is independent of its non-descendants given its parents.20) The joint probability (1) then reduces to Pr(v1; :::;vm) = mY j=1 Pr(vjjpa(Vj)). (3) For instance, in our famine example, P(v1;v2;v3;v4) = P(v1)P(v2jv1)P(v3jv2)P(v4jv2;v3). 5 Two-stage aggregation: the quantitative stage As we seek to reach an aggregate probability judgment that is compatible with the aggregate causal judgment, the probability function Pr should satisfy the decompo- sition (3). This requirement is usually violated by standard, one-stage probability aggregation, where the individual probability functions Pr1(v1; :::;vn); :::;Prn(v1; :::;vn) (4) are directly merged into an aggregate probability function Pr(v1; :::;vn). On our proposed two-stage approach, by contrast, Pr is explicitly constructed so as to meet the necessary decomposition requirement. Let the aggregate causal relevance relation (the �consensus�DAG) be given, and consider the decomposition constraint (3) relative to that relation. The quantitative stage of our approach now consists in (i) determining each factor of the decomposition, Pr(vjjpa(Vj)), through separate probability aggregation, and (ii) computing the joint probability function Pr(v1; :::;vm) as the product of these separately determined factors. 20There are multiple equivalent ways to de�ne �compatibility�of Pr with the DAG. In addition to the ordered Markov condition and the parental Markov condition, a third de�nition (chosen by Pearl) is given in terms of the validity of the decomposition (3). On the equivalence of these de�nitions, see Theorems 1.2.6 and 1.2.7 in Pearl [2000]. 17 More formally, for every variable Vj in V and every combination pa(Vj) of parental values, we merge the individual conditional probability functions Pr1(vjjpa(Vj)); ::;Prn(vjjpa(Vj)) (5) into an aggregate conditional probability function Pr(vjjpa(Vj)). These separate ag- gregationexercises caneachbeperformed, for example, by linearorgeometricpooling. In our famine example, this involves merging Pr1(v1); ::;Prn(v1) into Pr(v1); for any �xed v1, merging Pr1(v2jv1); ::;Prn(v2jv1) into Pr(v2jv1); for any �xed v2, merging Pr1(v3jv2); ::;Prn(v3jv2) into Pr(v3jv2); for any �xed v2;v3, merging Pr1(v4jv2;v3); ::;Prn(v4jv2;v3) into Pr(v4jv2;v3). (6) The present approach has several distinctive properties, to which we now turn. Compatibility with causal judgments. The aggregate probability function Pr, given by (3), is automatically compatible with the aggregate causal relevance re- lation, represented by the appropriate DAG. In particular, Pr respects the causal Markov condition: any variable Vj is probabilistically independent of all its causal non-descendants given its causal parents. In our famine example, Pr makes political con�ict independent of rainfall conditional on crop yield,21 and famine independent of rainfall conditional on crop yield and political con�ict.22 The causally motivated conditional independencies are thus respected, whereas other conditional independen- cies may or may not arise. By contrast, standard one-stage probability aggregation does not generally produce an aggregate probability judgment that is consistent with any prior judgments of causal relevance. Preservation of causal (conditional) independencies. What about the preser- vation of unanimously held independencies between variables (both conditional and unconditional ones)? Suppose, for example, that all individuals consider variables Vj and Vk probabilistically independent given Vl. 23 Does the aggregate probability judgment preserve this conditional independence? As we have seen, for standard probability aggregation methods the answer is usually negative. Under our approach, by contrast, causal conditional independencies are preserved. To see why, suppose all individuals judge Vj and Vk to be probabilistically independent given Vl because of a unanimous agreement that Vj�s only causal parent is Vl and that Vk is not a causal descendant of Vj. Then the aggregate probability judgment respects this indepen- dence: according to Pr, Vj and Vk are also probabilistically independent given Vl. 24 21Formally, Pr(v1;v3jv2) = Pr(v1jv2)Pr(v3jv2). 22Formally, Pr(v1;v4jv2;v3) = Pr(v1jv2;v3)Pr(v4jv2;v3). 23Formally, Pri(vj;vkjvl) = Pri(vjjvl)Pri(vkjvl). 24Formally, Pr(vj;vkjvl) = Pr(vjjvl)Pr(vkjvl): 18 The reason is that, so long as a �reasonable�causal judgment aggregation rule is used at the �rst stage of our two-stage process, we will have arrived at an aggregate causal relevance relation that re�ects the unanimous opinion on the causal relations between Vj;Vk;Vl; the second stage then leads to a probability function that is compatible with this aggregate causal relevance relation.25 Variable expert weights In contrast to one-stage linear or geometric pooling of probabilities, our approach is compatible with the assignment of di¤erent weights to di¤erent experts�judgments so as to re�ect their di¤erent levels of competence on the relevant issues. Once the consensus DAG for the causes of famine is given, for instance, greatest weight can be assigned to the climatologist�s judgment in the aggregate probability for rainfall (Pr(v1)), to the agriculturalist�s judgment in the aggregate conditional probability for crop yield, given a level of rainfall (Pr(v2jv1)), and to the political scientist�s judgment for the aggregate conditional probability for political con�ict, given crop yields (Pr(v3jv2)). In the limit, an aggregate judgment on the probability of famine might be constructed using only the consensus DAG and the judgments of the relevant expert on each variable. But as the literature on epistemic democracy shows, there can be advantages to consulting a range of opinions provided that all who are consulted are su¢ ciently competent. Instead, the two-stage method can be used to optimise the balance between competence and diversity of opinion by suitable assignment of weights in the aggregation of probabilities for each variable. Complexity reduction. Our two-stage approach subdivides an m-dimensional probability aggregation problem into several one-dimensional ones. Rather than ag- gregating joint probability functions over the vector V1; :::;Vm (of the form (4)), we aggregate conditional probability functions of a single variable Vj (of the form (5)). But we face several such aggregation problems, namely one for each variable Vj and each �xed combination of parent values paj(Vj). This is less demanding on the side of individual inputs, as long as the aggregate DAG is not too rich in causal connec- tions. To illustrate this complexity reduction, consider our famine example again, and suppose for simplicity that each variable can take only two values, i.e., there are only two levels of rainfall, two levels of crop yield, and so on. If we were to aggregate the joint probability functions Pri(v1;v2;v3;v4) directly, each individual would have 25Note that unanimously held conditional independencies that are not causal (i.e., which are not implied by the structure of the DAG, together with the Markov condition) are not generally preserved under our approach. However, in the important special case in which all individuals hold the same DAG (i.e., the causal structure is not in dispute) and satisfy faithfulness in relation to their probability judgments, there will not be any unanimously held independencies between variables (conditional or unconditional) that are not implied by the DAG, and hence all such unanimous independencies will be preserved in the aggregation (assuming the unanimous DAG is also the aggregate DAG). We are grateful to an anonymous referee for pressing this point. 19 to submit 24�1 = 15 probability values (there are 24 possible combinations of values (v1;v2;v3;v4), but once the probabilities of 24�1 of them are speci�ed, the remaining probability is given by one minus the sum of the rest). Specifying any one of these 15 probabilities is hard in practice: what, for example, is the probability of a combi- nation of high rainfall and low crop yield and low political con�ict and high famine? Under our approach, by contrast, each individual has to submit only probabilities or conditional probabilities of singular events, like the probability of high rainfall or the conditional probability of high crop yield given low rainfall. The number of required probabilities is smaller than 15 in our example. Using (6), we can see that it equals 4X j=1 �number of possible values of Vj minus 1� ��number of possible parent values pa(Vj)� = (2�1)�1+(2�1)�2+(2�1)�2+(2�1)�22 = 1+2+2+4 = 9. Types of informational input. Our approach not only reduces the complexity of the aggregation problem; it also uses a di¤erent informational input, compared to one-stage probability aggregation. First, we use the additional information of the individuals�qualitative causal judgments �the information aggregated at the �rst stage of our two-stage process. Second, an interesting question arises about the nature of the probabilistic input used at the second stage. Consider a variable Vj with parents PA(Vj) in the aggregate causal relevance relation (DAG). Since that relation is the result of the aggregation of individual causal relevance relations, some individuals may not agree that the variables listed in PA(Vj) are the correct causal parents of Vj. They may think instead that not all of these variables are causally relevant to Vj or that some other variables are relevant, despite not being included in PA(Vj). But then, what does such an individual�s conditional probability Pri(vjjpa(Vj)) � the informational input at the second stage �represent? For instance, individual 1�s causal relevance relation may be of the form V1 ! V2 ! V3, while all other individuals�causal relevance relations may be of the form V1 ! V2 V3, which might then also become the aggregate relation. Here, individual 1 disagrees with everyone else about both PA(V2) and PA(V3): How should we interpret individual 1�s conditional probabilities Pr1(v2jpa(V2)) and Pr1(v3jpa(V3)) at the second stage of our two-stage aggregation process? Similarly, what is someone supposed to answer to the question �how probable is high political con�ict given low crop yield?�if he or she actually thinks that famine rather than crop yield is causally relevant to political con�ict? There are at least three possible interpretations of an individual�s conditional probabilities in such cases: an evidential, a causal, and a hypothetical one. We begin 20 with a discussion of the �rst two interpretations. To give an informal example, sup- pose for a moment that, according to individual i�s qualitative causal judgment, the variables in PA(Vj) are not causally relevant to Vj but nonetheless probabilistically correlated with Vj. Then, if Pri(vjjpa(Vj)) represents an evidential conditional prob- ability, its value is sensitive to pa(Vj) (by probabilistic dependence), whereas if it is understood as a causal conditional probability, its value does not depend on pa(Vj) (by causal independence). More generally, an evidential conditional probability repre- sents an agent�s belief, given a particular evidential supposition (here the supposition that the values of the variables in PA(Vj) are pa(Vj)). A causal conditional proba- bility represents an agent�s belief, given a particular counterfactual supposition (its content again being that the values of the variables in PA(Vj) are pa(Vj)). This causal conditional probability can be understood as resulting from supposing an external in- tervention in our system that sets the values of the variables PA(Vj) to pa(Vj). The two kinds of conditional probability take the same value if PA(Vj) consists of the cor- rect causal parents according to individual i�s qualitative causal judgment, but may di¤er in general. Formally, in the evidential case, Pri(vjjpa(Vj)) is a standard conditional proba- bility, which can be derived from individual i�s joint probability function over the variables using Bayes�s rule.26 In the causal case, Pri(vjjpa(Vj)) can be calculated as follows (and is sometimes denoted Pri(vjjjpa(Vj)) or Pri(vjnpa(Vj)) to mark the di¤erence; see also Pearl [2000]). (i) Modify individual i�s causal relevance relation by deleting relevance links from any variable to any of the variables in PA(Vj). So, the variables in PA(Vj) have no parents left (intuitively, they are set by an external intervention). (ii) Modify the probability assignment to the variables PA(Vj) by letting them take the values pa(Vj) with probability one (unconditionally, since these variables no longer have any parents). (iii) Relative to this new �post-intervention�Bayesian network, compute the prob- ability that Vj takes the value vj in the usual way. This probability then co- incides with the causally understood conditional probability Pri(vjjpa(Vj)) (= Pri(vjjjpa(Vj))) of the initial Bayesian network.27 26Provided that Pr(pa(Vj)) 6= 0. 27To be precise, this causal conditional probability measures the possibly indirect causal e¤ect of the variables PA(Vj) on Vj, according to individual i�s judgment. There may be such an e¤ect even if none of the variables in PA(Vj) are directly causally relevant to Vj according to individual i�s DAG, since Vj may depend on these variables indirectly. Note that PA(Vj) contains the parents of Vj according to the aggregate DAG; these need not be parents of Vj according to individual i�s DAG. If we wanted to de�ne a direct causal conditional probability of vj, given pa(Vj), according to individual i�s DAG, we would have to re-do the calculation described in steps (i) to (iii) with the set of variables PA(Vj) replaced by its subset consisting only of variables that are also parents 21 Let us now turn to the third possible interpretation of the individuals�conditional probabilities submitted at the second stage of our two-stage aggregation process: the hypothetical interpretation. Here, individuals are asked to entertain the hypothesis that the aggregate causal relevance relation is correct and to express conditional probabilities based on this hypothesis. It is unclear, however, whether and how Pri(vjjpa(Vj)) can be derived from the individuals�Bayesian networks. This raises a number of challenges for future research. 6 A �nal challenge The �rst stage of our two-stage approach restricts the second by requiring the ag- gregate probability function to display certain conditional independencies mandated by the aggregate causal relevance relation. Roughly, the fewer causal links are ac- cepted at the �rst stage, the more probabilistic independencies are enforced at the second stage. In the extreme case in which no variable is deemed causally relevant to any other variable, the second stage produces an aggregate probability judgment according to which every variable is probabilistically independent of every other. Ac- cepting few causal connections has the advantage of reducing the complexity of the probability aggregation problem at the second stage but the potential disadvantage of over-restricting the admissible probability assignments. This restriction is prob- lematic when the sparse set of accepted causal links between variables is not a result of the individuals believing in sparse causal links but a result of a causal judgment aggregation rule setting a high threshold for the acceptance of causal links. We are thus faced with a trade-o¤between (i) the goal of reducing the complexity of the probability aggregation problem (achieved via a high threshold for accepting causal links between variables) and (ii) the goal of representing causal e¤ects between variables when there are such e¤ects (achieved via a low threshold for accepting causal links). We have argued that a high threshold for accepting causal links may help to prevent a cyclical aggregate causal relevance relation, whereas in other situations, particularly if the variables can be put into a temporal order, even a low threshold (perhaps lower than the majority threshold) guarantees acyclicity. We leave it as a challenge for future research to come up with causal judgment aggregation rules that perform well on both aspects of this trade-o¤: being neither too permissive nor too restrictive in accepting causal links while avoiding cyclical causal judgments. of Vj according to i�s DAG. This subset may be empty, in which case the direct causal conditional probability of vj, given pa(Vj), coincides with the unconditional probability of vj. 22 References [1980] Aczél, J., and C. Wagner. 1980. �A characterization of weighted arithmetic means.�SIAM Journal on Algebraic and Discrete Methods 1: 259-260. [2007] Bradley, R. 2007. �Reaching a Consensus.�Social Choice and Welfare 29(4): 609-632. [2007] Dietrich, F. 2007. �A generalised model of judgment aggregation.� Social Choice and Welfare 28(4): 529-565. [2007] Dietrich, F., and C. List. 2007. �Judgment aggregation by quota rules: major- ity voting generalized.�Journal of Theoretical Politics 19(4): 391-424. [2008] Dietrich, F., and C. List. 2008. �A liberal paradox for judgment aggregation.� Social Choice and Welfare 31(1): 59-78. [2010] Dietrich, F., and C. List. 2010. �The impossibility of unbiased judgment ag- gregation.�Theory and Decision 68(3): 281-299. [2010] Dietrich, F., and C. List. 2010. �Majority voting on restricted domains.�Jour- nal of Economic Theory 145(2): 441-466. [2007] Dietrich, F., and C. List. 2007. �Opinion pooling on general agendas.�Working paper, London School of Economics. [1984] Genest, C., and K. Wagner. 1984. �Further Evidence against Independence Preservation in Expert judgment Synthesis.�Technical Report 84-10, Dept. of Statistics and Actuarial Science, University of Waterloo. [1986] Genest, C., and J. V. Zidek. 1986. �Combining Probability Distributions: A Critique and Annotated Bibliography.�Statistical Science 1(1): 113-135. [1981] Lehrer, K., and C. Wagner. 1981. Rational Consensus in Science and Society, Dordrecht: Reidel. [2001] List, C. 2001. Mission Impossible? The Problem of Democratic Aggregation in the Face of Arrow�s Theorem. DPhil thesis, Oxford University. [2002] List, C., and P. Pettit. 2002. �Aggregating sets of judgments: an impossibility result.�Economics and Philosophy 18(1): 89-110. [2004] List, C., and P. Pettit. 2004. �Aggregating sets of judgments: two impossibility results compared.�Synthese 140(1-2): 207-235. [2003] List, C. 2003. �A Possibility Theorem on Aggregation over Multiple Proposi- tions.�Mathematical Social Sciences 45(1): 1-13; see also the corrigendum in Mathematical Social Sciences 52(1): 109-110 (2006). 23 [2004] List, C. 2004. �A Model of Path-Dependence in Decisions over Multiple Propo- sitions.�American Political Science Review 98(3): 495-513. [2012] List, C. 2012. �The theory of judgment aggregation: An introductory review.� Synthese 187(1): 179-207. [2009] List, C., and C. Puppe. 2009. �Judgment aggregation: a survey.�In Oxford Handbook of Rational and Social Choice, ed. P. Anand, C. Puppe, and P. Pattanaik, 457-482, Oxford: Oxford University Press. [1981] McConway, K. J. 1981. �Marginalization and Linear Opinion Pools.�Journal of the American Statistical Association 76(374): 410-414. [2009] Miller, M. K., and D. Osherson. 2009. �Methods for distance-based judgment aggregation.�Social Choice and Welfare 32(4): 575-601. [2006] Pauly, M., and M. van Hees. 2006. �Logical constraints on judgment aggrega- tion.�Journal of Philosophical Logic 35(6): 569-585. [2000] Pearl, J. 2000. Causality: Models, Reasoning and Inference. Cambridge: Cam- bridge University Press. [2006] Pigozzi, G. 2006. �Belief merging and the discursive dilemma: an argument- based account to paradoxes in judgment aggregation.�Synthese 152(2): 285- 298. [1990] Glymour, C., P. Spirtes, and R. Scheines. 1990. �Independence Relations Pro- duced by Parameter Values in Causal Models.�Philosophical Topics 18(2): 55-70. [2000] Spirtes, P., C. Glymour, and R. Scheines. 2000. Causation, Prediction and Search, 2nd ed., Cambridge MA: MIT Press. [1982] Wagner, C. 1982. �Allocation, Lehrer Models, and the Consensus of Probabil- ities.�Theory and Decision 14(2): 207-220. [1985] Wagner, C. 1985. �On the Formal Properties of Weighted Averaging as a Method of Aggregation.�Synthese 62(1): 97-108. 24 Bradley_Aggregating judgments_2015_cover Bradley_Aggregating judgments_2015_author