Cosm_confusion_final 1 August 7, 11, 13, 2009; October 10, 16, November 5, 2009; January 13, March 31, April 5, 2010 Cosmic Confusions: Not Supporting versus Supporting Not- John D. Norton1 Department of History and Philosophy of Science Center for Philosophy of Science University of Pittsburgh For updates, see www.pitt.edu/~jdnorton Bayesian probabilistic explication of inductive inference conflates neutrality of supporting evidence for some hypothesis H (“not supporting H”) with disfavoring evidence (“supporting not-H”). This expressive inadequacy leads to spurious results that are artifacts of a poor choice of inductive logic. I illustrate how such artifacts have arisen in simple inductive inferences in cosmology. In the inductive disjunctive fallacy, neutral support for many possibilities is spuriously converted into strong support for their disjunction. The Bayesian “doomsday argument” is shown to rely entirely on a similar artifact, for the result disappears in a reanalysis that employs fragments of inductive logic able to represent evidential neutrality. Finally, the mere supposition of a multiverse is not yet enough to warrant the introduction of probabilities without some factual analog of a randomizer over the multiverses. 1. Introduction One cannot have any doubt of the many successes of the Bayesian project of explicating inductive inferences. Its successes have been widely and justly celebrated. What has received 1 For helpful discussion, I thank Jeremy Butterfield, Eric Hatleback, Wayne Myrvold and participants at the conference “Philosophy of Cosmology: Characterising Science and Beyond” St. Anne’s College, Oxford September 20-22, 2009. 2 less attention are the limits of these successes. The purpose of this note is to describe one circumstance in which Bayesian analysis fails. This is the extreme case of complete neutrality of evidential support. The Bayesian system is unable to distinguish it cleanly from strongly disfavoring evidence. The system tries to represent this complete neutrality with a broadly spread probability measure that ends up assigning a very low probability to each possibility. The trouble is that this same very low value of probability is correctly used when that same possibility is strongly disfavored by the evidence or, equivalently, its negation is strongly favored. In short, for an hypothesis H, a Bayesian analysis conflates the cases of evidence not supporting H with evidence supporting not-H. If one insists that probabilistic notions should be used in cases of evidential neutrality, one ends up assigning neutral support the formal properties of evidential disfavoring. Since evidential neutrality warrants fewer definite conclusions than does evidential disfavor, this conflation leads to spurious conclusions that are merely artifacts of a poor choice of inductive logic. My contention in this paper is that this conflation of neutral and disfavoring evidence has occurred repeatedly in philosophical and physical analyses in cosmology. Since cosmology often deals with problems of universal scope for which evidence is meager, it is rich in cases of neutral support and thus especially prone to the confusion. My purpose in this note is to elaborate the difference between neutral and disfavoring evidence, to show how non-probabilistic formal tools may be used to represent completely neutral evidential support and to give examples of the conflation of neutral and disfavoring evidence in cosmology. In the following, Section 2 will develop a simple example of neutral evidential support in cosmology in order to fix the notion more clearly. Section 3 will investigate how this neutrality can be represented formally. It will be argued that a probability measure represents degrees of favoring and disfavoring, but does not capture neutrality. Rather an inherently non-additive representation must be used for completely neutral support. Section 4 will show that misdescription of neutrality of support by a probability measure leads to the “inductive disjunctive fallacy” in which disjunctions of neutrally supported possibilities are mistakenly judged as strongly supported. Illustrations in the literature include van Inwagen’s argument for why there is very probably something rather than nothing. Section 5 will sketch how the non- 3 additive representation of completely neutral support can be incorporated into an alternative inductive logic. Section 6 will show that the implausible results of the Bayesian “doomsday argument” arise as an artifact of the inability of the Bayesian system to represent neutral evidential support. A reanalysis in an inductive logic that can express evidential neutrality no longer returns the implausible results. Sections 7 will review how probabilistic representations can properly be introduced into cosmology. An ensemble provided by a multiverse is not enough. What is needed are some facts that specifically warrant probabilities. The difficulties of the “self-sampling assumption” arise because there are no such facts. Finally, the concluding Section 8 will suggest that mainstream cosmological theorizing is at risk of committing the same fallacies as sketched in earlier sections. 2. A Cosmological Case of Neutral Support A clear example of complete neutrality of support in cosmology arises in a more extreme version of multiverse theory. There we may postulate other universes, disconnected from ours, but in which the same fundamental laws of physics obtain. In these other universes, the fundamental constants like h, c, G and the parameters of the standard model of particle physics have different values, but our supposition is that we have no indication at all of what those values might be. Even so, we can still know a lot about these other universes. Except in degenerate cases, they will admit wavelike propagations of electromagnetic radiation. If the various fundamental forces are appropriately balanced, they will harbor chemical elements like our own, with characteristic quantized atomic spectra. But what can we say of the values of fundamental constants themselves? Our evidence tells us nothing. We have no reason at all to favor one set of values of Planck’s constant h over any other. The evidence is neutral.2 This case is to be distinguished from another multiverse theory in which we have disfavoring evidence for the same parameter. In this other multiverse theory, new universes are born from singularities through stochastic processes whose governing law, we shall suppose, 2 Comparing fundamental constants across universes requires that we also compare the units of measurement used. Readers who wish to avoid these complications should replicate the arguments of this paper using dimensionless quantities, such as the fine structure constant. 4 provides a broadly spread probability distribution over the possible values of h. In this case, that h lies in any small interval of values is very improbable; our background evidence disfavors that small interval. Correspondingly, we have strong evidence that the actualized value of h lies outside this interval. In the first multiverse theory, we simply have no support for the value of h to be in or not to be in some particular small interval of values. In the second, it is improbable that h lies in some small interval and probable that it lies outside it. We should not conflate the two cases. Should we try to represent the neutrality of the first theory by assigning a low probability to h lying in the interval, we have contradicted that neutrality. For that assignment forces a high probability on h lying outside the interval, an outcome for which we must now assign strong support. That high probability and the resulting near certainty is a spurious artifact of the use of the wrong inductive logic. It is the support that would arise in the second theory in which the evidence disfavors strongly the small interval and thus strongly favor values outside that interval. 3. Representing Neutral Evidential Support 3.1 The Failure of a Probabilistic Representation If a probability measure is able to represent degrees of evidential support at all, then a probability P(H|E) near unity must represent the case of evidence E providing strong support for the hypothesis H. It immediately follows from the additivity of probability measures P(H|E) + P(not-H|E) = 1 that P(not-H|E) is close to zero. Since E favors H just to the extent that it disfavors the negation not-H, we must now conclude that, when P(not-H|E) is close to zero, evidence E strongly disfavors not-H. Reversing H and not-H, we can now conclude that, when P(H|E) is close to zero, evidence E strongly disfavors H. More generally, if there are n mutually exclusive and exhaustive outcomes A1, …, An, additivity requires P(A1|B) + P(A2|B) + … + P (An|B) = 1 or, in other words, that the measure is normalized to unity. This normalization condition means that background evidence B can favor one outcome or set of outcomes only if it disfavors others. 5 The additivity of probabilities is the mathematical expression of the complementary relationship of support and disfavoring.3 It leaves no place in the representation for neutrality. The standard device of representing neutrality with a broadly spread probability distribution merely assigns a very low probability to each possible outcome; that is the case of evidential disfavor, not neutrality.4 3.2 Representing Evidential Neutrality How are we to represent evidential neutrality? The difficulty for the most general case is that the full spectrum of evidential support cannot simply be represented by the degrees of a one- dimensional continuum, such as the reals in [0,1]. The full spectrum forms a multi-dimensional space with, loosely speaking, disfavoring and neutrality proceeding in different directions. I know of no adequate theoretical representation of this space. However we can discern what a small portion of it looks like. Write [A|B] as the inductive support proposition A is accorded by proposition B. The use of a new notation reminds us that these degrees of support need not be probabilities. Let us take the case of complete evidential neutrality. This extreme case can be captured by an essentially non-additive representation. The support accorded any contingent proposition A by the background B is just one fixed value that we write “I” (for indifference or ignorance) that figures in the distribution: (CNS) Completely neutral support [T|B] = 1 for all propositions T deductively entailed by B [A|B] = I for all contingent propositions A [F|B] = 0 for all propositions F that logically contradict B 3 Conversely it has been argued (Norton 2007, Section 4.1) that the presumption that the range of values of degrees of support span favoring to disfavoring leads us directly to an additive measure. 4 What of the popular device of representing neutrality by sets of probability measures? It has been argued in Norton (2007, Section 4.2; 2007a, Section 6) that this device fails for several reasons. The most serious is that it is an attempt to simulate an inherently non-additive logic with an additive measure, rather than to seek the logic directly. 6 The 1 and 0 of the two extreme cases are less interesting; this is merely the assigning of extreme values to propositions we know deductively to be true or false given B. The interesting part is that all contingent propositions, whose truth values are left undecided by B, are accorded the same neutral value I. The quantity [A|B] of (CNS) should not be confused with other quantities that arise in Bayesian analyses.5 This quantity expresses the total support accorded to an outcome A by the background B. It is a function of two propositions, A and B, only. It is distinct from a relation of differential or incremental support: the support accorded A specifically by evidence E in the context of background B. This is a tertiary function of three propositions, A, E and B. In a Bayesian analysis, it is measured by comparing the posteriors and priors, P(A|E&B) and P(A|B), such as through a difference measure: P(A|E&B)-P(A|B); and the analysis may seek to express neutrality through the probabilistic independence of E and A when they are conditioned on the background B. My concern is not the differential evidential import of E, but that a probabilistic prior such as P(A|B) must fail to capture total neutrality of support. That (CNS) is the appropriate representation of completely neutral support has been argued at length in Norton (2008).6 I refer readers to it for a formally precise development. In the discussion below, I shall indicate informally how the result comes about. It comes from two independent invariance conditions, each of which yields the same outcome. 3.2.1 Invariance under Redescription (and the Principle of Indifference) The principle of indifference asserts that, if the evidence bears equally on two outcomes, then the support accorded each by the evidence should be the same. This principle is so weak as to border on truism. It does have some strong consequences, however, if we allow that indifference and the 5 I am grateful to Jonah Schupbach for raising this issue. 6 In Norton (2008), I describe neutral support as an “ignorance” distribution. In using that description, regrettably I succumbed to the subjective Bayesian’s insistence that inductive logic is really about degrees of belief, whereas I now think we must insist that it is about objective degrees of support, as do objective Bayesians. The intrusion of opinion must be resisted since it corrupts evidential relations of support and obscures the limits of applicability of Bayesianism. 7 resulting equality of evidential import persists when the outcome space is redescribed. The invariance of this indifference leads directly (CNS). Take the multiverse example of Section 2. The evidence is completely neutral over different values of h. As result it supports equally that h is in each of the intervals 0h≥1 and 1>h≥1/2. 8 in the complementary interval 1t. Call this E. It now follows that the hypothesis of any T greater than t entails the evidence. Hence we can use the rule of conditionalization of Section 4 and infer that [T1|E&B] = [T2|E&B] = I That is, knowing that the end, T, must come after t, gives us no basis for discriminating among different end times T1 and T2. What should we do if we do want to incorporate the further information that some specific t is observed? A return to the Bayesian analysis will show us a way to proceed. 6.3 The Bayesian Analysis Again The Bayesian analysis of Section 6.1 is only a fragment of a fuller Bayesian analysis. When we explore that fuller analysis, we find the Bayesian analysis fails. Where it founders is on a requirement that the analysis should be insensitive to the units used to measure time. To see how this comes about, consider the posterior probability, as delivered by Bayes’ theorem: € p(T | t & B) = p(t |T & B)⋅ p(T | B) p(t | B) = 1 T ⋅ p(T | B) p(t | B) 17 for T>t. What seems unknowable is the ratio of priors p(T|B)/p(t|B). It turns out, however, that the ratio must be a constant, independent of T (but not necessarily independent of t). This follows from the requirement that the analysis proceeds the same way no matter what system of units we use—whether we measure time in days or years. To assume otherwise would not be unreasonable. If, for example, the process is the life span of an oak tree, we know that its average life span is 400-500 years. With this time scale information in hand, we should expect a very different analysis of the time to death if our datum is that the oak is 100 days old or 100 years old. However that is a different problem; the doomsday problem as posed provides no information on the time scale and no grounds to analyze differently according to the unit used to measure time. To proceed, we assume that there is a single probability density p(.|.) appropriate to the analysis, so that the problem is soluble at all; and, to capture the condition of independence from units of time, we assume that the same probability density p(.|.) is used whichever unit is used to measure time. This entails that the probability density p(.|.) is invariant under a linear rescaling of the times t and T (that, for example, corresponds to changing measurements in years to measurements in days): t’ = At T’ = AT This is a familiar condition applied standardly to prior probability densities that are functions of some dimensioned quantity T. Such a probability density, it turns out, must be the “Jeffreys prior,” which is:13 p(T|t&B) = C(t)/T for T>t where C(t) is a constant, independent of T. The difficulty with this probability density in T is that it cannot be normalized to unity. The summed probability over all time T diverges: € p(T | t)dT = T=t ∞ ∫ (C(t)/T)dT =T=t ∞ ∫ ∞ 13 See, for example, Jaynes (2003, 382). The probability assigned to the small interval dT must be unchanged when we change units. That is: p(T|t&B)dT = p(T’|t’&B)dT’. Since T’=AT, we have dT’/dT = A = T’/T, so that p(T|t&B).T = p(T’|t’&B).T’, from which the Jeffreys prior follows immediately. 18 The Bayesian literature has learned to accommodate such improper behavior in prior probability distributions. The key requirement is that, on conditionalization, the improper prior probability distribution must return a normalizable posterior probability distribution. Here, however, the improper distribution is already the posterior distribution. So the failure is not merely a familiar failure of the Bayesian analysis to provide a suitable prior probability; it is its failure to be able to express a distribution of support over different times independent of units of measure. The failure of normalization of probability is not easily accommodated. It immediately breaks connections with frequencies. While we may posit that ratios of the finite-valued probabilities are approximated by ratios of frequencies of the corresponding outcomes in the usual way, there is no comparable accommodation for outcomes with infinite probability. Their ratios are ill-defined. We may wish to proceed nonetheless, interpreting the unnormalized probabilities just as degrees of support in some variant inductive logic. The result is curious. Consider the degree of support assigned to the set of end times T in any finite interval T1 to T2: € P(T1 < T < T2) = p(T | t & B)dT = C(T1 T2∫T1 T2∫ t)/T ⋅dT = finite The degree assigned to the set of end times greater than some nominated T2 € P(T > T2) = p(T | t & B)dT = C(T2 ∞ ∫T2 ∞ ∫ t)/T ⋅dT = ∞ As a result, finite degree is assigned to any finite interval of times; and, no matter how big a finite interval we take, an infinite degree is always assigned to the set of times that comes after. Since support must follow the infinite degree, all support is accrued by arbitrarily late times. No matter how large we take T2 to be, all support must be located on the proposition that the end time T comes after it. The standard doomsday argument assures us that, on a pairwise comparison, more support is accrued by the earlier time for doom. This extended analysis agrees with that. It adds, however, that, when we consider the support accrued by intervals of times, maximum possible support shifts to the latest possible times. 6.4 A Richer Analysis The analysis of the last section shows two things: the unsustainability of the Bayesian analysis and the power of invariance requirements. Here is a way that invariance requirements 19 can be used in a non-Bayesian analysis. We seek the degree of support [T1, T2|t] for an end time in the interval T1 to T2 given by the observation that the process has progressed to time t. We assume both T1 and T2 are greater than t. The Bayesian analysis of Section 6.1 required that we know which of all possible clocks is the correct one in the sense that the likelihood of our observation is uniformly distributed over its time scale. Of course it is virtually impossible to know which is the right one. We somehow need to judge how the cosmos is distributing our moments of consciousnesses as observers. Are they distributed uniformly in time? Are they distributed uniformly over the volumes of expanding space? Are they distributed uniformly over all people; or weighted according to how long each person lives? Are they distributed uniformly over all people or all people and primates with advanced cognitive functions? Or is the distribution weighted to favor beings according to the degree of advancement of their cognitive functions? Let us presume that there is such a preferred clock in this analysis as well. In addition, we assume that we have no idea from our background knowledge which is the correct clock. As a result, we must treat all clocks the same. This condition is an invariance condition. The degrees of support assigned to various intervals of time must be unchanged as we rescale the clocks used to label the times. A consequence of this invariance is that the degrees of support assigned to all finite intervals must be the same; that is, for any T2>T1>t and any other T4>T3>t, we will have14 [T1, T2|t] = [T3, T4|t] = I This will still be the case if either interval in a proper subinterval of the other. In this regard, after conditionalization on t, we have a distribution with the properties of completely neutral support. For this reason, I give the single universal value the symbol “I”, as before. 14 To see this, consider any monotonic rescaling f of the clock with the properties: t’=f(t)=t; T1’ = f(T1) = T3; and T2’ = f(T2) = T4. Since we have only relabeled the times, the degrees of support must be unchanged so that [T1, T2|t] = [T1’, T2’|t’]’ = [T3, T4|t]’. The prime on [.,.|.]’ indicates that we are using the rule for computing degrees of support pertinent to the rescaled clock. The invariance, however tells us that both original and rescaled systems use the same rule, so that the two functions [.,.|.] and [.,.|.]’ are the same. Hence [T1, T2|t] = [T3, T4|t] as claimed. 20 That is, contrary to Bayesian analysis, learning that t has passed does not invest us with oracular powers of prognostication. On that evidence, we have no reason to prefer any finite time interval in the future over any other.15 7. Bringing Back Probabilities There are many cases in which a probabilistic logic is the right one. To know which they are, we need to find a grounding in the facts of the particular case for probabilities of the logic.16 The simplest case arises when the system is a stochastic one governed by physical chances, such as the decay of radioactive atom. Then it is natural to conform strengths of support to the chances, for then strengths of support will agree with frequencies of success. A widely applicable example occurs if we assume that the errors entering into the measurement of a quantity arise in a pseudo- random manner. If they are small, independent and summed in accord with the antecedent conditions of the central limit theorem of probability theory, their pseudo-randomness warrants the use of a probabilistic bell curve to model the variations in the measured quantity. 17 15 This result does not automatically extend to intervals open to infinity. However it is clear that a minor alteration of the analysis will return [T1, ∞|t] = [T2, ∞|t] = I* for any T1>t and T2>t. It is plausible that some further condition will give us the stronger [T1, ∞|t] = [T1, T2|t], so that I*=I. However I do not think invariance conditions are able to force it. 16 The material theory of induction (Norton, 2003, 2005) is an extension of this idea. It asserts that the warrant for an inductive inference is not a universal formal template, but a locally obtaining matter of fact. 17 The facts that warrant a probabilistic analysis need not be facts about physical probabilities. Imagine that one is at a racetrack placing bets with a “Dutch bookie” and that the constellation of assumptions surrounding the Dutch book arguments obtain. (See Howson and Urbach, 2006, Ch. 3.) These facts warrant one conforming one’s inductive reasoning with the probability calculus— but only as long as these facts obtain. 21 7.1 A Mere Ensemble is Not Enough In the cosmology literature, there are efforts to use the physical facts of the cosmology to ground the assigning of probabilities to the components of a multiverse.18 This is the right way to proceed, although there is always scope for the facts invoked to fall short of what is needed. An ensemble is like a deck or cards. We do not have a probability of 1/52 for the ace of hearts when we merely have a deck of cards. We must in addition shuffle it and deal a card. Without this randomizer, merely having neutral evidential support for all cards is insufficient to induce the probabilities. The proposal developed in Gibbons et al. (1987), Hawking and Page (1988) and Gibbons and Turok (2008) supplies an ensemble but no analog of the randomizer. It employs a Hamiltonian formulation of the cosmological theories and derives its probabilities from the naturally occurring canonical measures in them. At first this seems promising since it is reminiscent of the natural measure of the Hamiltonian formulation of ordinary statistical physics. There, the association of a probability measure with the canonical phase space volume is underwritten by some expectation of a dynamics that is, in some sense, ergodic.19 That means that the system will spend roughly equal times in equal volumes of phase space, as it explores the full extent of the phase space. This behavior functions as a randomizer. It allows us to connect frequencies of occupation of a portion of the phase space with its phase volume, so that the familiar connection between frequencies and probabilities is recoverable. In the Gibbons et al. proposal, however, such ergodic-like behavior is not expected. Over time, a single model will not explore a fuller part of the model space of all possible cosmologies. Rather, the proposal is justified by the remark (p. 736): Giving the models equal weight corresponds to adopting Laplace’s ‘principle of indifference’, which claims that in the absence of any further information, all outcomes are equally likely. 18 For other examples of such efforts, see Tegmark et al. (2006) and Weinberg (2000). 19 It is merely an expectation but not an assurance, since a formal demonstration of the sort of behavior expected remains elusive. 22 If that truly is the basis of the proposal, then its basis does not warrant the assigning of probabilities. We have seen in Section 3.2 above that application of the principle of indifference may lead to the non-probabilistic representation of completely neutral support. 7.2 The Self-Sampling Assumption A similar failing arises in connection with the “self-sampling assumption” of Bostrom (2002, Ch. 4, 5 and 9; 2002a; 2007). A very large or spatially infinite universe may harbor many observers and, prior to consideration of further evidence specific to our circumstances, we might ask which of these many we are. The background evidence considered is quite neutral on the matter. So the appropriate representation is that of completely neutral evidence, as described in Section 3.2 above. That representation provides no basis for a probabilistic analysis. One can impose a probabilistic analysis on the problem by stipulation. That is the effect of the self-sampling assumption. It enjoins us as follows (2007, 433): One should reason as if one were a random sample from the set of all observers in one’s reference class. Since sampling is probabilistic, we assign equal probability to the outcome that we are each of the many observers. Bostrom (2002, 618) stresses that these probabilities are not an adaptation of strengths of support to physical chances. “I am not suggesting that there is a physical randomization process, a cosmic fortune wheel as it were, that assigns souls to bodies in a stochastic manner. Rather we should think of these probabilities as epistemic” (Original emphasis; see also Bostrom, 2002, 57 for similar remarks.). However if the probabilities are epistemic and thus implement an inductive logic, what grounds do we have for that logic being probabilistic? Bostrom continues to explain that he regards the assumption “as kind of restricted indifference principle.” The principle of indifference, however, does not automatically warrant probabilities, but only equalities of inductive strength. As we saw in Section 3.2 above, if the indifference is extensive enough, the principle can directly preclude a probabilistic logic. Such preclusion arises when indifference persists over redescriptions. This proves to be a problem for the self-sampling assumption. In forming our sampling distribution, should we be indifferent over all people? Over individual minutes experienced by people? Over groups of people? Over civilizations? Each choice gives a different probability measure. Bostrom (2002, 69-72) has identified this 23 problem as the “reference class problem” and attempts a solution in subsequent chapters (Ch. 10- 11). The attempt depends on the assumption that there is a single, correct reference class to be chosen and that poor choices can be eliminated by showing that they have undesirable consequences in a probabilistic analysis. Since the relevant evidence is sufficiently weak to allow indifference to persist over multiple descriptions, both assumptions of a unique, correct reference class and the applicability of probabilistic reasoning are in error. Finally Bostrom (2002, 51-58; 2002a; 2007, Sect. 24.2) urges that we must employ the self-sampling assumption to save Bayesian analysis of evidence from the following problem. A standard result of Bayesianism is that a good theory is rewarded epistemically for saying that the observed outcome of an experiment is very probable, whereas as a poor theory is punished when it says that the outcome is improbable. Now, a poor theory can still allow the observed outcome to occur as a highly improbable fluctuation, so that its occurrence somewhere in a very big universe is all but assured. As a result, Bostrom believes, we cannot use our observation of the experimental outcome to reward the good theory and punish the poor one in an unsupplemented Bayesian analysis. Both theories allow the observation with high probability. We must invoke the self-sampling assumption to discount the high probability from the poor theory. If Bostrom is right that the Bayesian analysis has to be saved by an incorrect representation of the inductive import of the evidence, then that seems good reason not to use a Bayesian analysis. The inductive import of the experiments do not have to be explicated by a Bayesian analysis, but only by an inductive logic that is properly adapted to the case at hand. Sometimes, as we saw above for the doomsday argument, a non-probabilistic inductive logic is called for. In this case, however, I do not believe that the problem Bostrom outlines is a challenge to Bayesian analysis.20 20 In brief, the Bayesian needs only that the poor theory makes the outcome of our instantiation of the experiment very unlikely, whereas the good theory makes it likely. These facts are deduced within the poor and good theories and no consideration of other observers who may perform the experiment is needed. 24 In sum, the self sampling assumption imposes a stronger probabilistic representation onto the problem than the weaker one warranted by the neutrality of the evidence, thereby risking that conclusions are artifacts of a poorly chosen logic.21 8. Conclusion What the above analysis shows is that there are limits to Bayesian analysis. It is unable to separate neutral evidential support from disfavoring support. If we confuse the two by using a probability measure to represent neutral evidential support, we introduce artifacts into our results that merely reflect the poor choice of inductive logic. These artifacts were illustrated in cosmology in the cases of the inductive disjunctive fallacy and the doomsday argument. These examples are simplified and removed from mainstream cosmological theorizing. They were chosen for analysis precisely because of this simplicity. It gave us enough independent perspective to be able to untangle the faulty reasoning. Are these same problems a concern in mainstream cosmological theorizing? It takes only a cursory review of the literature to see that it is. The multiverse literature has defined the “measure problem,” which is the problem of defining an additive measure over a set of multiverses. If defining an additive measure is merely a mathematical exercise in counting, then the problem would be benign. However it is not. The measure is supposed to reflect how much we expect the various multiverses to be actualized.22 In the conditions that largely prevail, our background evidence supplies completely neutral support for the actualization of each multiverse. Therefore, following the principal argument of this paper, an additive measure is simply the wrong structure. 21 I pass over one lingering problem arising from the choice of the wrong inductive logic: standard approaches admit no probability measure that is uniform over a countable infinity of observers. 22 Reviewing the articles collected in Carr (2007), for example, one finds probabilities appearing in full-blown Bayesian analyses, in casual mentions and much in between. The idea that assigning these probabilities is an arbitrary and even risky project appears often in the multiverse literature. See for example Aguirre (2007), Page (2007, 422), Tegmark (2007, 121-22). 25 This poor choice can cause problems. Our background theories provide no grounds for various cosmic constants to take the values they do. Non-inflationary cosmology provides no reason for us to expect the curvature of the spatial slices to be as close to zero as it is. Fundamental theories simply stipulate values for basic constants like h, c and G and give no prior reason for why they should have just the very values need to enable our form of life. There is a sense that these surprising values demand explanation. What argument can support that sense? The background theories provide no grounds for the parameters to have those specific values. That is, they provide completely neutral evidence. It is easy and common to represent that neutrality by saying that the prior probability of any particular value is very small. However the redescription of neutrality by the term “low probability” brings connotations. A low probability event in physics is commonly one that is not to be expected. If it does happen, we normally seek an explanation. By re-expressing neutral support as low probability, we have applied the wrong inductive logic. That brings artifacts. One is an unwarranted demand for explanation. My point is not that we need no explanation for these parameter values. Rather it is that we should look elsewhere for a justification of the need for explanation. That raises a difficult question. We cannot insist that everything needs to be explained. Such insistence triggers an unsatisfiable infinite regress. Even if we explain why the parameters have the values they do, we would then need to explain why the equations in which they figure have the form they do; and so on indefinitely. We should surely grant that some things just are the way they are and no further explanation is needed. How do we divide those things that need explanation from those that do not? My sense is there is little intrinsic to the things that mark them as in pressing need of explanation. Rather, it is a post hoc analysis. Once we find a successor theory, inflationary or anthropic, that can explain some formerly contingent aspect of the world, then we go back and see that aspect anew as one that urgently demanded explanation. References Aguirre, Anthony. 2007. “Making Predictions in a Multiverse: Conundrums, Dangers, Coincidences.” In Carr 2007, 367-86. Bostrom, Nick. 2002. Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge. 26 Bostrom, Nick. 2002a. “Self-Locating Belief in Big Worlds: Cosmology's Missing Link to Observation,” Journal of Philosophy 99: 607-623. Bostrom, Nick. 2003. “Are We Living in a Computer Simulation?” Philosophical Quarterly 53: 243-55. Bostrom, Nick. 2007. “Observation Selection Theory and Cosmological Fine-Tuning.” In Carr. 2007, 431-43. Carr, Bernard, ed. 2007. Universe or Multiverse. Cambridge: Cambridge University Press. Gibbons, G. W.; Hawkings, S. W. and Stewart, J. M. 1987. “A Natural Measure on the Set of All Universes.” Nuclear Physics B281: 736-51. Gibbons, G. W. and Turok, Neil. 2008) “Measure Problem in Cosmology,” Physical Review, D77, pp. 063516-1-12. Hawking, S. W. and Page, Don N. 1988 “How Probable is Inflation?” Nuclear Physics B298: 789-809. Howson, Colin and Urbach, Peter. 2006. Scientific Reasoning: The Bayesian Approach. 3rd ed. Chicago and La Salle, Il: Open Court. Jaynes, E. T. 2003. Probability Theory: The Logic of Science. Cambridge: Cambridge University Press. Norton, John D. 2003. "A Material Theory of Induction." Philosophy of Science 70: 647-70. Norton, John D. 2005. "A Little Survey of Induction." In P. Achinstein, ed., Scientific Evidence: Philosophical Theories and Applications. 9-34. Johns Hopkins University Press. Norton, John D. 2007. "Probability Disassembled." British Journal for the Philosophy of Science 58: 141-171. Norton, John D. 2007a. "Disbelief as the Dual of Belief." International Studies in the Philosophy of Science 21: 231-252. Norton, John D. 2008. "Ignorance and Indifference." Philosophy of Science 75: 45-68. Norton, John D. manuscript. “Challenges to Bayesian Confirmation Theory.” Prepared for Prasanta S. Bandyopadhyay and Malcolm Forster (eds.), Philosophy of Statistics: Vol. 7 Handbook of the Philosophy of Science. Elsevier. http://www.pitt.edu/~jdnorton/ Norton, John D. .manuscript a. "Deductively Definable Logics of Induction." http://www.pitt.edu/~jdnorton/ 27 Olum, Ken D. 2004. “Conflict Between Anthropic Reasoning and Observation.” Analysis 64.1: 1-8. Page, Don N. 2007. “Prediction and Tests of Multiverse Theories.” In Carr 2007, 411-29. Tegmark Max (2007). “The Multiverse Hierarchy.” In Carr 2007. 99-125. Tegmark, Max; Aguirre, Anthony; Rees, Martin J. and Wilczek, Frank. 2006. “Dimensionless Constants, Cosmology, and Other Dark Matters.” Physical Review D73: 023505-1-28. Van Inwagen, Peter. 1996. “Why is There Anything At All?” Proceedings of the Aristotelian Society, Supplementary Volumes 70: 95-120. Weinberg, Steven. 2000. “A Priori Probabilities of the Cosmological Constant.” Physical Review D61: 103505-1-4. Williamson, Jon. 2009. “Philosophies of Probability” In, A. Irvine, ed., Philosophy of Mathematics. 493-534. Amsterdam: North-Holland.