axm009.dvi Brit. J. Phil. Sci. 58 (2007), 141 – 171 Probability Disassembled John D. Norton ABSTRACT While there is no universal logic of induction, the probability calculus succeeds as a logic of induction in many contexts through its use of several notions concerning inductive inference. They include Addition, through which low probabilities represent disbelief as opposed to ignorance; and Bayes property, which commits the calculus to a ‘refute and rescale’ dynamics for incorporating new evidence. These notions are independent and it is urged that they be employed selectively according to needs of the problem at hand. It is shown that neither is adapted to inductive inference concerning some indeterministic systems. 1 Introduction 2 Failure of demonstrations of universality 2.1 Working backwards 2.2 The surface logic 3 Framework 3.1 The properties 3.2 Boundaries 3.2.1 Universal comparability 3.2.2 Transitivity 3.2.3 Monotonicity 4 Addition 4.1 The property: disbelief versus ignorance 4.2 Boundaries 5 Bayes property 5.1 The property 5.2 Bayes’ theorem 5.3 Boundaries 5.3.1 Dogmatism of the priors 5.3.2 Impossibility of prior ignorance 5.3.3 Accommodation of virtues 6 Real values 7 Sufficiency and independence ! The Author (2007). Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved. doi:10.1093/bjps/axm009 For Permissions, please email: journals.permissions@oxfordjournals.org Advance Access published on May 23, 2007 142 John D. Norton 8 Illustrations 8.1 All properties retained 8.2 Bayes property only retained 8.3 Induction without additivity and Bayes property 9 Conclusion 1 Introduction No single idea about induction1 has been more fertile than the idea that inductive inferences may conform to the probability calculus. For no other proposal has proven anywhere near as effective at synthesizing a huge array of disparate intuitions about induction into a simple and orderly system. No single idea about induction has wrought more mischief than the insistence that all inductive inferences must conform to the probability calculus. For it has obliged probabilists to stretch their calculus to fit it to cases to which it is ill suited, and to devise many ingenious but ill fated proofs of its universal applicability. This article offers an alternative to this second idea. It is part of a larger project (Norton [2003a], [2005]) in which it is urged that there is no single logic of induction, but many varieties of logic each adapted to particular contexts. The goal of the present article is to understand why the probability calculus works so well as a logic of inductive inference, in the contexts in which it does; and to try to demarcate when it does not. To this end, the article draws on an extensive, existing literature in presenting an axiom system for the probability calculus. However, unlike traditional axiomatizations, the goal is not to find the most parsimonious system. Instead the individual axioms have been carefully selected so that each expresses an intuitively natural idea about inductive inference that can be used independently. As a result, the ideas are logically stronger than they need be were the only purpose to deduce the probability calculus. These ideas, as developed in the Sections 3 – 6, are: Framework, Addition, Bayes property (= Narrowness + Multiplication) and Real values. 1 The terms ‘induction’ and ‘inductive inference’ are used here in the broadest sense of any form of ampliative inference. They include more traditional forms of induction, such as enumerative induction and inference to the best explanation, which embody a rule of detachment; as well as confirmation theories, such as in traditional Bayesianism or Hempel’s satisfaction criterion, which lack such a rule and merely display confirmatory relations between sentences. Since the assumptions of Framework (Section 3 below) lack a rule of detachment, the positive analysis of this article uses the latter approach. Probability Disassembled 143 Excepting Framework, these ideas are independent of one another. The Bayes property, for example, is responsible for the dynamics of conditionalization under Bayes’ theorem; it is independent of Addition and Real values, and may be invoked independently of them. The proposal of this article is that we should do just this. We should not assume that all these component notions apply in every context in which we may seek to use the probability calculus as a logic of induction. Rather we should determine which, if any, apply in the context at hand and use those only. I will suggest that following this course will help us avoid problems associated with the application of the probability calculus to inductive inference. How are we to decide which components apply in a given context? A principled basis is supplied by what I call elsewhere a ‘material theory of induction’ (Norton [2003a], [2005]). According to it, induction differs fundamentally from deduction in that inductive inferences are not licensed ultimately by universally applicable inference schemas into which particular content may be inserted. Rather, they are licensed by contingent facts. Since different facts obtain in different domains, we should expect different inductive inference forms to be applicable in different domains. If we are reasoning about stochastic systems governed by a theory with physical chances, the facts of that theory will likely license inductive inference forms involving the probability calculus. In domains in which different facts prevail, these forms may no longer be licensed. Section 8 provides illustrations and argues that neither Addition nor Bayes property is licensed for inductive inferences concerning some indeterministic systems not governed by physical chances.2 2 Failure of demonstrations of universality There have been numerous attempts to establish that the probability calculus is the universally applicable logic of induction. The best known are the Dutch book arguments, developed most effectively by de Finetti ([1937]), or those that recover probabilistic beliefs from natural presumptions about our preferences (Savage [1972]).3 Others proceed from natural supposition over how relations of inductive support must be, such as Jaynes ([2003], Ch. 2). 2 The idea that one should investigate induction locally has been considered in the literature that gives a probabilistic analysis of induction, but without forgoing the idea that the probability calculus underwrites inductive inference even locally. For an entry to this literature, see Kyburg ([1976]). 3 Strictly speaking, these arguments purport to establish only that degrees of belief, as made manifest by a person’s preferences and behaviors, must conform to the probability calculus on pain of inconsistency. They become arguments for universality if we add some version of a view common in subjectivist interpretations that degrees of belief are only meaningful insofar as they can be manifested in preferences and behaviors. 144 John D. Norton 2.1 Working backwards These demonstrations are ingenious and generally quite successful, in the sense that accepting their premises leads inexorably to the conclusion that probability theory governs inductive inference. That, of course, is just the problem. The conclusion is established only insofar as we accept the premises. Since the conclusion makes a strong, contingent claim about our world, the demonstrations can only succeed if their premises are at least strong factually.4 That makes them at least as fragile as the conclusion they seek to establish. Since they are usually created by the simple expedient of working backwards from the conclusion, they are often accepted just because we tacitly already believe the conclusion. For these reasons, all demonstrations of universality are fragile and defeated by a denial of one or more of the premises. A few examples illustrate this general strategy for defeating the demonstrations. Dutch books arguments are defeated simply by denying that some beliefs are manifested in dispositions to accept wagers. Or their results can be altered merely by adjusting the premises we will accept. Dutch book arguments commonly assume that there are wagers for which we are willing to accept either side. That assumption is responsible for the additivity of the degrees of belief the argument delivers. Its denial involves no incoherence in the ordinary sense. It just leads us to a calculus that is not additive. (See Smith [1961]) Similarly, there is no logical inconsistency in harboring intransitive preferences. They will, however, not sustain a recovery of transitivity of beliefs in Savage’s ([1972], §3.2) framework, which is necessary for beliefs to be probabilistic.5 Finally, Jaynes ([2003], §2.1) proceeds from the assumption that the plausibility of A and B conditioned on C (written ‘(AB|C)’) must be a function of (B|C) and (A|BC) alone, from which he recovers the familiar product rule for probabilities, P(AB|C) = P(A|BC)P(B|C). That this sort of functional relation must exist among plausibilities, let alone this specific one, is likely to be uncontroversial only for someone who already believes that plausibilities are probabilities, and has tacitly in mind that we must eventually recover the product rule.6 4 There is no escape in declaring that good inductive inferences are, by definition, those governed by the probability calculus. For any such definition must conform with essentially the same facts in that it must cohere with canonical inductive practice. Otherwise we would be free to stipulate any system we choose as the correct logic of inductive inference. 5 Savage’s framework harbors a circularity. In its barest form, it offers you a prize of $1, say, for each of the three acts fA , fB and fC , if uncertain outcomes A, B or C happen, respectively. You prefer fA to fB just in case you think A more likely than B. So your preferences on fA , fB and fC will be transitive just in case you already have transitive beliefs on the possibilities of A, B and C. 6 A simple illustration of an assignment of plausibilities that violates the functional dependence is ‘Plaus.’ It is generated by a probability measure P over propositions A, B, . . . as a coarsening, with only two intermediate values: Plaus(A|B) = ‘Low’ when 0 < P(A|B) < 1/2; and Plaus(A|B) = ‘High’ when 1/2 < P(A|B) < 1 Probability Disassembled 145 The fragility of these demonstrations is very similar to the failure of attempts to show that Euclid’s fifth postulate of the parallels is the only postulate admissible in geometry. These attempts started by denying Euclid’s fifth postulate in the context of the other postulates, and inferring from the denial some unusual geometric propositions that, we were to suppose, are incoherent. It was eventually realized in the nineteenth century that the denial of Euclid’s fifth postulate involved no inconsistency; it merely led us to different geometries. While I believe all these demonstrations fail in establishing universality, they still have great value. For we learn from them that, in domains in which their premises hold, our inductive inferences must be governed by the probability calculus. 2.2 The surface logic There is a second sort of argument for universality, mostly suggested indirectly by impressive catalogs of the success of Bayesian analysis at capturing our intuitions about inductive inference. All these intuitions so far have been captured by the probability calculus; so, the thought goes, we should expect this success to continue. In my view, the success is overrated and does not sustain the probability calculus as the unique logic of induction. In many cases, the success is achieved only by presuming enough extra hidden structures — priors, likelihoods, new variables, new spaces — until the desired intuition emerges. That does not mean that the logic on the surface is probabilistic, but only that this surface logic can be simulated with a more complicated, hidden structure that employs probability measures. Two examples will illustrate the concern. Take Hempel’s original question of whether a nonblack, nonraven confirms that all ravens are black. A probabilistic analysis gives an intuitively very comfortable result. But it only succeeds by adding a great deal of new structure to the original problem: populations with different distributions of ravens and black objects and a presumption that we are sampling randomly from them. That changes the problem to a new one amenable to probabilistic analysis. (For a survey, see Earman [1992], §3.3.) Consider ignorance, which, I argue below in Section 4.2, is not represented in an additive calculus. It may be introduced by associating beliefs with convex sets of probability measures. While additive measures were used to produce them, the sets themselves no longer conform to a logic with the formal property of Addition as defined below. Additive measures are merely the device used to generate a new system governed by a different surface logic. Once again, there is a geometric analogy. We can recover many non- Euclidean geometries by considering curved surfaces embedded in a higher 146 John D. Norton dimensioned Euclidean space. That does not mean that Euclidean geometry is the universal geometry. It is not the geometry intrinsic to the surface. However, we learn that Euclidean geometry can be used as a tool to generate that geometry, as could other geometries. 3 Framework The system of properties for confirmation relations to be described here draws on the extensive literature in axioms for the probability calculus already developed. See especially Cox ([1961]) and, for surveys, see Fine ([1973]) and Fishburn ([1986]). 3.1 The properties The framework assumes a set of propositions A1, A2, . . . closed under the familiar Boolean operations ! (negation), " (disjunction) and & (conjunction). Where the context calls for it, the set will be assumed to be closed under countable disjunction. The universal proposition is ! = A1 " A2 " . . . Implication # is stronger than material implication; A # B means that propositions are so related7 that !A " B must always be true; that is, !A " B = !. The universal proposition, !, is implied by every proposition in the algebra and is always true. The proposition, Ø, implies every proposition and is always false. The symbol [A|B] represents the degree to which proposition B confirms proposition A. It is undefined when B is of minimum degree, which means that B = Ø or there is a C such that B # C and [B|C] = [Ø|C]. The relation on these degrees [A|B] ! [C|D] (or equivalently [C|D] " [A|B]) is interpreted informally as ‘D confirms C at least as strongly as B confirms A.’ It satisfies: F. Framework F1. Partial order. The relation ! is a partial order. That is, for any admissible8 propositions A, B, C, D, E and F: F1a. Reflexivity. [A|B] ! [A|B] 7 For example, if we associate propositions with the sets of worlds in which they are true, then A # B obtains just if A’s worlds are a subset of B’s. 8 Here and henceforth, ‘admissible’ precludes formation of the undefined [·|B], where B is of minimum degree. Probability Disassembled 147 F1b. Antisymmetry. If [A|B] ! [C|D] and [A|B] " [C|D] then [A|B] = [C|D] F1c. Transitivity. If [A|B] ! [C|D] and [C|D] ! [E|F] then [A|B] ! [E|F] Antisymmetry allows us to define < and > in the usual way.9 We also suppose: F2. For all admissible propositions A and B: F2a. [Ø|!] ! [A|B] ! [!|!] F2b. [Ø|!] < [!|!] F2c. [A|A] = [!|!] and [Ø|A] = [Ø|!]; and F3. Universal comparability. For all admissible propositions A, B, C and D [A|B] ! [C|D] or [A|B] " [C|D]; and F4. Monotonicity. For all admissible propositions A, B and C, if A #B #C, then [A|C] ! [B|C]. 3.2 Boundaries While these properties are natural, they, nonetheless, have significant content, and it is far from clear that they will be applicable to all cases of inductive inference. Two properties are especially vulnerable, F3. Universal comparability and F1c. Transitivity, as is possibly F4. Monotonicity. 3.2.1 Universal comparability We cannot presume, as Keynes ([1921], Ch.3) correctly urged, that all degrees of confirmation are comparable. A tacit expectation of universal comparability is natural as long as we think of degrees of confirmation as real valued. The expectation rapidly evaporates once we use more complicated structures. Imagine, for example, that the degrees are real intervals in [0,1] with the size of the interval betokening something about the bearing of evidence. Take two intervals [0.01, 0.99] and [0.49, 0.51]. If they must be comparable, the only relation that respects the symmetry of dispositions about the midpoint 0.5 is that they are equal. But that contradicts the presumption that the size of the interval represents some sort of difference in the degrees of confirmation. However, even if degrees of confirmation are real valued, it does not follow that they are comparable. For two degrees to be comparable in the relevant 9 [A|B] < [C|D] and [C|D] > [A|B] just in case [A|B] ! [C|D] but not [A|B] = [C|D]. 148 John D. Norton sense, they must measure essentially the same thing. The mere fact that two scales employ real values is not enough to assure this. One hundred degrees Celsius on the mercury thermometer scale and on the ideal gas thermometer scale are equivalent since they measure the same thing, temperature. They are none of equivalent to, less than or greater than one hundred degrees Baumé of specific gravity. Propositions can bear, evidentially, on one another in many ways, and the range of variation is sufficiently great that we can surely not always presume comparability of the degrees, even if both are measured on the same numerical scale. Consider the hypothesis H that the half-life of radioactive decay of Radium 221 is 30 seconds and the evidence E that some Radium 221 atom did decay in a time period of 30 seconds. The two degrees, [E|H] and [H|E], are very different. In the first, we take certain laws of physics, with their characteristic constants, as fixed and distribute belief over possibilities (decay in 30 seconds, decay in 40 second, etc.). Those laws provide physical chances for the possibilities and the bearing of H on E is detailed for us completely as a matter of physical law.10 In the second, we take an experimental fact as fixed and must now distribute belief over the possibility of different half-lives for Radium 221. No physical law can fix the bearing of E on H, for now the range of possibilities must involve denial of physical laws; there is only one correct value for the half-life. Even exactly how we are to conceive that range is unclear. Will we try to hold all of physics fixed and just imagine different half-lives for Radium 221? Or should we recall that the physical properties of Radium 221 are fixed by quantum physics and chemistry, so that differences in half-lives must be reflected in differences throughout those theories. And how should those differences be effected? As alterations just to fundamental constants like h and c? Or in alterations to Schrödinger’s equation itself? My point is not that we cannot answer these questions, but that answering them engages us in a very different project that is a mixture of science and speculative metaphysics. The way H bears on E in [E|H] is very different from the way E bears on H in [H|E].11 So, if we expect the degrees of confirmation simply to measure the bearing of evidence, as an objectivist about probability like Keynes would, then we should not expect the two sets of degrees always to be comparable. A subjectivist about probabilities has no easy escape. Of course, the subjectivist 10 Or, more cautiously, Lewis’s ([1980]) ‘principal principle’ in effect enjoins us to endow our degrees of confirmation with the properties of a physical chance. 11 Humphreys ([1985]) uses related illustrations to object to the propensity interpretation of probability. For example, if proposition S asserts that a person is a smoker and C that the person has an undiscovered lung cancer, then the causal propensity of a smoker to have an undiscovered lung cancer is expressed by the direct probability P(C|S). Yet, precisely because this causal propensity is unidirectional, the inverse probability P(S|C) does not express a causal propensity of people with undiscovered lung cancer to smoking. Probability Disassembled 149 simply supposes comparability and stipulates real valued prior probabilities that lead to real values for both [E|H] and [H|E] upon conditionalization. The hope is that the subjectivist’s assignments will eventually betoken something more than arbitrary numbers as the accumulation of evidence ‘washes out the priors’ and leads to a convergence of values for all subjectivists. If the very idea that the two degrees are comparable entered originally as a supposition without proper grounding, the convergence does not remove its arbitrariness. Oranges are not apples, even if we end up agreeing on how many apples make an orange. 3.2.2 Transitivity The prevalence of real values for degrees of confirmation can also mislead us into expecting their transitivity universally. That expectation fades once we entertain the possibility that these degrees have more complicated structures.12 For example, that some hypothesis H entails true evidence E is generally taken to confirm H. Some hypotheses, however, are routinely assessed as being more deserving of support if they manifest certain virtues in the context of the successful deduction. These virtues include: simplicity, scope, fecundity and explanatory power, with the latter engendering the account of induction known as ‘inference to the best explanation.’ So three hypotheses H1, H2 and H3 may score differently with regard to three virtues V1, V2 and V3. Allowing for three values, ‘high,’ ‘medium’ and ‘low,’ we may end up with the following assignments:13 Table 1 Intransitive degrees V1 V2 V3 [H1|E] High Medium Low [H2|E] Medium Low High [H3|E] Low High Medium Following a simple rule that the majority wins, [H1|E] > [H2|E], since [H1|E] outscores [H2|E] in two of three virtues. Similarly, [H2|E] > [H3|E] and [H3|E] > [H1|E], which violates transitivity. Indeed, if we assign equal importance to the three virtues and require a rule of comparison to rank solely on the basis of the values in the table, then any rule that yields [H1|E] > [H2|E] must also generate the intransitivity. For there is a cyclic symmetry in the values in that 12 The discussion of Section 4 below raises the possibility of degrees of confirmation with a two-dimensional structure, where lower degrees represent some mix of disbelief and ignorance. 13 These virtues are discussed further in Section 5.3.3 below. 150 John D. Norton [H1|E] relates to [H2|E] in the same way as [H2|E] relates to [H3|E] and [H3|E] relates to [H1|E]. 3.2.3 Monotonicity Monotonicity prohibits evidence from confirming a proposition more strongly than its deductive consequences. Yet, as Tversky and Kahneman ([1982]) showed in psychological experiments, people are easily led to violate this prohibition. If she is described appropriately, subjects will judge it more probable that Linda is a bank teller and a feminist than that Linda is a bank teller. Tversky and Kahneman interpret this to mean that people conflate probability with representativeness. Might there be a calculus of confirmation that violates monotonicity in that degrees of confirmation measure, in part, goodness-of-fit, in which Linda, the bank teller and feminist, would be a better fit to the evidence than Linda the bank teller? That could arise in a system of inductive inference with a rule of detachment that forces us to select among well-confirmed hypotheses, using quantities [H|E] as scores. On evidence E = ‘the coin did not fall heads,’ it may score H = ‘the coin fell tails’ higher than H’ = ‘the coin fell tails or on edge.’ For if we must choose just one hypothesis to detach from E, it would, in ordinary circumstances, be H and not H’, even though H entails H’. 4 Addition 4.1 The property: disbelief versus ignorance The range of degrees of confirmation for some proposition A spans from the maximal [A|!] = [!|!] to the minimal [A|!] = [Ø|!]. Do these extreme values correspond to justification of complete belief in A and complete disbelief in A? Or do they correspond to complete belief in A and complete ignorance concerning A? The signal feature of a probability measure is that it is an additive measure and we shall see that this property is derived from choosing the first option: Underlying intuition of Addition: The range of degrees of confirmation span justification of complete belief and complete disbelief. This first option is characterized by a reciprocal relationship between degrees of confirmation for A and for its negation, !A. Complete disbelief in A corre- sponds to complete belief in !A. As the degree of confirmation [A|!] weakens from the maximum [!|!] that justifies complete belief, then the degree of con- firmation [!A|!] must strengthen accordingly from the minimal [Ø|!] that justifies complete disbelief. We should expect, under the above intuition, that this reciprocal relation between degrees of confirmation will also hold when we Probability Disassembled 151 divide any proposition B into two, exhaustive and mutually exclusive logical parts, A&B and !A&B; and that it will obtain when we conditionalize on any background C. The map that takes us from [A&B|C] to [!A&B|C] will, in gen- eral, differ according to [B|C], since the maximum degree that can be assigned to [A&B|C] or [!A&B|C] is set by [B|C] under F4. Monotonicity. So there is a family of functions, f[B|C](·). We express the above intuition by requiring: A’. Addition. For any propositions A and B and any admissible C, there exists a function [!A&B|C] = f[B|C]([A&B|C]) where f is strictly increasing in [B|C]14 and strictly decreasing in [A&B|C].15 To convert this form of Addition into a more familiar one, we note that, since f is strictly increasing in [B|C], the function f can be inverted in this argument.16 That is, there exists a function g, such that [B|C] = g([A&B|C], [!A&B|C]) where g is strictly increasing in both [A&B|C] and [!A&B|C]. This last function is presented in a more familiar way as an addition operator in a property equivalent to A$ Addition: A. Addition. For any admissible proposition Z and mutually contradictory propositions X and Y, there exists an addition operator % such that [X " Y |Z] = [X|Z] % [Y |Z] where % is strictly increasing in both [X|Z] and [Y|Z]. This second form justifies the name Addition, since it displays the sense in which the degree of confirmation of a proposition is fixed by the ‘adding up’ of degrees of confirmation of its logical parts. Properties that % must carry for compatibility with the Framework F. are readily deduced from the logical properties of propositions, such as X " Y = Y " X, U " V " W = (U " V) " W = U " (V " W), X " Ø = X and X " ! X = !: [X|Z] % [Y|Z] = [Y|Z] % [X|Z] [U|Z] % [V|Z] % [W|Z] = ([U|Z] % [V|Z]) % [W|Z] = [U|Z] % ([V|Z] % [W|Z]) [X|Z] % [Ø|Z] = [X|Z] [X|!] % [!X|!] = [!|!] 14 That is, for each y, if x$>x, then z$>z, where z$ = fx$ (y) and z = fx (y). 15 That is, for each x, if y$>y, then z$ [A|C], then [A$|B] > [A|B]. It follows that 18 That Bayesian inference depends on such a simple model is well recognized. See, for example, Hawthorne ([1993]). 19 That is, if we have propositions A # B # C and A$ # B$ # C$, where, for admissible B, B$, C and C$, [A|C] = [A$|C$] and [B|C] = [B$|C$], then [A|B] = f[B|C],C ([A|C]) = [A$|B$] = f[B$ |C$ ],C$ ([A$|C$]). Probability Disassembled 155 Figure 1. Conditionalization as rescaling degrees of confirmation. f[B|C]([A|C]) is strictly increasing in [A|C]. Therefore f[B|C]([A|C]) is invertible in [A|C]. The inverse of this function, [A|C] = f&1[B|C]([A|B]) can be written in a more familiar way as a product operator [A|C] = [A|B] ' [B|C] which must be strictly increasing in [A|B] since f[B|C]([A|C]) is strictly increasing in [A|C]. That the operator should also be strictly increasing in [B|C] for all values of [A|B] excepting [Ø|B] is the import of the requirement above that the redistribution be ‘uniform.’ An increase in [B|C], when [A|B] has the maximal value [B|B], is reflected by an exactly equal increase in [A|C], since [B|C] = [B|B] ' [B|C]. An increase in [B|C], when [A|B] has the minimal value [Ø|B], is reflected by no change in [A|C], since then [Ø|C] = [Ø|B] ' [B|C]. The requirement of uniformity amounts to asking that the increase in [A|C] for intermediate values of [A|B] should be uniformly interpolated between these two extreme values. Or it would amount to this if there were a way to represent ‘uniformly interpolated’ with the structures defined so far. But there is not. However, whatever it may amount to, minimally, it must require some increase in [A|C] for all intermediate values of [A|B]. That is sufficient to support the strict increase of ' in [B|C] unless [A|B] is [Ø|B]. Collecting these properties, we have: M. Multiplication. For any proposition A and admissible propositions B and C such that A # B # C, there exists a multiplication operator ' such that [A|C] = [A|B] ' [B|C] where ' is strictly increasing and thus invertible in both arguments (excepting [B|C], when [A|B] = [Ø|B]). 156 John D. Norton This operator is the analog of the normal product operator of the probability calculus, where, for these A, B and C, P(A|C) = P(A|B) · P(B|C). The two properties combined form: B. Bayes Property. N. Narrowness and M. Multiplication We can readily deduce the expected rules from this combined property. The analog of the product rule of probability theory is [A&B|C] = [A&B|B] ' [B|C] = [A|B] ' [B|C] (1) Combined with A. Addition we have the analog of the rule of total probability [A|C] = [A&B|C] % [A&!B|C] = ([A|B] ' [B|C]) % ([A|!B] ' [!B|C]) (2) 5.2 Bayes’ theorem The analog of Bayes’ theorem is derived in the usual way from the product rule. For an hypothesis H and evidence E: [H&E|!] = [H|E] ' [E|!] = [E|H] ' [H|!] (3) The terms can be labeled in the obvious way in analogy with the usual, probabilistic form of Bayes’ theorem as: ‘posterior’ ([H|E]), ‘expectedness’ ([E|!]), ‘likelihood’ ([E|H]) and ‘prior’ ([H|!]). Since the operator ' is strictly increasing and invertible in both arguments (excepting one case), the posterior [H|E] can be recovered by inverting ' and the theorem can be used in the usual way to recover familiar intuitions. Other terms equal, the posterior [H|E] will have a maximum value when H # E, for then the likelihood [E|H] = [!|!], which is the maximum value.20 Similarly, other factors equal, an increase in the prior [H|!] will lead to a corresponding increase in the posterior [H|E]. And an hypothesis that successfully entails evidence of lower expectedness [E|!] will have a higher posterior. This much, and many more familiar results like them, are recoverable without assuming A. Addition. If it is assumed, then a further form of Bayes’ theorem can be recovered by substituting for the expectedness using the rule (2): [E|!] = ([E|H] ' [H|!]) % ([E|!H] ' [!H|!]). 20 The likelihood [E|H] = [E&H|H] by N. and, since H = E&H when H # E, we have [E|H] = [H|H], which is the maximal [!|!] by F2b. Probability Disassembled 157 5.3 Boundaries While we may find the simplicity of the ‘refute and rescale’ dynamics appealing, that simplicity proves to be its fundamental limitation. The dynamics are sensitive only to entailment relations. As we shall see below, that forces the inductive character of the inferences to be inserted by our selection of priors. That burden overtaxes the priors since they will also be called upon to represent initial states of ignorance at the same time as they must supply essential inductive content. And worse, that inductive content is decided in significant measure as a matter of stipulation. For these reasons, prior probabilities have inevitably become the traditional locus of problems in probabilistic analysis; they are called upon to make up for the deficiencies of the ‘refute and rescale’ dynamics. 5.3.1 Dogmatism of the priors It is well known in probabilistic analysis that once zero or unit probability has been assigned to an hypothesis’ prior probability, conditionalization on new evidence compatible with it cannot alter those probabilities. The same problem arises in a system with B. Bayes property. Learning from experience will never lead it inductively to alter judgments of maximum or minimum belief, unlike humans. For any hypothesis H and evidence E, we have from Bayes’ theorem (3) the paired relations [H|E] ' [E|!] = [E|H] ' [H|!] [E|E] ' [E|!] = [E|!] ' [!|!] where the second relation arises from setting H = ! and noting that [!|E] = [E|E] = [!|!] from N . and F2c. Even if H is not !, once we set the prior [H|!] to [!|!], compatibility of the paired relations forces the posterior [H|E] = [E|E] = [!|!]. A prior set to certainty is immovable inductively. For any hypothesis H and any admissible evidence E, from the product rule (1), we have the paired relations [H&E|!] = [H|E] ' [E|!] [Ø|!] = [Ø|E] ' [E|!] where the second relation arises from setting H = Ø. If H is not Ø, if we set the prior [H|!] = [Ø|!], it follows from F4. that [H&E|!] = [Ø|!]. Compatibility of the paired relations forces the posterior to [H|E] = [Ø|E] = [Ø|!]. A prior set to maximum disbelief is immovable inductively. We can see how this last example arises directly from the excessive simplicity of the ‘refute and rescale’ dynamics. Those dynamics are sensitive only to the fact that both H&E and Ø are each able to entail the evidence E. So, if they 158 John D. Norton are given the same priors, they must then have the same posteriors. Since Ø must remain at the minimal level of confirmation on any evidence, H&E is condemned to the same fate. A more sophisticated dynamics would be able to recognize and exploit the difference between Ø vacuously entailing E and H&E entailing E.21 5.3.2 Impossibility of prior ignorance We have seen that A. Addition precludes lower degrees of confirmation from representing ignorance as opposed to disbelief. It also turns out that B. Bayes property precludes priors that truly represent ignorance and does so independently of A. Addition. To see this, note that the property entails that, for any propositions H and E, where [E|!] is not [Ø|!]: [H&E|!] = [H|E] ' [E|!] This relation is invertible in [H|E]. That is: [H|E] is fixed by the priors [H&E|!] and [E|!], (unless [E|!] is [Ø|!]). What this means is that the degree [H|E] — whether it is high or low, and in which precise measure — is already encoded in the prior [·|!]. The prior [·|!] amounts to a massive catalog of all possible relations of inductive support between all pairs of propositions. It must decide in advance just how we will redistribute support once we learn E, no matter what E may be (as long as [E|!] is not [Ø|!]). There is a large literature devoted to ‘ignorance priors,’ ‘uninformative priors’ or ‘informationless priors’ in probability theory (Jaynes [2003], Ch. 12). It is generally recognized that these terms are misnomers; the priors are really only as uninformative as the probability calculus allows and are typically tailored to being that uninformative about one particular fact, such as a parameter value. Were they really to achieve ignorance in the sense of a complete null state, the result would be a catastrophe for any system whose dynamics conforms to B. Bayes property. For all the system can do is to take a prior already rich in inductive information and refine it by the dynamic of ‘refute and rescale.’ 5.3.3 Accommodation of virtues An important limitation of the ‘refute and rescale’ dynamics is that it cannot differentially reward two hypotheses for their success in entailing the same true evidence. If hypotheses H1 and H2 entail the evidence E and we conditionalize 21 Analogously, the fixity of maximum support arises since the dynamics does not distinguish the trivial entailment E # E from the nontrivial H # E, where H is strictly stronger, logically, than E (that is, for some X, E = H " X, where H&X is Ø). Probability Disassembled 159 on E, the resulting changes in degrees of confirmation will be the same for each. For, in this case, M. Multiplication becomes [Hi|E] ' [E|!] = [Hi|!] If the two priors [H1|! ] and [H2|! ] agree, then so must the posteriors [H1|E] and [H2|E], because of the invertibility of the operator '. There is strong indication that this outcome renders systems with the B. Bayes property too insensitive to differences in the way hypotheses may entail evidence. The dogmatism of the priors above arose because the system is unable to distinguish the nonvacuous entailment of evidence E by some hypothesis from the vacuous entailment of E by the contradiction Ø. Some logics of induction, such as that illustrated in Section 8.3 below, must differentially reward hypotheses H1 and H2. Moreover, standard lore does not automatically accord equal confirmatory boosts to the two hypotheses H1 and H2. One is often favored over the other because the first entails the evidence in some virtuous way: with great simplicity or explanatory power; or because the second does it with some deficiency: it is ad hoc or grue-ified. Might there be some system of inductive inference that could distinguish some entailments as virtuous and others as deficient? The principal obstacle is that the virtues — notably simplicity and explanatory power — are so poorly understood that even the outlines of such a system are obscure. The problem of accommodating these virtues and vices into a probabilistic analysis is not new. (For helpful entries into this literature, see Howson and Urbach [1996], Ch. 7) While the problem has been addressed with many ingenious stratagems, they must all come down to one idea only. The only way a system that conforms to B. Bayes property can differentiate H1 and H2 is to reward virtue with a high prior and punish vice with a low prior. The effect of this need is that any system conforming to B. Bayes property must urge that the standard lore is mistaken in distinguishing virtuous entailments. For example, the standard lore is that the success of an ad hoc hypothesis in entailing some remarkable evidence gives it no boost in confirmatory support, for the success is achieved unvirtuously by cooking the books.22 Under ‘refute and rescale’ dynamics, this same conclusion must be arrived at in a two-step calculation that must itself be cooked to yield the null outcome. It says, contrary to the lore, that the ad hoc hypothesis does accrue exactly as large a boost in confirmatory support as enjoyed by the hypothesis that virtuously entails the same evidence. However the gains of that boost are 22 Or at least this is clearly so for the ‘bad’ cases, such as the supposition of a creationist geology that the world was created in 4004 BC complete with the fossil record of all geological eras intact. See Howson and Urbach ([1996], pp. 154 – 7) for cases of ‘good’ ad hoc hypotheses that do deserve support. 160 John D. Norton exactly canceled by a prior that has been cooked to just the very low value needed. While this stratagem of explicating virtues and vices in terms of high and low priors has had some notable successes in the probability literature, it faces a fundamental limitation. The assigning of a prior is global; it is done once. Yet, in the lore, the import of virtues and vices is local and may differ as hypotheses are subject to evidential scrutiny in different contexts, which in turn may call for differing priors. For example, the wave theory of light gives an especially simple and elegant explanation of interference. Its account of the rule of stellar aberration, however, proves to be quite tortured, once one looks at it closely — so much so, that it was a major achievement of late 19th century electrodynamics to be able to show that the wave theory could accommodate the totality of the rule satisfactorily (Norton [forthcoming]). The situation reverses for a corpuscular theory of light. It gives a simple and elegant explanation of stellar aberration; but, insofar as Newton’s corpuscular theory was able to give any account of the interference effect of ‘Newton’s rings’ using his fits of easy reflection and refraction, it was certainly not virtuous.23 The one prior must somehow reward virtue in one context and punish vice in another. Or we may be in a situation in which we cannot adjust priors to reward a virtue. In 1905, Einstein used his light quantum hypothesis to produce remarkably simple explanations of some of the observed properties of radiation. We should like to reward the light quantum hypothesis for not just entailing the evidence, but for explaining it virtuously. Yet, in 1905, after the nineteenth century overthrow of the corpuscular theory and the resounding success of the wave theory of light, any investigation of the properties of light must begin with a low prior on any corpuscular hypothesis. Finally, there is a related problem arising directly from N. Narrowness. That property allows evidence E to support an hypothesis H only through support of a disjunctive part H1 that entails E. The other disjunctive parts are H2, H3, . . ., where H = H1 " H2 " H3 " . . . and (H2 " H3 " . . .) & E = Ø. They have no effect on the support accrued to H. The property N . denies that there can be a synergy between the disjunctive parts, such that we should assign a different boost to the entire hypothesis than to the part, or to different disjunctive hypotheses that share the same disjunctive part that entails the evidence. Yet, such synergies seem to have a place in the lore of confirmation. Kepler’s hypothesis HKep that Mars orbits the Sun in a particular ellipse 23 To anticipate the rejoinder, I fully expect that this example and most others can be accommodated in a Bayesian system by adding in enough distinctions, variables, likelihoods and priors, just as Ptolemy’s geocentric system was able to accommodate any celestial motion by adding in enough epicycles and equants. That did not mean, however, that he had the right theory. Probability Disassembled 161 gains some support from the evidence E of Tycho’s observations of Mars and the Sun. N. Narrowness requires us to accord just the same support on evidence E to the disjunctive hypothesis, Hdisj = HKep " H2 " H3 " . . . " Hn, where H2, H3, . . ., Hn are hypotheses asserting other trajectories. As long as the hypotheses disjoined in Hdisj form an inchoate set, this seems reasonable enough. However, at the level of accuracy of Tycho’s data,24 HKep is also a disjunctive part of another hypothesis. If we restrict Newton’s theory of gravitation to two masses, one the size of the Sun and the other Mars, the resulting hypothesis HNew predicts a large number of possible orbits.25 The hypothesis, HNew, is a disjunction of hypotheses asserting them. The set disjoined is far from inchoate; its members are uniquely picked out as the set of orbits that satisfy Newton’s inverse square law of gravity for these masses. In effect the hypothesis HNew just asserts that the orbit of Mars conforms to Newton’s law. The natural intuition is that HNew somehow expresses a deeper truth than Hdisj, which merely disjoined HKep with a haphazard collection of alternatives. So we might expect that the synergistic disjunction in HNew deserves more support on the evidence than the inchoate disjunction of Hdisj. N. Narrowness prohibits us from rewarding HNew for this synergy among its parts; it requires that the evidence E must support Hdisj and HNew equally. The disparity becomes more striking the larger we conceive to be the set of haphazardly chosen orbits disjoined in Hdisj. The usual strategy, of course, is to attempt to reward HNew in advance by assigning much greater priors to the disjunctive sets of hypotheses delimited by simple differential equations, such as appear in Newton’s theory. However no assignment of priors can serve this end. As long as N. Narrowness is preserved, Hdisj and HNew must be accorded the same support on evidence E, whatever their priors. Once we discard the idea that any calculus of inductive inference must conform to B. Bayes property, we can begin to reflect upon what a replacement rule may bring. It may reward synergies; it may differentially reward virtuous and unvirtuous entailment of evidence; it may not be so dogmatic that assignments of complete certainty and disbelief are immovable; and it may be rich enough to admit true null states as priors. 24 To simplify the example, I adopt the fiction that Tycho’s data picks out just one orbit from each disjunctive set and neglect the motion of the Sun around the Sun-Mars center of mass that is entailed by Newton’s theory. 25 It predicts many more than the countably many disjuncts presumed by the F. Framework. To circumvent this difficulty, define HNew – Kep as hypothesizing all the orbits admitted by Newton’s theory in this case, excluding HKep . Then HNew retains the disjunctive form HNew = HKep " HNew – Kep . 162 John D. Norton 6 Real values The properties developed so far are necessary properties if degrees of confirmation are to be probabilities. They are not sufficient. They do not preclude value sets that cannot be mapped one-one onto a closed interval of reals in a way that preserves ranking. The traditional counter-example (Jeffrey [1961], pp. 19 – 20) is a family of hypotheses Hx,y, with real valued parameters x and y, where [HX,Y|!] > [Hx,y|!] just in case X>x, or, if X = x, Y>y. While ingenious ‘Archimedean Axioms’ have been devised to bridge the gap, none seem as illuminating in terms of fundamental ideas about inductive inference as the direct statement of the gap itself: R. Real Values. For any admissible propositions A, A$, B and B$, the set of values possible for degrees of confirmation [A|B] can be mapped one-one onto a closed set of reals such that the mapped real values f([A|B])>f([A$|B$]) just in case [A|B] > [A$|B$]. The obvious limitation of a system with this property is that it cannot accommodate inference problems that require larger value sets, such as infinitely great or infinitesimally small degrees (or at least not without nonstandard reals). We can readily contrive problems that require such extensions. For example, consider the problem of picking a real number in [0, 1] ‘at random.’ That the number is in the interval [0, 0.5] is finitely more probable than in the interval [0, 0.4], which is infinitely more probable than in the discrete set {0, 0.1, 0.2}, which is finitely more probable than in the set {0, 0.1}. 7 Sufficiency and independence The properties F. Framework, A. Addition, B. Bayes property and R. Real values are necessary if degrees of belief are to be probabilities. That they are sufficient follows from theorems in Aczel ([1966], pp. 319 – 24). That is, they are sufficient in the sense that, for each connected region26 of the set of propositions, there exists a rescaling of the real values assigned to the degrees by R. Real values such that the rescaled values obey the probability calculus. The independence of A. Addition, B. Bayes property and R. Real values from one another is obvious. That independence is important here since it is urged that we should implement these properties selectively, according to the problem at hand. There is some further independence of A. Addition and B. 26 A connected region is a set of propositions such that for any two propositions V and W in the set, there exist other propositions C1, C2, . . ., Cn in the set such that all of V&C1, C1&C2, . . ., W&Cn are in the set. Probability Disassembled 163 Bayes property from F. Framework. The most interesting is their independence from F1c. Transitivity. For that shows that A. and B. may obtain not just when the degrees of belief are not reals, but also when they are not even partially ordered. The demonstration of the consistency of A. and B. with a nontransitive value set is achieved by displaying an example that has all three.27 8 Illustrations It is urged here that we should not seek the one, true combination of properties that yields the universally true logic of induction. Rather, in accord with the material theory of induction (Norton [2003a], [2005]), we should invoke just those properties in each domain warranted by the material facts prevailing in each domain. So each domain will prove to have its own characteristic logic of induction. Some illustrations follow. 8.1 All properties retained If the circumstances are governed completely enough by stochastic, physical laws, we will have sufficient material facts to warrant all the properties that comprise the probability calculus. Imagine, for example, that we randomly sample an atom of naturally occurring Uranium and seek evidence for its half- life. The evidence is that it does not undergo radioactive decay over the period of a week. To what degree does that evidence confirm each of the three half- lives possible for this atom? The known distribution of isotopes in naturally occurring Uranium fixes the physical chances for our sampling each of them. They are, by atoms in natural uranium: U-234 is 0.0054%, U-235 is 0.72% and U-238 is 99.275%. These chances fix our prior probabilities that the atom is the corresponding isotope with the characteristic half-life. Those half-lives are: U-234, 244,500 years; U-235, 703,800,000 years and U-238, 4,468,000,000 years. These half-lives, in conjunction with the rule of radioactive decay, give the physical chances for each isotope persisting for a week without decay. 27 Values are pairs (r," ) of reals, where 0 ! r ! 1 and 0 ! " < 360. The quantity " will behave like an angle variable whose value always remains in [0,360), so two " ’s are added or subtracted modulo 360 (written ‘&m’ and ‘+m’). The ranking is defined by (r$," $) > (r,") when r$ > r or, if r$ = r and neither are 0 or 1, 0 < " $-m" < 180. Also for any 0 < r < 1, (r$," $) = (r,") if " $-m" = 180. This ranking is intransitive: (0.5, 0) > (0.5, 240) > (0.5, 120) > (0.5, 0). The maximum and minimum values are (1," ) and (0," ), where, for these two cases, all (1," ) are taken to be same and all (0," ) are taken to be same, for all " values. The addition operator % is implemented as (r$ ," $) % (r,") = (r$+r, " $ +m "). The multiplication operator ' is (r$ ," $) ' (r," ) = (r$ .r, " $ +m "), when neither r$ nor r is 1; as (r$ ," $) ' (1," ) = (r$, " $); or as (1," $) ' (r,") = (r,"). The operator % is strictly increasing in both arguments, as is ', excepting in the latter case when either argument is (0," ). 164 John D. Norton These physical chances provide the likelihoods that figure in the obvious, fully probabilistic analysis. The scenarios imagined in Dutch book arguments give us another case in which we would use the full probability calculus. If we are in a casino, gambling, such that all the conditions in those scenarios obtain, then we ought to reason on the outcomes of the various games by means of the probability calculus. These two examples illustrate how the notorious problem of selecting the right interpretation of the probability calculus28 is greatly ameliorated in a material theory of induction. The appropriate interpretation will vary from domain to domain; we are absolved from the impossible burden of finding the one, universally correct interpretation that fits every case. In the case of sampling Uranium, all the propositions over which we reason are related by physical chances governed by the probability calculus, so we are able to set our degrees of confirmation by those chances. An objective interpretation will fit these probabilistic degrees best. They represent something like the relative frequency of truth among many physical systems relevantly similar to the present one. In the case of the casino, however, the degrees have a very different meaning. They are now internal accounting factors that, if employed in the appropriate way, have the pragmatic value of protecting us from harm. 8.2 Bayes property only retained A slight adjustment of the Uranium sampling problem above produces a problem in which we will dispense with A. Addition and R. Real Values. Instead of sampling atoms, imagine that we are given N atoms of some radioactive element of unknown half-life that we wish to determine. Our evidence is that over time t, n of the N atoms decay. By presumption, we have no idea of the half-life of the element. That is, we have no idea of the size of the time constant # in the rule of radioactive decay, which tells us that the physical chance of decay of one atom over time t is c(t) = (1 & exp(&t/# )). (4) and # relates to the half-life t1/2 as t1/2 = # ln 2. If we were to attempt a probabilistic analysis, our prior probability would be the uniform prior over all values of # , from 0 to infinity. That is, the probability density p(# ) = constant > 0 (5) 28 For recent discussion, see (Gillies [2000]; Galavotti [2005]; Mellor [2005]). Gillies ([2000], Ch 8,9) describes and advocates a pluralism over interpretations of the probability calculus. Probability Disassembled 165 This is an ‘improper prior,’ since it cannot be normalized to unit total probability. Many statisticians have been tempted to use such priors, since they can yield useful results. Yet they have been tormented by them, since they violate the probability calculus and only sometimes yield normalizable posterior probabilities (Rosenkrantz [1981], §4*.2). We are inclined to employ an improper prior precisely because the material facts of the inference problem do not call for its additivity. So the material theory of induction allows us to dispense with additivity. Compare this with the case of sampling Uranium above. Our uncertainty over the half-lives resulted from a sampling process, governed by physical chances. So, in conforming our degrees of confirmation to the chances, these degrees had to be additive. In the new problem, our uncertainty does not result from any physical process governed by chances. It is just plain ignorance. Therefore, as we saw in the discussion of A. Addition above, we should not require our prior beliefs to be additive probabilities and, therefore, need not be troubled by the impropriety of the prior (5). Since the principal properties of A. Addition and B. Bayes property are independent, all other aspects can remain essentially the same. We will still learn about the half-lives from observed decay times, governed by the physical chances of (4). So as before, we will still expect the dynamics of confirmation to be governed by B. Bayes property, with the likelihoods provided by physical chances. To conform with the countable set of propositions supposed in F. Framework, we will replace the continuous range of # values with a countable set of intervals. The propositions # i assert that the time constant # for the element lies in the small interval i$# to (i + 1)$# for small $# and i = 0, 1, 2, . . . Let E be the evidence that n of the N atoms decay in some fixed time t. Bayes’ theorem (3) asserts [# i|E] ' [E|!] = [E|# i] ' [# i|!] (6) For the remaining analysis, we will also dispense with R. Real values, since it is not needed for the outcome. To express our ignorance over the value of # , we will assume the prior [# i|!] has some fixed value greater than [Ø|!], independent of # i. Similarly we will assume only that the expectedness has some value [E|!]>[Ø|!].29 The physical chances of n decays among N atoms over time t when the time constant # is within the interval # i are approximated arbitrarily well for small enough $# by: c(E|# i) = [N!/(n!(N & n)!)](exp(&t/# i))N&n(1 & exp(&t/# i))n$# (7) The likelihood [E|# i] will be set by these physical chances in the sense that [E|# i] will be a strictly increasing function of c(E|# i). It is a familiar result for 29 We choose values other than [Ø|!] for these two quantities to preserve the invertibility of '. 166 John D. Norton the binomial expression of (7) that, for t, n and N fixed, c(E|# i) has a maximum value when exp (&t/# max) = (N & n)/N. That is, # max = t/ln(N/(N & n)) (8) So the likelihood [E|# i] has a maximum at # max. Finally, since the expression [# i|E] ' [E|!] in Bayes’ theorem (6) is invertible in the posterior [# i|E], it follows that the posterior [# i|E] has a maximum at # max. That is, on the evidence E, the time constant with the highest degree of confirmation is # max. This is the result we would expect. For example, if N/2 atoms decay in time t, then we would expect t to be a good estimator of the half-life t1/2 = # ln 2. In this case, (8) becomes t = # max ln 2. It is also evident from its derivation that (8) is a maximum likelihood estimator of # . Finally, the analysis can be repeated, replacing # with a function of # . For example, we can re-express the law of radioactive decay (4) using % = 1/# and adopt a prior indifferent to values of %. We arrive at the same estimator # max of (8). 8.3 Induction without additivity and Bayes property The last example gave a principled reason for dispensing with the additivity of the prior. Otherwise the analysis was not so different from the familiar probabilistic one. That is so, since we were inferring inductively about propositions governed by physical chances. We recover an analysis that is very different from the familiar ones if we consider systems whose uncertainties are not governed by physical chances. Many of our physical theories, including Newtonian physics, allow indeterministic systems. These are systems whose present states do not fix their future states and — the key fact of importance here — our physical theories provide no physical chances for the different futures. They tell us only that they are possible (Alper et al. [2000]; Norton [1999]). One of the simplest Newtonian examples is ‘the dome,’ described more fully in Norton ([2003b], §3). A point mass sits motionless at the apex of a dome with circular symmetry and is able to slide frictionlessly over it. See Figure 2. If the shape of the surface is chosen appropriately, Newton’s equations admit solutions in which the mass remains at rest indefinitely at the apex, or for which the mass remains at rest for some arbitrary time T and then spontaneously accelerates in any radial direction.30 30 An appropriate shape is h = (2/3g)r3/2 , where r is the radial distance in the surface of the dome and h the vertical distance below the apex; g is the acceleration due to gravity. For a unit mass on the surface, Newton’s laws entail an outwardly directed acceleration field, F = d2r/dt2 = r1/2 . This equation is solved by r(t) = 0, for all t; and by a spontaneous excitation at T: r(t) = 0, for t ! T and r(t) = (1/144)(t&T)4 , for t " T. Probability Disassembled 167 r=0 h = (2/3g)r3/2 F = r1/2 r Figure 2. The dome: an indeterministic system31. How are we to represent our uncertainty over the time T of spontaneous acceleration? The natural answer is to treat the problem as analogous to radioactive decay and assume that the timing of spontaneous acceleration is governed by the law of radioactive decay (4). The difficulty is that this law requires a time constant # . A # of a millisecond has a very different meaning physically for whether the excitation is likely to happen soon than does a # of a millennium. Nothing in the physics supplies such a time constant. Indeed any proper probability distribution must employ some sort of parameter to govern the rate at which the integrated distribution approaches unity for large times. Yet no parameters are provided by the physics. Or one may try an improper distribution that merely sets the probability of spontaneous acceleration proportional to the size of the time interval in question. That still goes beyond the physics, since it asserts that spontaneous motion is twice as probable in a time interval that is twice as large. The physics knows nothing of this. It merely asserts that spontaneous motion in both time intervals is possible. There is no notion of ‘twice as possible.’ The attempt to represent our uncertainty with a probability measure imposes structure on the problem that is not present in the physics. Therefore, according to a material theory of induction, we cannot use probabilities to represent our uncertainty. What should we use? The physics gives us the structure. It assigns three values to propositions concerning the system: impossible, possible and necessary. Just as we conform our degrees of confirmation to physical chances when they are present, we should take these three as the values possible for our degrees of confirmations. We shall abbreviate them as ‘imp,’ ‘poss’ and ‘nec.’ They are assigned to the propositions E(T1,T2), with T1. Norton, J. D. [2005]: ‘A Little Survey of Induction’, in P. Achinstein (ed.), 2005, Scientific Evidence: Philosophical Theories and Applications, Johns Hopkins University Press, pp. 9 – 34. Norton, J. D. [forthcoming]: ‘Einstein’s Special Theory of Relativity and the Problems in the Electrodynamics of Moving Bodies that Led him to it’, in M. Janssen and C. Lehner (eds), Cambridge Companion to Einstein, Cambridge: Cambridge University Press. Probability Disassembled 171 Norton, J. D. [unpublished]: ‘Ignorance and Indifference’, available online at . Rosenkrantz, R. D. [1981]: Foundations and Applications of Inductive Probability, Atascadero, CA: Ridgeview. Savage, L. J. [1972]: The Foundations of Statistics, 2nd revised edn, New York: Dover. Shafer, G. [1976]: A Mathematical Theory of Evidence, Princeton: Princeton University Press. Smith, C. A. B. [1961]: ‘Consistency in Statistical Inference and Decision’, Journal of the Royal Statistical Society Series B, 23, pp. 1 – 25. Tversky, A. and Kahneman, D. [1982]: ‘Judgments of and by representativeness,’ in D. Kahneman, P. Slovic and A. Tversky (eds), Judgment Under Uncertainty: Heuristics and Biases, Cambridge: Cambridge University Press, pp. 84 – 98.