Error Statistics and Duhem's Problem Error Statistics and Duhem's Problem* Gregory R. Wheelertt Departments of Philosophy and Computer Science, University of Rochester No one has a well developed solution to Duhem's problem, the problem of how ex- perimental evidence warrants revision of our theories. Deborah Mayo proposes a so- lution to Duhem's problem in route to her more ambitious program of providing a philosophical account of inductive inference and experimental knowledge. This paper is a response to Mayo's Error Statistics (ES) program, paying particular attention to her response to Duhem's problem. It turns out that Mayo's purported solution to Du- hem's problem is very significant to her project, for the epistemic license claimed by ES and the philosophical underpinnings to her account of experimental knowledge depend on this solution. By introducing the partition problem, I argue that ES fails to solve Duhem's problem and therefore fails to provide an adequate account of experimental knowledge. 1. Introduction. Duhem's problem arises when we have experimental evi- dence that is contrary to a theory's predictions. Given this situation, we have reason to believe that at least one of the statements of the theory plus auxiliaries is false: the conjunction of theoretical statements, the aux- iliaries, and the experimental evidence statement is inconsistent. But do we have adequate grounds for determining which statement among the set is to blame? Pierre Duhem argued that such grounds are not found in laboratory notebooks per se but rather in the good sense of their authors. The very nature of experimental evidence renders its bearing on theory essentially opaque. So, according to Duhem, treating experimental evi- dence as if it determined which statement is to blame is mistaken. Most *Received May 1999; revised April 2000. tSend requests for reprints to the author, Department of Philosophy, 534 Lattimore Hall, Rochester NY 14627; email: wheeler@philosophy.rochester.edu. Previous versions of this paper were presented at Cornell University, University of Rochester, M.I.T., and University of Lethbridge. The author wishes to thank Prasanta Bandypadhyay, Earl Conee, Heidi Dankosh, Joe Halpern, Deborah Mayo, and espe- cially Henry Kyburg and an anonymous referee for their comments. Philosophy of Science, 67 (September 2000) pp. 410-420. 0031-8248/2000/6703-0004$2.00 Copyright 2000 by the Philosophy of Science Association. All rights reserved. 410 ERROR STATISTICS AND DUHEM S PROBLEM 411 commentators have found this holistic account less than satisfactory. If we assume that decisions of this kind are rational and that experimental evidence plays a significant role in their being rational, then Duhem's problem is just the problem of determining how experimental evidence warrants assigning error to one statement but not another. Recently Deborah Mayo (Mayo 1996b) has proposed a novel solution to Duhem's problem, a proposal that explicitly rejects the claim of evi- dential opacity and thus whose burden it is to show "that there are good grounds for localizing the bearing of evidence" (Mayo 1996b, 102). Mayo's proposal stems from her Error Statistics account of experimental knowledge (ES). Her idea is that in many experimental situations Duhem's problem can be resolved because not all alternative hypotheses are at once susceptible to revision. We say "many" situations and not "all" since not every occasion of disconfirming evidence does Duhem's problem make: sometimes the reasonable thing to do is to collect more evidence. Mayo accounts for this easily enough by appealing to a fundamental distinction found in classical statistics between two kinds of cases: (i) those in which there are positive grounds for attributing the error to some statement h, and (ii) those in which there are inadequate grounds for attributing the error to statement h. So what we need to know is what it is to have good grounds for attributing error to a statement. Mayo adopts Karl Popper's slogan that we learn the most about hy- potheses which are severely tested. But rather than propose an update to Popper's falsificationism, Mayo instead grounds her notion of severity in classical statistics. Under ES, severity is a property of statistical method that ensures that a test of h is a good one. Curiously, however, severity "attaches to a particular hypothesis passed" (Mayo 1997a, 250) thereby granting the hypothesis epistemic warrant. By identifying " 'having good evidence for h (or just having evidence for h)' and 'having a good test of h," ' Mayo then identifies "whether e counts as good evidence for h ... [with] whether h has passed a good test with e" (Mayo, 1996b, 179).1 An hypothesis h then is acceptable just to the extent that it passes a severe test. So of the cases that call for a solution to Duhem's, problem we can expect each to satisfy this severity requirement. What then constitutes a good (i.e., severe) test? According to Mayo, a severe test T is one such that "there is a very low probability that test procedure T would yield such a passing result, if hypothesis h is false" (Mayo, 1997, 248). So, in type (i) cases we need a good test that determines whether or not an auxiliary statement, A, is to blame for the contrary experimental result, e'. Given e', ES says we may consider a statement for blame only when a test is run which measures the probability of accepting 1. I substitute 'h' for Mayo's 'H'. 412 GREGORY R. WHEELER the statement when it is false (i.e., measures the probability of committing a Type-I error). So, under ES a statement (hypothesis) h is shown to be in error as a result of e' only if the alternative hypothesis A has been shown to pass a severe test (Mayo 1996b, 108). Since often there is more than one alternative hypothesis we may generalize ES's severity condition for type (i) cases as follows: Severity Condition: Hypothesis h is shown to be in error as a result of e' only if: given the set of auxiliary statements F, all An E F have been shown to pass severe tests. Thus we have a necessary condition for the ES solution to Duhem's problem. In the next two sections I argue that ES cannot satisfy this con- dition. Hence, if it is not the case that all An have been shown to pass severe tests, then h is not shown to be in error as a result of e'. And if h is not shown to be in error as a result of e', then there are inadequate grounds for attributing the error to h. But any case in which there are inadequate grounds for attributing error to some hypothesis is a case of type (ii) and hence is not a solvable Duhem case. The upshot of our ar- gument is that ES renders all cases type (ii) cases since it does not include an adequate account of when enough evidence is enough. 2. The Partition Problem. Our interest in this section is to see why the ES solution to Duhem's problem is inadequate. We observed that if ES treats all cases as cases in which there are inadequate grounds for assigning error to a statement, then we are left wondering how experimental evidence warrants rejecting one statement instead of another. In other words, if ES treats all cases as type (ii) cases, then we are left precisely with Duhem's problem. So, what reasons do we have to think that ES treats all cases as type (ii) cases? To begin, one may suspect that the severity condition invites a kind of third-man argument. We'll call this the testing regress. One could add auxiliaries indefinitely to a typical set F of given auxiliaries thereby intro- ducing an indefinite series of tests.2 Indeed, without restrictions on the auxiliary statements in F there are, in principle, an infinite number of tests. A recipe for inflating F in this manner is to randomly pick declarative sentences out of the language, without replacement, construct an auxiliary stating it doesn't affect the test hypothesis and then test whether the fac- 2. Mayo, adopting Suppes' (Suppes 1969) notion of a model, presents ES "as a series of conceptual representations or models ranging from the primary scientific hypothesis or questions ... to the nitty gritty details of the generation and analysis of data" (Mayo 1996b, 128). Notice that this maneuver doesn't resolve the testing regress; nothing in Suppes' sketch of data models precludes there being an infinite series of such models, given the task Mayo assigns for them. ERROR STATISTICS AND DUHEM S PROBLEM 413 tor(s) denoted by each such construction is responsible for an error. But, of course, if there are infinite tests then not all An-- E F could be shown to pass a severe test. So there would be inadequate grounds for blaming h for e' and Duhem's problem would remain. As formulated, the testing regress objection bears a similarity to what Mayo calls the alternative hypothesis objection an objection she contends only bears against Popper's account of severe testing, not hers. For Pop- per, h passes a severe test with e only if all so-far-considered hypotheses have been tested and each entails --h. But this objection doesn't apply to ES, since for Mayo a severe test of h must, with high power, probe the ways that h can err, and need not test an alternative h'. Severity, then, is a property that is always assessed within some context or theory. As Mayo observes: Satisfying the severity requirement demands that we make our ques- tions appropriately small or local. .. . By using simple local contexts in which the assumptions may be shown to hold sufficiently, it is pos- sible to ask one question at a time. (Mayo 1997, 254. Italics added and deleted.) Notice that these methods correspond to the goal of satisfying the experimental assumptions .. . Then there is an array of extraneous factors assumed to be either irrelevant to the effect of interest or sat- isfactorily controlled. The correctness of this assumption can, in prin- ciple, come up for questioning after-the-trial. (Mayo 1996b, 144). Mayo's proposal then is this: localize test questions by sorting out what is relevant for testing from what is safe to assume is irrelevant. This will reduce the number of factors to a manageable size where ES can do its work, that is where we can severely test single hypotheses. Popper's ac- count fails, then, precisely because it does not accommodate the hierar- chical structure of experimental inquiry. Presented with an anomaly, the hypothetico-deductive method leaves a disjunction of negated state- ments-the negation of each An F, plus --h. ES works, we're told, because it imposes a structure on the statements in F, sorting them into relevant models simple enough for error statistical methods to work out a solution to Duhemian cases. So, the severity condition in play imposes a structure on F: Structured Severity Condition: Hypothesis h is shown to be in error as a result of e' only if: given a finite set of relevant auxiliaries, .R1....,Rk} c c E F, all Rk F F have been shown to pass severe tests. Underlying this proposal, however, is the claim that these structures rest on good grounds. In other words, the error probabilities that underpin 414 GREGORY R. WHEELER localized experiments must themselves be tested or shown to hold, even if only in principle. This last claim is essential for establishing ES's normative-epistemic credentials and is the target of my criticisms. It is essential because if it turns out that this structure cannot be accounted for within ES, then the claim that evidence isn't opaque is undermined: re- jecting statements becomes more than a matter of evidence and method, at least as those notions are construed and employed within ES. The key then is the structure of F. According to Mayo, by demanding that each test be specific, we are forced to sort out what is relevant and, hence, what are likely sources of error. Each test then has a set of extra- neous factors that we ignore or control for, and a smaller "relevant" set that we pay close attention to. This latter set of factors is just what our experiment is about; they are the target properties that are measured, ex- amined, and from which we learn. For example, Adams and Laplace's test of the predicted acceleration 6 of the moon involved a set of auxiliary statements, including: A1: tidal friction is not sufficient to affect measured lunar acceleration more than 6+n; A2: instrument X's margin or error is not sufficient to produce measurements of lunar acceleration more than 6-n; A3: seasonal movements of migratory birds do not affect measured lunar acceleration more than 6+-n and so on. These three statements are a subset of the set of auxiliaries F for Adams and Laplace's test. The factors in this example are four target properties: Tidal friction (of some magnitude %), instrument accuracy, lunar acceleration, and collective bird force. To avoid the problems which beset Popper, Mayo's proposal is to assume that most of the auxiliaries in F are about properties that are irrelevant to the hypothesis under test (like bird force and, in the original experiment, tidal friction), and so may be ignored. This leaves a few aux- iliaries that are controlled (like instrument error) and the test hypothesis involving the factors we are interested in. In the face of disagreeing evi- dence, we may have a hunch that the tidal friction auxiliary is a better hypothesis to reject than the bird-force auxiliary, but the promise of ES is the claim that there is an empirical method for justifying this preference; that is, that our decision is grounded by evidence and method alone. But notice what is required to fulfill this promise. Each test carves up the auxiliaries An-o E F into auxiliaries to test and auxiliaries to ignore. So, associated with a test is a particular partition of F, say the ith of infinite possible partitions, that fixes which auxiliaries are to be tested and which to be ignored. The ignored auxiliaries make up that test's ceteris paribus condition. Fixing this partition is crucial to determining which of the Fn, will be tested and, therefore, which auxiliaries are candidates for "bearing the evidence" e'. For ES's promise to hold it needs to test each of these partitions, or be able to at least in principle. But notice that after-trial testing of the partition's placement (i.e., which of the Fn, is the right one) ERROR STATISTICS AND DUHEM S PROBLEM 415 is not possible, even in principle, since to try invites the testing regress. That is, if F is infinite in breadth, then so too are there infinite possible divisions of auxiliaries into those to test and those to ignore. So, if F is infinite then ES cannot get started, even in principle. Since passing a severe test depends on selecting the right auxiliaries to test and the right ones to ignore, the ES version of Duhem's problem is simply this: on what grounds are we justified fixing the partition of a test's auxiliaries one way rather than another? For convenience, let's call this the partition problem. 3. When r Is Finite. It is important to realize that the partition problem does not depend on F being an infinite set of statements. Even if we sup- pose that F is finite, the number of modalities generated by even toy ex- periments is sufficiently large to introduce the partition problem. To see this we'll look closely at an example Mayo borrows from R. A. Fisher. Suppose there is a woman who claims to distinguish by taste alone whether tea or milk is added first to a mixture of tea and milk. Suppose we are interested in testing this claim. Let h be: Lady can discriminate order by taste, and let the null hypothesis h. be: Lady cannot discriminate order by taste. Fisher reasoned that someone who failed to discriminate the order by taste would do no better than chance at determining the correct order of the mixture. Thus, the question at hand (i.e., whether she has the ability) is reduced to considering two hypotheses h and ho. The binomial chance model is assumed to accurately model the results of her failing to have the ability. So, h is confirmed or "warranted" to the extent that the experimental record of her correct guesses differs significantly from the results of flipping a fair coin. We infer whether her guesses are significantly different from flips of a fair coin from probability theory. Suppose we prepare 5 teacups for her to sample. This creates 25 possibilities, or 32 possible outcomes. The prob- ability of choosing the correct milk-tea order by chance in all 5 cups is just the probability of picking one of the 32 possible sequences, or .03125. This probability is the probability of committing a Type-I error-i.e., the probability of accepting h (rejecting the null h.) when h is false.3 But notice an assumption that we are making to get this far in the example. Mayo writes that "in order for the comparison offered by the statistical link in the experimental model to go through, the assumptions of the experimental model must hold sufficiently in the actual experiment" 3. In a five-cup case, let c be the number of cups classified correctly. The probability of guessing c correctly is calculated by 2 We reject the null if c= 5, since the probability of c = 5 is 1/32, if ho is true. 416 GREGORY R. WHEELER (Mayo 1996b, 136). One assumption that must hold is that the subject isn't tipped off by something other than the taste of the samples. Since we are measuring the lady's ability to discriminate by taste, our confidence that we are only exposed to a 1 in 32 chance of her making the right choice all five times and yet not having the ability to do it by taste turns on this assumption holding. So, the very idea of a severe test is predicated on the assumption that the power of our test is quite high. Yet, on what grounds do we know that it is? As a first precaution, we might wish to randomize the order of the treatments so that we can avoid giving clues to the subject. We might begin by randomizing the order of the milk-tea mixture for each treatment, al- tering between milk first and tea first. We may even wish to randomize (or standardize) the presentation of the cups too, in case there is an or- dering of the cups' masses, or rim thickness that correlates with the mix- ture order. Notice that what we are doing is controlling possible factors that may reduce our confidence in our probability assignment for Type-I errors. We've controlled for the possibility that the experimenter knowing the order of the mixture influences the subject's performance, and the possibility that a non-random order of the teacups may give a clue of the order to the subject, respectively. What we are articulating is the class of auxiliaries to test and those to let pass. Specifically, we are describing the class of controlled factors that have corresponding auxiliaries in some ith partition of F: (Fi). Hence, implicitly we are fixing a partition. In designing our experiment we make judgments about what to include in the class of tested auxiliaries and what to push into our ceteris paribus condition. For instance, milk-tea mixture order and stirring are to be controlled for, randomizing the cups before presenting them to the experimenter might be a borderline case, and the make of the china most likely isn't considered a serious candidate at all. But how do we make these judgments about what to test and what to regard as extraneous? We might be tempted to cite our "good sense" or previous experience, if not for remembering Duhem's own solution to similar puzzles in the philosophy of experimental physics a century ago. That is, in so far as our previous experience can be codified into an em- pirical theory, we may ask on what grounds we accept it. The upshot is that even if we treat F as finite for toy experiments like tea tasting we can easily inflate it to a size that demands a partition. Yet once F is partitioned, we then create a set of n number i's and once again are faced with the problem of determining which partition to settle on. 4. A Partial Solution? Mayo's solution to Duhem's problem fails because it depends on a given structure of the set of auxiliary statements that itself can't be justified by ES methods. Under ES we haven't good grounds to ERROR STATISTICS AND DUHEM S PROBLEM 417 prefer one partition over another, and so haven't good grounds for con- sidering contrary evidence to count against one hypothesis over another. But even though we don't have a full solution to Duhem's problem, we might wonder whether ES provides a "partial" solution to the problem. Suppose we simply accept a certain partition as a matter of convention. ES might provide us with a means to test a limited number of viable alternatives against the current partition thereby giving us some empirical evidence that warrants selecting one over the other. While not solving Duhem's problem, this conventionalist approach might account for in- ductive practices within some agreed upon domain of inquiry; we might find solace knowing whether we can empirically compare our current the- ory to at least some others and have an empirical basis for evaluating the merits of alternatives vis-a-vis the current partition.4 Let's suppose we are given a particular partition of auxiliaries, Fi. What does accepting Fi tell us? Fi is a set of statements, after all, yet our interest lies in the factors those statements denote. Do we have the resources to compare 17 to 1j? It turns out that if we're to consider a revision, even when given a partition of 1, we still must construct a test akin to testing all n-partitions of F. Roughly speaking, to compare 1i and 1j we're forced to consider a test that eliminates any advantage accepting a particular structure of T gives. Simply accepting 1i doesn't provide us with enough information to effectively use ES to evaluate an alternative partition. To see the problem let's return to the tea tasting example. To compare the ith and jth partition of F we first need to see what we know from starting with 1i. Suppose that the ith partition includes in the class of untested auxiliaries A4: Using city tap water is not correlated with the subject recording correct responses better than chance. In considering an alternative, 1j, how do we evaluate F7 if A4 E F1? Since the city-water factor is under consideration we might wish to subject A4 to test. Suppose we do and there are inadequate grounds for rejecting A4. May we then infer that the city-waterfactor is not statistically relevant to the lady's ability to respond correctly better than chance? No, not as ES stands. The reason is that to do so would be to conclude that the city-water factor is independent of all other factors and, hence, not a constituent in a multi-factor effect. By accepting A4 we accept that the city-water factor alone isn't correlated with the subject performing better than chance, not that it has no effect at all. But suppose circumstances are such that it does affect the subject's performance, but only if the experi- ment uses unpasteurized milk and the tea kettle is brought to a rolling boil before pouring. In such circumstances it isn't the case that the city-water 4. Larry Laudan sketches a similar approach for ES in (Laudan 1997). See also (Kyburg 1990). 418 GREGORY R. WHEELER factor is irrelevant to the subject's performance. Yet, without keen theo- retical knowledge that extends beyond merely accepting Fi, ES methods alone couldn't help us with detecting this multi-factor effect. So long as we follow Mayo's recommendation of equating "having good evidence" with "having a good test," we are forced to test each auxiliary in T that pairs the city-water factor with all permutations of other factors. But this is unreasonable, for among the set of auxiliaries to test is a superstatistic that treats all factors as controlled and whose set of untested auxiliaries is the empty set. Leaving aside the computational expense and incompre- hensible size of such a statement, such a test renders ES epistemologically vacuous: for, again, it strips ES of the structure it needs to target evidence. One might suggest that we group the auxiliaries into "stable factor sets" in an attempt to represent previous experience. Suppose we determine that our n-factor auxiliary is not statistically relevant, but we're curious about an n + 1 factor chain. Couldn't we cut down on the number of total state- ments by grouping together auxiliaries denoting factors that have proven steady losers? No; not as ES stands. Even if the auxiliary testing the set of factors {f1, f2 . . . A 'f } is not statistically relevant to the subject re- porting correct guesses, we are without "good grounds" to infer that {f1, f2, f**A f,+ } is also not statistically relevant without testing the aux- iliary denoting that set. Notice, too, that "good grounds" isn't monotonic either. In other words, even if {f1, f2 . .. .fn' } is found not statistically relevant we couldn't infer {f2 f,3 . . . Af, } isn't statistically relevant too without testing that factor chain. So long as good grounds is akin to a good test, we haven't a viable way to navigate the factor space and actually learn from error. The upshot is that accepting Fi doesn't amount to the kind of knowledge we need to guide our use of ES methods. What is important to see is that we are forced not only to accept a ceteris paribus condition to test anything under ES, but must also rely on a robust knowledge base to direct those tests. The very idea of a severe test is predicated on having a very rich body of empirical knowledge that itself is warranted by means other than those provided by ES. To propose ES as account of experimental knowledge, then, is to have things turned around. To the extent that ES solves Duhem's problem, even partially, it does so by relying heavily on a rich body of knowledge that can't be accounted for by ES methods. It is the great experimentalist who knows how to use her limited resources and theoretical knowledge to probe the factor space to maximize her chances of learning about the system under study. ES simply fails to give a philosophical account of how this is done. 5. The Problem for ES. What is philosophically attractive about ES is its epistemic promise. Mayo's account is proposed as an account of how ERROR STATISTICS AND DUHEM S PROBLEM 419 experimental knowledge claims are warranted. The crux of ES is Mayo's notion of a severe test. But the ES notion of a severe test fails to do its own epistemic work required for solving Duhem's problem. We direct the tools of ES in precisely the manner that Duhem's problem concerns. In the end, ES describes such inferences and fails to explain the grounds for our preferences. That we in fact reduce the size of the factor space and in fact seem to target evidence is not in dispute: what we want, and ES leaves wanting, is an account of how this is done. In closing, note that this failure presents a pressing problem for ES in general. For Mayo appeals to a version of C. S. Peirce's thesis that in- ductive methods are "self-correcting" in order to justify ES methods. By developing my view of Pierce's error-correcting justification of in- duction I will . . . be developing the justification I need for error sta- tistical methods in science. The justification for these methods lies in their ability to control error probabilities, hence sustain learning from error, hence provide for the growth of experimental knowledge. (Mayo 1996b, 413) Yet, necessary for Mayo's version of Pierce's thesis is that "the [test] method should be able to detect its own errors in the sense of checking its own assumptions ... and it should be able to correct violations or 'sub- tract them out' in the analysis" (Mayo 1 996b, 421). But this, of course, is precisely what ES cannot do. The assumptions that distinguish good tests from bad are precisely those that cannot be checked by severe tests. In so far as an ES test identifies a statement to reject, it does so because of the wits of its designer, not the features of her test method. REFERENCES Ariew, Roger (1984), "The Duhem Thesis", British Journalfor the Philosophy of Science 35: 313-25. Ariew, Roger and P. Barker (eds.), (1996), Pierre Duhem. Essays in the History and Philos- ophy of Science. Indianapolis: Hackett Press. Duhem, Pierre ([1906] 1954), The Aim and Structure of Physical Theory. Reprint. Translated by P. P. Wiener. Originally published as La Thorie Physique. Son Objet, et sa Structure (Paris: Marcel Riviere & Cie). Princeton: Princeton University Press. Fisher, R. A. (1956), Statistical Methods and Scientific Inference. Edinburgh: Oliver and Boyd. Griinbaum, Adolf (1960), "The Duhemian Argument", Philosophy of Science 27: 75-87. Howson, Colin (1997), "A Logic of Induction", Philosophy of Science 64: 268-290. Kyburg, Henry E., Jr. (1983), Epistemology and Inference. Minneapolis: University of Min- nesota Press. Kyburg, Henry E., Jr. (1990), "Theories as Mere Conventions", in Wade Savage, (ed.), Scientific Theories. Minnesota Studies in the Philosophy of Science, vol. 14. Minneapolis: University of Minnesota Press, 158-74. Kyburg, Henry E., Jr. (1997), "Combinatory Semantics", Computational Intelligence 13: 215-257. Lakatos, Imre (1978), "The Methodology of Scientific Research Programmes", in John Wor- 420 GREGORY R. WHEELER rall and G. Currie, (eds.), Philosophical Papers: Vol. 1. Cambridge: Cambridge Uni- versity Press. Lakatos, Imre and A. Musgrave, (eds.), (1970), Criticism and the Growth of Knowledge. Cambridge: Cambridge University Press. Laudan, Larry (1997), "How about Bust? Factoring Explanatory Power Back into Theory Evaluation", Philosophy of Science 64: 306-316. Mayo, Deborah G. (1996a), "Ducks, Rabbits, and Normal Science: Recasting the Kuhn's- eye view of Popper's Demarcation of Science", The British Journalfor the Philosophy of Science 47: 271-90. Mayo, Deborah G. (1996b), Error Statistics and the Growth of Experimental Knowledge. Chicago: University of Chicago Press. Mayo, Deborah G. (1997a), "Severe Tests, Arguing from Error, and Methodological Un- derdetermination", Philosophical Studies 86: 243-66. Mayo, Deborah G. (1997b), "Error Statistics and Learning From Error: Making a Virtue of Necessity", Philosophy of Science 64 (Proceedings) : S195-S212. Mayo, Deborah G. (1997c), "Duhem's Problem, The Bayesian Way, and Error Statistics, or 'What's Belief Got to Do With It'?", Philosophy of Science 64: 222-244. Mayo, Deborah G. (1997d), "Response to Howson and Laudan", Philosophy of Science 64: 323-333. Quine, Willard V. 0. (1969), Ontological Relativity and Other Essays. New York: Columbia University Press. Suppes, Patrick (1969), "Models of Data", in Patrick Suppes, (ed.), Studies in the Method- ology and Foundations of Science. Dordrecht: D. Reidel, 24-35. Worrall, John (1993), "Falsification, Rationality, and the Duhem Problem", in John Ear- man, A. Janis, G. Massey, and N. Rescher, (eds.), Philosophical Problems of the Internal and External Worlds: Essays on the Philosophy of Adolf Griinbaum. Pittsburgh: Uni- versity of Pittsburgh Press. Article Contents p. 410 p. 411 p. 412 p. 413 p. 414 p. 415 p. 416 p. 417 p. 418 p. 419 p. 420 Issue Table of Contents Philosophy of Science, Vol. 67, No. 3 (Sep., 2000), pp. 355-558 Front Matter Two Kinds of Observation: Why van Fraassen Was Right to Make a Distinction, but Made the Wrong One [pp. 355 - 365] Do Large Probabilities Explain Better? [pp. 366 - 390] Conditional Probability and Dutch Books [pp. 391 - 409] Error Statistics and Duhem's Problem [pp. 410 - 420] The Anticipation of Necessity: Kant on Kepler's Laws and Universal Gravitation [pp. 421 - 443] Shocking Lessons from Electric Fish: The Theory and Practice of Multiple Realization [pp. 444 - 465] Robust Supervenience and Emergence [pp. 466 - 489] Evolutionary Explanations of Distributive Justice [pp. 490 - 516] Information and Structure in Molecular Biology: Comments on Maynard Smith [pp. 517 - 526] Book Reviews untitled [pp. 527 - 530] untitled [pp. 530 - 533] untitled [pp. 533 - 536] untitled [pp. 536 - 540] untitled [pp. 540 - 546] untitled [pp. 546 - 548] untitled [pp. 549 - 551] untitled [pp. 551 - 553] untitled [pp. 553 - 557] Back Matter [pp. 558 - 558]