Brit. ]. Phil. Sci. 40 (1989). 541-555 Printed in Great Britain Connectionism, modularity, and tacit knowledge MARTIN DAVIES ABSTRACT In this paper, I define tacit knowledge as a kind of causal-explanatory structure, mirroring the derivational structure in the theory that is tacitly known. On this definition, tacit knowledge does not have to be explicitly represented. I then take the notion of a modular theory, and project the idea of modularity to several different levels of description; in particular, to the processing level and the neurophysiological level. The fundamental description of a connectionist network lies at a level between the processing level and the physiological level. At this level, connectionism involves a characteristic departure from modularity, and a correlative absence of syntactic structure. This is linked to the fact that tacit knowledge descriptions of networks are only approximately true. A consequence is that strict causal systematicity in cognitive processes poses a problem for the connectionist programme. 1 Tacit knowledge 2 Modularity 3 Connectionism 4 Syntax 5 Tacit knowledge again 6 Conclusion I TACIT KNOWLEDGE It is natural to introduce the notion of tacit knowledge through Chomsky's work. In Aspects of the Theory of Syntax, he wrote ([1965], p. 8): Obviously, every speaker of a language has mastered and internalised a generative grammar [i.e. a system of rules] that expresses his knowledge of his language. This is not to say that he is aware of the rules of the grammar or even that he could become aware of them . . . This notion of tacit knowledge of the rules, principles, or generalizations of language recurs throughout his work; and several different pieces of terminology are used to express the same fundamental point. a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 5 4 2 Martin Davies Thus ([1965], p. 8), 'what the speaker actually knows' is equated with the speaker's competence. Then ([1976], pp. 164-5), in order to sidestep what are argued to be irrelevant objections based on intuitive connections—for example, between knowledge and justified belief, and between competence and ability—the technical term cognize is introduced, and is explicitly linked with tacit knowledge ([1980], pp. 69-70): The particular things we know, we also cognize. . . . Furthermore, we cognize the system of mentally-represented rules from which the facts follow.. . . And finally we cognize the innate schematism, along with its rules, principles, and conditions. .. . Thus 'cognizing' is tacit or implicit knowledge . . . [C]ognizing has the structure and character of knowledge, but may be and in the interesting cases is inaccessible to consciousness. (See also [1988], pp. 9-12.) Ordinary speakers know—in the familiar everyday sense—and also cognize facts about, for example, what various complete sentences mean. In addition, they cognize—even though they do not know in the ordinary sense—the facts from which those first facts follow. We might think of the first facts as stated by the theorems of a systematic theory. If we continue to focus on facts about what complete sentences mean, the systematic theory will be a semantic theory. Then, the basic idea would be this. Ordinary speakers cognize and know the facts stated by these theorems. They also cognize—even if they do not know in the ordinary sense—the facts stated by the axioms from which the theorems are derived in the theory. If we think of the issue in these terms, then it is easy to raise a major question which confronts any friend of the notion of tacit knowledge. There will always be extensionally equivalent theories: distinct sets of axioms from which we can derive the same theorems about, say, the meanings of whole sentences. Given that fact, does it make any empirical sense to suppose that an ordinary speaker tacitly knows, or cognizes, or has internalized, one set of axioms, rather than an alternative set from which just the same theorems of the relevant kind can be derived? Does it make any sense to suppose that one theory is psychologically real, rather than another extensionally equivalent theory? This is essentially Quine's challenge [1972] to the empirical credentials of the notion of tacit knowledge. Following a suggestion of Evans [1981], I would aim to respond to this challenge by construing tacit knowledge as a certain kind of causal- explanatory structure which underlies, or is antecedent to, the pieces of knowledge that the speaker has concerning complete sentences. We can make the main idea clear enough if we follow Evans in considering two semantic theories for a very simple little language L. This language has just one hundred sentences, constructed out often names and ten predicates. The names are 'a', 'V 'f. and the predicates are 'F', 'G' ' 0 ' . a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Connectionism, modularity, and tacit knowledge 543 Consequently, the sentences are 'Fa', 'Fb' 'Ff. 'Ga', 'Gb' '0;'. These sentences have meanings which—as we theorists can see from the outside— depend in a systematic way upon their construction. Thus, all sentences containing 'a' mean something about John; all sentences containing 'b' mean something about Harry; all sentences containing 'F' mean something about being bald; all sentences containing 'G' mean something about being happy; and so on. The two semantic theories that we are to consider are both theories of truth conditions for L. They assign just the same truth conditions to the sentences of L; but they differ in their internal or derivational structure. (We could just as well consider theories of meaning strictly so called; but theories of truth conditions have the advantage of familiarity.) The first theory. Ti, is the listiform theory. It simply has one hundred axioms, one specifying the truth condition of each sentence of the language. The axioms of Ti thus include: 'Fa' is true if and only if John is bald 'Ga' is true if and only if John is happy and so on. The second theory, T2 is a structured or articulated theory. It has an axiom assigning a semantic value to each name of the language; and likewise, an axiom for each predicate. For the name 'a', for example, we have 'a' denotes John and for the predicate 'F', for example, we have a sentence coupling a name with the predicate 'F' is true if and only if the object denoted by the name is bald. From the twenty axioms of T2, we can derive just the same truth condition specifications as those that can be derived trivially from the axioms of Ti. The two theories are extensionally equivalent; though they are not, of course, logically equivalent. Suppose that there is a speaker who uses the sentences of L. with the truth conditions which both theories agree in assigning. What evidence can we imagine having, which would incline us to attribute to that speaker tacit knowledge of the articulated theory T2, rather than merely of the listiform theory Ti? This is the question with which Quine's challenge confronts us. But more important than this evidential question is a constitutive one. What would it be for a speaker to have tacit knowledge of T2, rather than merely of T,? Evans himself gave a constitutive account of tacit knowledge in terms of dispositions ([1981], p. 328): a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 544 Martin Davies I suggest that we construe the claim that someone tacitly knows a theory of meaning as ascribing to that person a set of dispositions—one corresponding to each of the expressions for which the theory provides a distinct axiom. He added that, for the account to work as intended, the notion of disposition must be understood 'in a full-blooded sense'. Given such an understanding (p. 330): the ascription of tacit knowledge of T 2 . . . involves the claim that there is a single state of the subject which figures in a causal explanation of.why he reacts in this regular way to all the sentences containing [a given expression]. Thus, according to Evans's account, ascription of tacit knowledge of T2 involves the attribution to the subject of twenty distinct dispositions, and twenty distinct causal explanatory states—one for each name and one for each predicate of the language. It is helpful to think of Evans's basic idea in the following way. In theory T2, but not in theory Ti, the derivations of truth condition specifying theorems for the sentences 'Fa' and 'Ga' involve a common factor: namely, the axiom for the name 'a'. Likewise, the derivations of theorems for the sentences 'Fa' and 'Fb' involve a common factor. For tacit knowledge of T2, and not merely of Ti, we require that where there is, in the theory, a derivational common factor there should be, in the speaker, a causal common factor. Roughly, for a speaker to have tacit knowledge of a particular articulated theory, there must be a causal- explanatory structure in the speaker which mirrors the derivational structure in the theory. This rough idea requires a number of refinements (Davies [1987]). But for present purposes, it is sufficient to observe two attractive features of any refinement of the basic idea. The first attractive feature is that there can certainly be empirical evidence for or against a particular kind of causal structure in a subject. If attributions of tacit knowledge are basically attributions of structures of causal-explanatory states, then such attributions make perfectly good empirical sense: and they can, in principle, be grounded in empirical evidence. Thus, we meet Quine's challenge. The second attractive feature is that the basic idea, and refinements of it. do not require that in order to have tacit knowledge of an articulated theory a speaker must conceptualize the axioms or rules of the theory. The basic idea leaves room for a distinction between tacit knowledge and propositional attitudes like belief (see Davies [1989]). In fact, the account does not require that there be any explicit represen- tations—doxastic or subdoxastic, personal or subpersonal—of the axioms or rules that are tacitly known. Tacit knowledge can be realized by the presence of a processor rather than the presence of a collection of representational states, provided that the processing exhibits the requisite causal structure. The fact a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Connectionism, modularity, and tacit knowledge 545 that the account does not require explicit representation in no way trivializes it; for there is all the difference in the world between processing with a structure that mirrors the derivational structure in T2 and processing whose structure merely mirrors the derivational structure in TV For example, a processor with an autonomous component for each sentence of L would meet the latter condition, but not the former. 2 MODULARITY The notion of modularity can also be introduced via Chomsky's work. In Knowledge of Language [1986], he recommends distinguishing between internalized language—that is, I-language—and grammar. I-language is 'some element of the mind of the person who knows the language' (p. 22); a grammar, in contrast, is a theory of I-language. A grammar is not a cognitive structure; it is a linguist's theory. Now, Chomsky describes grammars as modular (p. 71); and his exposition of the current state of linguistic theory is under the heading 'Modules of grammar' (p. 160). In this use of the term, a module is a subtheory of a linguist's theory of I-language. If a particular grammar is a correct theory of the I-language of a speaker, then the language faculty of that speaker can be characterized—at one level of description—by that grammar. If the grammar is modular, then the language faculty that it characterizes can itself be said to be modular; it is an articulated information processing system. Thus Chomsky says ([1986], p. 204): The. general idea that the language faculty involves a precisely articulated computational system—fairly simple in its basic principles when modules are properly distinguished, but quite intricate in the consequences that are produced—seems reasonably well established. A module within the language faculty will be a subsystem that is characterized by a module of the grammar. For a grammar to be a correct theory of I-language is for it to be psychologically real, or tacitly known. And what this requires—according to Section 1—is that there should be causal common factors lying behind pieces of linguistic knowledge in the speaker, exhibiting a pattern that mirrors the way in which there are derivational common factors lying behind the pronouncements of the grammar. Thus far, we have two different notions of a module. We could label these notions. On the one hand, there are modules in the analytical sense: constituent subtheories of a theoretical characterization of a cognitive task. Chomsky's modules of grammar are modules in this sense. On the other hand, there are modules in the processing sense: constituent subsystems of a cognitive system. Fodor's modules (Fodor [1983]) are modules in this sense; although Fodorian a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 546 Martin Davies modularity involves characteristics not required simply bymodularity in the processing sense. But there are more notions of modularity than just these two. For just as the processing sense of modularity is the result of projecting the analytical notion to the tacit knowledge level of description, so we can introduce notions of modularity at other levels of description too. Suppose, for example, that we have identified—at the level of theoretical characterization—two components C and D of some cognitive task. Suppose further that, as a matter of empirical fact, the cognitive system under study does perform the task in question by having inter alia component subsystems that carry out the subtasks C and D. This would be a highly non-trivial empirical fact. But it would leave open the further empirical question whether the parts of the brain that subserve the performance of task C are distinct from the parts of the brain that subserve the performance of task D. In fact, there are a number of more precise questions that we can ask when we move to the neurophysiological level of description. One which is of some importance is the question whether the geographical region of the brain implicated in task C overlaps, or is disjoint from, the region implicated in task D. The importance of this question is that, the more the respective regions overlap, the less likely it is that the brain in question could, in practice, be damaged in such a way as to disturb the performance of one task while leaving the performance of the other task intact. We thus have three different notions of modularity, belonging at three different levels of description and explanation. (These correspond very roughly with Marr's three levels: [1982], pp. 24-5.) There is a clear enough distinction between the analytical notion and the processing notion, even though they are closely related: one kind of modularity is a feature of theories, the other is a feature of systems. Both the processing notion and the neurophysiological notion specify an empirical feature of systems, but the distinction between these two notions is crucial nevertheless. For example, cognitive neuropsychology is the branch of cognitive psychol- ogy in which models of normal cognitive processes are evaluated in the light of data provided by observations of people with acquired disorders of cognition. The classical form of.argument in cognitive neuropsychology is from an observed double dissociation of deficits to a claim about modularity. The systems X and Y that are responsible for the performance of the tasks A and B are argued to be independent systems or separate modules, on the grounds that performance of each of the tasks can be impaired while performance of the other remains intact. The cognitive neuropsychologist infers modularity from findings of dissocia- tions. But he does not generally infer absence of modularity from the failure to observe dissociations. Rather, if dissociation between the performance of two a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Connectionism, modularity, and tacit knowledge 547 tasks is not found, then the cognitive neuropsychologist considers two possible explanations. One possible explanation is that the cognitive model is incorrect; the two tasks A and B are really performed by a single integrated system. The other possible explanation is that, although there are indeed two independent information processing systems present, psychologically unimportant features of neurophysiology prevent the systems from being damaged separately. (On these issues, see Coltheart [1985].) The cognitive neuropsychologist is theorizing about modularity at the processing level; but his arguments are complicated by the fact that modularity at that level might not be matched by modularity at the neurophysiological level. 3 CONNECTIONISM The processing level—as we have so far characterized it—is a level at which the description of a system is an interpreted (semantic, cognitive, or content using) description. The interpreted description is cast in the same terms as the theoretical characterization of the task at the analytical level; this is particularly clear if we think of the interpreted description as a tacit knowledge description. In fact, the simple equation of the processing level with the level of tacit knowledge description is potentially misleading. A description at the tacit knowledge level specifies the information that the system draws upon. But a full description at the processing level should surely do more than that; it should specify, in addition, how the information is drawn upon. To the extent that the processing level is to be identified with Marr's level two—the level of the algorithm—the tacit knowledge level should be distinguished as a slightly higher level of description. (Peacocke [1986] labels it level 1-5.) What we really have is a hierarchy of levels of coarser and more detailed interpreted descriptions of the way in which the task is carried out. But as well as all these levels of interpreted description, there are also uninterpreted descriptions which are still different from descriptions at the physiological level. For example, those classical computational theorists who favour the symbol manipulation paradigm recognize a level of uninterpreted syntactic descrip- tion. This is not to say that every state which has an interpreted, or semantic, description also has a description as a representational state with a syntax. For a piece of tacit knowledge can be realized by the presence of a computational processor. But, what is insisted upon is that the representational states which constitute the domain of the processor should be syntactically structured states. Thus, Fodor says ([1987], p. 25): [The representational theory of mind] says that the contents of a sequence of attitudes that constitutes a mental process must be expressed by explicit a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 548 Martin Davies tokenings of mental representations. But the rules that determine the course of the transformation of these representations . . . need not themselves ever be explicit. (C/, Fodor [1985], p. 95.) The friends of parallel distributed processing (PDP) also recognize a level of uninterpreted description which is quite distinct from the physiological level. At this level, the descriptions are in terms of activation at nodes or units, mediated by weights or strengths attached to connections between the units. Let us label this level of formal description of a connectionist system the network level. On the face of it, the availability of this level of uninterpreted description does not count against the validity of interpreted, or semantic, descriptions of a connectionist network. Indeed, just as the classical theorist recognizes representational states and computational processes as vehicles of semantic content, so too, the connec- tionist assigns semantic content to two kinds of patterns within networks. Some of the information in a system is realized by particular patterns of activation of the units (Smolensky [1988], p. 6): The entities in the [network] with the semantics of conscious concepts of the task domain are complex patterns of activity over many units. Each unit participates in many such patterns. And some of the information is realized by patterns of weights attached to connections (p. 13): Patterns of activity representing inputs are directly transformed (possibly through multiple layers of units) to patterns of activity representing outputs. The connections that mediate this transformation represent a form of task know- ledge .. . In a connectionist network, then, the bearers of semantic content are complex, structured items. Some philosophical discussions give the impression that such an interpreted description of a connectionist network is of, at most, heuristic significance; and that the advent of connectionism brings nearer the demise of content using explanations in cognitive psychology. But really, the issue of interpreted or content using description and the issue of connectionism should be regarded as orthogonal. There are four positions that a theorist might occupy. One quadrant is for those friends of symbol manipulation who insist on the role of content using descriptions (e.g. Fodor [1987]). A second quadrant is for friends of symbol manipulation who prescind from content (e.g. Stich [1983]). There is a third position that is occupied by enthusiasts for connectionism who would altogether eliminate appeal to semantic content (e.g. Churchland [1988]). And there is a fourth box/to be occupied by connectionists who insist that content using descriptions are essential for psychological theory. a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Conmctionism, modularity, and tacit knowledge 549 The following remark by Smolensky ([1987], p. 101) seems to place him in that fourth box: the formal system is at a lower level than the level of semantic interpretation: the level of denotation is higher than the level of manipulation.... Both levels are essential: the lower level is essential for defining what the system is (in terms of activation passing) and the higher level is essential for understanding what the system means (in terms of the problem domain). But, whether or not any particular theorist clearly occupies the fourth quadrant, if we take Fodor's position as the canonical version of the symbol manipulation paradigm, then the appropriate comparison is with content using connectionism. So far then, we have no reason to deny that a connectionist system can have a true tacit knowledge description. Nor do we yet have any reason to deny that a connectionist system may exhibit modularity at the processing level, or the tacit knowledge level. For all that this latter requires is that the network should have a true tacit knowledge description cast in the terms of a modular theory (that is, a theory that is modular, in the analytical sense). It does not require that the articulation in the formal description of a network should exactly match the articulation of the original modular theory into subtheories. A system that is modular in the processing sense need not also be modular at the network level—the level of description in terms of units and connections—any more than it has to be modular at the physiological level. Indeed, it is not generally the case that a connectionist network is built from smaller component networks corresponding to the constituent subtheories of a modular theory. This is not to say that connectionism is committed to the extreme view that there is a single giant network, responsible for all cognitive processes. On the contrary, it is explicit in the work of PDP theorists that specific tasks may be assigned to distinct networks, and that this amounts to an element of modularity (Hinton, McClelland and Rumelhart [1986], p. 79): A system that uses distributed representations still requires many different modules for representing completely different kinds of thing at the same time. The distributed representations occur within these localized modules. For example, different modules would be devoted to things as different as mental images and sentence structures . . . There is a potentially misleading mention of 'localized modules' in this passage: we should not confuse modularity at the network level with the physiological notion of modularity. Rather, the point is that localization or physiological modularity, requires network modularity (though the converse does not hold). To the extent that there is evidence of neural localization of cognitive functions, this is still consistent with the connectionist programme; for that a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 55O Martin Davies programme already includes an element of modularity at the network level. Here it is useful to employ the idea of nested modules, and of coarser and finer grains of modularity. At the analytical level, a theory may be composed of subtheories which themselves have a modular structure. Indeed, we could think in terms of a massive psychological theory, whose subject matter is the whole of cognition, and which is composed of relatively independent subtheories concerning particular cognitive functions. The idea of the language faculty 'with its specific properties, structure, and organization, one "module" of the mind' (Chomsky [1986], pp. 12-13) is a reflection at the processing level of this coarse grained analytical modularity. A component theory, concerning a particular aspect of cognition—such as language, or vision—may itself be modular; and we can pursue this finer grained modularity through the various levels of description of a cognitive system. The connectionist programme is committed to some coarse grained modularity at the network level; but it is not committed to modularity at the network level matching any finer grained modularity at the analytical level. (C/. the discussion of propositional modularity in Ramsey, Stich and Garon [to appear].) According to the story that we have told so far, connectionism's character- istic departure from modularity at the network level is compatible with descriptions of PDP systems as embodying tacit knowledge of modular theories. The remaining two sections of the paper will call that compatibility into question. 4 SYNTAX The formal articulation of a connectionist network need not—and typically does not—reflect the articulation in the interpreted description of the system at the tacit knowledge level or processing level. This is why connectionism departs from fine grained modularity at the network level of description. If we now focus on patterns of activation as vehicles of semantic content, then we can see another consequence of the mismatch between the tacit knowledge description and the network description. The articulation in the network description of a connectionist system is in general not the syntactic articulation that is characteristic of symbol manipulation. As we have already noted, the symbol manipulation paradigm does not require explicit representation of computational procedures. So the relevant issue is not whether patterns of weights amount to syntactic encodings of tacitly known rules. But symbol manipulation does require syntactic structure in the representational states that lie in the domain of those procedures. The mismatch between the interpreted description and the uninterpreted descrip- tion of a connectionist network promises a sharp characterization of the difference between the two programmes. For, in general, the connectionist a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Connectionism, modularity, and tacit knowledge 551 analogues of syntactic representations—namely, patterns of activation—are structured, but not syntactically structured. Someone might object to this characterization of the difference. It might be said that the symbol manipulation paradigm can itself recognize levels of description lying between the level of uninterpreted, syntactic, description and the physiological level. Nothing so far shows that the level of formal description of a connectionist network is anything other than one such intermediate level. There are a number of correct points here. A pattern of activation in units is a structured item; and there is nothing in the idea of such a pattern, as such, which prevents it from having a syntactic description. What is more, the way in which one pattern of activation leads to another pattern at a later timecan be specified without reference to what the patterns of activation mean; it is precisely so specified in the formal description of the network. So, transitions between patterns of activation meet a familiar formality condition upon symbol manipulation. Furthermore, it is possible that a system for symbol manipulation should have a description—at a level lower than the syntactic level—as a network of connected units. But none of this adds up to an argument for regarding patterns of activation as such as syntactically structured. According to Fodor, syntax must meet three conditions. First, 'The syntax of a symbol is one of its higher-order physical properties' (Fodor [1987], p. 18). Second, syntax is systematically related to semantics. Third, syntax is a determinant of causal role (ibid. pp. 16-21). We can agree that the structure in a complex pattern of activation meets the first and third of these conditions. But, the constituent features of a pattern of activation—namely, specific levels of activation at individual units—need not, and typically do not, make a systematic contribution to the semantic content of the overall pattern; they are not like words in a natural language sentence. The articulation within a pattern of activation does not constitute a syntactic structure, so long as the interpreted description is afforded by the processing or tacit knowledge level. It is possible to introduce a different level of interpreted description, lining up more neatly with the uninterpreted, formal, network description. This level is sometimes called the subconceptual level; the processing level is then called the conceptual level. (See Smolensky [1988], p. 3. The terminology is not ideal since it may suggest that tacit knowledge involves conceptualization; see again Da vies [1989].) The main difference between these two levels of interpreted description is this. The concepts used in the conceptual level description are the primitive concepts deployed in the theoretical characterization of the task at the analytical level. In contrast, the description at the subconceptual level is in terms of microfeatures. A consequence of this semantic dimension shift (Smolensky's phrase: [1988], a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 552 Martin Davies p. 11) between the conceptual and the subconceptual description is that, while the subconceptual interpreted description is a genuinely accurate semantic description of the operation of the network, the conceptual description is an approximation. This consequence calls for a modification to the idea that ,a pattern of excitation is straightforwardly a vehicle of semantic content. Suppose that we consider a family of states whose (conceptual level) interpreted descriptions have something in common. Perhaps, they are all states whose contents concern coffee. Or (recalling the language L in Section 1), they might all be states whose contents concern sentences containing the predicate ' F . The original idea about bearers of semantic content would suggest that the states in such a family involve a common subpattern of activation which has an interpreted description as being about coffee, or being about a sentence containing the predicate ' F . But really this is not so, as Smolensky ([1988], p. 17) makes explicit: These constituent subpatterns representing coffee in varying contexts are activity vectors that are not identical, but possess a rich structure of commonalities and differences (a family resemblance, one might say). Similarly, if a connectionist network were to perform the task of assigning a meaning (specified in some format) to each sentence of L, then the constituent subpatterns representing the presence of the predicate ' F in the varying contexts provided by the sentences 'Fa', 'Fb', and so on, would not be identical. The argument at the beginning of this section showed that the articulation within a pattern of activation does not constitute syntactic structure, given that the semantic description is cast in the same terms as the theory at the analytical level. Someone might have responded to that argument with the suggestion that we develop a level of syntactic description by taking certain subpatterns of activation to be the primitive syntactic items corresponding to the primitive concepts that are employed in the theory at the analytical level. Because patterns of activation are simply superimposed, this would have been a rather weak suggestion; it would not even preserve the idea of the order of constituents in a syntactically complex expression. But, in any case, we can now see that the suggestion would not work. For there is no single pattern of activation corresponding to each primitive concept; and so there are no candidates for the role of syntactic primitive. 5 TACIT KNOWLEDGE AGAIN We can now draw an important consequence for the attribution of tacit knowledge to connectionist systems. Recall, once again, the example in Section 1 and the two semantic theories Ti and T2. Suppose that a network that involves a dimension shift between its a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Connectionism, modularity, and tacit knowledge 553 conceptual and subconceptual descriptions succeeds in assigning the correct truth conditions to all the sentences of L. In particular, it assigns the correct truth conditions to all the sentences containing the predicate 'F': 'Fa' is true iff John is bald, 'Fb' is true iff Harry is bald, and so on. Suppose too, that this network is not simply made up of a collection of completely autonomous subsystems, one for each of the sentences containing 'F'. Then we do not have a straightforward instance of tacit knowledge merely of the listiform theory T]. Nevertheless, it need not be the case that there is a single pattern of weights on connections which is a causal common factor in all these transitions from representations of sentences to representations of meanings. For a typical connectionist system, we shall be able to say only that the approximate equivalence of the patterns corresponding to the predicate 'F' in varying contexts results in a considerable overlap in the patterns of connection weights implicated in the several transitions. Consequently, it will not be strictly correct to attribute to the network tacit knowledge of the articulated semantic theory T2. Such an attribution will be, at best, approximately correct. This result generalizes. Typically, PDP systems do not strictly embody tacit knowledge of modular theories. It is not an accident that the absence of accurate tacit knowledge attributions, and the absence of syntactic structure, go in step here. Tacit knowledge does not have to be explicitly represented; it can be realized by the presence of a processor. But tacit knowledge is a matter of strict causal systematicity in the transitions mediated by that processor—causal systemati- city mirroring the derivational systematicity in the theory that is tacitly known. And the way to incorporate that causal systematicity is to provide, for the states which are inputs to the processor, a physical articulation or structure which is systematically related both to the interpreted descriptions of those states and to the causal transitions to which the states lead. Given the three conditions upon syntactic descriptions, what this amounts to is providing the input states with a syntax. (For arguments from causal systematicity of process to syntactically structured representations, see Fodor [1987], pp. 135-54, and Fodor and Pylyshyn [1988].) 6 CONCLUSION What prevents even the most rudimentary syntactic articulation in the states of a connectionist network is the dimension shift between the conceptual and subconceptual level. Given such a shift, the terms deployed in the theory at the analytical level do not figure in any accurate interpreted description of the network. It is arguable that the apparent empirical inadequacy of some connectionist a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ 554 Martin Davies models is attributable to this attempt to do without resources which are, in fact, crucial; namely, the categories used in the classical theoretical characteri- zation of the cognitive task in question. (For this issue, see Rumelhart and McClelland [1986] and Pinker and Prince [1988].) More generally, strict causal systematicity of the kind required for tacit knowledge presents a problem for the connectionist programme. For the absence of a syntactic level of description is characteristic of connectionism. But causal systematicity requires syntactically structured representational states.1 Philosophy Department Birkbeck College Malet Street London WCIE 7HX REFERENCES CHOMSKY. N. [1965]: Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press. CHOMSKY. N. [1976]: Reflections on Language. London: Fontana/Collins. CHOMSKY. N. [1980]: Rules and Representations. Oxford: Blackwell. CHOMSKY N. [ 1 9 8 6 ] : Knowledge of Language: Its Nature, Origin and Use. New York: Praeger. CHOMSKY, N. [1988]: Language and Problems of Knowledge. Cambridge. Massachusetts: MIT Press. CHURCHLAND, P. M. [1988]: 'On the nature of theories: A neurocomputational perspective', in Minnesota Studies in the Philosophy of Science. Volume 14. Minneapolis: University of Minnesota Press. COLTHEART, M. [1985]: 'Cognitive neuropsychology and the study of reading', in M. I. Posner and 0. S. M. Marin (eds.). Attention and Performance XI, pp. 3-27. London: Erlbaum. DAVIES. M. [198 7]: Tacit knowledge and semantic theory: Can a five per cent difference matter?' Mind, 96, pp. 4 4 1 - 6 2 . DAVIES, M. [1989]: 'Tacit knowledge and subdoxastic states', in A. George (ed.). Reflections on Chomsky, pp. 131-52. Oxford: Blackwell. EVANS, G. [1981]: 'Semantic theory and tacit knowledge', in Collected Papers, pp. 3 2 2 - 42. Oxford: Oxford University Press (1986). FODOR. J. [1983]: The Modularity of Mind. Cambridge, Massachusetts: MIT Press. FODOR, J. [1985]: 'Fodor's guide to mental representation: The intelligent Auntie's vade- mecum'. Mind, 94, pp. 77-100. FODOR, J. [1987]: Psychosemantics. Cambridge, Massachusetts: MIT Press. 1 An earlier version of this paper was written while I was visiting the Research School of Social Sciences. Australian National University in late 1987. and was presented at the conference Cognition et Connaissance held in Toulouse, in March 1988. I am grateful to Ned Block for comments on a more recent version. a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/ Connectionism, modularity, and tacil knowledge 555 FODOR, J. AND PYLYSHYN, Z. [1988]: 'Connectionism and cognitive architecture: A critical analysis', Cognition, 28, pp. 3 - 7 1 . HINTON, G. E., MCCLELLAND, J. L. AND RUMELHART. D. E. [1986]: 'Distributed representations', in D. E. Rumelhart, J. L. McClelland and the PDP Research Group, Parallel Distributed Processing. Volume 1. pp. 77-109. Cambridge, Massachusetts: MIT Press. MARR, D. [1982]: Vision. New York: W. H. Freeman and Co. PEACOCKE, C. [1986]: 'Explanation in computational psychology: Language, perception and level 1-5', Mind and Language. 1, pp. 1 0 1 - 2 3 . PINKER, S. AND PRINCE, A. [1988]: 'On language and connectionism: Analysis of a parallel distributed processing model of language acquisition', Cognition. 28, 7 3 - 193. QUINE, W. V. O. [1972]: 'Methodological reflections on current linguistic theory', in D. Davidson and G. Harman (eds.). Semantics of Natural Language, pp. 4 4 2 - 5 4 . Dordrecht: Reidel. RAMSEY, W., STICH, S. AND GARON, J. [to appear]: 'Connectionism, eliminativism, and the future of folk psychology'. RUMELHART, D. E. AND MCCLELLAND, J. L. [1986]: 'On learning the past tenses of English verbs', in J. L. McClelland, D. E. Rumelhart and the PDP Research Group. Parallel Distributed Processing, Volume 2, pp. 2 1 6 - 7 1 . Cambridge, Massachusetts: MIT Press. SMOLENSKY, P. [1987]: 'Connectionist Al. symbolic AI. and the brain'. Artificial Intelligence Review, 1, pp. 9 5 - 1 0 9 . SMOLENSKY, P. [1988]: 'On the proper treatment of connectionism'. Behavioural and Brain Sciences, 1 1 , pp. 1-74. STICH, S. [1983]: From Folk Psychology to Cognitive Science. Cambridge, Massachusetts, MIT Press. a t R a d cliffe S cie n ce L ib ra ry, B o d le ia n L ib ra ry o n A u g u st 7 , 2 0 1 1 b jp s.o xfo rd jo u rn a ls.o rg D o w n lo a d e d fro m http://bjps.oxfordjournals.org/