Abbott.indd 524 The Traditional Future: A Computational Theory of Library Research Andrew Abbott Andrew Abbo� is Gustavus F. and Ann M. Swi� Distinguished Service Professor in the Department of Sociology and the College at the University of Chicago; e-mail: a-abbo�@uchicago.edu. I argue that library-based research should be conceived as a particular kind of research system, in contrast to more familiar systems like standard social scientific research (SSSR). Unlike SSSR, library-based research is based on nonelicited sources, which are recursively used and multiply ordered. It employs the associative algorithms of reading and browsing as opposed to the measurement algorithms of SSSR. Unlike SSSR, it is nonstandard- ized, nonsequential, and artisanally organized, deriving crucial power from multitasking. Taken together, these facts imply that, as a larger structure, library-based research has a neural net architecture as opposed to the von Neumann architecture of SSSR. This architecture is probably optimal, given library-based research’s chief aim, which is less finding truth than filling a space of possible interpretations. From these various considerations it follows that faster is not necessarily better in library-based research, with obvious implications for library technologization. Other implications of this computational theory of library research are also explored. mong the most important questions raised by the cur- rent revolution in libraries is that of the effect of new library technologies on library-based scholarship as a whole. Surprisingly, there is almost no serious theoretical reflection on this topic. Most writers focus their a�ention only on the new techniques themselves: the research tasks that are newly possible or that can now be accomplished faster than ever before. No one asks whether there are sound theoretical reasons for thinking that faster is be�er or that the newly possible work will lead to improve- ment in library-based scholarship as an enterprise.1 Indeed, library-based scholarship as an overall enterprise has seen relatively li�le study. There is serious empirical study of various search strategies employed by li- brary users with both physical and digital materials. There are articles studying the research habits of individual scholars in the library-based disciplines. And there are manuals teaching students and others how to do library-based research, at least in the minimal sense of finding informa- tion and expert opinion. But there is almost nothing—in the library literature A Computational Theory of Library Research 525 at least—about how the myriad library- research activities of individual scholars actually come together into a corporate enterprise, much less about the possible overall effects of the revolution in aca- demic libraries on the nature and quality of that enterprise. Indeed, it is precisely because we don’t have a clear theory of how expert library-based scholarship is itself created, si�ed, and maintained that our predictions about the effects of the digital revolution on library-based expert scholarship are for the most part hanging in midair.2 Nor is there much wri�en about this topic by academics in the fields most af- fected by the revolution in libraries. There are library research how-to manuals for graduate students in sca�ered fields, a fact that indicates that humanists as well as some social scientists do sometimes teach library research skills to their graduate students, introducing them to the critical reading of sources and to the major bibliographical and archival guides for their fields.3 But seasoned library researchers do not seem to write much about library research methods in general. There are no books on cu�ing- edge library methodology, no equiva- lents to journals like Sociological Methods and Research or The Journal of Economic Methodology where quantitative social scientists present their latest techniques. For example, one would expect historians to be among the most assiduous writers on library research methods. But of the 45 articles in JSTOR’s 71-journal history collection whose abstracts contain the word “library” or “libraries,” none is about the practice of library research. Lest it be thought that “library” is too general a term, there are only eight articles in the JSTOR history collection with the word “bibliography” in their abstracts, and only nine with both the words “reading” and “sources.” None of these articles is explicitly about the creating of a bibliog- raphy or the reading of sources. In part, this lack of a�ention may re- flect the belief of historians (and correla- tively of their colleagues in musicology, literature, and the other library-based research fields) that the more global parts of library research “methodol- ogy”—how to assemble sources, how to maintain records and files, how to assess which areas of a project need fur- ther library work, how to predict where hidden sources may be found, how and when to draw conclusions from mate- rial—are not really “library research” proper. These other things are taken to be disciplinary knowledge, practices to be taught in seminars and in the direct supervision of dissertations. They don’t seem belong to the library per se, but to “research design” or some other category. Now, such an argument presumes a strict division of labor between librarians and scholars: scholars think up disciplinary questions and imagine the various kinds of information that can be assembled into new answers, while librarians array information and help the scholars search the array for what they need. But no ex- perienced library-based researcher really believes in this division of labor. Ques- tions, answers, sources, and information are simultaneously in play in a library research project. There is no separation of design and execution. So it remains sur- prising, given the importance of libraries to the library-based disciplines, that there is nowhere in them a body of theoretical or even empirical speculation about the nature of the library-based scholarship as a general social form: how it is that each individual library project comes together into a whole and how it is that many such projects come together into something we call knowledge. As for the sociologists, whose business it is to study such social forms, they too have said li�le. The sociologists of science have been almost completely preoccupied with the natural sciences and their labo- ratories, ignoring even the social sciences, much less the humanities. Looking at sociology more broadly, the 56 articles in JSTOR’s sociology section that have the words “library” or “libraries” in their 526 College & Research Libraries November 2008 abstracts include none that is about what we might call the sociology of advanced library research. There is simply no socio- logical writing about library research.4 This extraordinary disa�ention to the theory and practice of library research is all the more surprising given that there are quite a few theoretical reasons for expecting the present revolution in libraries to have very powerful effects on the scholarship accomplished in and through libraries. For one thing, electronic consortia like JSTOR have brought to nonelite universities vast holdings that used to be the privilege of the elite, a de- velopment that could raise or lower the average level of scholarship depending on our assumptions about the impact of an individual researcher’s caliber on his output. For another, the vast increase of easily identifiable and retrievable material has swelled reference and citation lists, possibly making it much harder to reach consensus in subfields. For yet another, the huge decline in the cost of accessing materials has probably meant—on the classical two-factor model of produc- tion—that today’s scholars spend more time accessing scholarship and less time reading it than did their predecessors, a change that could have large implications for scholarly quality. One could develop many such arguments. Evaluating these hypotheses, however, is a difficult ma�er. First of all, we lack an agreed-upon outcome variable. What exactly do we mean by good library scholarship overall? Most measures of scholarly productivity at the individual level boil down to bean-counting, either of publication or of citations, and no one with in-depth knowledge of any sub- stantive field thinks that either of these measures has much concept validity. But even if we were to have a valid outcome variable, we don’t really have a theory of how advanced library research actually works. Yet such a theory is required if we are to make predictions about how changes in library technologies might actually affect scholarship overall. We do, to be sure, have some ideas about what scholars do in libraries as individual us- ers. But we don’t have a theory of how those activities are tied together to make a successful scholarly community. Most of the models for such processes, again, concern the natural sciences, where the Popperian, Kuhnian, and other models are familiar.5 In short, there is no truly formal or theoretical consideration of library re- search as an enterprise and, consequently, no sound basis on which to form a view of whether the current transformation of libraries is good or bad for scholarship. In this paper, I will undertake the first task in order to draw some conclusions about the second. I begin with a brief sketch of standard social scientific methods. By first discussing a reasonably well-known and well-thought-through system of research, I hope to establish what are the parts of a research system and what are the pa- rameters that determine its functioning. With that framework in hand, I then turn to library research, which I define largely through its contrasts with this other, more familiar body of knowledge procedures, showing how it differs in sources, prac- tices, structures, and aims. This discussion culminates in the argument that the two sets of research practices represent differ- ent forms of computation. By pursuing this metaphor, I move the discussion onto neutral grounds to escape the usual polemics about libraries. The paper closes by drawing out the implications of such a computational theory of library research for the future of both library research and library policy. A word of definition and clarification is useful before beginning. By the phrase “library research,” I do not refer to all usage of the library, but only to advanced scholarly usage. Undergraduates may be the most common users of the library because of their huge numbers, but they do not need the immense holdings charac- teristic of scholarly libraries. And within “advanced scholarly usage,” I am refer- ring only to those branches of scholarship A Computational Theory of Library Research 527 whose principal mode of production has been the use of library materials. I am thus talking for the most part about the humanities and the humanistic social sci- ences: scholars of the various languages and literatures, historians, musicologists, art historians, philosophers, and members of those branches of sociology, anthro- pology, and political science that draw heavily on library data (historical sociol- ogy, for example). Of course, scientists use libraries. But their main mode of production is not library research. I am here interested only in those branches of scholarship that rely heavily on libraries for their “data.” Because there is so li�le prior work, there is no way to avoid confusing the empirical and the normative in what fol- lows. In part, this is a confusion inevitable in any writing about methods. We would not describe standard social scientific methods purely in terms of what social scientists do in practice, but rather in terms of what they ought to do in theory. At the same time, those of us who teach those methods know that in practice we have to teach our students not only pre- cepts a good methodologist should follow in the abstract, but also empirical rules of thumb that can guide everyday practice. For library research, we lack the abstract precepts, making do at best with the em- pirical rules of thumb, and in most cases lacking even those: How big a bibliogra- phy is big enough, for example? (Or too big?) Indeed, one way of understanding what I am doing is to say that I am trying to provide the prescriptive theory—the “ought” theory—of library research by trying to theorize what library research actually “does,” i.e., what it ought to do when it is—empirically—being a best version of itself. This may be confusing at times, but it is an inevitable concomitant of the early stages of inquiry. Standard Social Science Methods Let me begin by sketching a be�er-theo- rized body of research method, one that can serve as a foil against which to devel- op my concept of library research. I shall use standard social scientific methods for this purpose. Of course, the picture I draw here will be stark and unnuanced. But that is another price of thinking theoreti- cally, at least at the outset. By standard research or standard methods I mean here methods as understood within the broad range of the quantitative social sciences. I will cover the basics of these research methods under three headings: Sources, Practices, and Structures. To begin with sources. Standard social science elicits its data.6 This elicitation can be by surveys or by interviews. It is most o�en active elicitation, although much social science is built on data that is either collected on a routine basis (like census data) or simply passively piled up as a part of record-keeping for commercial or other purposes. Data that is actively elicited is standardized and formalized in various ways: it can be selected accord- ing to the rules of sampling, for example, and it can be precoded via forced choice instruments. A�er “cleaning,” it can be used directly or further aggregated via data reduction techniques like factor analysis and clustering. These gathered and prepared sources are then subject to the various practices of research. The data are first translated in terms of a set of concepts and measures, which have usually, indeed, governed much of the process of elicitation. These concepts and measures are typically widely shared across a literature, like the notion of stress in studies of social sup- port or like the use of years-in-school as a variable to indicate education. Substantial subliteratures form around the task of improving these concepts and measures, an improvement that may mean be�er stability over time, or be�er portability across datasets, or greater plausibility in terms of theory. Once couched in terms of shared concepts and indicators, the translated source data—which have now been re- dacted into variables—become subject to various methodologies. The majority 528 College & Research Libraries November 2008 of these methodologies in social science have the aim of expressing some one (dependent) variable as a function of the rest (independent variables), typically as a linear combination of them. The choice of methodology is to some extent deter- mined by the nature of the dependent variable, although, conversely, that vari- able can usually be transformed to fit a preferred methodology—dichotomized, categorized, logit-transformed, and so on. These methodologies are for the most part completely routinized recipes for analysis; one “writes a model,” puts the data in, and results come out. But, all the same, there is plenty of room for modify- ing these recipes through the handling of the various challenges that data always present to the stringent assumptions of the statistical techniques. Seemingly me- chanical in theory, these methodologies nonetheless require a subtle and artful hand in practice. The underlying logic of all of these practices—loosely but nonetheless strongly held by most people working in standard social science research—is a modified version of the Popperian model of conjectures and refutations.7 The schol- arly intervention is regarded as making a plausible conjecture about the way the world is and then evaluating it against data. If the conjecture is not rejected, then because of its theoretical plausibility it can be added to and possibly reconciled with our stock of conjectures to this point. In a loose sense, that is, the basis of standard social scientific methods is a correspon- dence between our model of the world (that a group of independent variables determine a dependent one in a certain way) and the way the numbers fall out in practice. Adding a conjecture to our stock of conjectures is o�en a simple ma�er: by adducing a new conjecture, an article may set limits on the range of a causal relationship or, conversely, show its vi- ability in new realms. The question of reconciliation with existing conjectures is a more vexed one, however. New results are o�en inconclusive, and new method- ologies can produce results not so much contradictory to earlier ones as incom- mensurable with them. The problem of reconciliation thus raises the question of how our standard research practices are embedded within a larger structure of research. How, that is, are researchers and their individual research chained together into a larger enterprise? The first larger structure of standard research is the enormous corpus of data—both formally elicited and pas- sively collected—that the social sciences have used over the years. Much of this is collected in places like the Bureau of the Census, the Inter-University Consortium for Political and Social Research (ICPSR) and the National Opinion Research Cen- ter (NORC) that maintain data archives. Other data remains in researchers’ papers. Sometimes data is published in one form or another, in print or online. What is important for our present purposes is that, in the main, this corpus of data is not really systematized and ordered; there is no quantitative equivalent to the historians’ National Union Catalogue of Manuscript Collections, for example. The main characteristic of the larger data structure of social science is this unor- dered, unsystematized quality. It is just a vast pile of used datasets.8 A second structural quality of the stan- dard research world is specialization and division of labor. Division of labor can obtain at the level of the project; there can be interviewers and coders and analysts and PIs. But it can also obtain at the level of the discipline. There are specialists in sampling and in particular quantitative methodologies just as there are special- ists in this or that research area. This is an obvious fact and requires no further comment. A third structural quality of standard research is that it is to a considerable extent characterized by a sequential logic. Things have to happen in a cer- tain order. You gather data before you analyze it. You validate your measures A Computational Theory of Library Research 529 before you apply them. You select data with a certain question in mind. Beyond the level of the project, this sequentiality continues. Broad propositions tend to be succeeded by more specific and limited ones. Subparts of large questions must be resolved before a�acks on the large questions can produce credible results. To be sure, quantitative social science as a general enterprise is typically advanc- ing on many fronts simultaneously. But within particular research traditions, a sequential logic usually applies, as we see from the common belief in cumulativ- ity. In any specific literature of standard research, early pieces are felt to be less specified, less methodologically care- ful, less definite. Later results are more specific, more rigorous, more defined. Within such traditions, indeed, later studies o�en self-consciously replicate earlier studies, even while extending or specifying them. The final quality of standard research taken as a structure is its organization around a search for truth. As I noted ear- lier, “truth” here means in practice a cor- respondence between the way we predict the numbers to be, given our theoretical ideas, and the way the numbers actually are when we have gone out and measured the world. This means that standard re- search is ultimately a form of prediction and search. The truth is thought to be out there in the real world (a, b, and c cause x), and our model is a hypothesis about what that truth is (maybe we think b and c cause x). We measure reality according to our model, and then reality tells us whether we found the truth or not (in this case, that we are a li�le off in our guess about where truth is). Standard methods are thus ultimately a formalized version of blind man’s bluff; we make educated guesses about where the truth is and then get told whether our guesses are right or wrong. Fundamental to this game is our belief that the truth is somewhere out there in the world to be discovered. There is a “true state of affairs.” Our inability to find it may be a problem, but the true state of affairs exists and can in principle be found. One can disagree with various parts of this picture, and certainly one could make it much more precise. But overall it is an acceptable thumbnail sketch of how standard research operates in practice in the social sciences. Let me summarize it quickly. The sources of standard research works lie most o�en in actively elicited data, which is o�en standardized or con- catenated in the process of being collected. The practices of standard research begin with the application of measures and ter- minologies that are standardized, widely shared (or, at least in principle, sharable), and usually fairly rigid and specified. They then continue with the application of routine methodological recipes that evaluate the conjectures of researchers by comparing them to the state of the real world. The recipes either accept or reject the conjectures. The larger structures of this standard research world comprise first the enormous collection of used data, which is not particularly systematized or ordered. They comprise second the quali- ties of sequentiality and division of labor. And they comprise third an overall orga- nization of research around the search for a true state of affairs, which is taken to be “out there” in the real world, but possibly very difficult to find. Library Research Let me now turn to a similar analysis of library research. As I noted earlier, this is a much less organized and defined system of materials and practices. But we can characterize library research by looking again at sources, practices, and structures, using the sketch just given of standard methods as a guide to the analysis. If, as a result, library research seems a li�le too perfectly opposed to what I have called standard research, we can regard that as a heightening of differences for ease of comprehension, not as a claim that some awful chasm divides the two. In fact, they interpenetrate considerably.9 Note, finally, that it should be recalled from 530 College & Research Libraries November 2008 the introduction that the phrase “library research” means use of library materials by expert scholars and, in particular, by scholars in those disciplines for which use of library materials is the primary mode of intellectual production—historians, professors of literature, and so on. The differences start at the beginning, with sources. Library research uses not elicited data, but recorded data—things in libraries. Some of this is passive records of the kind we have earlier seen: routine census data or annual reports of compa- nies, governments, and other organiza- tions. But much of it is author-produced primary material of various types: novels, autobiographies, religious tracts, philo- sophical discourses, films, travelogues, ethnographic reports, and so on. What is important about all of this primary material is that it was not elicited by the researcher. It is simply there—created by its authors or originators and deposited one way or another in the library. In this sense, the only analogous material in standard research is passively collected quantitative data. But this recorded primary material is only part of the data for library research. An immense portion of the sources of library research consists of prior library research (and indeed prior nonlibrary research as well). Moreover, library re- search uses this prior work in a very dif- ferent way than does standard research. In standard research, previous work is of interest largely for its output—the con- jectures that it authorized or rejected. In library research, prior research is used for all sorts of things in addition to its output. Indeed, it is o�en ground up into pieces: its primary data can be redefined and re- used, its interpretations can be stolen and metamorphosed, its priorities deformed and redirected, its arguments ransacked for irrelevancies that are changed into major new positions. Although it is by custom called “secondary material,” the prior research work recorded in the library is, to all intents and purposes, yet another form of primary data. We can label this peculiar and intensive use of prior research with a word from computer science. Library research, we can say, is recursive; it can operate on itself. So the sources of library research are quite different from those of standard research: they are not elicited by research- ers, and they are, in the sense just de- fined, to a considerable extent recursive. Moreover, the vast corpus of stuff that makes up the data of library researchers is ordered in a number of important ways. It is classified—not only by its author and publisher and date and other facts of provenance, but above all by its subject headings and, in particular, by the most important of these: the call number that gives it a physical location. Unlike the data of the standard researchers, the data of the library researcher is embodied in physical artifacts, a fact to which I shall return below. (This is of course changing at the moment, but we are considering the system as it has evolved to the present.) But subject headings are not the only forms of classification and ordering in the library. To subject headings are added back-of-the-book indexes and bib- liographic notes, subject bibliographies, encyclopedias and handbooks and other reference works, bibliographical guides, and so on. Most of this indexing and as- sembling is done by human minds, not by the concordance indexing that drives most of our current search engines.10 Indeed, an enormous amount of this indexing is implicit in the contents of the data artifacts themselves: one way of understanding any given book based on library research is as a kind of index to a particular set of other library materials. In short, library research materials have an order imposed on them quite dif- ferent from the order present in the elic- ited data of standard methods. In elicited data, the analyst imposes order on what he perceives to be mere human activity by applying certain accepted conventions of measurement and conceptualization. And once a dataset is gathered and used, it goes on a stack of datasets that is not A Computational Theory of Library Research 531 further ordered, classified, or indexed. (There are a few counterexamples, but they prove the rule.) But no one would take the materials in a library as uncognized activity needing to be ordered by certain conventions of measurement and coding. Library materi- als are already cognized and ordered in dozens of ways. Each book is a particular selection of things by a human agent; and, beyond the books themselves, indexers have created dozens of mappings—by no means all the same—of myriad idio- syncratic subsets of the materials in the library. The sources for library research are, in short, fundamentally different from those of standard research, above all because of this huge amount of indexing and preorganization, which far surpasses the straightforward application of mea- surement and coding conventions that is characteristic of standard research. Let me underline that I am not speaking of one, single comprehensive order. There are multiple such orders, and deliberately so, an obvious contrast with the strain of standard research toward consistent definitions. In summary, the sources of library research consist of recorded materials, which include prior library research (which thus can be used recursively) and which are ordered by a large number of multiple and cross-cutting indexes that govern myriads of subsets of their contents. It helps to have a simple term to refer to library materials: I shall call them “texts.” With these texts, library researchers undertake quite different practices than do their standard researcher colleagues. In the first place, library researchers to a great extent lack the well-defined and widely shared concepts and measures that are fundamental to the practices of the standard researchers. The only strictly defined terms in library work are those of certain established large-scale indexes, so-called controlled vocabularies; at the monograph and reference-book level, such controlled vocabularies are created de novo for each new artifact. And the steady shi� of terminologies as language dri�s inevitably over time—particularly with respect to more complex concepts— limits the efficacy of the major controlled vocabularies. Some library fields have highly specific and enduring terminolo- gies, to be sure: musicology and historical linguistics are examples. But the vast ma- jority of library research does not involve use of widely shared, well-defined, and stable concepts, nor any other idea of “measure” analogous to that in standard methods. The chief practices of library scholars with texts are reading and browsing. It is these that are, in fact, the analogue of the standard researchers’ measurement, since it is by reading and browsing that library research scholars extract what they want from texts. By pointing to reading and browsing as methodologies, I want to make them unfamiliar, less taken for granted. We need to see the exact anal- ogy between a standard researcher, who “measures” the social world using a fairly limited vocabulary of shared concepts and indicators, and a library researcher, who browses or reads a text using his or her own—and possibly idiosyncratic—in- terpretive armamentarium. To understand reading and browsing as the analogues of the measurement and methodology of standard research, it is useful to borrow language from com- puter science. Measurement, in computer science terms, employs a fairly simple algorithm. A measurement algorithm takes social reality as input and returns a number or category. The shared—or at least (in principle) sharable—nature of the algorithm means that its output is independent of who runs it. Browsing and reading constitute this kind of “measurement” only in a very limited sense. To the extent that we think of a text as having a single fixed meaning, invariant with respect to any differences in the readers, reading the text should re- turn that meaning. In such a case we could think of reading as pure measurement. 532 College & Research Libraries November 2008 But texts that have such fixed meaning almost never occur in natural language; they can exist only in things like computer programming that have perfectly con- trolled vocabulary and syntax. Most texts have multiple and ambiguous meanings, and no texts outside controlled languages have meanings that are invariant with respect to readers. Reading and browsing—the two are simply different levels of the same thing—thus belong to a different family of algorithms than does measurement. They are association algorithms, in which input is taken from text and combined with reader-internal data to produce an output. They are thus inherently nonrep- licable because of their dependence on data internal to the reader or browser. A useful way of imagining this is to think about the book-reader technology as com- pared with the site-surfer technology. In the site-surfer technology, hyperlinks are hard-coded into the page and direct every reader to specific preconnected pages. In the book-reader technology, hyperlinks are generated dynamically in the act of reading. They arise by the conjunction of knowledge in the mind of the reader with potential meanings in the body of the text. Such a system is obviously intensely dependent on the richness of prior knowledge in the minds of readers. And although we can, through things like general examinations, force a certain level of basic background knowledge into the minds of young scholar readers, there will remain quite large random differences in this background knowledge even be- tween fairly closely comparable scholars. Consequently, there will be substantial variation in the outputs of the reading process even between two such scholars. Reading is thus profoundly different from measurement as a research practice, since the la�er has replicability as one of its most important qualities.11 One can “read” with differing levels of a�ention to detail. Skimming is what we call reading when we pay very li�le at- tention to detail. Browsing is what we call reading when we disregard not so much the details of a text as its composed order. Browsing is an association algorithm that ignores the continuous order of the text or, more commonly, that is applied to things that are not continuous composed texts in the first place but that have other kinds of order built into them. One can browse a continuous text by flipping through it here and there, but one more o�en browses things that have an order that is not through-composition. One browses an index or a bibliography, which is ordered alphabetically by main topic and/or author. Or one browses a hand- book or other reference work, which is ordered by main topics in some structural or functional relation to one another. Or one browses a shelf, which is ordered by call number. In each case, that is, brows- ing brings together a prepared mind and a highly ordered source that is (usually) not a continuous text. To some extent, browsing is analogous to what are called hashing algorithms in searching systems; it takes large blocks of material and disregards their detailed order or inspects them on the basis of simple data checks. At other times, brows- ing operates via simple association of random elements in the object browsed with random elements in the reader ’s mind. Each random connection is as- sociated with a probability that it will be useful, and those above a certain level are retained. What is central to all forms of browsing is thus the coming together of a highly organized but not necessarily continuous source object with an equally highly (but quite differently) organized mind. From this is expected to emerge a substantial collection of productive but random combinations.12 As I have noted, the role of internal knowledge in reading and browsing implies a crucial difference from the measurement that is their equivalent in standard methodology; they are not replicable. Two readers don’t get quite the same output from reading a book, and there is no real a�empt in library A Computational Theory of Library Research 533 research fields to correct this by improv- ing measures, controlling terminologies, and so on. There is thus no real equality between an English professor presenting a reading of a novel to a class and a soci- ology professor discussing quantitative indicators of education. The second is interested in and hopes to produce repli- cability. The first regards replicability as both unachievable and undesirable. Another equally important differ- ence between the “methods” of library research and those of standard research is that the former lack sequentiality. Even at the single text level, library researchers read straight through only rarely. While some library researchers read background sources straight through at the beginning of a project, it is much more common for a project to begin out of a variety of types of sources of varying levels of detail and relevance, which have been read in no particular order. There is no equivalent in a library research project to the Idea- Question-Data-Method-Result sequence of the standard research program. To be sure, even the la�er is, in practice, something of a rationalization a�er the fact, but in library research there is no a�empt to create even such an imposed, retrospective order. For example, there is no right order in which to read the original sources for a book on, say, the passing of the British Reform Bill of 1832, although of course any library researcher would find the major secondary source—J.R.M. Butler ’s magisterial book—on the first bibliographical pass.13 Should you read the Parliamentary debates first? or the private correspondence of Earl Grey? or the diaries of the important Tory magnates? It is possible that three quite different but equally important works could be wri�en on the subject starting from those three different beginnings. The sequence does not ma�er. The rule of thumb in library research is usually to read most heavily—at any given time—in the area of the largest hole remaining in the argument. The result of that rule is that sources are read in wildly different orders in comparable projects. But the lack of standardization and the lack of sequentiality do not exhaust the differences between library and standard research practices. Library research is also different in that it is customarily artisanal. Each project is done by a single scholar. This obviously goes hand in hand with the lack of standardization. The unity that a project has is the unity of its researcher; since his is the mind that reads and inter- prets, his is the mind that browses, his is the mind that ultimately assembles read- ings and interpretations and browsings into a work of scholarship. Those of us who do this kind of work have all tried to use research assistants. And nearly all of us have given up on them except for those very narrow portions of projects where we can make use of fixed terminologies. No research assistant I have ever hired to compile a bibliography has come up with one half as good as the ones I can make for myself. They simply don’t have the same contents in their minds and hence can’t perform to my satisfaction the simple associative task that is creating a bibliography. The downside of artisanality is familiar enough; it slows production. Historically, this has been one of the crucial forces driving the social sciences toward the research practices that I have here called standard research. But artisanality has an important upside, which is that it permits an extremely productive form of multitasking. It is best to show this with an example. In a recent library-based project I chose to do my own coding of the lives of every occupational therapist working in Illinois in 1956. And because the relevant source did not permit im- mediate extraction of exactly and only what I wanted, I was forced to scan large amounts of interesting but slightly ir- relevant material: lives of occupational therapists in other states, aspects of career data that I wasn’t coding, addresses, and so on. In the eight hours that I spent on the task, I let my browsing self run in the 534 College & Research Libraries November 2008 ment in standard research, which are—by contrast—based on standardization and sequentiality and which consequently permit, and indeed take advantage of, division of labor, both within projects and across projects. What then is the library analogue of methodology proper—of regression, or log-linear analysis, or event history methods, the various statistical techniques of the standard researchers? And what is the equivalent of the logical foundation of standard research practices on conjectures and refutations? The quick answer is that there is no such analogue. There is no family of fixed recipes by which library scholars produce their final output. We can at best give a general name to the process by which library researchers assemble their various materials into wri�en texts. I shall give that process the label of “col- ligation,” a term of William Whewell.14 It denotes the inductive assemblage of a set of facts under a general conception of some kind. A classic example is Jacob Burckhardt’s colligation of the various changes in thirteenth century Italian city states under the heading of Renaissance. Whewell famously a�empted a general theory of such induction, but it has had few followers and no successors. Indeed, much of nineteenth-century German historiography aspired to a quite different theory of historical writing. According to Ranke’s celebrated dictum, history was a ma�er of search and discovery, a finding out of what had actually happened—wie es eigentlich gewesen. This is exactly the model of standard research discussed ear- lier—not the imagination of a new whole out of diverse parts, but the discovery of a given truth out there in the world.15 If we list the kinds of colligations that are legitimate products of library research, we see at once that the practice of library researchers for the last century has followed Whewell rather than Ranke: the pursuit of a findable and fixed truth is not an accurate summary of or model for professional history or for any other of the library research disciplines. To be sure, background like a virus checker. What it picked up—that is, what I acquired in addition to the coded careers that I wanted—were signs of crucial changes in the population of organizations employ- ing occupational therapists, indications of a separate military career trajectory for occupational therapists, two possible hypotheses about the intersection of social class and occupational therapy, and a firm grasp on the marital demography of oc- cupational therapists. Even if my research assistant hadn’t wasted his own multi- tasking capabilities by listening to music (as he usually would, I am convinced), he doesn’t know that Easter Seals—one of the common employers of occupational therapists—was a polio relief organiza- tion in the 1950s and that polio virtually disappeared as an American problem in that decade, two facts that, taken together, show that one of occupational therapy’s crucial work jurisdictions was under threat. He doesn’t know that, in 1956, there was a mental hospital in Anna, Il- linois, which in a complicated way was the key to my marital pa�ern insight. I saw those things only because I had the requisite knowledge, le� over from past projects or—in the polio case—simply from having lived through the period involved. It should at once be noted that another seasoned researcher might have seen other things and not these. But as we will see below, that doesn’t, in fact, ma�er. What does ma�er is that because a single prepared mind does all the work in the typical library project, the prospects for productive multitasking are very, very high. This is all foregone in the standard research project with its o�en consider- able division of labor. So far, I have discussed the library re- search practices of reading and browsing, with their qualities of nonstandardiza- tion, nonsequentiality, and artisanality. And I have emphasized the multitasking permi�ed by artisanality. Reading and browsing, I have argued, are the ana- logues of conceptualization and measure- A Computational Theory of Library Research 535 one body of legitimate library research does consist of what I will call Rankean investigations: investigations aiming to exploit new primary sources and to add to our collection of known—that is, ordered and located—facts. An example would be a family reconstruction study of a particu- lar English village. A much larger, second class of work is the rewriting or remaking of past colligations into newer shapes conforming to the ever-changing cul- tural norms and questions of the present. Ranke himself provides an example; he rewrote the earlier historiography of the Middle Ages as Marc Bloch was to rewrite him, Georges Duby to rewrite Bloch, and so on. O�en, such works rest on Rankean investigations, but they put those new facts to even newer uses. A third and even more adventurous class of work under- takes not reinterpretation but whole new colligations, pulling together old facts and interpretations into whole new “things.” We see this in the rapid development of the concept of “women writers” over the last thirty years, which has driven not only reinterpretations of canonical writ- ers like George Eliot and Edith Wharton but also has led to Rankean investigations into writers hitherto ignored like Mary Webb and Charlotte Yonge, all in the service of creating the new colligation of ”women writers.” It is, to be sure, no news to anyone that there are not formal recipes for producing these three kinds of colligations: Rankean investigations, reinterpretations, and recolligations. There are not even clear genres for writing them: within history, for example, one can think of narratives that fall into all three of these classes. The same is true of biographies and quantitative works on historical topics. Nor is it clear that there is anything that corresponds to the conjectures and refuta- tions logic underlying standard methods. There is a loose sense that library-based works should be organized around ques- tions, but those questions can take many forms. It is perhaps be�er to say that there is a taste in library-based work, a taste for reinterpretation that is clever and in- sightful but at the same time founded in evidence and argument. I shall return to this problem of the criteria for successful colligation below. Let me then summarize this discus- sion of the practices of library research before I go on to the notion of the larger structures of library research and their qualities. I have shown so far that the sources of library research are nonelic- ited, to a considerable extent recursive, and multiply ordered. I have shown that the basic practices of library research are associative algorithms like reading and browsing. This dependence on associative production implies the nonstandardiza- tion, nonsequentiality, and artisanality of library research practices and confers on them an especially powerful form of multitasking. I turn now to the larger structures and qualities of the library research enterprise. The first larger structure of the library research world is, of course, the collection of all the sources—that is, libraries. I have already noted some of the characteristics of libraries as repositories of sources— their multiply ordered and recursive quality. I should also note here one par- ticularly important physical quality of the library. Physical libraries contain records of their own past orderings, which indeed constitute one of the basic data types for the discipline of history. This is true not only of indexes and other orderings but, in a far more important way, of the physi- cal artifacts or books themselves. A book is a representation of an interpretation at a given moment and cannot be modified by later interventions. There is no equivalent in the online world. Record copies could be created in principle, but they can eas- ily be modified either through accident or malice. At present, they simply do not have the stability of print.16 Since library research is so dependent on associative rather than measurement algorithms, the second central structure of the library research world is the collection of prepared artisans: that is, the scholars in 536 College & Research Libraries November 2008 the several disciplines. Preparation is the central ma�er here. The library research enterprise, taken as a whole, depends on having workers who are prepared for their task; otherwise, reading and brows- ing don’t work. Library research depends on prepared minds—people who have passed laborious general exams that cram their heads full of facts and interpreta- tions that will provide the steady flow of hyperlinks as they read texts in the library. Note the policy implication: that if we abolish these kinds of exams and memorization, we vastly decrease the overall power of library research as an enterprise. Sadly, some of this decrease has already taken place by means of various otherwise admirable reforms in graduate programs. Finally, and most important, the arti- sanal research system characteristic of li- brary research is massively parallel rather than sequential. This is perhaps the most profound difference between library and standard research. As we have seen, there is a pre�y clear—if o�en imperfect—se- quential logic to standard research. Both the individual research project and the cumulative research enterprise have as their ideal a process that is specifically progressive: a logical ordering of tasks in the individual research project and a cumulative ordering of results in the collective body of research projects taken together. Of course, there is some degree of parallelism in the standard research system. There are research projects on individual career changes, for example, going forward at the same time as are studies on the mobility of whole classes. But the underlying logic of the system, and certainly its ideals, are sequential. This is emphatically not the case in library research. Nobody thinks that the great book on Jane Austen has to wait for the great book on Pride and Prejudice. These things are taken to be unorderable in principle, in the sense that one could imagine either one of these great books being wri�en first and exercising a de- termining effect on the other. Indeed, a library research community would be disturbed if that were not the case. There is absolutely no order to the topics in- vestigated in library research, and only in the most degenerate cases do we have the sequentiality of the standard methods. To be sure, some part of library research consists of Rankean investigations that have a certain kind of cumulativity, a kind of simple piling up of brute new facts. But very li�le of the action in the disciplines of history or of literature is merely about piling up facts. The action lies in using new facts to leverage new re- interpretations and radical recolligations. These last have no logical or cumulative order whatever, and, indeed, one of the standard gambits in library-based fields is precisely to overturn some implicit ordering of results in favor of some other possible ordering. The importance of this parallel quality of library-based research can be seen if we think about it computationally. Standard research can best be imagined—in the ideal at least—as a species of classical von Neumann programming. It conceives of research as a MAIN program that has various subprograms contributing to it and called from it as the need arises. It presupposes well-defined terms that are sharable across program units. It al- lows—indeed encourages—specialization of subroutines and subcalculations. It is governed by an overall sequentiality. And it aims, ultimately, at the successful per- formance of a fairly simply optimizable search task, which is the discovery of a truth that is taken to be out there in the world but hidden by various amounts of misinformation and randomness. Library research is fundamentally quite different from this. It is a massively paral- lel system in which individual, largely un- coordinated processors are taking inputs idiosyncratically from other processors and from the stack of prior information and are then employing idiosyncratic knowledge and concepts of their own to turn out new outputs that in turn become inputs to other individual processors like A Computational Theory of Library Research 537 themselves. It has no sequentiality, no subroutines, no common variables. And although the “Rankean investigations” portion of it may be optimizing the search for a truth that is assumed to be out there in the world but hidden by misinforma- tion and randomness, the rest of it is doing something quite different. There is a concept in computation for that kind of a computing architecture—the concept of a neural net. And once we rec- ognize that library research as traditionally practiced has a neural net architecture, we are suddenly on very new ground. For one thing, this means that, contrary to widely held views, library research is every bit as “technological” a research system as is standard research. Neural nets are quite capable of performing all the basic tasks we expect computers to perform: most notably, they can remember and converge on and hence possibly discover pa�erns. You don’t need an elaborate structure to discover pa�erns; you don’t need accepted terms, conventional measures, and stable, recipe-based methodologies. You can do without sequentiality and—by implica- tion—even cumulation altogether. You just need the right input-output weight- ing patterns for the individual nonse- quential processors, dispensing thereby with common definitions and variables systemwide. And you need to strongly prepare the artisan-processors, loading them up with the stuff that will make all the materials they read come alive with blue hyperlinks. So the first basic conclusion about li- brary research is this: It is not a low-tech system designed for people who can’t think rigorously. It’s actually a quite high- tech computational architecture that relies heavily on well-trained individuals. That they work in what seem like random ways and random orders on the stack of prior knowledge and interpretations is just part of the architecture; it’s not a desper- ate intellectual problem. You don’t need replicability and cumulation and all that other apparatus of discovery. You need well-trained scholars, a strongly ordered stack of material, and a willingness to tolerate randomness. The main structural quality of library research is, therefore, parallelism. And, as I have just noted, a parallel architecture can produce pa�erns as effectively as a sequential, von Neumann one. But can we say that the aim of library research is, in the last analysis, to search for true pa�erns out there in the world, as is the case with the standard research system? Other than for the “Rankean investiga- tions” part of the library research system, I think the answer to this question is no, and that the real reason for the difference between the architectures of standard and library research is that the library research system does not really aim at the search for a truth out there in the world, but at something quite different. This in turn will mean that optimizing the library research system is not the same as optimizing the standard research system and, in particular, that making the library research system “more efficient” will not necessarily improve its overall ability to do what we want it to do. In general, the disciplines that sustain library research as their primary mode of research are not fields that are organized around the pursuit of a truth to which one comes closer and closer. The universe of possible interpretations of Pride and Prejudice is in principle infinite, as is even the universe of possible interpretations of Jane Austen as a biographical human be- ing, even if the date when Jane Austen the biological individual died is something specific and finite that can be established as a ma�er of truth. Obviously, the disci- pline of English literature is more inter- ested in those types of things to be said about Jane Austen that are infinite than in those that are finite. The specifiable date of her death is uninteresting compared to the infinitely evolving meanings of Pride and Prejudice. This does not mean that canons for rigorous thinking about the la�er are not possible; any extensive reading of work in literary studies will persuade one quickly to the contrary. But 538 College & Research Libraries November 2008 the computational task of the algorithm that is literary research taken as a whole is not the task of finding, as efficiently as possible, the truth about Pride and Prejudice or even about Jane Austen. The task is rather something like “maximally filling the space of possible interpreta- tions” or “not losing sight for too long of any given region of the space of possible interpretations”—or something like that. That is, the computational criterion we must optimize has something to do with comprehensiveness and richness rather than with rapidity of convergence. A similar argument applies to all li- brary research–based fields: literary stud- ies, musicology, art history, history, and the library-based parts of sociology, po- litical science, and anthropology. In all of them, the overall thing library researchers aim to optimize is not a “truth” but a rich- ness and plenitude of interpretations. At any given time, one or another school may focus a�ention in one part of the space. But, in the long run, unvisited regions are always returned to cultivation, and plenitude is again and again achieved. In practice, this may look like rediscovering the wheel, but it is, I believe, the ideal of a set of disciplines whose focus is less on the true than on the meaningful. Although specifying such a criterion of meaningfulness or plenitude is, of course, a long task, I should underscore my hy- pothesis that the reason library research takes the shape I have outlined here—a neural net of highly trained processors making local adjustments in the web of meaning—is that this is the optimal way to produce knowledge about the propa- gation of meaning in human systems. Indeed, we employ the same strategy when we study meaning “in the wild” rather than recorded in the library—that is, when we do anthropological ethnog- raphy. Anthropology studies the same general form of data (unformed, raw experience) as does standard social sci- ence, but it does with that data not what standard research does but rather what library research does: it puts individual human processors (ethnographers) into raw unformed experience and asks them to return the results of their observations to a general stock of ethnographies, mak- ing it part of the input that will guide new ethnographies of the future. The ultimate reason that a network of human processors does be�er in the pursuit of meaning than do divided labor systems with shared vocabularies and recipe methods is that the extraordinary multiplicity of meaning cannot easily be captured by the rigidly limited vocabu- laries of variables in standard methods. We know, for example, that the young Bronislaw Malinowski met Edith Whar- ton—then nearing the end of her life—at her villa on the French Riviera in the late 1920s. That’s a Rankean fact. But one could imagine this fact appearing in any of a half-dozen library research–based books: a biography of Malinowski, a study of the American encounter with Eu- rope, a work on the succession of cohorts in arts and literature, an analysis of the relation of anthropology and fiction, an examination of Wharton’s relations with men, and a study of the evolution of the French Riviera as a cultural center. None of these is the “right” colligation of that fact. Each of them pulls it in a different direction. If we want a body of research that will take up all these possibilities of interpretation, we cannot employ a method that strips out all but one mean- ing from the start. And that is exactly what standard research does. The result of these different approach- es in the two research systems is different patterns of knowledge development. Standard research tends to evolve in tightly organized literatures with widely shared conventions on variables, meth- ods, and problems. As they try to take on a greater and greater range of problems, these literatures run up against other literatures coming into the same areas but with different conventions. There usually results a clash and a starting over. Standard literatures are thus somewhat self-limiting.17 By contrast, library re- A Computational Theory of Library Research 539 search fields do not really have coherent literatures with extensive conventions. Their coverage of the space of possible works is much more interwoven and in- terpenetrating than that of the coherent, separable traditions of standard work. Works of library research are forgo�en gradually and as individual works rather than being hemmed in and starved as whole traditions. In short, the reason library research looks the way it does is not that we haven’t had the tools to be efficient about it but rather that library research aims to accom- plish something rather different than does standard research. It is not interested in creating a model of reality based on fixed meanings and then asking observed real- ity to judge whether this model is right or wrong. It does not ultimately seek a correspondence between what it argues and a “real world.” Rather, it seeks to con- tribute to an evolving conversation about what human activity means. Its premise is that the real world has no inherent or single meaning but that it becomes what we make it. Individual works can best contribute to that conversation if they combine a coherence of individual vi- sion with a tolerance of reinterpretation. They require solidity in themselves but also must facilitate their own reuse in other contexts. The system of knowledge so produced aims to find the largest possible universe of human meanings. On the way, it will turn up oceans of Rankean facts. But they are just a means to another end. Implications Having shown that library research is a “technological” (in the sense of cogni- tively legitimate) approach to thinking about human affairs, I would now like to turn to the implications of my argument for libraries and library research going forward. The first, and by far most important, of these is that because library research is not aimed at finding things, in particu- lar at finding correspondences between models and the world, but rather is aimed at space-filling or some other criterion of plenitude, it is by no means clear that in- creasing the efficiency of library research will improve its overall quality. For ex- ample, we cannot automatically assume that increasing the speed of access to library materials by orders of magnitude has improved the quality of library-based research. This would follow at once if convergence on correspondence between model and reality were the aim of library research, but given that it is not, there’s no necessary reason why faster should be be�er. In fact, given that browsing is reduced by efficient access, my argument implies that, other things being equal, faster is probably worse. Indeed, this skepticism holds for many other technological improvements of library research. For example, an applica- tion of elementary economic theory tells us that dramatic lowering of item access costs in terms of time—or put another way, the dramatically increased produc- tivity of a given unit of time spent looking for materials—has almost without ques- tion meant that library researchers devote more time to discovery and access (rela- tive to reading) than did library research- ers of decades ago.18 That a substantial decrease in reading would help library research seems most unlikely. A second general implication involves the overall quality of the stack of prior material on which library researchers draw. Since on my model recursive use of prior material is central to new colligation in library research, a general lowering of stack quality—or an increasing inability to differentiate quality material in the stack—is a serious problem to library research. But a variety of forces, not all of them technological, have probably lowered the quality of this stack. The most powerful such force is vastly lowered barriers to entry, both from technologi- cally mediated increases in availability of sources and from productivity enhancers like canned so�ware for statistical analy- sis and, increasingly, for the automated 540 College & Research Libraries November 2008 analysis of texts. On the reasonable as- sumption that the level of a scholar ’s academic placement—in terms of access to traditional library resources—is not completely uncorrelated with his or her ability, lowered barriers to entry probably means lowered quality. This mechanism is furthered by the following: the prolif- eration of journals, which makes good work harder to find; by the decreasing resources invested in peer review, which makes standards hard to maintain; and by the expansion of academic publishing to earlier and earlier phases of the profes- sional life course, which fills the library with intellectual juvenilia. To understand the full workings of this mechanism, however, requires a more de- tailed theory of library research than I can sketch here. A hidden assumption of the neural net analogy is that we can specify the “weights” assigned to prior scholar- ship and data by artisanal researchers as they take in material from the library. I have emphasized the importance of the interaction between external inputs and the input that comes from a scholar ’s prior knowledge. But for a neural net to work, the ideas and interpretations that come out of this interaction must themselves be “weighted” as they are assembled into a new colligation; not all of them are of equal importance. For the library-research “nets” of the various disciplines to succeed in filling the space of possible interpretations and revisiting past interpretations on some finite basis, it is no doubt necessary that these weights be constrained to some range of values. These constraints are no doubt implicit in the tacit knowledge that disciplinary practitioners have in mind when they insist that library research can be taught only in seminars or direct dissertation supervision. They are also probably related to the phenomenon, universal in library-based disciplines, of selecting some somewhat arbitrary (and slowly changing) set of texts as a canon that will have special weight in the process of scholarship. Until we can specify more exactly how this tacit knowledge works to produce the mixture of rigor, traditionalism, innovation, and recolligation that is characteristic work of the library-based disciplines, we cannot directly predict the effect of decreasing stack quality on that knowledge. But it seems likely that its effects will be as problematic as they have been in the more “scientific” disciplines, which are affected by many of the same forces although via different particular mechanisms.19 A third general implication of my argument concerns systematic loss of randomness. As anyone who has worked recently in optimization knows, strip- ping the randomness out of a comput- ing system is a bad idea. Harnessing randomness is what optimization is all about today. (Even algorithms designed for convergence make extensive use of randomness, and it is clear that library research in particular thrives on it.) But it is evident that much of the technologiza- tion of libraries is destroying huge swaths of randomness. First, the reduction of access to a relatively small number of search engines, with fairly simple-minded indexing systems based on concordance indexing, has meant a vast decrease in the randomness of retrieval. Everybody who asks the same questions of the same sources gets the same answers. (Although this effect is, to be sure, undercut by the paradoxical fact that asking the same question of a source a few days later can produce dramatically different results because of altered algorithms, improved OCR readers, new data, and so on.) The centralization and simplification of ac- cess tools thus has major and dangerous consequences. This comes even through reduction of temporal randomness. In major indexes without cumulations—the Readers Guide, for example—substantial randomness was introduced by the fact that researchers in different periods tended to see different references. With complete cumulations, that variation is gone. The same things always rise to the A Computational Theory of Library Research 541 top. As Merton’s theory predicts and as the Salganik-Watts experiments have shown, this rising arises as much through arbitrary piling-on as it does through recognition of quality.20 Going also is a huge of amount of ran- dom variation introduced by the physical character of library artifacts. Books must be divided into pages, pages into lines. Each of these divisions creates a random emphasis—how many of us thumbing through a dictionary have been caught by two or three head words before we get where we want, and led thereby to some minor discovery! The same kinds of random emphases are created by physi- cal shelving—the importance of books at (varying) eye-heights, the importance of books at the ends of stacks that are visible from the corridor as one walks by, and so on. All of this, ultimately, disappears in the Googlification of the library. (It could be artificially imposed, to be sure, but the mistaken ethic of efficiency militates against it.) Yet, in fact, all of this random- ization introduced by the physical nature of the artifacts is probably quite impor- tant in the computational architecture of library research. Indeed, it is physical proximity that produces the famous epi- sodes of serendipity with which library researchers love to confute opponents of the physical library. But these stories, which emphasize the extraordinary nature of the one unique book pulled by accident off a nearby shelf, convey a mistaken impression. As I have argued earlier, browsing and the consequent production of serendipitous insight are a constant presence in library work, not an exceptional one. But that constant back- ground browsing only works because the library is a highly ordered physical and indexed system that is cut by thousands of random cuts. It is this superposition of random cuts on a highly ordered substrate that makes library browsing so constantly productive. This argument makes it clear why “efficient” search is actually dangerous. The more technology allows us to find exactly what we want, the more we lose this browsing power. Library research, as any real adept knows, consists in the first instance in knowing, when you run across something interesting, that you ought to have wanted to look for it in the first place. Library research is almost never a matter of looking for known items. But looking for known items is the true—indeed the only—glory of the technological library. The technological library thus helps us do something faster, but it is something we almost never want to do; furthermore, it strips us in the pro- cess of much of the randomness-in-order on which browsing naturally feeds. In this sense, the technologized library is a disaster. (I have tried to insist that my university library design its new remote access system for rarely used materials so that it delivers the wrong item one out of twenty times. My librarians are skeptical.) There are other dangers in the shi� to concordance and other simplified forms of indexing as opposed to hu- man-based subject indexing. There is still no automated indexing system that compares with human indexing as a means of creating new meanings and connections. Concordance indexing is a blunt instrument indeed. Even the newer “word cloud” index systems have many pathologies, as was discovered thirty years ago when anthropologists like Roy D’Andrade first started using them.21 They’re very visual—which appeals to a new generation—but their actual connec- tion with the meaning systems they index is o�en problematic. And unless they are changed from passive clustering systems to actively intelligent systems, they are all subject to the problem noted above: that they can only deliver the same set of things to whoever queries them similarly. That they do this quickly and effectively just means that much less randomness, that many fewer occasions for new in- sight. Of course, they do permit certain kinds of discoveries. Concordance-cloud techniques were used almost forty years 542 College & Research Libraries November 2008 ago to discover the order in which the works of Plato were wri�en.22 But the order in which the works of Plato were wri�en, although an interesting Rankean fact, is not the question that drives the publication of new books about Plato year a�er year. The Plato story is a parable of the new library. It is indeed true that the new technologies enable us to do many things faster than ever before. It is indeed true that those technologies enable us to do some kinds of things that we have never done before. But neither of these things means that the current technology really revolutionizes library research. It is a wonderful new tool when well handled, but most of its direct effects on library re- search are mixed or deleterious. And the ideology behind much of it—that it will suddenly enable unskilled researchers to produce high quality work—is simple anti-intellectualism. Conclusion I have argued two basic things in this paper. First, I have shown that library research is a fully legitimate form of in- quiry, a computational architecture every bit as “scientific” as standard research with its more familiar design. Second, I have shown that, given what library research aims to do and how it actually works, most of the moves toward the technologization of library practices are either neutral or harmful to the enter- prise as it has been conducted. But I do not wish to close with this doom-and- gloom scenario. I do think that library re- searchers have to defend their resources against the technologists, who have no idea of what library research is or what it aims for, and against the administra- tors, who see in the false technological argument an intellectual justification for the huge savings they hope to real- ize by decommissioning libraries. But, on the other hand, I take heart from the most important single statistic to emerge from my own 5,700-respondent survey of library use at my university library.23 We created an index of use of physical materials that included things like taking a book out, browsing the stacks, finding a useful book in the reference depart- ment, recalling a book, and so on. And we created another index of cu�ing-edge electronic use—consulting an online bibliographical tool, downloading data from a government data Web site, using an online reference system, and so on. And, much to everyone’s surprise, the correlation between these two things was not only substantial and positive at the group level—graduate students did both of these things much more than did un- dergraduates—but also at the individual level; in fact, the correlation was about 0.5. There is thus no evidence of substitu- tion of one kind of use for another. Quite the reverse: among the young people using our library, it is the heavy physical users of the library who use electronic resources the most, and the heavy elec- tronic users who use physical resources the most. What this says plainly is that there are heavy “research library” users and lighter, “study hall” users. And the heavy users use whatever they can get their hands on; scholarship advances on electronic and physical fronts at once. It’s obvious, once you think of it; a good student will pursue all means to success. It’s the bad students who take the easy way home. What this means for policy is very simple, if very expensive. If you are going to have a serious library research community, you have to have both a physical library and a technological one. The new technology is not a pana- cea—more a useful extension. While it provides wonderful benefits to the many scholars not lucky enough to work at uni- versities with great physical collections, and while it enables some things never before allowed, it does not in fact “revo- lutionize” library research. The technolo- gized library may, of course, sweep all before it. But that victory would entail the loss of something far be�er than the technologized library can produce. A Computational Theory of Library Research 543 Notes 1. By “library research” in this paper I refer to research conducted by disciplinary experts (such as musicologists, literature professors, historians, and political scientists) that is primarily based on materials collected in libraries. I do not mean research about how libraries themselves function. The paper is, in fact, a theoretical essay about the la�er topic (with respect to a particular group of users—disciplinary experts), but to address that topic I need a term for the ensemble of discipline-based expert research conducted in libraries. “Library-based scholarship” is correct but cumbersome. So I shall use the phrase “library research” throughout, opposing it implicitly to terms like “survey research” or “ethnography.” 2. There are only a handful of how-to volumes about library research in my university’s 7.5- million volume library: perhaps a dozen simple manuals directed at college students as well as Thomas Mann’s more advanced Oxford Guide to Library Research (New York: Oxford University Press, 2005), which, although absolutely superb as a guide to identifying and finding materials, does not cover the library research process as a whole. There are, of course, many books designed to teach the reader how to look for expert information on some topic. Such books exemplify what is perhaps the dominant assumption behind much information thinking—that truth (or the “true” expert judgment) is out there somewhere in the library, and the task is to find it with minimum difficulty. Typical titles are Student Guide to Research in the Digital Age: How to Locate and Evaluate Information Sources and Find It Fast. Such books obviously offer no help in theorizing how it is that expert library workers create knowledge in the first place; they assume that knowledge ex ante, out there for the finding. 3. Examples are T.L. Martinson, Introduction to Library Research in Geography (1972); A.E. Simpson, Guide to Library Research in Public Administration (1976); R.K. Baker, Introduction to Library Research in French Literature (1978); L.F. Place et al., Aging and the Aged: An Annotated Bibliography and Library Research Guide; L.L. Richardson, Introduction to Library Research in German Studies (1984); S.E. Sebring, Introduction to Library Research in Women’s Studies (1985); and J.M. Weeks, Introduction to Library Research in Anthropology (1991). (The la�er five are all in a Westview Press series of Guides to Library Research.) Generally aimed at advanced undergraduates and begin- ning graduate students, all of these books are in effect slimmed-down, specialized versions of the ALA Guide to Reference Books (which indeed is mentioned in all but one of them), usually coupled with some useful advice about the idiosyncracies of the Library of Congress classification system and other indexing tools. The same is true of Downs’s old standard How to Do Library Research (Urbana: University of Illinois Press, 1966 and later editions). None of these is really a manual for an expert or even an advanced student, although, paradoxically, every one of them contains a far larger range of reference tools than would be in the working knowledge of even the greatest experts in the specialties involved. This paradox captures nicely the enormous difference between “finding information” and “doing research.” 4. If there is li�le about the method of library research, there is even less about its theory. At time of writing (26 April 2007), a check of Google revealed five uses of the phrase “theory of library research.” One of them is in a tongue-in-cheek guide to the simplest forms of library usage for Duke University chemistry majors. The rest are references to the work of the present writer. There are twelve entries for “library research theory”; all appear to be artifacts combining the last words of one sentence with first word of another—“...library research. Theory....” For a recent review of the sociology of science, see S. Sismondo, An Introduction to Science and Technol- ogy Studies (Malden, Mass.: Blackwell, 2004). The standard journal in the field is Social Studies of Science. The reader will scan it in vain for articles on library-based knowledge. 5. K.R. Popper, Conjectures and Refutations (New York: Basic, 1962); T.S. Kuhn, The Structure of Scientific Revolutions, 2nd ed. (Chicago: University of Chicago Press, 1970); Sismondo, An Introduction to Science and Technology Studies. According to the Popperian model (Popper 1962), science proposes conjectures, which are then tested against real-world data and either refuted or not. Knowledge at any given time is made up of non-refuted conjectures. Kuhn (1970) insists that “real-world data” are to some extent theory-defined, and that “paradigms” (bundles of theory, data, and practices) are not able to see their own refutation, as Popper’s theory requires. See Sismondo 2004 for an introduction to these theories. 6. I apologize to those for whom “data” must be plural. My usage here (singular for data seen collectively and plural for data seen as disparate facts) is standard in the social sciences at this point. 7. Popper, Conjectures and Refutations, 33–65. 8. There are a�empts to change this at present, but they face enormous difficulties because of the incommensurability of datasets and, more important, of their internal structure. Generalized data archiving is at present in the same situation as were books around the time of the standard- 544 College & Research Libraries November 2008 ization of the LC classification; the metadata standards we seek at present are the equivalents of authoritative standards for descriptive bibliography. 9. Several “standard research” readers of this manuscript have objected that “of course we also have and do those things” (that is to say, the kinds of sources, practices, and structures I argue characterize library research). As an empirical statement, this is of course true. Standard researchers do plenty of pa�ern searching and random access and other things that I shall argue characterize library research. But these are not part of the ideal they teach their students nor are they part of the organizing reality of their research programs or of the criteria by which they judge proposals when serving as members of funding panels. In those activities, they are quite clear about enforcing the formal picture given in the preceding section. In fact, then, their reason for claiming that “we do it too” is to assert overall jurisdiction over “scientific method” and to assert that their brand of it is the only one. It is the central assertion of this paper that that claim is false. 10. Keywords, in the classical sense, are a short number of (subject) index words that are as- signed by a human coder to a particular text. They may or may not occur in that text, and they are, typically, part of a controlled vocabulary that enables the retrieval of effectively concentrated bibliographies. Since they are o�en assigned by authors themselves, they amount to authorial steering of future readers. Obviously, keyword indexing in this sense contains far more infor- mation for the scholar than does indexing by simple words that occur in a text, even when this la�er is supplemented by quantity information. I use the name “concordance indexing” for this la�er type of indexing by words in the text—which confusingly has been called keyword out of context (KWOC) indexing even while the original sense of “keyword” still survived. There is nothing “key” about the keywords in KWOC indexing. Calling concordance indexing “keyword indexing” is like calling oleomargarine bu�er. 11. D.W. King and C. Tenopir., “Using and Reading Scholarly Literature,” Annual Review of Information Science and Technology 34 (1999): 423–77; C. Tenopir and D.W. King, Towards Electronic Journals (Washington, D.C.: SLA, 2000); C. Tenopir and D.W. King, Communications Pa�erns of Engineers (Piscataway, N.J.: IEEE Press, 2004). There is an enormous and quite rich literature in information science on reading, much of it summarized in King and Tenopir (1999) and Tenopir and King (2000, 2004). The vast majority of it concerns scientists, engineers, and physicians, whose use of published information is radically different from that of the humanists and social scientists who are the library researchers here discussed. More disturbing, however, most of this work presupposes a theory of knowledge as independent bits of information that has been systematically dismantled by sociologists and philosophers of knowledge and science over the last fi�y years. It applies only to that part of knowledge that consists of sheer facts, what will here be called Rankean facts (see footnote 15). As a model of more general knowledge systems, the “knowledge bits” theory is clearly inadequate. 12. R.E. Rice, M. McCreadie, and S.-J.L. Chang, Accessing and Browsing Information and Com- munication (Cambridge, Mass.: MIT Press, 2001). There is a substantial literature on browsing in information science. Its approach is generally more individualized and psychological than the approach taken here. Also, it does not generally focus on browsing by experts and therefore does not focus on the centrality of antecedent knowledge in the browsing process. 13. J.R.M. Butler, The Passing of the Great Reform Bill (London: Longmans, 1914). 14. W. Whewell, The Philosophy of Inductive Science (London: Parker, 1847); C.B. McCullagh, “Colligation and Classification in History,” History and Theory 17 (1978): 267–84; A. Abbo�, “Event Sequence and Event Duration,” Historical Methods 17 (1984): 192–204; E.O. Wilson, Consilience (New York: Knopf, 1998). The term “colligation” (from Whewell) did have a faint a�erlife, in the literature on the philosophy of history. The Whewellian approach has been more recently revived by E.O. Wilson. 15. L. Krieger, Ranke: The Meaning of History. Chicago: University of Chicago Press, 1977. The standard source on Ranke’s historiography is Krieger. His translation of what he calls “the most famous statement in all historiography” (1977:4) is: History has had assigned to it the task of judging the past, of instructing the present for the benefit of ages to come. The present study does not assume such a high office: it wants to show only what actually happened (wie es eigentlich gewesen). 16. A computer science colleague has objected that checksums are widely used to verify the stability of computer files and that, in fact, computer files therefore have greater stability than texts. True enough. But that argument ignores the problem of the creation of a central and cred- ible verifying authority for such checksums and the defense of such an authority from political and secret manipulation, one of the many hurdles to be surmounted before the online world can have the credibility and authority provided willy-nilly by the physicality of print. The issue can be seen as a more general ideological one. The ease of updating in the online world leads us all to indulge ourselves in the core fantasy of a society founded on the ideology of progress—that A Computational Theory of Library Research 545 the present is always be�er than the past. There is, in fact, no a priori reason to think this is true (or false). But our devout belief in its truth leads us to rewrite the past with complete abandon. The physical nature of library artifacts prevents that. 17. A. Abbo�, Chaos of Disciplines (Chicago: University of Chicago Press, 2001); A. Abbo�, “Seven Types of Ambiguity,” in Time Ma�ers (Chicago: University of Chicago Press, 2001). 18. Allen Renear (personal communication) has reported some empirical work confirming this prediction. 19. F. Rodell, “Goodbye to Law Reviews,” Virginia Law Review 23 (1936): 40–41. A classic case of the technology-induced degradation of a knowledge system is law reviews, which were over- whelmed with pseudo-scholarship in part because of citation indexing. Fred Rodell’s still-famous article has lost none of its sting in seventy years: “And then there is the probative or if-you’re- from-Missouri-just-look-at- this type [of footnote]. ... It is [this] probative footnote that is so o�en made up of nothing but a long list of cases that the writer has had some stooge look up and throw together for him. ... Any article that has to be explained or improved by being clu�ered up with li�le numbers until it looks like the Acrosses and Downs of a crossword puzzle has no business being wri�en.” 20. This is the effect nicknamed the “Ma�hew effect” by Merton in a famous article. The Sal- ganik-Wa�s Web-based experimental studies look at teenagers’ piling-on to arbitrarily labeled “good bands.” R.K. Merton, “The Ma�hew Effect in Science,” Science NS 159 (1968): 56–63; M.J. Salganik, P.S. Dodds, and D.J. Wa�s, “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market,” Science NS 311 (2006): 854–56. 21. M.L. Burton, “Mathematical Anthropology,” Annual Review of Anthropology 2 (1973):189– 99. 22. L.I. Boneva, “A New Approach to a Problem of Chronological Seriation Associated with the Works of Plato,” in Mathematics in the Archeological and Historial Sciences, ed. F.R. Hodson, D.G. Kendall, and P. Tautu (Edinburgh: Edinburgh University Press, 1971), 173–85. 23. A. Abbo�, “The University Library,” Appendix to the Report of the Provost’s Task Force on the Future of the University Library, University of Chicago, 2006. Available online at www.lib. uchicago.edu/e/about/abbo�-report.html and www.lib.uchicago.edu/e/about/abbo�-appendix. html. [Accessed 29 September 2008].