College and Research Libraries STEPHEN BULICK Book Use as a Bradford-Zip£ Phenomenon The Bradford distribution is introduced and explained; its ambiguity, aris- ing from two different conceptions, and relationship to the Zipf distribution are discussed. The general situation that might be expected to yield such a distribution is described and shown to fit the interaction between book users and books available for use in a library. Circulation data from a major re- search library are then shown to follow the verbal rather than the graphical formulation of Bradford's law. Some possible implications are discussed. IN HIS FAMOUS WORK Documentation, S. C. Bradford expressed an idea that has come to be called "Bradford's Law of Scat- tering" or simply "Bradford's Law." Bradford discovered a mathematical expression for the distribution of articles on a given sub- ject in the scientific literature: 1 If scientific journals are arranged in order of de- creasing productivity of articles on a given sub- ject, they may be divided into a nucleus of peri- odicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and suc- ceeding zones will be as l:n:n2 In this statement "n" is a constant particular to the subject involved. He also developed a graphic formulation of the law. 2 Unfortunately, as Vickery and later Wilkinson noticed, the two formula- tions are different. 3 • 4 Because some re- searchers have used one formulation and some have used the other, confusion has Stephen Bulick is a doctoral candidate at the Graduate School of Library and Information Sci- ences, University of Pittsburgh. The work in this paper was supported in part under National Sci- ence Foundation grants SIS 7~11840 and DSI 75-11840 A02. Contributions and comments by Allen Kent, James G . Williams, and K. Leon Montgomery of the school and Ronald Schuetz of the Hillman Library are gratefully acknowledged. developed. Whichever formulation is cho- sen, the idea to be preserved is that, given some coherent population of items, a small number will be very productive (in terms of selection, citation, or whatever), a greater number will be less productive, still a greater number will be even less produc- tive, and so on until the population of items has been exhausted. There are, thus, "zones" of fairly constant total productivity but containing greater and greater numbers of individual items in each zone, i.e., the average productivity per item decreases in each succeeding zone. Fur- thermore, there is a direct relation between the cumulative productivity and the cumula- tive total of items. Though Bradford appar- ently felt that the zones were based on a theoretically derived nucleus, Vickery shows that Bradford's law holds for any number of zones of equal productivity. 5 RELATION OF BRADFORD AND ZIPF LAWS While originally proposed. to describe the distribution of journal articles in the scien- tific literature and frequently used by librar- ians as an aid in preparing bibliographies and in maintaining journal coverage of vari- ous subject areas, Bradford's law seems ap- plicable to other situations as well. Brookes points out that the Bradford distribution I 215 216 I College & Research Libraries • May 1978 closely resembles a Zipf distribution with certain restrictions. 6 A strictly linear Zipf distribution arises from a situation in which items are chosen from a restricted population of possible items. Gradually some items emerge, on a "success breeds success" basis, as most popular and continue to be chosen at a greater rate. The total population of "used" items increases as well, since some items continue to be chosen for the first time. With no restrictions as to the availability of the popular items, the distribution of "uses" vs. "items used" becomes linear on a semi- log scale. With such restrictions the graph curves at the bottom but rises to linearity. The Bradford distribution can thus be superimposed on the Zipf distribution and, in fact, becomes a special case of the Zipf distribution. Leimkuhler has als Qticed the close re- lationship between t radford and Zipf distributions. 7 Since the "success breeds success" mechanism and the restriction on most popular items are common to many situations, one would expect the Bradford- Zipf distribution to apply in other areas. Brookes describes the general situation in which one might expect to encounter such a distribution: The Bradford-Zipf distribution can be expected to arise when selection is made of items, charac- terized by some common element, which are all equally open to selection for an equal period and subject to the "success-breeds-success" mechanism, but when the selection of a most popular group is al.so, but to a weaker extent, subject to restriction. It is thus a general law of concentration over· an unrestricted range of items on which is superimposed a weaker law of disper- sion over a restricted range of the most fre- quently selected items. 8 Our notion is that this description fits the pattern of book use by library patrons and that Bradford's law should, therefore, de- scribe that situation as well. TwO FORMULATIONS OF BRADFORD-ZIPF Wilkinson performs a valuable service in working out both forms of Bradford's law and then applying each to the same set of data. Her conclusion is that the graphical formulation yields better results. 9 We shall borrow her expressions to show that the verbal formulation better describes library circulation data. The underlying assumption of both forms of Bradford is the same: If items are ranked by productivity from high to low, there is a relationship between cumulative (and there- fore relative) productivity and the logarithm of cumulative items and thus the rank. The verbal formulation, used by Vickery10 and Leimkuhler, 11 is expressed as: R(n) = j log (nit + 1) where: R(n) = cumulative productivity of the first n items andj and t are constants. The graphical formulation, used by Brookes, 12 is expressed as: R(n) = k log nls where: R(n) = cumulative productivity of first n items and k and s are constants. Furthermore, j =f k and t =f s. APPLICATION TO LffiRARY CIRCULATION DATA We now develop each of these ex- pressions for library circulation data. Our items are books and monographs; our pro- ductivity is measured in "uses" or "transac- tions" of those items. All logarithms are to base e. Table 1 shows virtually complete ex- ternal circulation data for calendar year 1972 from the Hillman Library at the University of Pittsburgh, a major research library and Association of Research Libraries member. (It should be noted that the circulation data exclude nonpatron transactions for binding, storage, mending, etc.) Items have been ranked according to frequency of use; col- umns are as follows: A represents the number of times circulated. B represents the number of individual items hav- ing the frequency in column A, e.g., seven items circulated fifteen or more times. C cumulates column B, e. g., forty-six items circu- lated thirteen times or more. Dis column A times column B, i.e., the number of transactions (or circulations) accounted for by the items in column B, except for row 1, of course. E cumulates column D, e.g., items that circu- lated 13 times or more accounted for 629 total transactions. To ease the computational burden (and drop out a constant) we recast each of the I r two expressions to get F(x), the proportion of total productivity contained in the most productive fraction, x, of items. Verbal: F(n!N) = R(n)IR(N), where N = total items count F(n!N) = j log (nit + 1) I j log (Nit + 1) x = n/N F(x) = log (xNit + 1) I log (Nit + 1) Let V =Nit, then F(x) = log (Vx + 1) I log (V + 1) Graphical: F(n!N) = (k log nls) I (k log Nls) x = n!N F(x) = (log xNis) I log Nls Let G = Nls, then F(x) = log Gx I log G Constants for distributions of this type (gamma related) present a problem, since there is no strictly scientific way of deter- mining what they should be. Both Wilkin- son and Leimkuhler offer explanations. 13 •14 Wilkinson's method is use.d here although Leimkuhler's is similar. To derive the con- stants t and s, using data in table 1: Let p items contribute cumulative S transactions q items contribute cumulative 25 transactions From table 1: p = 25,525 q = 107,152 =p21 (q - 2p) s = 105,359 2S = 210,717 = 651,525,625 I (107,152 - 51,550) = 11,613 V =Nit = 9.227 s = p 21 q Book Use I 217 = 651,525,625 I 107,152 = 6,080 G = Nls = 17.624 To obtain the actual productivity, in transactions, for any most productive frac- tion x of items, it is necessary to multiply 107,152, the total items population count, by the fraction, then look at table 1, inter- polating as necessary, for the actual number of transactions produced by xN items, e.g.: Let x = 0.10, then xN = 10,715. From column C of table 1 we find that 10,715 lies between 7,294 and 13,603. 7,294 items produce cumulative 44,522 transac- tions. We must account for the next 3,421 items (10, 715-7,294), which will be drawn from the group used 4 times each, that is, the next 3,421 items yield 4 X 3,421 = 13,684 transactions. 44,592 + 13,684 = 58,276 transactions produced by 10,715 items. Interpolation is necessary because each rank contains many ties. Applying each version of Bradford to the data in table 1 and interpolating as neces- sary to get the actual transaction counts yields the results shown in table 2. TABLE 1 CIRCULATION DATA, 1972, HILLMAN LIBRARY, UNIVERSITY OF PITTSBURGH A B c D E No. oftimes No. of Cumulative No. of Cumulative circulated items no. of items transac- no. of circulated circulated tions transactions {A X B) 15+ 7 7 110 110 14 12 19 168 278 13 27 46 351 629 12 37 83 444 1,073 11 62 145 682 1,755 10 120 265 1,200 2,955 9 281 546 2,529 5,484 8 503 1,049 4,024 9,508 7 965 2,014 6,755 16,263 6 1,859 3,873 11,154 27,417 5 3,421 7,294 17,105 44,522 4 6,309 13,603 25,236 69,758 3 11,756 25,359 35,268 105,026 2 23,898 49,257 47,796 152,822 1 57,895 107,152 57,895 210,717 218 I College & Research Libraries • May 1978 TABLE 2 VALVES PREDICTED BY BRADFORD-ZIPF COMPARED WITH ACTUAL VALVES Most produc- tive fraction (x) 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.20 0 .30 0.40 0.50 0.60 0.70 0.80 0.90 Items in that frac - tion (xN) 1,072 2,143 3,215 4,286 5,358 6,429 7,501 8,572 9,644 10,715 21 ,430 32,146 42,861 53,576 64,291 75,006 85,722 96,437 Predicted transactions verbal 7,999 15,349 22,147 28,471 34,382 39,931 45,160 50,103 54,791 59,253 94,773 120,228 140,080 156,354 170, 147 182,115 192,685 202,150 graphical ...... t 4,101 15,422 25,229 33,880 41,617 92,526 122,301 143,428 159,816 173,205 184,527 194,334 202,984 Actual Percent of trans- error* actions verbal graphical error error 9,669 17.27 17,037 9.91 23,469 5 .63 29,482 3,43 34,842 1.32 40,197 0 .66 89.80 45,350 0.42 65.99 49,634 0.94 49.17 53,922 1.61 37. 17 58,276 1.68 28.59 93,239 1.65 0.76 118,600 1.37 3.12 140,030 0.04 2.43 157,141 0.50 1.70 167,855 1.37 3. 19 178,571 1.98 3.34 189,287 1.80 2.67 200,002 1.07 1.49 Predicted values for verbal and graphical may contain 00. 1% error due to rounding of logs. Values for xN and F(x) are rounded to nearest whole item or transaction. verbal: F(x)= 1n(Vx+ 1)/1n(Y + 1) V=Nit = 9.227 graphical : F(x)= 1n Gx/1n G G=Nis = 17.624 • % error = I actual - predicted I /actual t when xN < s, ln Gx < 0 CONCLUSIONS As the data in table 2 show, the verbal formulation consistantly produces a lower error percentage than the graphical and is , in fact , a good overall predictor. Book use, then , is a Bradford-Zipf phenomenon. Figure 1 shows a graph of cumulative transactions against cumulative items on a semi-log scale for the real data shown in table 1. This is a standard Bradford-Zipf graph of the type used by Brookes and Wil- kinson. It clearly shows the curvilinearity described by Brookes. If we accept Brookes' explanation (though not his formulation) of the Bradford-Zipf phenomenon, then the curving part of the line represents demand interference, i.e., items represented on this part of the line would have been used even more if they had somehow been more avail- able. The ideal would be a straight line meaning that every item had lived up to its use potential. This offers support for the strategy of shortening loan periods for frequently circu- lated items. Other potential uses for a func- tion that describes the distribution of book use over a collection might include core col- lection determination or the derivation of a marginal utility function describing the ef- fect on total use of adding to a collection. REFERENCES 1. S. C . Bradford, Documentation (London: Crosby Lockwood & Son, 1953), p.154. 2. Ibid. , p.153. 3. B. C . Vickery, "Bradford's Law of Scatter- ing," journal of Documentation 4:198-203 (1948). 4. E. A. Wilkinson, "The Ambiguity of Brad- ford's Law, " journal of Documentation 28:122-30 (1972) . Book Use I 219 240 210 180 ~ X tn z 150 0 j: u • tn z • a: 120 ~ w > j: • ~ ~ :I 90 ~ u 60 2 4 6 8 Log 8 CUMULATIVE ITEMS Fig. 1 Bradford-Zipf Graph of Data in Columns C and E ofTable 1 5. Vickery, "Bradford's Law of Scattering," p.202. 6. B. C. Brookes, "Bradford's Law and the Bib- liography of Science," Nature 224:953-56 {1969). 7. F. F. Leimkuhler, "The Bradford Distribu- tion," Journal of Documentation 23:197-207 {1967). 8. Brookes, "Bradford's Law and the Bibliogra- phy of Science," p.954. 9. Wilkinson, "The Ambiguity of Bradford's Law," p.124. 10. Vickery, "Bradford's Law of Scattering." 11. Leimkuhler, "The Bradford Distribution." Wilkinson shows that Leimkuhler uses the verbal formulation. 12. Brookes, "Bradford's Law and the Bibliogra- phy of Science." B. C. Brookes, "Numerical Methods of Bibliographic Analysis," Library Trends 22:18-43 {1973) . 13. Wilkinson, "The Ambiguity of Bradford's Law." 14. Leimkuhler, "The Bradford Distribution." Use this free booklet to learn how to solve your problems of keeping up with the proceedings literature. This new booklet gives you a complete explanation of lSI's newest service, the Index to Scientific and Technical Proceedings,Man in-depth search tool for the proceed- ings literature of engineering and the sciences. The booklet shows examples of the easy-to-use, table-of- contents format used for the 3,000 proceedings entries indexed annually. And there are samples of each of the six index sections which lead you to over 90,000 papers contained in the proceedings literature. The-ISTPTMprovides easy access to a formerly difficult area to search. To convince yourself and your patrons how valuable this information is, write for our free booklet today. Please send me a free copy of your new booklet describing the Index to Scientific and Technical Proceedings ™. Name ______________ Title __________ _ Department _____________ Telephone _ _______ _ Organization ________________________ _ Address _________________________ _ City ______ State/Province ______ Zip ____ Country __ _ 101 -1179 C19711SI ~@~®Institute for Scientific Information® 325Chestnut St ., Phila ., Pa. 19106. U .S.A ., Tel : (215) 923-3300. Cable: SCINFO. Telex : a.--5305