College and Research Libraries GLORIA S. CLINE Application of Bradford~s Law to Citation Data This study serves as a test of the two formulations of Bradford's law, verbal and graphical, using 5,628 citations to journal literature referenced in Col- lege & Research Libraries and Special Libraries, 1940 through 1974. The data are divided into seven five-year spans so that comparisons can be made between the calculated percentage errors for each journal during each five- year period. In addition, trends in citation patterns are identified . Neither the verbal nor the graphical formulation provides results that are clearly more consistent with the practical situation. INTRODUCTION The intellectual base of any discipline is revealed in its journal literature, which serves , among other things, as a vehicle for disseminating information, introducing in- novations , and reporting the findings of re- search in the field. In recent years, biblio- metric techniques have been used widely to identify the characteristics of the journal lit- erature of many different subject areas, but most often researchers have concentrated the use of these techniques on the literature from various branches of science . Library periodical literature has seldom been sub- jected to such intense scrutiny and when appraisals have been made, comments generally have been negative. In 1967, for example , Katz conducted a survey of the attitudes of library science faculty and stu- dents toward their professional literature and summarized their comments by saying, "Library literature is timid , rotten , unimagi- native , vague, repulsive , and debased. " 1 Moon described the literature as a "stream of garbage. "2 With these thoughts in mind , it seems obvious that librarians must look back at their literature and produce evalua- Gloria S . Cline is assistant director, University Libraries , the University of Southwestern Louisiana, Lafayette. tive accounts of any changes that may have occurred in the principal library science journals to refute these negative statements . Bibliometric techniques present themselves as a key to objective evaluation. Since the basic data for many of the bib- liometric techniques are citations, the ques- tion rapidly arises concerning which of the library periodicals are likely subjects for in- vestigation. Until recently , the national general-interest journals such as Library Journal and American Libraries published articles having no references whatever. Others like Library Quarterly were thought to be too scholarly to have wide appeal. Appearing between the two extremes were College & Research Libraries (C&RL) and Special Libraries (SL), both of which had been published for many years, thus per- mitting retrospective analysis, had solid rep- utations, and had wide readership. Bib- liometric techniques could certainly be ap- plied to the citations listed in these journals in order to identify changes that had oc- curred through the years. Although one of the leading bibliometric techniques, Bradford's law, lends itself to such a study, it will require some explica- tion.3 First published in 1948, Bradford's law has been used to test the completeness of a bibliography, to describe the characteristics of various subject literatures, 4 I 53 54 I College & Research Libraries • January 1981 and even to describe the interaction be- tween book users and books available for use in a library. 5 Librarians generally relate the Bradford distribution to "zones" of pro- ductivity and appreciate the fact that the nuclear zone identifies the most productive journals in a subject area. Here in practical terms Bradford's law aids in selection and collection development. However, when one goes beyond this point to discuss the difference between the verbal and graphical formulations of Bradford's law, most librar- ians become confused, uncertain as to just what the difference is. This study will attempt to explain the two formulations of Bradford's law, verbal and graphical, and then , utilizing Wilkinson's formulas , will test the two to determine which provides a bet- ter fit for citation data taken from C&RL and SLover the thirty-five-year period 1940 through 1974. 6 By dividing the data into five-year spans, it will also be possible to identify changes in the literature published in C&RL and SL during this time. BACKGROUND Basically , Bradford's law states that a quantitative relation exists between journals and the papers they publish. The oft-quoted law, now recognized as the verbal formula- tion, represents Bradford' s theory rather than his observations and reads: If scientific journals are arranged in order of de- creasing productivity of articles on a given sub- ject , they may be divided into a nucleus of periodicals more particularly devoted to the sub- ject, and several groups or zones containing the same numbe r of articles as the nucleus , where the number of periodicals in the nucleus and suc- ceeding zones will be l:a:a2 .• . . 7 In' other words , only a small number of journals will be needed to supply the nu- cleus of papers on a given topic, assuming that the topic is a narrow scientific subject. Beyond the nucleus or first zone, however, the number of journals required to produce the same number of papers increases dra- matically. For example , if two journals supply 300 articles on a topic , then four additional journals will be needed to supply the next 300 articles, and sixteen journals the next 300 articles. When Bradford applied his formulation to bibliographies on lubrication and geophy- sics, he found that three zones of produc- tivity resulted. Unfortunately, Bradford did not conclude his study by simply stating his law verbally, but instead went on to express it graphically using experimental data, not noting himself that the graphical expression was not mathematically identical to the ver- bal formulation. He plotted R( n) (cumula- tive total of relevant papers) against log n (natural logarithm of the total of productive journals) and found that the data revealed an elongated S-shaped curve, the general form of which is shown in figure 1. Part one of the curve, the initial concave portion, represents the higher density of the nuclear zone. Part two , the linear portion of the curve when data are plotted on a semilog scale, is equivalent to the Zipf distribution , 8 hence the commonly used expression the Bradford-Zipf distribution . Part three , often called the Groos droop , shows the de- parture from linearity for higher values of n, the reason for which is not yet fully understood. 9 Brookes thought that the droop was observed when there were omis- sions from the relevant literature. 10 Howev- er, Praunlich and Kroll thought it was an intrinsic factor of the distribution. 11 In the years following the publication of Bradford's law, papers by eminent research- ers such as Vickery, Brookes, and Leim- kuhler contributed to a partial understand- ing of the Bradford distribution-partially because these contributors did not interpret the law in mathematically identical terms . Vickery extended the verbal formulation to show that it applied to any number of zones of equal yield , not to only the three zones that Bradford had used for his data. 12 Later Leimkuhler expressed the verbal formula- tion mathematically as is shown in equation 1.13 where R(n) = j log (nit + 1) (n > nrrJ [1] R(n) = n= j and t = n = m cumulative total of relevant papers found in the first n journals when all periodicals are ranked i, 2, 3 . . . n in order of decreasing productivity; cumulative number of journals pro- ducing R(n) relevant papers ; constants defined in terms of other variables ; see equations (4) and (6); and the value of n beyond which the curve becomes linear. RCN) R(n) Application of Bradford·s Law I 55 -------,--------,-----'it I / 1 I . I // IGROOS DROOP I // :__t I // I I / I /. I I I I UPPER I I NONLINEAR I REGION I I I I NUCLEUS I I I i .,.___ I----i · t--m -i So ---n-----~1 I nu LINEAR REGION n(Loo Scale) I I 1 I I I I I N Fig. 1 General Form of Bradford Distribution 56 I C allege & Research Libraries • ] anuary 1981 Still later, Brookes expressed the formula for the graphical version of Bradford's law beyond the nuclear zone and for N large as is shown in equation 2. 14 where R(n ) = N log (n/s) (n > nrrJ [2] N = total number of journals estimated to contain articles relevant to the subject of the search; and s = a constant calculated using experimental data. For some time , it seemed that only Vick- ery had noted that the verbal and graphical formulations were not mathematically iden- tical. Once the disparity between the two formulations was recognized, the question arose concerning which of the two was inore practical to apply to empirical data. Wilkin- son devised a comparative test between the two formulations utilizing the same biblio- graphic data for four different subjects (agri- cultural economics , muscle fiber, schisto- somiasis, and mast cells) . 15 The test did not require calculation of the nucleus (region I in figure 1). Instead it utilized simple for- mulas for calculating N (the estimated total number of journals containing articles rele- vant to the subject of the search) and R(N ) (the estimated total number of papers pro- duced by N). Only p (number of journals) and S (the corresponding cumulative num- ber of papers) had to be known in order to apply the formulas. Both p and S were ob- tained from a plot of the empirical data on semilog paper. Although the value of p ·could be chosen anywhere in the linear por- tion of the curve , the point at which the ini- tial concave portion of the curve turned into the linear region (n = nTYI) was arbitrarily chosen to be equal to p and was used in de- termining the corresponding value of S. By identifying on the plot 2S papers, the corre- sponding number of journals required to supply 2S , called q , was ascertained . The values obtained for S, p , and q were then used to calculate N and R(N ) for both the verbal and graphical expressions of Brad- ford's law (equations 3-6*). Wilkinson's test revealed that, for the data she considered, *Note that the discrepancy in equations 4 and 6, when they are compared with those given by Wilkinson , was attributed to a printer's error in the original article. the graphicaf rather than the verbal formu- lation was more consistent with the practical situation. Verbal Fonnulation s N = -- log a q- p where a=--- p s R(N ) = log a Graphical Fonnulation s N = log 13 q where 13 = - p R(N ) s log 13 p a - 1 [3] S • (a - 1)] log a p [4] [5] s log 13 [6] PURPOSE The purpose of this investigation is threefold. First, the study serves as a test of the two formulations of Bradford's law, ver- bal and graphical , to determine which bet- ter fits citation data from two selected jour- nals in library science. Wilkinson's formulas are used , thus permitting comparisons to be made, not only between the two library periodicals, but also with the results of her study. Two basic differences exist between the Wilkinson study and the current one. The first is that citations are used rather than a bibliography. This means that the physical significance of N as representing the exhaustive search of a complete bib- liography does not apply here since there is no restriction on the cumulative sum of citations that can be made. The second dif- ference is that the citations were drawn from two library science periodicals, C&RL and SL, rather than from journals repre- senting a narrow scientific subject. Thus, another purpose of this study is to test the appropriateness of applying Bradford's law to works in broad subject fields such as li- brary science. The final purpose of this in- vestigation is to briefly identify any changes and trends in adherence to the Bradford distribution that may have occurred in li- brary literature during the thirty-five-year period 1940 through 1974. METHODS The data base for this study consists of 5,628 citations to the journal literature ref- erenced in the articles published in C&RL and SL from 1940 through 1974. 16 The data were divided into seven five-year totals. C&RL began publication in December 1939; this single issue from 1939 is included in the study. The bibliographic information recorded for each citation included journal title and date. Journal titles were sorted in descending order of productivity. Graphs were plotted on semilogarithmic paper, with the vertical axis representing the cumulative number of citations and the horizontal axis being the natural logarithm 600 500 c 11: til ~ ·.-l +J 11! ...-i ::l e ::l 100 u 0 1 Application of Bradford's Law I 51 of the number of journals producing these citations. On the resulting graphs, the linear region was extended as a dashed line for use in calculating the deviation from the Bradford distribution. Great care was taken in ascertaining the slope of the extended portion of the linear region, because it was found that even slight inaccuracies strongly affected the percentage-error calculations. Figures 2 and 3 represent the extreme cases encountered in this study, both of which by coincidence involved citation data from C&RL. Note that the dashed line in figure 2 deviates dramatically from the solid line , illustrating that the calculated number of papers was far greater than the observed number of papers. The percentage error here is 144.0 percent, the largest of the study. On the other hand, the dashed line in figure 3 deviates very little from the solid line and indicates only a small difference between the calculated and observed mim- / / / / / / / / / / / / / / / Cumulative Number of Journals, n (Log Scale) Fig. 2 Bradford Distribution of C&RL Cumulative Citation Data, 1940-44 58 I College & Research Libraries • January 1981 600 500 s:: 0:: Ul 1-< Q) 0. 400 n:l p.. +J s:: n:l :> Q) .-i Q) 300 0:: Il-l 0 .-i n:l +J 0 E-< 200 Q) :> ·.-! +J n:l .-i ::3 e 100 ::3 u 0 1 1000 Cumulative Number of Journals, n (Log Scale) Fig. 3 Bradford Distribution of C&RL Cumulative Citation Data, 1955-59 her of papers. This graph illustrates the smallest percentage error of the study, -0.7 percent. Note also in figures 2 and 3 that these curves represent Zipf distribu- tions rather than true Bradford distributions since there is essentially no nuclear zone present in either case. As East and Wey- man pointed out, this is because citation data are used. 17 RESULTS Recall from the earlier discussion that the physical significance of N as representing the exhaustive search of a complete bib- liography does not apply in the case of cita- tions, since no restriction on the cumulative sum of citations can be made . This study, therefore, deals only with the estimated values of R(N) . However, the value of N still represents a point on the extension of the linear portion of the curve and as such can be calculated using Wilkinson's formulas to provide values of R(N ) for both the ver- bal and graphical expressions of the Brad- ford distribution. Tables 1 and 2 present the data for this study for the seven five-year periods , 1940 through 1974. The data for C&RL given in table 1 can be read in this manner: During the 1940-44 time span , two journals (p) produced a total of 86 papers (S ). By plotting the data on semilog paper, it was found that 5. 7 jour- nals would be required to supply 172 papers (or 25) . For that entire five-year period , a total of 234 papers was actually cited . When the formula for the graphical expression was applied to the observed data, the estimated total of papers was found to be 391.1 , a sum considerably larger than the observed total of 234 . Therefore, the percen- tage error was rather large , 67.1 percent. The verbal formulation provided an even larger esti- mated total of papers , 571 , and, of course , a still greater percentage error, 144.0 percent. In the tables note that the percentage errors Application of Bradford's Law I 59 TABLE 1 COMPARISON OF RESULTS FOR TOTAL NUMBER OF PAPERS R(N ) USING DATA FROM C&RL Graphical Formulation Verbal Formulation Time Obse rved Estim ated Pe rce ntage Period p q Total Total Error 1940-44 2 5.7 86 234 391.1 67.1 1945-49 2 6.7 156 435 693.7 59.5 1950-54 1 3.0 105 391 540.8 38.3 1955-59 3 25 .0 153 385 382.5 - 0.7 1960-64 5 35.0 203 496 519.9 4.8 1965-69 3 12.3 297 993 1191.8 20.0 1970-74 3 18.0 258 797 815.4 2.3 p = particular value of th e cumulative number of journals, n , arbitrarily selected = n . S = estimated cumulative numbe r of papers, R(p ), corresponding to n = p . 111 q = cumul ative numbe r of journals, n , required to supply 2S papers. Estim ated - Observed Estim ated Total 571.0 879.0 760.5 390.7 535.9 1368.6 860.0 Perce ntage e rror (for both graphical and ve rbal) = --- 0 -bs_e_rv-ed-- X 100 (at corresponding N) . TABLE 2 COMPARISON OF RES ULTS FOR TOTAL N UMBER OF PAPERS R (N ) USING DAT-A FROM SL Pe rce ntage Error 144.0 102.1 94 .5 1.5 8.0 37.8 7.9 Graphical Formulation Verbal Formulation Tim e Obse rved Estim ated Perce ntage Estim ated Perce ntage Pe riod p q s Total Total Error Total Error 1940-44 1 2.9 40 175 176.2 0.7 251.0 43 .4 1945-49 4 54.0 80 159 142.7 -10.3 142.9 -10.1 1950-54 4 29.0 50 119 96.5 -18.9 97.6 -18.0 1955--59 6 46.0 97 215 195.7 -9.0 198.2 -7.8 1960-64 3 16.4 82 251 216.1 -13.9 227.3 -9.5 1965--69 4 30.3 158 369 389.8 5.6 399.5 8.3 1970-74 3 14.0 170 609 567.9 -6.8 622 .3 2.2 p = parti cular value of th e cumul ative number of journals, n , arbitraril y selected = n . S = es tim ated cumulative numbe r of pape rs, R( p ), corre sponding to n = p . 111 q = c umulati ve numbe r of journals, n , required to suppl y 2S pape rs. Estimated - Observed Perce ntage e rror (for both graphical and ve rbal) = --- 0 -bs_e_rv-ed-- X 100 (at corresponding N). for each set of citation data from C&RL and SL generally decrease in the latter years of the study, and rather dramatically for the C&RL data. Whereas Wilkinson's results plainly indi- cated that , for her data, the graphical for- mulation more closely adhered to the prac- tical situation, the present investigation re- veals no such clear-cut picture. In all seven cases for the C&RL data, the graphical for- mulation does provide smaller percentage errors for estimated values of R(N ). On the other hand, the verbal formulation provides smaller errors in five of seven cases for the SL data. For these sets of citation data, then, neither formulation appears to be dis- tinctly superior to the other. Wilkinson's study utilized complete bib- liographies for four narrow scientific fields , while the current investigation uses citation data from two widely circulating library sci- ence periodicals. The question thus arises concerning which group of data more close- ly adheres to the Bradford distribution . Since , at the present time , there is no agreed-upon percentage of error that deter- mines adherence to Bradford's law, one can state only that the smaller percentage errors indicate closer adherence. For Wilkinson's data, the errors ranged from 1 percent to slightly more than 5 percent in six of eight cases for both graphical and verbal formula- tions . The errors were much higher for the C&RL and SL data, with fewer than 50 per- cent of the cases calculated as about 10 per- cent or less . Only five of twenty-eight cases had errors of less than 5 percent. This gloomy picture is lightened, however, by a closer inspection of the percentage errors given in tables 1 and 2, which reveals an 60 I College & Research Libraries • January 1981 important fact-that the higher errors generally occurred in the early years of the study, 1940 through 1954. This is particular- ly evident in the C&RL data. After 1955, in six of eight cases, the C&RL data had errors of less than 10 percent. This obvious trend toward smaller errors in the latter years of the study is not present in the SL data. In- stead, smaller percentage errors exist for both graphical and verbal formulations of SL data , with ten of fourteen cases having errors of about 10 percent or less. Thus, while the percentage errors for both for- mulations of Bradford's law and for both li- brary journals overall were not as small as Wilkinson's, the majority are in the range of acceptability for the latter years of the study. CONCLUSIONS The purpose of this study was threefold: (1) to ascertain whether the graphical or verbal formulation of Bradford's law more closely adhered to the practical situation when applied to citation data; (2) to test the appropriateness of applying Bradford's law to works in broad subject fields such as li- brary science rather than to data from nar- row scientific fields ; and (3) to identify any trends in adherence to the Bradford dis- tribution that occurred in popular library -'literature during the thirty-five-year span, 1940 through 1974. The results were mixed to such an extent that few firm conclusions can be reached. For the data analyzed, neither verbal nor graphical formulation of Bradford's law pro- vided strong enough evidence to indicate its superiority for use with citations. While the graphical formulation provided smaller per- centage errors for the citations from C&RL in all seven cases, the verbal formulation did so for the SL data in five of seven cases. Thus·, the obvious conclusion is that further testing of citation data is needed, with the stipulation that the same methodology and formulas be used . The appropriateness of Bradford's law as a test of data from a broad subject field like library science is another question ad- dressed in this investigation. The evidence supports a positive response. In almost all cases. for the SL data, the percentage errors were not so high that either the data or the test could be called invalid. The reasons for the consistently closer adherence of SL cita- tions to Bradford's law are not known. It may have been that SL had a narrower sub- ject coverage than did C&RL. On the other hand, the very large errors observed in the C&RL data for the first three time spans, 1940 through 1954, indicate that either the test is inappropriate or the data are invalid. The latter reason seems to be correct since much smaller and more acceptable errors are observed after 1954. This fact may indi- cate that library literature as reflected in C&RL and SL changed or, at the very least, began to change into something more sub- stantive and more worthy than "garbage ." Regardless of the reasons for the behavior of the citation data from C&RL and SL, however, the overall evidence indicates that Bradford's law can be applied with a rela- tively high degree of confidence to data from sources not generally considered to be "narrow" or "scientific"-sources such as li- brary science periodicals. REFERENCES 1. William Katz, " Publications ," Drexel Library Quarterly 3:176 (1967). 2 . Eric Moon, "The Library Press ," Library journal 94:4104--9 (1969). 3. S. C. Bradford, Documentation (Washington , D.C . : Public Affairs Press , 1950). 4 . See, for example , W . Brace , " A Citation Analysis of Doctoral Dissertations in Library and Information Science, 1961-1970" (Ph.D. diss. , Case Western Reserve Univ. , 1975). 5. S. Bulick, "Book Use as a Bradford-Zipf Phe- nomenon ," College & Research Libraries 39:215-19 (1978) . 6 . E. A. Wilkinson, "The Ambiguity of Brad- ford ' s Law, " journal of Documentation 28:122-30 (1972). 7. Bradford, Documentation , p.ll6. 8. G . K. Zipf, Human Behavior and the Princi- ple of Least Effort (Reading, Mass.: Addison- Wesley, 1949). 9. 0. V. Groos, " Bradford ' s Law and the Keenan-Atherton Data, " American Docu- mentation 18:46 (1967). 10. B. C. Brookes, "The Derivation and Applica- tion of the Bradford-Zipf Distribution, " jour- nal of Documentation 24:247-65 (1968) . 11. P. Praunlich and M. Kroll, "Bradford's Dis- tribution : A New Formulatioh," journal of the American Society for Information Science 29:51-55 (1978). 12. B. C. Vickery, "Bradford's Law of Scatter- ing," journal of Documentation 4:198-203 (1948). 13. F. F. Leimkuhler, "The Bradford Distribu- tion ," journal of Documentation 23:197-207 (1967). r; '~I Application of Bradford's Law I 61 14. Brookes, "Derivation and Application. " 15. Wilkinson , " The Ambiguity of Bradford' s Law," p.125-26. 16. G . S. Cline, "A Bibliometric Study of Two Selected Journals in Library Science, 1940-- 1974" (D.L.S . diss. , Univ. of Southern Cali- fornia , 1978). 17. H . East and A. Weyman , "A Study in the Source Literature of Plasma Physics," Aslib Proceedings 21:160--71 (1969). OUR NEW LOOK FOR OUR 100th BIRTHDAY! Yes, we've changed our look to celebrate our Centennial. But the fact that Faxon provides you with reliable subscription services for your library will never change, nor should it. What else is new at Faxon? New state of the art technology. More personal attention. Faster, more accurate services. One thing remains unchanged. Faxon is your most reliable source for over 115,000 foreign and domestic titles. F. W. Faxon Company, Inc. 15 Southwest Park, Westwood, Mass. 02090 Tel: 800-225-6055 (toll-free) 617-329-3350 (collect in Mass. and Canada) Ei:.rwift. economical acce.r.r to vital • • en_g1nctc!r1ng infOrmation Today, you can't afford to miss news of important technical developments in your field . Nor can you afford the time to read every publication presenting the results of vital research in engineering and related disciplines of science and management. ENGINEERING INDEX, INC. (Ei) provides swift, reliable access to this information-covering the entire engineer- ing spectrum , in a variety of forms , frequencies and fields . For nearly a century Ei has been creating abstracts (accurate representations) of material published in over 3,000 sources, and considered by its editors to be of sig- nificant interest and lasting research value . The total data base now contains over TWO MILLION abstracts-an invaluable ·source for both current and retrospective information retrieval. If original source documents are not available locally, copies of most can be obtained through Engineering Societies Library in New York City. Depending on your requirements, abstracts can be obtained ... • in printed , computer-readable (including on-line) or microform • yearly and monthly • in the specialized fields of ENERGY and BIOENGINEERING For libraries , engineering organ izations , research groups , universities , governmental agencies, etc ., the Engineering Index series of publications provide maximum technical awareness at a minimum cost in time and money. 1981 Ei Monthly: $990.00 (U.S., Canada, Mexico) ; $1,090.00 (foreign). 1980 Ei Annual: $495.00 (U.S. , Canada, Mexico) ; $545.00 (foreign) . Combination Price : $1 ,200.00 (U.S., Canada, Mexico); $1 ,320.00 (foreign) . Call or write for sample copies of Ei products- and further information : Engineering Index, Inc. 345 East 47th Street New York, New York 10017 212/644-7615 Ei's main products and derivatives • • ~nQtnctertn tncJe.x'nc