Previous Contents Next
Issues in Science and Technology Librarianship
Fall 2012
DOI:10.5062/F4ZS2TF7

URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed.

[Refereed]

How Much Is Enough? Examining Computer Science and Civil Engineering Citation Data to Inform Collection Development and Retention Decisions in Three Large Canadian University Libraries

Michelle Spence
Reference & Instruction Librarian
University of Toronto
Toronto, Ontario, Canada
michelle.spence@utoronto.ca

Tara Mawhinney
Liaison Librarian
McGill University
Montreal, Quebec, Canada
tara.mawhinney@mcgill.ca

Eugene Barsky
Science and Engineering Librarian
University of British Columbia
Vancouver, British Columbia, Canada
eugene.barsky@ubc.ca

Copyright 2012, Michelle Spence, Tara Mawhinney, and Eugene Barsky. Used with permission.

Abstract

Science and engineering libraries have an important role to play in preserving the intellectual content in research areas of the departments they serve. This study employs bibliographic data from the Web of Science database to examine how much research material is required to cover 90% of faculty citations in civil engineering and computer science. Bearing in mind the importance of access to current as well as past research, as well as the issue of space in libraries, the study evaluates citations from one year's worth of research output from faculty in three prominent Canadian universities with departments in civil engineering and computer science: University of Toronto, University of British Columbia and McGill University for the purpose of best aligning collection development activities with science and engineering research needs. The findings for all three institutions combined show that 25 years of computer science literature is needed to cover 90% of researchers' citations, whereas 30 years of materials are needed for civil engineering. We also found that the citation data is not only discipline specific, but also location specific, and a one-size-fits-all approach is not appropriate when making collections and retention decisions.

Introduction

Science and engineering libraries across North America are reaching near full capacity with regard to space for print collections. As more and more collections move to electronic formats, there is also increased pressure to use library space for study and other purposes. Within this context, the authors wanted to have collection development and retention plans in place that were based on data for our individual institutions. In light of limited space for collections, it is important to know what differences exist between sub-disciplines in science and engineering, and keep them in mind when allocating present and future space for print collections. This comparative study examines materials cited by faculty from three prominent Canadian universities across the country, University of Toronto (U of T), University of British Columbia (UBC) and McGill University (McGill), in order to best align collection development and retention plans with research needs. The universities at these institutions have libraries that specifically support the needs of science and engineering researchers: U of T's Engineering & Computer Science Library; the Science and Engineering Branch within the Irving K. Barber Learning Centre at UBC; and Schulich Library of Science and Engineering at McGill. We examined one science subject (computer science) and one engineering subject (civil engineering) with the goal of comparing differences between fields and between institutions.

Our study uses methods previously employed by Barsky at UBC to examine how far back libraries need to retain materials in order to have 90% of the citations from research papers produced by their faculty members (Barsky 2012). We hypothesized that disciplines in science and engineering would vary in the length of time research material would be required. Barsky and Butkovich's earlier studies conclude that mathematics is a field where research materials need to be kept for at least 40 years in order to have 90% of what faculty are citing (Barsky 2012; Butkovich 2010). We suspected that faculty in civil engineering and computer science would be citing more recent materials than mathematics researchers. We were interested in comparing data between three different institutions to see if there were similarities and differences in the citation patterns of our respective faculty. From the Web of Science (WoS) data gathered for 2011, we were also able to calculate the average number of citations per paper in order to examine differences in citation patterns between these two disciplines and between our respective institutions. We chose WoS because it offers multidisciplinary coverage, indexes several format types such as journal articles, conference proceedings and book chapters, and is available at all our institutions.

Literature Review

The current study makes use of methods similar to those employed by Barsky (2012) and Butkovich (2010) with the advantage that we examine data from more than one institution and consider two subject areas that are not addressed in these two previous studies. The previous literature suggests that citation patterns differ significantly for various disciplines in science and engineering. Butkovich (2010) studied the fields of astronomy, chemistry, mathematics, physics, and statistics, and examined the three most recent publications from each of the faculty members at Pennsylvania State University. The fields Butkovich examined varied from 90% of citations being 20 years old or less in astronomy, to 45 years old or less in mathematics (Butkovich 2010). Barsky (2012) used a more systematic method in that rather than considering the three most recent publications from faculty members as found in WoS and on their personal web sites, he examined citations from within the same time frame (the most recent year's worth of data) from a single source (WoS) to analyze all publications from faculty members in the department of Mathematics at UBC. He found that 40 years of research materials are needed to have 90% of what faculty members are citing. The current study employs Barsky's (2012) method to examine two different fields -- civil engineering and computer science -- to determine how they compare to research in other areas.

Relating to engineering and specifically civil engineering, Curtis (2011) studied references found in a sample of journal articles from six of the top civil engineering journals, as determined in the Journal Citation Reports, and found that 90% of citations were 25 years old or less. Curtis' study gives citation patterns for civil engineering as a whole, whereas our study gives more precise data for our specific institutions, rendering it more useful for our collection development purposes than the Curtis study. There are advantages and disadvantages to each method of conducting citation analysis, whether general for a subject area or specific to a certain institution. As Curtis notes when discussing findings from previous research on civil engineering theses and dissertations, the authors' use of data gathered at their individual institutions renders "their studies valuable for their particular institutions, but raises questions as to whether results found there are applicable to academic literature on the discipline more generally" (Curtis 2011). A similar study to Curtis (2011) was conducted by Musser and Conkling (1996) who did not limit their study to civil engineering but instead took a general sample of engineering journals, one from each of 16 sub-disciplines in engineering. They studied 4,780 references from 212 articles and noted that "ninety percent of the references were less than twenty-six years old" (Musser & Conkling 1996) with large variations depending on the format type of the specific reference cited. Like Curtis (2011), the advantage of the Musser and Conkling study is being able to develop general citation patterns for engineering as a whole. However, it does not provide specific details about sub-disciplines of engineering, such as civil engineering or mechanical engineering, that may have unique citation patterns. The current study's approach has the advantages of gathering local-level data which is useful for making individual collection development decisions, analyzing two specific disciplines within science and engineering for differences in citation patterns, and comparing data across three institutions to determine if broader trends in citation patterns emerge.

Another way the current study contributes to the field is by focusing on faculty publications in civil engineering and computer science. Many past studies conduct citation analysis of student theses and dissertations in engineering, specifically civil engineering (Gadd et al. 2010; Kirkwood 2009; Williams & Fletcher 2006; Fuchs et al. 2006), and one study compares theses and dissertations from several branches of engineering as well as computer science (Eckel 2009). The findings from civil engineering journals differ somewhat from those of researchers studying civil engineering theses and dissertations, such as Gadd, Kirkwood, Williams and Fletcher, Fuchs and Eckel. Williams and Fletcher's (2006) study of 250 masters' theses in various fields of engineering submitted from 2000-2004 at Mississippi State University determined that 80% of references were 15 years or younger for all fields of engineering and 14 years or younger for civil engineering. Eckel's (2009) study of 120 computer science and engineering theses and dissertations between 2002-2006 from Western Michigan University's College of Engineering and Applied Sciences found that doctoral students make use of older sources than do masters' students. He hypothesized that expectations of completing a comprehensive literature review and demonstrating a thorough knowledge of the research literature in their field would be higher for doctoral students than masters' students, which would lead to doctoral students using older sources (Eckel 2009). One might presume that faculty would possess an even greater familiarity with the research in their respective fields and, as a result, make use of even older sources than their students do, and previous research suggests that this appears to be the case for faculty in civil engineering (Curtis 2011).

In conducting citation analysis of faculty publications in computer science for collection development and retention purposes, the current study focuses on an area where there is little previous research. In his study of 382 computing journals from the Journal Citation Reports, Sjøberg (2010) found a citing half-life of 7.5 years for computer science, or in other words that "half of the citations are more than seven years old" (Sjøberg 2010). When compared to other fields, he found that computer science journals have a slightly shorter citing half-life than engineering journals (Sjøberg 2010), and he concludes that, despite misconceptions, "research in computing does not become obsolete more quickly than research in other disciplines"(Sjøberg 2010). The current study builds on previous work by gathering specific data on how many years of computer science literature is needed in order to retain 90% of what faculty are citing, and comparing it to citation patterns of faculty in civil engineering.

Methods

A number of studies have used as a research question: what does the library need to satisfy 90% of user's needs (Basile & Smith 1970; Butkovich 2010)? While it is quite understandable that the median age for physical science and, especially, engineering references would be relatively recent, a much larger date range is required to fill 90% of cited references (Butkovich 2010).

For a relative ease of quantitative analysis, we have decided to work with the most comprehensive sets of one year's worth of research output from our appropriate departments and institutions -- Civil Engineering and Computer Science in McGill University, UBC and U of T. We retrieved all papers published by the appropriate departments and institutions in 2011 and indexed in the WoS database (see an example for the Civil Engineering search for the University of British Columbia in Figure 1). The following collections are included in our WoS subscription -- Science Citation Index Expanded (SCI-EXPANDED), 1899-present; Social Sciences Citation Index (SSCI), 1956-present; Arts & Humanities Citation Index (A&HCI), 1975-present; Conference Proceedings Citation Index-Science (CPCI-S), 1990-present; and Conference Proceedings Citation Index-Social Science & Humanities (CPCI-SSH), 1990-present. All searches were performed during February 2012.


Figure 1: An example of the Civil Engineering search for the University of British Columbia publications in 2011

We have exported the data from the WoS database into text files for each of the two disciplines at each institution. We then imported the text files into Microsoft Excel and used the CR fields (citations from articles) for our analysis. Using Microsoft Word, we delimited the fields of each reference so that we could have them display in separate columns in Excel. Then we used Excel functions, e.g. =FLOOR(C2,5)/5, to separate the citations into 5-year intervals.

Results and Discussion

We found that there was a median age of eight years and an average of 12 years for the combined civil engineering data for all three universities. The resources ranged in age from 0-124 years of age, the oldest resource cited being Voigt's Theoretische Studien über die Elasticitätsverhältnisse der Krystalle. Table 1 shows the combined citation data distributions broken into five-year intervals.

Table 1: Civil engineering age distributions, percentages and cumulative percentages for UBC, McGill and U of T combined

5-year distributions Number of citations Percent Cumulative percent
0-4 years 1,364 27% 27%
5-9 years 1,451 29% 56%
10-14 years 845 17% 73%
15-19 years 448 9% 82%
20-24 years 265 5% 87%
25-29 years 172 3% 90%
30-34 years 148 3% 93%
35-39 years 96 2% 95%
40-44 years 53 1% 96%
45-49 years 52 1% 97%
More than 50 years 87 2% 99%
No date 26 1% 100%
Total 5,007 100% 100%

We can see from Table 1 that although it would only take 10 years to cover 56% of the cited literature, it would take approximately 30 years to cover 90%.

The computer science data tells a slightly different story. Despite the larger range of cited works compared to civil engineering (0-306 years of age, the oldest resource cited being Leibniz's New Essays on Human Understanding), the median age was only seven years, and the average age 10.6 years for the combined data from the three universities. Table 2 shows the combined computer science citation data distributions broken into 5-year intervals.

Table 2: Computer science age distributions, percentages and cumulative percentages for UBC, McGill and U of T combined

5-year distributions Number of citations Percent Cumulative percent
0-4 years 2,618 32% 32%
5-9 years 2,581 32% 64%
10-14 years 1,244 15% 79%
15-19 years 560 7% 86%
20-24 years 341 4% 90%
25-29 years 198 2% 92%
30-34 years 171 2% 94%
35-39 years 112 1% 95%
40-44 years 69 1% 96%
45-49 years 62 1% 97%
More than 50 years 134 2% 99%
No date 80 1% 100%
Total 8,170 100% 100%

We can see from Table 2 that there was also a difference in the length of time it took to cover 90% of the cited literature in computer science, as compared to civil engineering. At 10 years, over 64% of the citations have been covered, and 90% of the citations are covered in 25 years.

These results also differ from those found by Barsky (2012) who reported on the field of mathematics, and Butkovich (2010) and Williams and Fletcher (2006) who reported on a variety of different disciplines. This demonstrates that the citation data is discipline specific, and a one-size-fits-all approach with regard to different disciplines is not appropriate when making collections and retention decisions.

The data can be further broken out by a specific university. Table 3 shows the median and average age as well as the number of years it takes to cover 90% of the literature for the civil engineering citations, while Table 4 shows the median and average age as well as the number of years it takes to cover 90% of the citations for the computer science citations.

Table 3: Median age, average age, and number of years to cover 90% of citations, in years, by institution for civil engineering citations

Institution Median Average Number of years to cover 90% of citations
UBC 10 13.9 32
McGill 8 12.5 29
U of T 7 10.5 23

Table 4: Median age, average age, and number of years to cover 90% of citations, in years, by institution for computer science citations

Institution Median Average Years when 90% of citations covered
UBC 7 9.5 20
McGill 8 12.3 30
U of T 7 10 22

The differences between our three institutions in both the civil engineering and computer science citations demonstrate that a one-size-fits-all approach should not apply to collections and retention decisions for different institutions as there are not only differences between disciplines, but also differences across institutions when considering the same discipline. A more inclusive approach to collection development and retention is necessary, since generalizations cannot be made based on the current data.

We also analyzed the number of citations per article for each university, and overall for both civil engineering and computer science. The results are displayed in Tables 5 and 6.

Table 5: Number of civil engineering citations in each article, by institution and total

Institution Articles Citations Citations/article
UBC 47 1,512 32.2
McGill 38 1,269 33.4
U of T 71 2,227 31.4
Total 156 5,008 32.1

Table 6: Number of computer science citations in each article, by institution and total

Institution Articles Citations Citations/article
UBC 57 1,872 32.8
McGill 75 2,452 32.7
U of T 102 3,857 37.8
Total 234 8,181 35

The largest outlier is the U of T citations per article in computer science, which was 15% more than the data for UBC and McGill. Overall, our results for the number of citations per article for both civil engineering and computer science are higher than previous reported numbers. For example, Curtis (2011) reported 27.81 citations per article for civil engineering articles taken in May 2008 from journals ranked highly in Journal Citation Reports. Musser and Conkling (1996) reported 22.55 citations per article for all engineering disciplines in scholarly journal articles. More research into the number of citations per article is needed to determine if the difference is based on location, institution, changes over time (i.e. are faculty members citing more articles in their research than they were previously, as Curtis (2011) indicates?) or other factors.

Conclusions

In this study, we wanted to see the age of citations that researchers in three prominent academic institutions in Canada are using to author their papers. Within this context, we wanted to consider what implications the age of citations would have for collection development and retention plans within our individual institutions. For instance, the data gathered for UBC will be used to determine which materials from civil engineering and computer science will be kept in the library and which will be moved to off-site storage. We were interested in what differences exist between sub-disciplines in science and applied sciences and keep them in mind when allocating present and future space for print collections.

We hypothesized that disciplines in science and engineering would vary in the length of time research material would be required. Barsky and Butkovich's earlier studies conclude that mathematics is a field where research materials need to be kept for at least 40 years in order to have 90% of what faculty are citing (Barsky 2012; Butkovich 2010). We suspected that faculty in civil engineering and computer science would be citing more recent materials than mathematics researchers. We were interested in comparing data between three prominent Canadian institutions to see if there were similarities and differences in the citation patterns of our respective faculty. Using WoS data gathered for 2011, we were also able to calculate the average number of citations per paper in order to examine differences in citation patterns between these two disciplines and between our respective institutions.

We could see that civil engineering and computer science researchers in our three institutions use significantly newer research than was reported in the literature for mathematics (Barsky 2012). Our findings for civil engineering are quite similar to previous findings in engineering, although the studies from Curtis and Musser and Conkling show that 90% of citations are approximately 25 years old or less and the current study shows that 90% of citations are 30 years old or less. As for computer science literature, we saw that overall we need 25 years of computer science literature to cover 90% of the materials that our researchers are citing. These findings are good news in that they provide evidence that we may be able to move collections older than 30 years to off-site and/or storage locations and thus allocate space for other purposes in our three institutions, while at the same time continue to meet the research needs of faculty. We could also see that there are differences in citations patterns between our institutions, which reflect the unique characteristics and foci of the programs in each university. When comparing the number of citations per article from the three institutions, we see differences in the data gathered in this study between our institutions, and also differences from data gathered in other studies. This demonstrates that the citation data is not only discipline specific, but also location specific, and a one-size-fits-all approach is not appropriate when making collections and retention decisions.

One limitation of the current study is that we do not differentiate between the format type of citations (e.g., journal articles, books, book chapters, conference proceedings, etc.), although the previous literature suggests that there are significant differences in the length of time different format types are cited by authors. For instance, Eckel (2009) found that the average age of conference papers cited in theses is much shorter than the average age of monographs. This is one area to consider for future research.

We are happy to report that the methods we have employed is easy enough to implement for any institution and is not a very time-consuming exercise. In future studies, we hope to supplement the current data with additional sources of data such as detailed analysis of theses and dissertations, a detailed analysis of where our own faculty members are publishing, OCLC collection analysis tools to compare collections, and electronic journal usage statistics in order to further examine our diverse users' needs.

References

Barsky, E. 2012. Four decades of materials are used by researchers in mathematics: evaluating citations' age and publication types in mathematical research. Science & Technology Libraries 31(3): 315-319. Accessible at: http://www.tandfonline.com/doi/abs/10.1080/0194262X.2012.705139

Basile, V.A. & Smith, R.W. 1970. Evolving the 90 percent pharmaceutical library. Special Libraries 61(2):81-86.

Butkovich, N.J. 2010. How much space does a library need?: justifying collections space in an electronic age. Issues in Science and Technology Librarianship (62). [Internet]. [Cited June 13, 2012]. Available from http://www.istl.org/10-summer/refereed1.html

Curtis, S.A. 2011. Informing collection development through citation examination of the civil engineering research literature. In: ASEE Annual Conference and Exposition, Conference Proceedings. [Internet]. [Cited September 12, 2012]. Available from: http://www.asee.org/public/conferences/1/papers/48/view

Eckel, E.J. 2009. The emerging engineering scholar: a citation analysis of theses and dissertations at Western Michigan University. Issues in Science and Technology Librarianship (56). [Internet]. [Cited June 13, 2012]. Available from http://www.istl.org/09-winter/refereed.html

Fuchs, B.E., Thomsen, C.M., Bias, R.G. & Davis Jr, D.G. 2006. Behavioral citation analysis: toward collection enhancement for users. College and Research Libraries 67(4): 304-324.

Gadd, E., Baldwin, A. & Norris, M. 2010. The citation behaviour of civil engineering students. Journal of Information Literacy 4(2):37-49.

Kirkwood, P. 2009. Using engineering theses and dissertations to inform collection development decisions especially in civil engineering. In: ASEE Annual Conference and Exposition, Conference Proceedings. [Internet]. [Cited September 12, 2012]. Available from: {https://peer.asee.org/4570}

Musser, L.R. & Conkling, T.W. 1996. Characteristics of engineering citations. Science and Technology Libraries 15(4):41-49.

Sjøberg, D.I.K. 2010. Confronting the myth of rapid obsolescence in computing research. Communications of the ACM 53(9):62-67.

Williams, V.K. & Fletcher, C.L. 2006. Materials used by master's students in engineering and implications for collection development: a citation analysis. Issues in Science & Technology Librarianship (45). [Internet]. [Cited June 13, 2012]. Available from http://www.istl.org/06-winter/refereed1.html

Previous Contents Next

W3C 4.0   Checked!