Using Data Visualization to Examine an Academic Library Collection


765

Using Data Visualization to Examine 
an Academic Library Collection

Jannette L. Finch and Angela R. Flenner

Jannette L. Finch is Librarian and Angela R. Flenner is Systems Librarian at College of Charleston; e-mail: 
finchj@cofc.edu, flennera@cofc.edu. Jannette L. Finch is interested in information design and the effect of 
technology on student learning, online learning and teaching, effective teaching through experiential learning 
activities, visualizing data, and assessment and planning. Angela R. Flenner is interested in interoperability 
of data among proprietary and open-source systems and using metadata to improve access and preservation 
of library resources. The authors wish to thank Katina Strauch, MLS, Assistant Dean for Technical Services 
and Collection Development, Addlestone Library, College of Charleston, and Caroline Hunt, PhD, Professor 
Emerita, English Department, College of Charleston, for their valuable suggestions. © 2016 Jannette L. Finch and 
Angela R. Flenner, Attribution-NonCommercial (http://creativecommons.org/licenses/by-nc/3.0/) CC BY-NC.

The authors generated data visualizations to compare sections of the 
library book collection, expenditures in those areas, student enrollment in 
majors and minors, and number of courses. The visualizations resulting 
from the entered data provide an excellent starting point for conversa-
tions about possible imbalances in the collection and point to areas that 
are either more developed or less developed than is needed to support 
the major and minor areas of study at the university. The methodology 
used should offer a template to follow for others wishing to examine their 
collection and may prove valuable for adjusting expenditures, suggesting 
service opportunities or for marketing pieces of the collection that had 
been hidden before graphical analysis.

athering and displaying data in visual representations helps inform the 
brain faster and more effectively than reading textual lines of information. 
“One picture is worth a thousand words” is the cliché we use to describe 
this phenomenon. The classic works of both Edward Tufte and of Informa-

tion Science professor and scientist Katy Börner provide beautiful examples of what 
excellent design principles applied to information and data may graphically reveal. 

Visualizations provide “overviews about general patterns and trends” and allow 
discovery of “hidden structures.”1 

Edward Tufte, professor emeritus of Yale University and a well-known advocate 
and creator of elegant graphical display of complex data, explains that “graphics reveal 
data.” Tufte asserts that the most “effective way to describe, explore, and summarize 
a set of numbers is to look at pictures of those numbers.”2 

Used in libraries, information gathered into graphical impressions can reveal pat-
terns hidden in lines of text. Xu et al. remind us that, “[i]n the context of large-scale 
and heterogeneous collections, the different layers of information cannot be easily 
comprehended if presented linearly and sequentially, and there is a risk of getting 
buried in details or lost in generalities.”3

doi:10.5860/crl.77.6.765 crl15-833


766  College & Research Libraries November 2016

Visualizations of library data have been used to:
• reveal relationships among subject areas for users.
• illuminate circulation patterns.
• suggest titles for weeding.
• analyze citations and map scholarly communications.
Future emphasis, as suggested by Eden, could be in replicating whole libraries in 

3D printouts, making predictions of growth and space easier to visualize.4

Definition of Terms
As defined by Börner et al., the broad concept of visualization “refers to the design 
of the visual appearance of data objects and their relationships.” Börner explains that 
well-designed visualizations improve our interaction with large volumes of data, pro-
viding comprehension, understanding and “revealing relations otherwise not noticed.”5

To decide what kind of graphical display is appropriate to reveal the data analyzed 
in this study, the authors used categorizations suggested by Börner and Polley in their 
2014 text, Visual Insights: A Practical Guide to Making Sense of Data. Börner and Polley 
suggest units of analysis as Meso/Local, containing 101 to 10,000 records. The units of 
analysis examined for this study are the purchases for one year, the number of courses 
offered in each major and minor, student enrollment for one year, circulation since 
purchase, and the expenditures for books supporting each department. Each unit of 
data analyzed can be described as topical, asking “what.”6

• What is the number of courses offered in each major and minor? 
• What is expended in each subject area? 
• What is the size of the physical collection in each subject area?
• What is student enrollment in each area?
• What is the circulation in specific areas for one year?
Börner and Polley describe a graph as the most common visualization used to 

examine Meso/Local topical data. Within the context of graphing visualizations, we 
display the results as circular visualizations. Further explanation of the visualizations 
can be found in the methodology section.

Literature Review: Collection Building and Budgeting
The library in this study supports a liberal arts and sciences curriculum of undergradu-
ate and limited masters programs, and a student population of about 12,500. Like 
many libraries, the library that is the focus of this study does not have rigid criteria 
for ordering materials and setting budgets. A 2013 study by Catalano and Caniano 
finds that, when libraries examine the collection and expenditures, there is little formal 
rationale for allocating funds.7

Presently, for the library in this study, there is no set formula to determine budget 
except an initial expense to support new majors ($2,000) and new minors ($1,000). Once 
a year, subject liaisons are asked to justify budget increases or decreases. The Collection 
Development team also looks at past ordering and spending history. As new services 
such as patron-driven acquisition become available, some of the firm order budget is 
redirected to support those efforts. Of course, the book budget is only available after 
the serials costs are met.8 

When academic libraries use deliberate methods to allocate funds, Catalano and 
Caniano find that most libraries use the following five methods: percentage-based, 
weighted multiple-variable, factor or regression analysis, historical spending plus use 
percentage of new formula, and circulation-based statistics.9

A 2007 study by Canepi includes a thorough literature review of various methods of 
allocating funds. Canepi acknowledges that all libraries allocate funds but vary in their 


Using Data Visualization to Examine an Academic Library Collection  767

approach. Out of seventy-five different formulas used by libraries, Canepi pulls a final 
total of twenty-three formulas. The top four most frequently used factors in Canepi’s 
study are student enrollment, cost of materials, circulation use, and number of faculty.10

Other studies call for the application of “more rigorous statistical methods” to 
create a “more equitable balance across departments,” for using ROI techniques to 
assess institutional value, or for making use of objective data in decisions regarding 
collection development.11 

In Rick Anderson’s article, “Collections 2021,” Anderson states that libraries, if they 
are to survive, must rethink their collecting and service strategies in radical and pos-
sibly scary ways and to do so sooner rather than later. Anderson predicts that, in the 
next ten years, the “idea of collection” will be overhauled in favor of “dynamic access 
to a virtually unlimited flow of information products.”12

The library collection of today is changing, affected by many factors, such as demand-
driven acquisitions, access, streaming media, interdisciplinary coursework, ordering 
enthusiasm, new areas of study, political pressures, vendor changes, and the individual 
faculty member following a focused line of research. If libraries do not allocate based 
on data, there could be subjective distribution of funds, affecting the perception of 
fairness and damaging the library’s reputation on campus.13

As described by Blake and Schleper, when librarians think “more and more about 
the cost of information,” new opportunities appear based on findings grounded in 
real data analysis. Knievel, Wicht, and Connaway suggest that subject librarians may 
see opportunities in looking more closely at the relatively unexplored “intersection 
of circulation, interlibrary loan, and holdings.” Many studies are starting to examine 
using circulation and patron use data to support service, tying in with instructional 
outreach. Morrisey reminds us that collections data can inform decisions regarding 
services. Select databases that are heavily used or high-circulation areas may suggest 
a change in staffing concentrations or opportunities for outreach. Finnel et al. propose 
that reference transactions may point to scholarly conversations that are taking place 
both for students and faculty. Using data analysis on the local level may illuminate 
indicators of quality in much the same way the Leiden Ranking (http://www.leiden-
ranking.com/) indicates scientific impact and scholarly collaborations worldwide.14

Literature Review: Using Visualizations to Address Library Problems
Much of the current research concerning library data visualization efforts address 
digital library collections, most often the interface and user environment. Two major 
sources for visualizations within libraries includes the entire 2005 January and Febru-
ary issue of Library Technology Reports and Sage’s journal, Information Visualization. The 
2005 Library Technology Reports issue addresses 2D and 3D visualizations and includes 
practical applications, resources, organizations, and a short bibliography. Information 
Visualizations, published by Sage, offers many examples of data visualization crossing 
multiple subjects.

In a 1999 visualization article, Beagle defines the difference between graphical rep-
resentations of environments and knowledge visualization, which generates graphi-
cal representations of meaningful relationships among retrieved files or objects. In a 
2003 work, Beagle applies data visualization to a digital library collection to foster the 
serendipitous discovery enjoyed by many while wandering physical stacks. Beagle’s 
physical depiction of the collection based on LC subject area holdings is based on 
VisualNet, called “Scholastica,” and depicts the relative size of the holdings in each 
class. Visually available to patrons is the type of material: print book, video, or e-book.15

Also working in the area of visualizing collections, we find the work of Zang, 
Junliang, and Mostafa, who use concepts and clustering to produce graphs of what is 


768  College & Research Libraries November 2016

available in a document collection. Major subtopics appear in the document collection 
as concept clusters.16

Pousman, Stasko, and Mateas describe the emphasis on using interactive visual 
models as attempting to provide amplified cognition and “deep insight for expert user 
populations.” Along with user behavior and information seeking, many library data 
visualization studies address citation analyses.17 Included in this focus is the important 
work of Katy Börner, who is a major influence in the visualization field. Börner’s work 
within library literature concerns many areas of interest, including distinguishing pat-
terns in scientific communication through citation analysis.

Other research that diverges from digital collections to analyze the physical library 
collection focuses on usage statistics and collection analysis. Lima describes how student 
Syed Reza Ali mapped transaction data from the Seattle Public Library to illuminate 
circulation trends. As reported by Brown and Stowers, knowledge gained from visu-
alizing the physical collection is used most often to support assessment, decisions on 
cancellations, and proposals about which items to store remotely or to weed.18

A few studies use the term “mapping” the collection. Bailey suggests analyzing a 
collection by constructing a matrix of prominent authors, keywords, and public figures 
within a particular subject area.19

The visualizations produced in this study will provide a snapshot of the current 
collection, with room for further analysis as gaps appear. The authors hope to gain 
insight through looking at graphical representation of the number of physical books 
and a small number of e-books purchased in a single year in each collection area and 
expenditures in those areas, compared to the number of course hours offered, which 
reflect the number of students enrolled.

This study’s primary focus is not on circulation numbers, although the authors 
provide some visualizations of circulation for one year, compared with expenditures 
and student enrollment. There are many variables in examining circulation, which 
may offer opportunity for a separate study. As stated by Bradford, 

Very often, a circ is not just a circ. Does that number include renewals or is it just 
first-time circulation? Those numbers can be, and often are, significantly different. 
Are you comparing items with different loan lengths? If your DVDs circulate for 
three days or one week, take that into account when comparing them to books 
that may circulate for three or four weeks.20 

Literature Review: Tools
There are myriad tools available, described in the literature in the context in which they 
are used. An entire issue of Library Journal (March 2005) names tools for text cluster-
ing, topical browsing, and information mapping. In citation analysis studies, Dunne 
et al. name other visualization tools such as CiteSpace, Network Workbench, and the 
SocialAction Network analysis tool. For visualizations and chart making, Chapman and 
Woodbury describe open source products Protovis (http://vis.stanford.edu/protovis/), 
Highcharts (www.highcharts.com), Google Chart API (http://code.google.com/apis/
chart/), and Microsoft Excel.21 Other major players in the field and inspiring examples 
can be found on the site www.infovis.net.

Some tools are bundled with library products already owned. For example, Watters 
reports that WorldCat has an Identity Map that can be used for relationships among 
subjects, authors, and characters. Bradford offers tools for collection analysis that are 
bundled with other common library products: collectionHQ from Baker & Taylor, 
Decision Center from Innovative Interfaces, Inc., and Intota Assessment.22 


Using Data Visualization to Examine an Academic Library Collection  769

Word Clouds appear in the literature23 as analyzing social media but, in a more 
limited library setting, could be used to examine user searching behavior. 

A study by Zang and Mostafa focuses on semantic relationships between words 
and describes the concept of a digital library. In the literature, there are many studies 
addressing visual interfaces and digital libraries, an area that is outside the scope of 
this study. However, looking at studies like Xu et al. suggest even more tools to use 
beyond the scope of digital collections.24 

Exhaustive lists of data visualization tools include:
• the DIRT Directory (http://dirtdirectory.org/categories/visualization) 
• Kathy Schrock’s educating through infographics (www.schrockguide.net/

infographics-as-an-assessment.html)
• Dataviz list of online tools (www.improving-visualisation.org/case-studies/id=5)
Visualization tools explored for this study include Plotly, Microsoft Excel, Python 

programming language, and D3.js, a javascript library for creating documents based 
on data.25 Because the process should be easily replicated without special knowledge 
of programming language, the authors generated some visualizations using Tableau 
Public©, which is freely available. Tableau charts are easily customizable using drag-
and-drop, which allows flexible and intuitive generating of data. Tableau accepts both 
text and Excel files. 

Plotly, a free online data visualization tool, was explored with limited success. The 
need to know Python programming is the disadvantage in using Plotly. The advantage 
of using Plotly is the interactive visualizations that result, making engagement with the 
data very dynamic. Plotly is also social: the program is web-based, and visualizations 
may be shared among the community for insight and feedback.

In the end, the authors found most success generating data bubbles using Microsoft 
Excel (version 2010), which is probably familiar to the widest audience and requires no 
special programming skills. In using Excel, the authors could plot multiple variables in 
various combinations: department name; books purchased within a year, expenditures, 
course hours, student enrollment, and circulation since purchase. An excellent tutorial 
by Eugene O’Loughlin, National College of Ireland, is very helpful in composing the 
charts and is found here: https://youtu.be/4FyImh2G7N0.26

Methodology
For this study, data on the number of course hours by major and undergraduate en-
rollment by major was retrieved from the institution’s Office of Institutional Research, 
Planning, and Information Management website. The authors chose to include semes-
ters for one academic year: 2013–2014.

The input data on purchases is found in the library catalog and from records held 
by the Collection Development department. Collection Development provided a list of 
titles purchased from firm order, approval plan, and demand-driven acquisition (DDA) 
budgets, separated by fund code. The authors totaled firm order, approval plan, and 
DDA by fund code to get a total number of purchases by fund for Fiscal Year (FY) 2014. 

It is important to note that the numbers provided by the Collection Development 
department includes both print and a small number of DDA books. The e-book col-
lection included is DDA and firm orders purchased for perpetual access. The number 
is very small, too insignificant to skew the numbers. 

Not included in the study are e-books purchased as part of a large subscription 
package such as e-brary or EBSCO e-books. The institution subscribes to seven dif-
ferent platforms with major e-book holdings and about six more with smaller e-book 
holdings. E-book collections cover many subject areas. It was thought that examining 
the e-books collections by subject would not add to this study, for several reasons. E-


770  College & Research Libraries November 2016

book use is calculated primarily at the chapter or page level rather than the title level, 
some books allow full-title downloads while others do not, and download statistics are 
only generated if the entire book is downloaded, leaving out partial viewings. At this 
time, e-book usage is calculated so differently that it was not included in this study.27 

The purchase data also excludes databases and journals, which are purchased from 
a different budget and are often interdisciplinary in subject. 

Aligning library fund codes with the majors offered, the authors compiled the data 
into one large Excel spreadsheet. The authors manually entered the corresponding 
fund code in the column next to the major, then used Excel’s VLOOKUP function to 
bring the data into one sheet. A few special discretionary funds were excluded because 
they did not correspond to a major. These exclusions were very small funds, less than 
1 percent of the total purchases. The figure for Expenditures is taken from the total 
firm orders, books from the approval plan, and DDA. DDA purchases were a pilot 
program in 2014 and comprised a small part of the total purchases.

Three lists were generated to get raw data: 
• List 1 includes firm book order records with paid date covering FY 2013–2014. 

No fund was specified. 
• List 2 includes all bibliographic records attached to the List 1 orders. 
• List 3 is composed of all items attached to the bibliographic records from List 

2, limited to item type = books and status = available.
Exported from List 1 is bibliographic record number, fund code, and price. Exported 

from List 3 is bibliographic record number and total circulation. Both text files gener-
ated were imported into Excel, then combined using the VLOOKUP function to pull 
the circulation figures into a column in List 1. A pivot table was used to summarize 
the data, as seen in figure 1. The values from the pivot table were copied into a new 
Excel sheet for editing. The export initially using Tableau was performed several 
times, as the authors encountered varied results that occurred due to items held by 
Special Collections, which are in-house only and don’t circulate. Another variable that 
muddies an accurate grasp of expenditures is the fact that some disciplines buy fewer, 
more expensive books, while others purchase inexpensive books in larger quantities.

The authors looked at the data for 2013 first, then compared data from 2014 using 
the same methods to see if similar patterns emerged.

FIGURE 1
Pivot Table Summarizes Data


Using Data Visualization to Examine an Academic Library Collection  771

Some departments didn’t align with fund codes, so that represents some extra work 
in areas such as Environmental Geoscience and Astronomy. It could be that other dis-
ciplines, like Environmental Studies, are close enough to provide adequate coverage, 
but that opens up another area for research. 

In other cases, to simplify the data bubbles, the authors chose to limit. For example, 
course hours for BioChemistry were missing, so that department was not included. 
Any omissions can be explored more thoroughly in the future.

Findings/Discussion
By looking at the data, more questions are revealed, much like archaeological excavation, 
good detective work, or the research process. The three-dimensional data bubble visual-
izations offer a starting point for discussing the collections in support of the curriculum 
and what is expended in each area. The visualizations provide greater comprehension 
than the two-dimensional “flatland” of the spreadsheets, in which valuable questions 
and insights are lost in the columns and rows of data. A screenshot of a portion of the 
Excel spreadsheet containing library fund codes, course hours, books purchased, per-
centages, expenditures, and enrollment in each discipline is seen in figure 2.28 

Using data visualization instead of a spreadsheet, figure 3 offers a much more vibrant 
depiction of books purchased within one year, expenditures and course hours for most 
of the schools. The data bubbles are easy to understand at a glance. A large school not 
included in figure 3 is the School of Education, Health and Human Performance, because 
of Excel limitations of displaying that much data in one chart. The School of Education, 
Health and Human Performance is compared with the School of the Arts in figure 7.

FIGURE 2
Screenshot of an Excel Spreadsheet Containing Library Fund Codes, Course 

Hours, Books Purchased, Percentages, Expenditures and Enrollment 


772  College & Research Libraries November 2016

Figure 4 offers several visualizations that ignite opportunities for discussion. For 
example, math’s course hours are huge, which were unexpected, although explained 
by high enrollments for required general education requirements and increasing 
math and statistics requirements from other courses.29 Math’s small physical collec-

FIGURE 3
Data Bubbles Representing Number of Books Purchased, X-Axis  

Showing Expenditures & Y-Axis is Course Hours for School of Humanities 
& Social Sciences, School of Sciences & Mathematics, School of Business, 

and School of the Arts, 2013–2014

FIGURE 4
Data Bubbles Representing Number of Books Purchased, X-Axis Showing 
Expenditures & Y-Axis is Course Hours for School of Humanities & Social 

Sciences and School of Sciences & Mathematics, 2013–2014


Using Data Visualization to Examine an Academic Library Collection  773

tion is probably typical of many universities. However, a closer look at support for 
that department is warranted. The undergraduates fulfilling requirements may not 
be conducting research, but do the faculty teaching the high number of classes need 
additional support or specialized databases for their research?

Further study of figure 4 suggests that communication and psychology may benefit 
from discussion about an increase in book budget. They both have small collections and 
fewer expenditures but are large majors. On the other hand, religious studies has a large 
collection but low course numbers and low expenditures. What is the reason? Active 
ordering? Bargain books? Increased communication with the library and departmental 
liaisons and with the Collection Development team is needed to answer these questions.

In figure 4, there is a small bubble near the origin that is only partially shown, with 
a small budget and no course hours. This bubble represents the special discretionary 
fund of a faculty member with relatively narrow and rare research interests. This 
scholar has ordered twenty books, but their collection is growing and represents a 
unique niche the library can advertise to other scholars conducting similar research. 
As Steele suggests, data visualization is a useful tool to reveal these collection oddities 
and perhaps provide a marketing opportunity for libraries. The library could highlight 
the collection for Interlibrary Loan or begin a miniconference for visiting scholars.30

Other topics of interest illustrated in figure 4 are the large collections for English and 
history. These areas don’t have the largest enrollments or course hours, but they have 
huge collections and healthy expenditures. When university budgets are threatened 
and shrinking, a look at circulation statistics can justify the numbers in these two areas 
if there is any challenge.

As seen in figure 4 and figure 5, any area in the top left quadrant of the charts needs 
review. Are the collection needs of the subjects that have high course hours and healthy 
student enrollments being met? If not by expenditures, then are there other resources 
not included in the figures? Could funds from areas with high expenditures but low 
course hours be redirected to support low budget departments?

FIGURE 5
Data Bubbles Representing Number of Books Purchased, X-Axis  

Showing Expenditures & Y-Axis is Course Hours for School of Business  
and School of the Arts, 2013–2014


774  College & Research Libraries November 2016

Also of interest are circulation numbers. In figure 7 and figure 8, the size of the data 
bubbles represent student enrollment and physical book circulation since purchase 
is graphed with expenditures. Again, it is easy to imagine conversations taking place 
about the significance of departments that have healthy student enrollment, robust 
circulation, but small expenditures, or conversely, areas in which healthy expenditures 

FIGURE 6
Data Bubbles Representing Number of Books Purchased, X-Axis Showing 
Expenditures & Y-Axis is Course Hours for School of Education, Health, 

and Human Performance and School of the Arts, 2013–2014

FIGURE 7
Data Bubbles Representing Student Enrollment by Discipline, X-Axis 
Showing Expenditures & Y-Axis is Book Circulation Since Purchase  

for School of Humanities & Social Sciences and School of  
Sciences & Mathematics, 2013–2014


Using Data Visualization to Examine an Academic Library Collection  775

are occurring, with very little circulation of materials and low student enrollment. In 
figure 8, the data bubbles are too overlaid when scaled to 100 to be of much use, but 
they still provide important clues about the appropriateness of the collection. Right 
away, we can visualize economics and finance, accounting and film studies as outli-
ers that need discussion and possible attention. Using Excel, the data fields may be 
adjusted or changed on the fly during a meeting to foster meaningful conversation 
about implications.

By looking at data visualized in different combinations, library collection develop-
ment teams can clearly compare important considerations in collection management: 
expenditures and purchases, circulation, student enrollment, and course hours. Library 
staff and administrators can make funding decisions or begin dialog based on data 
free from political pressure or from the influence of the squeakiest wheel in a depart-
ment. The visual depiction of information revealed in data bubbles represents an 
opportunity for conversation among collection development teams, subject liaisons, 
and other interested parties.

Implications for Future Research
Future research areas call for experimenting with different data visualizations using 
alternate tools or in additional areas. An obvious first step is to try to compare the size 
of the book collection for each area beyond purchases for a single year. While looking 
at the collection for a single year may hint at supporting subject areas, more definitive 
gathering of collection numbers is needed. Looking at the entire collection using data 
visualization may provide a new way of performing collection assessments. Compar-
ing the collection with circulation figures may be used with other variables to suggest 
weeding decisions. Libraries may look at other items, such as DVDs, and determine 
how they circulate. An examination of e-book usage statistics, if possible given the 
many variables discussed earlier, may reveal interesting trends. Could patterns of 
interlibrary loan requests of materials be easily understood through data visualization, 
suggesting solutions for lending? 

FIGURE 8
Data Bubbles Representing Student Enrollment by Discipline, X-Axis 

Showing Expenditures & Y-Axis is Book Circulation Since Purchase for 
School of Business and School of the Arts, 2013–2014


776  College & Research Libraries November 2016

Once the visualization tool is selected, data are gathered and cleaned up, a workflow 
is created and the process delineated, data combinations may be studied as needed. 
For example, what patterns might appear when figures are compared from Interli-
brary Loan, patron use, and instructional sessions? How about amount allotted in 
discretionary funds compared to expenditures for new majors? What happens when 
the collections for majors and minors are compared with the collections of popular 
interdisciplinary subjects? 

Conclusion
The need for examining collection data clearly extends beyond simply buying materials 
to support curriculum or to meet the requirements of the most vocal faculty. Accurate 
visualizations of library data suggest avenues for staffing and service, resource expen-
ditures, scholarly relationships and instructional outreach as well as opportunities for 
excellent collection development.

Groups to help with data visualization include The Office for Creative Research, 
found at http://o-c-r.org/abstract/. This group includes Jer Thorp, data artist. A descrip-
tion of Thorp should sound familiar, as it also describes librarian and information 
scientists. In a National Geographic interview, Jer states that his biggest thrill comes 
from engaging with a completely new topic. 

“I get to become a little expert in a lot of different things,” he says. “We work on 
projects that are in all kinds of categories and all types of subject areas, and we really 
make an effort to become as educated about all of them as we can.”31

Librarians and data artists like Thorp are alike. We can benefit from becoming better 
versed with the tools of data analysis. The call for librarians to become more comfortable 
with data is echoed in the literature. Brown and Stowers suggest justifying collections 
expenditures with data analysis, important since collection expenditures is “second 
only to personnel in the library’s budget.” As supported by Morrisey, a “thorough 
data analysis will let you know how people are using your book collections and may 
inform you as to adjusting collections dollars among the disciplines.”32

The changing landscape of collection development calls for a more accurate, unbi-
ased, and objective view of library holdings using a combination of data gathering to 
give an overall picture of the strength or weakness of the collection.33

In creating data visualizations that are clearly understood at a glance, without 
extravagant explanation, librarians will be able to have meaningful conversations 
resulting in free and impartial decision making.

Notes

 1. Katy Börner, Chaomei Chen, and Kevin W. Boyack, “Visualizing Knowledge Domains,” 
in Annual Review of Information Science and Technology, vol. 37 (2003) ed. B. Cronin (Medford, N.J.: 
Information Today, Inc.), doi:10.1002/aris.1440370106: 209.

 2. Edward R. Tufte, The Visual Display of Quantitative Information, 2nd ed., (Cheshire, Conn.: 
Graphics Press, 2001).

 3. W. Xu, M. Esteva, S.D. Jain, and V. Jain, “Interactive Visualization for Curatorial Analysis of 
Large Digital Collection,” Information Visualization 13 (2013): 159–83, doi:10.1177/1473871612473590. 

 4. Brad Eden, “Practical Applications of 2D and 3D Information Visualization for Information 
Organizations,” Library Technology Reports (2005), available online at https://journals.ala.org/ltr/
article/view/4599/5427 [accessed November 6, 2014].

 5. Börner, Chen, and Boyack, “Visualizing Knowledge Domains,” 209. 
 6. Katy Börner and David Polley, Visual Insights: A Practical Guide to Making Sense of Data 

(Cambridge, London: MIT Press, 2014), 7.
 7. Amy J. Catalano and William T. Caniano, “Book Allocations in a University Library: An 
Evaluation of Multiple Formulas,” Collection Management 38 (2013): 192–212, doi:10.1080/01

462679.2013.792306.


Using Data Visualization to Examine an Academic Library Collection  777

 8. Katina Strauch, e-mail message to authors, December 3, 2014.
 9. Catalano and Caniano, “Book Allocations in a University Library,” 5, 193.
10. Kitti Canepi, “Fund Allocation Formula Analysis: Determining Elements for Best Practices 

in Libraries.” Library Collections, Acquisitions, and Technical Services 31 (2007): 12–24, doi:10.1016/j.
lcats.2007.03.002.

11. George Stachokas and Tim Gritten, “Adapting to Scarcity: Developing an Integrated Alloca-
tion Formula,” Collection Management 38 (2013): 33–50, doi:10.1080/01462679.2012.730495; Denise Pan, 
Gabrielle Wiersma, Leslie Williams, and Yem S. Fong, “More Than a Number: Unexpected Benefits 
of Return on Investment Analysis,” Journal of Academic Librarianship 39 (2013): 566–72, doi:10.1016/j.
acalib.2013.05.002; Robin Bradford, “Getting Data Right,” Library Journal 139 (2014): 26.

12. Rick Anderson, “Collections 2021: The Future of the Library Collection Is Not a Collection,” 
Serials 24 (2011): 211–16.

13. Katina Strauch, e-mail message to authors, June 17, 2015; Canepi, “Fund Allocation Formula 
Analysis,” 2.

14. Julie C. Blake and Susan P. Schleper, “From Data to Decisions: Using Surveys and Statis-
tics to Make Collection Management Decisions,” Library Collections, Acquisitions, and Technical 
Services 28 (2004): 460–64, doi:10.1016/j.lcats.2004.09.002; Jennifer E. Knievel, Heather Wicht, and 
Lynn Silipigni Connaway, “Use of Circulation Statistics and Interlibrary Loan Data in Collection 
Management,” College & Research Libraries (2006): 35–50; Locke Morrisey, “Data-Driven Decision 
Making in Electronic Collection Development,” Journal of Library Administration 50 (2010): 283–90, 
doi:10.1080/01930821003635010; Joshua Finnel and Walt Fontane, “Reference Question Data Min-
ing: A Systematic Approach to Library Outreach,” Reference & User Services Quarterly 49 (2010): 
278–86.

15. Donald Beagle, “Visualization of Metadata,” Information Technology and Libraries (1999): 
192–99; Donald Beagle, “Visualizing Keyword Distribution across Multi-Disciplinary c-Space,” 
D-Lib Magazine 9 (2003), doi:10.1045/june2003-beagle.

16. Junliang Zang and Javed Mostafa, “Information Retrieval by Semantic Analysis and 
Visualization of the Concept Space of D-Lib® Magazine,” D-Lib Magazine 8 (2002), doi:10.1045/
october2002-zhang.

17. Zachary Pousman, John T. Stasko, and Michael Mateas, “Casual Information Visualization: 
Depictions of Data in Everyday Life,” IEEE Transactions on Visualization and Computer Graphics 13 
(2007): 1145–52; V. Nikolaevich, G. Nikolai, and A. Mazov, “Detection of Information Requirements 
of Researchers Using Bibliometric Analyses to Identify Target Journals,” Information Technology 
and Libraries (2013): 66–77.

18. Manuel Lima, Visual Complexity: Mapping Patterns of Information (New York: Princeton 
Architectural Press, 2011), 211; Jeanne M. Brown and Eva D. Stowers, “Use of Data in Collections 
Work: An Exploratory Survey.” Collection Management 38 (2013): 143–62, doi:10.1080/01462679.20
13.763742.

19. Lea Bailey, “Does Your Library Reflect the Hispanic Culture? A Mapping Analysis,” Library 
Media Connection (2009): 20–24.

20. Bradford, “Getting Data Right,” 26.
21. Cody Dunne, Ben Shneiderman, Robert Gove, Judith Klavans, and Bonnie Dorr, “Rapid 

Understanding of Scientific Paper Collections: Integrating Statistics, Text Analytics, and Visualiza-
tion,” Journal of the American Society for Information Science and Technology 63 (2012): 2351–69; Joyce 
Chapman and David Woodbury, “Leveraging Quantitative Data to Improve a Device-Lending 
Program,” Library Hi Tech 30 (2012): 210–34, doi:10.1108/07378831211239924.

22. A. Watters, “Visualization of the Week: Visualizing the Library Catalog” Radar: Insight, 
Analysis, and Research about Emerging Technologies (Aug. 2011), available online at http://radar.
oreilly.com/2011/08/visualization-of-the-week-visu.html [accessed 14 July 2014]; Bradford, “Get-
ting Data Right,” 26.

23. H. Andrew Schwartz et al., “Personality, Gender, and Age in the Language of Social Media: 
The Open-Vocabulary Approach,” PloS One 8, ed. Tobias Preis (2013): e73791, doi:10.1371/journal.
pone.0073791.

24. Junliang Zang and Javed Mostafa, “Information Retrieval by Semantic Analysis and 
Visualization of the Concept Space of D-Lib® Magazine,” D-Lib Magazine 8 (2002), doi:10.1045/
october2002-zhang; Xu, Esteva, Jain, and Jain, “Interactive Visualization.”

25. “DHO:Discovery (fionnachtain),” last modified n.d., http://discovery.dho.ie/discover.php.
26. “How to Draw and Format a Basic Bubble Chart in Excel 2010,” YouTube video, 7:34, 

posted by Eugene F.M. O’Loughlin (Apr. 5, 2013), https://youtu.be/4FyImh2G7N0.
27. Michelle Sellars and Lindsay Barnett, e-mail message to authors, December 10, 2015.
28. Edward R. Tufte, Envisioning Information (Cheshire, Conn.: Graphics Press, 1990).
29. Robert J. Mignone, e-mail message to authors, June 12, 2015.
30. Kirstin Steele, “Visualizing the Value of Library Content,” Bottom Line: Managing Library 

http://radar.oreilly.com/2011/08/visualization-of-the-week-visu.html
http://radar.oreilly.com/2011/08/visualization-of-the-week-visu.html


778  College & Research Libraries November 2016

Finances 26 (2013): 14–17, doi:10.1108/08880451311321537.
31. R. Schleeter, “Data Artist: Jer Thorp,” National Geographic Education (2013), available online at 

http://education.nationalgeographic.com/education/news/data-artist-jer-thorp/?ar_a=1 [accessed 
6 November 2014].

32. Jeanne M. Brown and Eva D. Stowers, “Use of Data in Collections Work: An Exploratory 
Survey,” Collection Management 38 (2013): 143–62, doi:10.1080/01462679.2013.763742; Locke Mor-
risey, “Data-Driven Decision Making in Electronic Collection Development,” Journal of Library 
Administration 50 (2010): 283–90, doi:10.1080/01930821003635010.

33. Julie C. Blake and Susan P. Schleper, “From Data to Decisions: Using Surveys and Statistics 
to Make Collection Management Decisions,” Library Collections, Acquisitions, and Technical Services 
28 (2004): 460–64, doi:10.1016/j.lcats.2004.09.002.