aRtiCLE titLE  |  autHoR   41ContEnt-BasED inFoRmation REtRiEvaL anD DiGitaL LiBRaRiEs  |  Wan anD Liu   41

Content-Based Information 
Retrieval and Digital Libraries

This paper discusses the applications and importance 
of content-based information retrieval technology in 
digital libraries. It generalizes the process and ana-
lyzes current examples in four areas of the technology. 
Content-based information retrieval has been shown to 
be an effective way to search for the type of multime-
dia documents that are increasingly stored in digital 
libraries. As a good complement to traditional text-
based information retrieval technology, content-based 
information retrieval will be a significant trend for the 
development of digital libraries. 

W
ith several decades of their development, digital 
libraries are no longer a myth. In fact, some gen-
eral digital libraries such as the National Science 

Digital Library (NSDL) and the Internet Public Library 
are widely known and used. The advance of computer 
technology makes it possible to include a colossal amount 
of information in various formats in a digital library. In 
addition to traditional text-based documents such as 
books and articles, other types of materials—including 
images, audio, and video—can also be easily digitized 
and stored. Therefore, how to retrieve and present this 
multimedia information effectively through the interface 
of a digital library becomes a significant research topic.

Currently, there are three methods of retrieving infor-
mation in a digital library. The first and the easiest way 
is free browsing. By this means, a user browses through a 
collection and looks for desired information. The second 
method—the most popular technique used today—is text-
based retrieval. Through this method, textual information 
(full text of text-based documents and/or metadata of 
multimedia documents) is indexed so that a user can 
search the digital library by using keywords or controlled 
terms. The third method is content-based retrieval, which 
enables a user to search multimedia information in terms 
of the actual content of image, audio, or video (Marques 
and Furht 2002). Some content features that have been 
studied so far include color, texture, size, shape, motion, 
and pitch.

While some may argue that text-based retrieval tech-
niques are good enough to locate desired multimedia 
information, as long as it is assigned proper metadata or 
tags, words are not sufficient to describe what is some-
times in a human’s mind. Imagine a few examples: A 
patron comes to a public library with a picture of a rare 
insect. Without expertise in entomology, the librarian 
won’t know where to start if only a text-based informa-
tion retrieval system is available. However, with the help 
of content-based image retrieval, the librarian can upload 
the digitized image of the insect to an online digital image 

library of insects, and the system will retrieve similar 
images with detailed description of this insect. Similarly, 
a patron has a segment of music audio, about which he or 
she knows nothing but wants to find out more. By using 
the content-based audio retrieval system, the patron can 
get similar audio clips with detailed information from a 
digital music library, and then listen to them to find an 
exact match. This procedure will be much easier than 
doing a search on a text-based music search system. It 
is definitely helpful if a user can search this non-textual 
information by styles and features.

In addition, the advance of the World Wide Web 
brings some new challenges to traditional text-based 
information retrieval. While today’s Web-based digital 
libraries can be accessed around the world, users with 
different language and cultural backgrounds may not 
be able to do effective keyword searches of these librar-
ies. Content-based information retrieval techniques will 
increase the accessibility of these digital libraries greatly, 
and this is probably a major reason it has become a hot 
research area in the past decade. Ideally, a content-based 
information retrieval system can understand the multi-
media data semantically, such as its objects and categories 
to which it belongs. Therefore, a user is able to submit 
semantic queries and retrieve matched results. However, 
a great difficulty in the current computer technology is 
to extract high-level or semantic features of multimedia 
information. Most projects still focus on lower-level fea-
tures, such as color, texture, and shape.

Simply put, a typical content-based information 
retrieval system works in this way: First, for each mul-
timedia file in the database, certain feature information 
(e.g., color, motion, or pitch) is extracted, indexed, and 
stored. Second, when a user composes a query, the feature 
information of the query is calculated as vectors. Finally, 
the system compares the similarity between the feature 
vectors of the query and multimedia data, and retrieves 
the best matching records. If the user is not satisfied with 
the retrieved records, he or she can refine the search 
results by selecting the most relevant ones to the search 
query, and repeat the search with the new information. 
This process is illustrated in figure 1.

The following sections will examine some exist-
ing content-based information retrieval techniques for 
most common information formats (image, audio, and 
video) in digital libraries, as well as their limitations and 
trends.

Gary (Gang) Wan (gwan@tamu.edu) is a Science Librarian and 
Assistant Professor, and zao Liu (zliu@tamu.edu) is a Distance 
Learning Librarian and Assistant Professor at Sterling C. Evans 
Library, Texas A&M University, College Station, Texas. 

Gary (Gang) Wan and Zao Liu


42   inFoRmation tECHnoLoGY anD LiBRaRiEs  |  maRCH 200842   inFoRmation tECHnoLoGY anD LiBRaRiEs  |  maRCH 2008

■	Content-based image retrieval
There have been a large number of different content-
based image retrieval (CBIR) systems proposed in the 
last few years, either building on prior work or exploring 
novel directions. One similarity among these systems is 
that most perform feature extraction as the first step in 
the process, obtaining global image features such as color, 
shape, and texture (Datta et al., 2005).

One of the most well-known CBIR systems is query 
by image content (QBIC), which was developed by 
IBM. It uses several different features, including color, 
sketches, texture, shape, and example images to retrieve 
images from image and video databases. Since its launch 
in 1995, the QBIC model has been employed for quite a 
few digital libraries or collections. One recent adopter is 
the State Hermitage Museum in Russia (www.hermitage.
ru), which uses QBIC for its Web-based digital image col-
lection. Users can find artwork images by selecting colors 
from a palette or by sketching shapes on a canvas. The 
user can also refine existing search results by requesting 
all artwork images with similar visual attributes. The 
following screenshots demonstrate how a user can do a 
content-based image search with QBIC technology.

In figure 2.1, the user chooses a color from the palette 
and composes the color schema of artwork he or she is 
looking for. Figure 2.2 shows the artwork images that 
match the query schema.

Another example of digital libraries or collections 
that have incorporated CBIR technology is the National 
Science Foundation’s International Digital Library Project 
(www.memorynet.org), a project that is composed of 
several image collections. The information retrieval sys-
tem for these collections includes both a traditional 
text-based search engine and a CBIR system called 
SIMPLIcity (Semantics-sensitive Integrated Matching for 
Picture Libraries) developed by Wang et al. (2001) of 
Pennsylvania State University. 

From the front page of these image collections, a user 
can choose to display a random group of images (figure 

3.1). Below each image is a “similar” button; clicking this 
allows the user to view a group of images that contain 
similar objects to the previously selected one (figure 3.2). 
By providing feedback to the search engine this way, the 
user can find images of desired objects without knowing 
their names or descriptions. 

Simply put, SIMPLIcity segments each image into 
small regions, extracts several features (such as color, 

Figure 1. The general process of content-based  
information retrieval

Figure 2.1. A user query

Figure 2.2. The search results for this query


aRtiCLE titLE  |  autHoR   43ContEnt-BasED inFoRmation REtRiEvaL anD DiGitaL LiBRaRiEs  |  Wan anD Liu   43

location, and shape) from these small regions, and clas-
sifies these regions into some semantic categories (such 
as textured/nontextured and graph/photograph). When 
computing the similarity between the query image and 
images in the database, all these features will be consid-
ered and integrated, and best matching results will be 
retrieved (Wang et al., 2001).

Similar applications of CBIR technology in digital 
libraries include the University of California–Berkeley’s 
Digital Library Project (http://bnhm.berkeley.edu), the 
National STEM Digital Library (ongoing), and Virginia 
Tech’s anthropology digital library, ETANA (ongoing).

While these feature-based approaches have been 
explored over the years, an emerging new research 
direction in CBIR is automatic concept recognition and 
annotation. Ideally, automatic concept recognition and 
annotation can discover the concepts that an image con-

veys and assign a set of metadata to it, thus allowing 
image search through the use of text. A trusted automatic 
concept recognition and annotation system can be a good 
solution for large data sets. However, the semantic gap 
between computer processors and human brains remains 
the major challenge in the development of a robust auto-
matic concept recognition and annotation system (Datta 
et al., 2005). A recent example of efforts in this field is 
Li and Wang’s ALIPR (Automatic Linguistic Indexing 
of Pictures—Real Time, http://alipr.com) project (2006). 
Through a Web interface, users are able to search images 
in several different ways: They may do text searches and 
provide feedback to the system to find similar images. 
Users may also upload an image, and the system will per-
form concept analysis and generate a set of annotations 
or tags automatically, as shown in figure 4. The system 
then retrieves images from the database that are visually 
similar to the uploaded image. In the process of auto-
matic annotation, if the user doesn’t think the tags given 
by the system are suitable, he or she can input other tags 
to describe the image. This is also the “training” process 
for the ALIPR system. 

 Since CBIR is the major research area and has the lon-
gest history in content-based information retrieval, there 
are many models, products, and ongoing projects in addi-
tion to the above examples. As image collections become 
a significant part of digital libraries, more attention has 
been paid to possibilities of providing content-based image 
search as a complement to existing metadata search.

■	Content-based audio retrieval
Compared with CBIR, content-based audio retrieval 
(CBAR) is relatively new, and fewer research projects on it 
can be found. In general, existing CBAR approaches start 
from the content analysis of audio clips. An example of 
this content analysis is extracting basic audio elements, 
such as duration, pitch, amplitude, brightness, and band-

Figure 3.1. A group of random images in the collection 

Figure 3.2. CBIR results Figure 4. ALIPR’s automatic annotation feature


44   inFoRmation tECHnoLoGY anD LiBRaRiEs  |  maRCH 200844   inFoRmation tECHnoLoGY anD LiBRaRiEs  |  maRCH 2008

width (Wold et al., 1996).
Because of the great difficulties in recognizing audio 

content, research in this area is less mature than that in 
content-based image and video retrieval. Although no 
CBAR system has been found to be implemented by any 
digital library so far, quite a few projects provide good 
prototypes or directions.

One good example is Zhang and Kuo’s (2001) research 
project on audio classification and retrieval. The proto-
type system is composed of three stages: coarse-level 
audio segmentation, fine-level classification, and audio 
retrieval. In the first stage, audio signals are semantically 
segmented and classified into several basic types includ-
ing speech, music, song, speech with music background, 
environment sounds, and silence. Some physical audio 
features—such as the energy function, the fundamental 
frequency, and the spectral peak tracks—are examined 
in this stage. In the second stage, further classification 
is conducted for every basic type. Features are extracted 
from the time-frequency representation of audio signals 
to reveal subtle differences of timbre and pattern among 
different classes of sounds. Based on these differences, 
the coarse-level segmentation obtained in stage one can 
be classified to narrower categories. For example, speech 
can be differentiated into the voices of men, women, and 
children. Finally, in the information retrieval stage, two 
approaches—query-by-keyword and query-by-exam-
ple—are employed. The query-by-keyword approach is 
more like the traditional text-based search system. The 
query-by-example approach is similar to content-based 
image retrieval systems where an image can be searched 
by color, texture, and histogram, and audio clips can be 
retrieved with distinct features, such as timbre, pitch, and 
rhythm. This way, a user may choose from a given list 
of features, listen to the retrieved samples, and modify 
the input feature set to get more desired results. Zhang 
and Kuo’s prototype is a very typical and classic CBAR 
system. It is relatively mature and can be used by large 
digital audio libraries.

More recently, Li et al. (2003) proposed a new feature 
extraction method particularly for music genre classifica-
tion named Daubechies Wavelet Coefficient Histograms 
(DWCHs). DWCHs capture the local and global informa-
tion of music signals simultaneously by computing their 
histograms. Similar to other CBAR strategies, this method 
divides the process of music genre classification into two 
steps: feature extraction and multi-class classification. 
The music signal information representing the music is 
extracted first, and then an algorithm is used to identify 
the labels from the representation of the music sounds 
with respect to their features.

Since the decomposition of audio signal can produce 
a set of subband signals at different frequencies cor-
responding to different characteristics, Li et al. (2003) 
proposed a new methodology, the DWCHs algorithm, for 

feature extraction. With this algorithm, the decomposi-
tion of the music signals is obtained at the beginning, and 
then a histogram of each subband is constructed. Hence, 
the energy for each subband is computed, and the charac-
teristics of the music are represented by these subbands. 
One finding from this research reveals that this methodol-
ogy, along with advanced machine learning techniques, 
has significantly improved accuracy of music genre clas-
sification (Li et al. 2003). Therefore, this methodology 
potentially can be used by those digital music libraries 
widely developed in past several years.

■	Content-based video retrieval
Content-based video retrieval (CBVR) is a more recent 
research topic than CBIR and CBAR, partly because the 
digitization technology for video appeared later than 
those for image and audio. As digital video Websites such 
as YouTube and Google Video become more popular, how 
to retrieve desired video clips effectively is a great con-
cern. Searching by some features of video, such as motion 
and texture, can be a good complement to the traditional 
text-based search method.

One of the earliest examples is the VideoQ system 
developed by Chang et al. (1997) of Columbia University 
(www.ctr.columbia.edu/VideoQ), which allows a user 
to search video based on a rich set of visual features and 
spatio-temporal relationships. The video clips in the data-
base are stored as MPEG files. Through a Web interface, the 
user can formulate a query scene as a collection of objects 
with different attributes, including motion, shape, color, 
and texture. Once the user has formulated the query, it is 
sent to a query server, which contains several databases 
for different content features. On the query server, the 
similarities between the features of each object specified in 
the query and those of the objects in the database are com-
puted; a list of video clips is then retrieved based on their 
similarity values. For each of these video clips, key-frames 
are dynamically extracted from the video database and 
returned to browser. The matched objects are highlighted 
in the returned key-frame. The user can interactively view 
these matched video clips by simply clicking on the key-
frame. Meanwhile, the video clip corresponding to that 
key-frame is extracted from the video database (Chang 
et al. 1997). Figures 5.1–5.2 show an example of a visual 
search through the VideoQ system.

Many other CBVR projects also examine these content 
features and try to find more efficient ways to retrieve 
data. A recent example is Wang et al.’s (2006) project, 
Vferret, a content-based similarity search tool for continu-
ous archived video. The Vferret system segments video 
data into clips and extracts both visual and audio features 
as metadata. Then a user can do a metadata search or 


aRtiCLE titLE  |  autHoR   45ContEnt-BasED inFoRmation REtRiEvaL anD DiGitaL LiBRaRiEs  |  Wan anD Liu   45

content-based search to retrieve desired video clips. In the 
first stage, a simple segmentation method is used to split 
the archived digital video into five-minute video clips. 
The system then extracts twenty image frames evenly 
from each of these five-minute video clips for visual 
feature extraction. Additionally, the system splits the 
audio channel of each clip into twenty individual fifteen-
second segments for further audio feature extraction. 
In the second stage, both audio and visual features are 
extracted. For visual features, the color element is used 
as the content feature. For audio features, 154 audio fea-
tures originally used by Ellis and Lee (2004) to describe 
audio segments are computed. For each fifteen-second 
video segment, the visual feature vector extracted from 
the sample image and the audio feature vector extracted 
from the corresponding audio segment are combined into 
a single feature vector. In the information retrieval stage, 
the user submits a video clip query at first, then its feature 
vector is computed and compared with that of video clips 
in the database, and the most similar clips are retrieved 
(Wang et al. 2006).

Similar projects in this area include Carnegie Mellon 
University’s Informedia Digital Video Library (www.
informedia.cs.cmu.edu) and MUVIS of Finland’s Tampere 

University of Technology (http://muvis.cs.tut.fi/index.
html).

Content-based information retrieval for other 
digital formats

With the advance of digitization technology, the content 
and formats of digital libraries are much richer than 
before. They are not limited to text, image, audio, and 
video. Some new formats of digital content are emerging. 
Digital libraries of 3-D objects are good examples.

Since 3-D models have arbitrary topologies and can-
not be easily “parameterized” using a standard template 
as in the case for 2-D forms (Bustos et al. 2005), content-
based 3-D model retrieval is a more challenging research 
topic than other multimedia formats discussed earlier. 
So far, four types of solutions—primitive-based, statis-
tics-based, geometry-based, and view-based—have been 
found (Bimbo and Pala 2006). Primitive-based solutions 
represent 3-D objects with a basic set of parameterized 
primitive elements. Parameters are used to control the 
shape of each primitive element and to fit each primitive 
element with a part of the model. With statistics-based 
approaches, shape descriptions based on statistical mod-

Figure 5.1. The user composes a query 

Figure 5.2. Search results for the sample query


46   inFoRmation tECHnoLoGY anD LiBRaRiEs  |  maRCH 200846   inFoRmation tECHnoLoGY anD LiBRaRiEs  |  maRCH 2008

els are created and measured. Geometry-based methods, 
however, use geometric properties of the 3-D object and 
their measures as global shape descriptors. For view-
based solutions, a set of 2-D views of the model and 
descriptors of their content are used to represent the 3-D 
object shape (Bimbo and Pala 2006).

Another novel example is Moustakas et al.’s (2005) 
project on 3-D model search using sketches. In the 
experimental system, the vector of geometrical descrip-
tors for each 3-D model is calculated during the feature 
extraction stage. In the retrieval stage, a user can ini-
tially use one of the sketching interfaces (such as the 
virtual reality interface or by using an air mouse) to 
sketch a 2-D contour of the desired 3-D object. The 2-D 
shape is recognized by the system, and a sample primi-
tive is automatically inserted in the scene. Next, the user 
defines other elements that cannot be described by the 
2-D contour, such as the height of the object, and manip-
ulates the 2-D contour until it reaches its target position. 
The final query is formed after all the primitives are 
inserted. Finally, the system computes the similarities 
between the query model and each 3-D model in the 
database, and renders the best matching records.

An online demonstration can be found for a European 
project specifically designed for a 3-D digital museum col-
lection, SCULPTEUR (www.sculpteurweb.org). From its 
Web-based search interface, a user can choose to do a meta-
data search or content-based search for a 3-D object. The 
search strategy here is somewhat similar to that in some 
CBIR systems: the user can upload a 3-D model in VRML 
formats, then select a search algorithm (such as similar 
color, texture, etc.) to perform a search within a digital 
collection of 3-D models. As 3-D computer visualization 
has been widely used in a variety of areas, there are more 
research projects focusing on the content-based informa-
tion retrieval techniques for this new multimedia format.

■	Conclusion
There is no doubt that content-based information 
retrieval technology is an emerging trend for digital 
library development and will be an important comple-
ment to the traditional text-based retrieval technology. 
The ideal CBIR system can semantically understand the 
information in a digital library, and render users the 
most desirable data. However, the machine understand-
ing of semantic information still remains to be a great 
difficulty. Therefore, most current research projects, 
including those discussed in this paper, deal with the 
understanding and retrieval of lower-level features or 
physical features of multimedia content. Certainly, as 
related disciplines such as computer vision and artificial 

intelligence keep developing, more researches will be 
done on higher-level feature-based retrieval.

In addition, the growing varieties of multimedia 
content in digital libraries have also brought many new 
challenges. For instance, 3-D models now become impor-
tant components of many digital libraries and museums. 
Content-based retrieval technology can be a good direc-
tion for this type of content, since the shapes of these 3-D 
objects are often found more effectively if the user can 
compose the query visually. New CBIR approaches need 
to be developed for these novel formats.

 Furthermore, most CBIR projects today tend to be 
Web-based. By contrast, many project were based on 
client applications in the 1990s. These Web-based CBIR 
tools will have significant influence on digital libraries 
or repositories, as most of them are also Web-based. 
Particularly in the age of Web 2.0, some large digital 
repositories—such as Flickr for images and YouTube 
and Google Video for video—are changing people’s 
daily lives. The implementation of CBIR will be a great 
benefit to millions of users.

 Since the nature of CBIR is to provide better search 
aids to end users, it is extremely important to focus on the 
actual user’s needs and how well the user can use these 
new search tools. It is surprising to find that little usabil-
ity testing has been done for most CBIR projects. Such 
testing should be incorporated into future CBIR research 
before it is widely adopted.

Bibliography

Bimbo, A. and P. Pala. 2006. Content-based retrieval of 3-D mod-
els. ACM Transactions on Multimedia Computing, Communica-
tions, and Applications 2, no. 1: 20–43.

Bustos, B., et al. 2005. Feature-based similarity search in 3-D 
object databases. ACM Computing Surveys 37, no. 4: 345–387.

Chang, S., et al. 1997). VideoQ: an automated content based 
video search system using visual cues. In Proceedings of the 
5th ACM International Conference on Multimedia, E. P. Glinert, 
et al., eds. New York: ACM.

Datta R., et al. 2005. Content-based image retrieval: approaches 
and trends of the new age. In Proceedings of the 7th Interna-
tional Workshop on Multimedia Information Retrieval, in Con-
junction with ACM International Conference on Multimedia, H. 
Zhang, , J. Smith, and Q. Tian, eds. New York: ACM.

Ellis, D. and K. Lee. Minimal-impact audio-based personal 
archives. In Proceedings of the 1st ACM workshop on Continuous 
Archival and Retrieval of Personal Experiences CARPE, J. Gem-
mell, et al., eds. New York: ACM.

Li, T., et al. 2003. A comparative study on content-based music 
genre classification. In Proceedings of the 26th Annual Interna-
tional ACM SIGIR Conference on Research and Development in 
Information Retrieval, C. Clarke, et al., eds. New York: ACM.

Li, J. and J. Wang, J. 2006. Real-time computerized annotation of 
pictures. In Proceedings of the 14th Annual ACM International 


aRtiCLE titLE  |  autHoR   47ContEnt-BasED inFoRmation REtRiEvaL anD DiGitaL LiBRaRiEs  |  Wan anD Liu   47

Conference on Multimedia, K. Nahrstedt, et al., eds. New York: 
ACM.

Marques, O. and B. Furht. 2002. Content-based Image and Video 
Retrieval. Norwell, Mass: Kluwer.

Moustakas, K., et al. 2005. MASTER-PIECE: A multimodal 
(gesture+speech) interface for 3D model search and retrieval 
integrated in a virtual assembly application. Proceedings of the 
eNTERFACE: 62–75.

Wang, J., et al. 2001. SIMPLIcity: semantics-sensitive integrated 
matching for picture libraries. IEEE Trans. Pattern Analysis 

and Machine Intelligence 23, no. 9: 947–963.
Wang, Z., et al. 2006. VFerret: content-based similarity search 

tool for continuous archived video. In Proceedings of the 3rd 
ACM Workshop on Continuous Archival and Retrival of Personal 
Experiences, K. Maze et al., eds. New York: ACM.

Wold, E., et al. 1996. Content-based classification, search, and 
retrieval of audio. IEEE MultiMedia 3, no. 3: 27–36.

Zhang, T. and C. Kuo. 2001. Content-based Audio Classification 
and Retrieval for Audiovisual Data Parsing. Norwell, Mass.: 
Kluwer. 

LITA National Forum cover 2
LITA Guides cover 3

LITA Workshops  cover 4

Index to Advertisers

STATEMENT OF OWNERSHIP, MANAGEMENT, AND CIRCULATION

Information Technology and Libraries, Publication No. 280-800, is published quarterly in March, June, September, and December 
by the Library Information and Technology Association, American Library Association, 50 E. Huron St., Chicago, Illinois 
60611-2795. Editor: John Webb, Librarian Emeritus, Washington State University Libraries, Pullman, WA 99164-5610. Annual 
subscription price, $55. Printed in U.S.A. with periodical-class postage paid at Chicago, Illinois, and other locations. As a 
nonprofit organization authorized to mail at special rates (DMM Section 424.12 only), the purpose, function, and nonprofit 
status for federal income tax purposes have not changed during the preceding twelve months.

EXTENT AND NATURE OF CIRCULATION

(Average figures denote the average number of copies printed each issue during the preceding twelve months; actual figures 
denote actual number of copies of single issue published nearest to filing date: June 2007 issue). Total number of copies 
printed: average, 5,354; actual, 5,280. Sales through dealers and carriers, street vendors, and counter sales: average, 0; actual 
462. Paid or requested mail subscriptions: average, 4,283; actual, 4,193. Free distribution (total): average, 292; actual, 292. 
Total distribution: average, 5,028; actual, 4,947. Office use, leftover, unaccounted, spoiled after printing: average, 326; actual, 
333. Total: average, 5,354; actual, 5,280. Percentage paid: average, 94.19; actual, 94.10.

S t a t e m e n t  o f  O w n e r s h i p ,  M a n a g e m e n t ,  a n d  C i r c u l a t i o n  ( P S  F o r m  3 5 2 6 ,  S e p t e m b e r  2 0 0 7 )  f i l e d  w i t h  t h e 
U n i t e d  S t a t e s  P o s t  O f f i c e  P o s t m a s t e r  i n  C h i c a g o ,  O c t o b e r  1 ,  2 0 0 7 .