7

Metadata Effectiveness in Internet 
Discovery: An Analysis of Digital 
Collection Metadata Elements and 
Internet Search Engine Keywords

Le Yang

Le Yang is Assistant Librarian at Texas Tech University; e-mail: le.yang@ttu.edu. © 2016 Le Yang, Attribution-
NonCommercial (http://creativecommons.org/licenses/by-nc/3.0/) CC BY-NC.

This study analyzed digital item metadata and keywords from Internet 
search engines to learn what metadata elements actually facilitate dis-
covery of digital collections through Internet keyword searching and how 
significantly each metadata element affects the discovery of items in a 
digital repository. The study found that keywords from Internet search 
engines matched values in eight metadata elements and resulted in land-
ing visits to the digital repository. Findings of the study indicate that three 
specific metadata elements are effective in enhancing discoverability of 
digital collections through Internet search engines, including Dublin Core 
metadata elements Title, Description, and Subject.

n recent decades, the rapid increase in the number of digital repositories 
has called for in-depth research on metadata quality evaluations. As Ann 
Windnagel claimed, the success of a digital repository depends largely 
on the quality of its metadata.1 Discussions about metadata evaluation 

and evaluation criteria have lasted for more than a decade. Relevant issues including 
metadata definition, primary purpose, and functionality perspectives have all been 
examined. Although several evaluation standards of metadata quality exist, there is 
no one answer to the question of how to evaluate metadata quality.2

In the literature, discoverability issues are the focus receiving the most impact in 
several metadata evaluation standards. As items are put into digital repositories in-
stead of the catalog, they move out of the controlled environment of the library. The 
metadata records, instead of just having to work within a system search, also have to 
be effective for external indexing by search engines in order to be found. Challenges 
of deploying metadata to the Internet to facilitate information discovery and retrieval 
have persisted for nearly twenty years; however, few studies have been conducted 
to test the effectiveness of metadata on online resource discovery by search engines. 
No relevant studies on metadata effectiveness between digital collections and search 
engines are found in the current literature.

This article aims to examine the implemented metadata of a digital collection and 
to evaluate the metadata effectiveness on digital resource discovery by Internet search 

doi:10.5860/crl.77.1.7 crl15-722



8  College & Research Libraries January 2016

engines. Using Google Analytics on Texas Tech University Libraries’ digital collections, 
this study extracts the digital content that is frequently visited by search engine traffic, 
analyzes the valid keywords that are entered in search engines, and compares keywords 
against implemented values of metadata elements. Results of the study show that key-
words from Internet search engines find matched values in certain metadata elements 
in the digital collections. The findings indicate that certain metadata elements are more 
effective in facilitating discovery of digital collections on Internet search engines, and 
those creating metadata records should pay extra attention to these elements.

Literature Review
Since the 1990s, a variety of digital libraries have been established to store and provide 
access to digital resources. Deploying cataloging rules for digital resources has led to 
favoring metadata as the best means of describing and discovering resources on the Web.3 

Mehdi Safari claimed that, as a result of the exponential growth of information 
resources on the Internet, metadata expanded beyond the traditional environment to 
address comprehensive issues involving effective resource description and discovery.4 
Jung-Ran Park emphasized that the rapid proliferation of digital repositories had called 
for research on metadata quality evaluation.5 F.O. Ininkaye, A.B.C. Robert, and B.A. 
Ojokoh echoed that the rapid increase of digital repositories motivated research on 
metadata quality and measurement.6 Windnagel stressed the importance of metadata 
quality in the usability of a digital repository and asserted that the success of a digital 
repository depends in large part on the quality of its metadata.7

Metadata is a heavily used term for which different definitions have been offered. In 
general, it may be defined as structured data about data, which “characterizes source 
data, describes their relationships, and supports the discovery and effective use of 
source data.”8 Alternatively, metadata can be defined as a structured set of elements 
that describe the information resources for the purpose of identification, discovery, 
and use of information.9 According to National Information Standards Organization 
(NISO), metadata is structured information that describes, explains, locates, or other-
wise makes easier to retrieve, use, or manage information resources.10

The conversation about metadata quality has lasted for more than a decade and has 
focused primarily on determining how general quality criteria might be established 
in an environment where diversity of metadata formats coexist.11 In 1997, William E. 
Moen, Erin L. Stewart, and Charles R. McClure summarized 23 evaluation criteria from 
literature on metadata quality and provided an in-depth discussion through examina-
tion of metadata records of 42 federal agencies. This comprehensive study specified four 
major sets of criteria including completeness, accuracy, consistency, and currency.12 In 
2004, Thomas R. Bruce and Diane I. Hillmann stressed that completeness of metadata 
could be measured by connections of individual objects to the parent collections, which 
reflect the functional purpose of metadata in resource discovery and use.13 In 2009, 
Park examined the state of the art of metadata quality evaluation and summarized that 
the quality of metadata reflects the degree to which the metadata performs the core 
bibliographic functions including discovery, use, provenance, currency, authenticity, 
and administration. In other words, Park concluded that the principal purpose of 
metadata is to a large degree related to that of traditional online library catalogs and 
databases in finding, identifying, selecting, and obtaining items.14

Discoverability has been the key issue in the standard of metadata quality evaluation. 
As Sevim McCutcheon noted, ambitious digitization projects have made digitized books 
and articles increasingly discoverable and accessible to web users via keyword search-
ing.15 Tyler E. Phelps pointed out that the challenge of bringing metadata-based order 
to the Internet to facilitate information retrieval has persisted for nearly twenty years.16



Metadata Effectiveness in Internet Discovery  9

The major purpose of any metadata, said Warwick Cathro, is to facilitate and improve 
the retrieval of information for information analysis and predictive decision-making 
process.17 Stuart Weibel et al. specified that resource discovery is the most pressing 
need that metadata can satisfy, thus believing that a simpler metadata scheme, such 
as Dublin Core, is desired because such a schema presents the minimum number of 
metadata elements required to facilitate resource discovery on the Internet.18 In the 
report, NISO stressed that an important reason for creating descriptive metadata is to 
facilitate discovery of relevant information.19 Descriptive metadata provides a way to 
facilitate easy searching, retrieval, and management of information resources.20

Safari expressed the concerns about metadata effectiveness, asking if metadata 
provides a basis for increased effectiveness of retrieval by search engines.21 While 
there have been studies done to evaluate search engines, as Safari noted, few studies 
have been done to test the effectiveness of metadata on resource discovery by search 
engines. Moreover, these studies focus primarily on web page resources instead of 
digital collections, and research conducted by different scholars in different times 
yield inconsistent results.22

One example on how embedded metadata affects retrieval of web pages was 
conducted by Thomas P. Turner and Lise Brackbill, finding that the use of keyword 
meta tags made a significant improvement on the retrievability of a web page.23 Jin 
Zhang and Alexandra Dimitroff also claimed that, since metadata techniques have 
been applied to various formats of digital resources, metadata should make Internet 
resources more organized, informative, searchable, and accessible, and the precision 
of information retrieved by search engines should improve substantially.24 In their 
research, Zhang and Dimitroff found that websites with Dublin Core metadata were 
returned by search engines significantly higher and faster in the search list than those 
sites without. Such findings demonstrate that implementation of metadata effectively 
improves the visibility of search results lists in search engines. 25

In 2001, Robin Henshaw and Edward J. Valauskas studied the effectiveness of Dublin 
Core metadata to see if the use of metadata enhanced information retrieval in a suite 
of specific search engines, and the results suggested that metadata did not play a sig-
nificant role in increasing the discovery of the resources.26 In a later study, based on 
the statistical analysis of six search engines, Safari found a similar result, concluding 
that using Dublin Core metadata elements did not improve the retrieval ranking of the 
web pages.27 In other words, deployment of Dublin Core metadata elements in web 
page resources did not affect retrievability of the resources, nor was it an impact factor 
for resource discovery on the web. Safari ascribed the result to the lack of consensus 
on implementing Dublin Core, suggesting that the metadata schema was not widely 
accepted and used by search engines.28

In addition to the effectiveness study of metadata in discoverability of online re-
sources, ratings of metadata elements is another aspect that metadata quality evalu-
ation involves. Yen Bui and Jung-Ran Park conducted a research of the repository 
metadata for quality evaluation and found that five essential metadata elements, 
regardless of types of metadata schema, include Descriptor, Title, Subject, Type, and 
Identifier.29 In 2012, Phelps surveyed 236 national library websites and found that the 
five most common Dublin Core elements were Date, Title, Language, Creator, and 
Subject.30 In a most recent study of metadata usage on three math and science digital 
repositories, Windnagel found that the frequently used Dublin Core elements were 
Identifier, Title, Description, and Contributor, which were also required or strongly 
recommended during implementation.31 None of the mentioned research, however, 
built the connection between the rankings of metadata and discoverability by Internet 
search engines.



10  College & Research Libraries January 2016

Research Questions
Although discussions on metadata quality and evaluation criteria have lasted for almost 
two decades, few evaluations have been conducted. Instead of evaluating metadata of 
digital collections, the research primarily evaluated effectiveness of metadata on Web 
page resource discovery by search engines. The research of metadata ranking mainly 
considered metadata usage in digital collection. None of them established a meaningful 
ranking between metadata elements and discoverability. 

To find out how implemented metadata elements facilitate discovery of digital col-
lections by search engines, this study employed a method to compare the values of 
metadata elements against keywords used by patrons in Internet search engines. By 
doing this, the study was able to evaluate the effectiveness of metadata by examining 
which implemented metadata elements have the highest keyword matching rates.

Research Question 1: Which metadata elements in the digital collection contain values that 
match the keywords patrons use when searching in Internet search engines?

Research Question 2: What is the keyword matching rate of each metadata element? Which 
metadata element is the most effective in facilitating discovery of digital collec-
tions by the Internet search engines?

Methodology
Texas Tech University Libraries use DSpace as the main platform for the digital and 
institutional repository. The repository contained 46 digital collections and a total of 
29,705 digital items as of July 31, 2014, according to the built-in statistics tool. The 
majority of the digital repository was electronic theses and dissertations, digitized 
documents and images, authenticated architecture materials, government documents 
collections, and research papers from other institutions on campus. Except for the one 
collection developed specifically for the Architecture Library using a combination of 
VRA Core and Dublin Core, the rest of digital collections were all implemented with 
Dublin Core metadata schema only. This study would not discuss selections of or have 
any preferences on metadata schemas. Values of metadata elements in all associated 
items, regardless of types of metadata schemas being used, were examined and com-
pared with searching keywords.

The researcher used Google Analytics in this study for the purpose of data retrieving, 
organic search sorting, and keyword analyzing. Organic search is keyword searching 
behavior performed through all unpaid search engine mediums.32 Google Analytics 
records traffics from the major search engines, what search terms were used, and links 
to the page that the search term pulled up. Google Analytics only records this informa-
tion if the search resulted in users visiting the repository.

Differentiation of search engines was not discussed in this study; thus, organic 
searching keywords brought by different Internet search engines, including Google, 
Yahoo, Bing, and others, were treated equally. The selected time range for data extrac-
tion from the digital repository was between August 1, 2013, and July 31, 2014. Texas 
Tech University Libraries completed content migration to DSpace in July 2013; the 
selected time range provided a whole year’s data for the research.

According to figure 1, during the selected time period the total count of visit ses-
sions to the digital repository was 73,341 and the organic search shared 59 percent of 
the total, which was 43,016 sessions. The data demonstrated that search engine traffic, 
in the selected time frame, was the main traffic source of the digital repository, which 
contained a representative sample pool for the study. Traffic referred by Web domains 
were categorized as referral traffic, including Google Scholar, and these sent 20,763 
visits (28%) to the digital repository in the selected time frames. Typing a URL into the 



Metadata Effectiveness in Internet Discovery  11

browsers or visiting through bookmark tools were classified as direct traffic, which 
sent 8,815 visits (12%). Social Medias such as Facebook and Twitter made a 1 percent 
contribution to the traffic sources.

The total number of keywords that were searched during the selected time 
period and successfully helped the patron navigate to the digital repository was 
22,559, all of which were associated with one or multiple landing page URLs. Key-
words’ association with landing page URLs indicated that the keyword searched 
in Internet search engines all resulted in landing to the digital repository. Google 
Analytics sorted 22,559 keywords by the frequency of searching in default, but 
Google Analytics provided other means for sorting, such as by the alphabetical 
order of keywords.

The ideal sample size for a population of 22,559 should be 378, with a 95 percent 
Confidence Level and a ±5 percent Margin of Error. Table 1 summarizes the basic 
parameters of the research.

FIGURE 1
Traffic Sources to Digital Repository (August 1, 2013–July 31, 2014) 

Organi c Search
43,016; 59%

Referral
20,763; 28%

Di rect
8,815; 12%

Soci al
744; 1%

Other
3; 0%

Total Visits: 73,341

Organic Search

Referral

Direct

Social

Other

TABLE 1
Parameters of the Research

Keyword Population 22,559 Population = 22,559
Confidence Level 95% Z=1.96
Margin of Error (Confidence Interval) ±5% c=0.05
Standard of Deviation 0.5 p=0.5
Sample Size 377.74 Sample Size = 378



12  College & Research Libraries January 2016

To guarantee that samples were selected at random, the researcher used Random 
Integer Generator33 to generate 378 random numbers ranging from 1 to 22,559, then 
sorted the keywords in Google Analytics by alphabetical order and chose each num-
bered keyword by using the random numbers. The researcher followed the associated 
URLs with those keywords to each landing page in the digital repository and compared 
the keywords against the values in all metadata elements of that item.

When a valid string in the keyword matched a text string in metadata elements, 
the metadata element was recorded in an Excel table as matching searching keyword 
once. A valid string here was defined as a readable and meaningful word and phrase, 
regardless of languages. For example, the 20,530th keyword (the number was retrieved 
from the 378 random numbers) in this study was “the space syntax methodology 翻
译” where “syntax methodology” and “翻译” were both regarded as valid words and 
compared against the metadata. 

Although DSpace provides full-text indexing for PDF files,34 this research did not 
include or analyze the text content in associated PDF files. Moreover, subfields of quali-
fied metadata element were regarded as a main element. For example, an abstract of a 
digital item in the metadata element dc.description.abstract received examination and 
comparison against the keywords and was recorded in the result as the dc.description.

After visiting each landing page and completing metadata-keyword comparison, the 
researcher recorded the results in an Excel table for further data sorting, calculating, 
and figure drawing. By doing this, the researcher was able to identify which metadata 
elements contained values that were matched with keywords and to calculate the 
matching rate of each matched metadata element. 

Results
The researcher visited each of 378 keywords in Google Analytics and associated land-
ing pages in the digital repository. Of the samples, 377 were valid keywords associated 
with valid landing pages, meaning that these keywords, searched by the Internet search 
engines, matched values of certain metadata elements in the digital repository and re-
sulted in visits to the associated landing pages. One keyword, however, was associated 
with an invalid URL directing to a page containing “Page Not Found” information. 
It is possible that this particular digital item was deleted from the digital repository 
after the visit was recorded by Google Analytics. However, one invalid sample in a 
sample size of 378 for the population 22,559 did not significantly affect the margin 
of error. The result of the research was still sound. Table 2 summarizes the corrected 
information of the research.

Examining the 377 keyword samples, the research identified that in total six Dublin 
Core metadata elements and two VRA Core metadata elements contained values that 
matched with the keywords from the Internet search engines. As shown in table 3, six 

TABLE 2
Corrected Parameters of the Research

Keyword Population 22,559
Sample Size 378
Valid Samples (Keywords with Valid URLs) N=377
Invalid Samples (Keywords with Invalid URLs) 1
Confidence Level 95%
Margin of Error (Confidence Interval) ±5%
Standard of Deviation 0.5



Metadata Effectiveness in Internet Discovery  13

Dublin Core metadata elements were Title, Creator, Contributor, Description, Subject, 
and Identifier. Two VRA Core metadata elements were Agent and Location.

Some of the metadata elements matched with the keywords by the Internet search 
engine more frequently than others. Some keywords found the matched values in only 
one metadata element while some keywords matched with multiple ones. Regardless 
of the overlap matching, Title was the metadata element that had the most keyword 
matches. Out of the sample, 279 (74.01%) keywords matched the values of the dc.title 
(see table 3). The findings indicate that most keywords that were searched through the 
Internet search engines discovered the matched values of Title element in the digital 
repository and resulted in visits to the relevant items. 

As shown in table 3, the metadata element bringing the second most landing visits 
from the Internet search engines was Description, which received a 208 (55.17%) key-
word matches out of 377. Subject element matched its values with 79 keywords (20.95%). 

Not many author name–related keywords brought landing visits from the Internet 
search engine, according to table 3. The Creator element only had 13 matches, and 
the Contributor element only had five. The Identifier element received two keyword 
matches, but the researcher noticed that these two values contained citation information 
with author names, which indeed matched the keywords. Two VRA Core elements 
had only three keyword matches in total, two of which, however, had overlapping 
matches with the values in dc.title.

TABLE 3
Metadata Elements, Matched Keywords, & Matching Rate (N=377)

Metadata
Schema

Metadata
Element

XML Element in Digital
Repository

Matched
Keywords

Matching
Rate

DublinCore

Title dc.title 279 74.01%
Creator dc.creator 13 3.45%
Contributor dc.contributor 5 1.33%
Description dc.description 208 55.17%
Subject dc.subject 79 20.95%
Identifier dc.identifier 2 0.53%

VRACore Agent vra.workagent 2 0.53%
Location vra.worklocation 1 0.27%

FIGURE 2
Frequency of Metadata Elements Discovering Matched Keywords (N=377) 

dc.�tle dc.descr ip�on dc.subject dc.cr eator dc.contributor dc.iden�fier vr a.workagent
vra.workloca�o

n
Matched Keywords 279 208 79 13 5 2 2 1
Matching Rate (N=377) 74.01% 55.17% 20.95% 3.45% 1.33% 0.53% 0.53% 0.27%

279

208

79

13
5

2 2 1

74.01%

55.17%

20.95%

3.45% 1.33% 0.53% 0.53% 0.27%
0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

0

50

100

150

200

250

300

Matchi ng RateMatched Keywords



14  College & Research Libraries January 2016

Figure 2 visually illustrates how frequently each metadata matched values with 
the keywords from Internet search engines. As seen in figure 2, three metadata ele-
ments, dc.title, dc.description, and dc.subject, contained the most matched values 
with keywords from Internet search engines and caused most landing visits to the 
digital repository. The findings indicate that these three Dublin Core metadata ele-
ments played significant roles in facilitating discovery of digital collections through 
the Internet search engine.

For the researcher to draw a more precise conclusion on the findings, table 4 pres-
ents how each metadata element performed in the study by considering parameters 
and deviations. Regarding Dublin Core’s Title element, for example, the researcher is 
now 95 percent certain that, among the total search engine traffic to Texas Tech Uni-
versity Libraries’ digital repository, between 69.62 percent (74.01% – 4.39%) and 78.40 
percent (74.01% + 4.39%) of keywords from Internet search engines discovered the 
matched values in the metadata element dc.title in the digital repository and resulted 
in landing visits.

Similar conclusions can also be made for the rest of metadata elements. With a confi-
dence interval of 95 percent, between 50.19 and 60.15 percent of keywords from Internet 
search engines discovered matched values in the metadata element dc.description; and 
between 16.88 and 25.02 percent of keywords did so in dc.subject. Table 4 also shows 
that the range for dc.creator was 3.45 percent (±1.83%) and for dc.contributor was 1.33 
percent (±1.15%). The rest of the three had a small enough matching rate that should 
be confident enough, so that a confidence interval was not applicable. 

The analysis between metadata elements and keywords so far has not removed 
the overlapping matches from the calculations. The researcher also found that some 
keywords had matched values in multiple metadata elements, while some keywords 
matched only one metadata element or even none (see table 5).

According to table 5, there were ten keywords that did not match values of any 
metadata elements. With the question of how these searches resulted in landing visits, 
the researcher found that they were actually referred from Google Scholar and those 
keywords contained a combination of digits and letters, such as the 17,461th keyword 
“related:zsti6noyraqj:scholar.google.com/.” These visits referred by Google Scholar 
ideally should be regarded as Referral Traffic35 defined in Google Analytics. However, 
Google Analytics mistakenly regarded them as organic search engine traffic.

TABLE 4
Deviation of Each Metadata Element (N=377; Confidence Level=95%; 

Z=1.96; Population=22,559)
Metadata Element Matched Keywords Matching Rate Margin of Error

dc.title 279 74.01% ±4.39%

dc.description 208 55.17% ±4.98%

dc.subject 79 20.95% ±4.07%

dc.creator 13 3.45% ±1.83%

dc.contributor 5 1.33% ±1.15%

dc.identifier 2 0.53% N/A

vra.workagent 2 0.53% N/A

vra.worklocation 1 0.27% N/A



Metadata Effectiveness in Internet Discovery  15

More than half of keywords (51.72%, 195 keywords) found matched values at least 
in one metadata element, 123 keywords (32.63%) in two metadata elements, and 48 
of them in three. Table 5 also shows that the keyword from the Internet search engine 
matched at most four metadata element values in the digital repository. To find out 
how well each metadata element performed in keyword matching, either by the ele-
ment itself or combining with others, this research looked into more detailed analysis 
on the findings.

Table 6 shows that, among the 195 keywords that discovered matched values in 
only one metadata element, the Dublin Core element dc.title had the most matched 
keywords, with 112 (57.44% of 195 and 29.71% of 377). The finding evidenced that 
the Title element of Dublin Core by itself, regardless of the overlapping matches in 
table 3, played a critical role in attracting keywords from Internet search engines. 
The Dublin Core element dc.description received the second most matched key-
words, with 65 (33.33%; 17.24%). These two Dublin Core elements were the most 
effective metadata elements in facilitating discovery when removing the influence 
of overlapping match. 

Of particular note is the Dublin Core Subject element dc.subject. Comparing to 
table 3, where dc.subject had 79 matched keywords (20.95% of 377) due to overlapping 
matches, the same metadata element only matched three keywords (1.54%; 0.80%) by 
itself. For complete information of all metadata elements, see table 6.

TABLE 5
Number of Metadata Elements & Matched Keywords (N=377)

Number of Metadata Elements Matched Keywords Matching Rate

0 10 2.65%

1 195 51.72%

2 123 32.63%

3 48 12.73%

4 1 0.27%

TABLE 6
1 Metadata Element & Matched Keywords (n=195; N=377)

Metadata Element Matched 
Keywords

Matching Rate 
(n=195)

Matching Rate 
(N=377)

dc.title 112 57.44% 29.71%

dc.creator 9 4.62% 2.39%

dc.contributor 5 2.56% 1.33%

dc.description 65 33.33% 17.24%

dc.subject 3 1.54% 0.80%

dc.identifier 0 0.00% 0.00%

vra.workagent 1 0.51% 0.27%

vra.worklocation 0 0.00% 0.00%



16  College & Research Libraries January 2016

Figure 3 presents a visualized chart for an overview of how each metadata element 
independently performed with matched keywords. Comparing with figure 2, the dif-
ference demonstrated that, when taking the overlapping match issue out from analysis, 
metadata elements dc.title and dc.description were still of significance in facilitating 
discovery of digital items through Internet search engines. 

Table 7 presents information of two metadata elements in combination finding 
matched keywords. Metadata elements dc.title, dc.description, and dc.subject were 
still key factors in facilitating discovery because these three metadata elements, 
combining with each other, had the most matched keywords from Internet search 
engines.

Table 8 displays information of three metadata elements in combination finding 
matched keywords. The data show that metadata elements dc.title, dc.description, 
and dc.subject, combining together, had the most matched keywords from Internet 
search engines.

Table 9 records the only keyword that discovered matched values from four metadata 
elements. Not surprisingly, metadata elements dc.title, dc.description, and dc.subject, 
again, were all in here.

FIGURE 3
1 Metadata Element & Matched Keyword (n=95 ; N=377)

dc.�tle dc.descr ip�on dc.cr eator dc.contributor dc.subject vra.workagent dc.iden�fier vra.workloca�on
Matched Keywords 112 65 9 5 3 1 0 0
Matching Rate ( n = 195 ) 57.44% 33.33% 4.62% 2.56% 1.54% 0.51% 0.00% 0.00%
Matching Rate ( N = 377 ) 29.71% 17.24% 2.39% 1.33% 0.80% 0.27% 0.00% 0.00%

112

65

9 5
3 1 0 0

57.44%

33.33%

4.62% 2.56% 1.54% 0.51% 0.00% 0.00%

29.71%

17.24%

2.39% 1.33% 0.80% 0.27% 0.00% 0.00% 0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

0

20

40

60

80

100

120

Matchi ng RateMatched Keywords

TABLE 7
2 Metadata Elements & Matched Keywords (n=123;N=377)

Metadata 
Element

And

Metadata 
Element

Matched 
Keywords

Matching 
Rate (n=123)

Matching Rate 
(N=377)

dc.title dc.description 89 72.36% 23.61%

dc.title dc.subject 26 21.14% 6.90%

dc.title dc.creator 1 0.81% 0.27%

dc.title vra.workagent 1 0.81% 0.27%

dc.title vra.worklocation 1 0.81% 0.27%

dc.description dc.subject 5 4.07% 1.33%



Metadata Effectiveness in Internet Discovery  17

Discussion
The research results show that the randomly selected keywords, searching from Inter-
net search engines, discover the matched values in six Dublin Core metadata elements 
and two VRA Core metadata elements in the digital repository. The six Dublin Core 
metadata elements are Title, Description, Subject, Creator, Contributor, and Citation; 
and VRA Core metadata elements are Agent and Location. Among these eight metadata 
elements, Dublin Core elements Title, Description, and Subject contain values that have 
most matched keywords and result in most landing visits to the digital repository. The 
finding indicates that these three metadata elements play the most significant roles in 
facilitating discovery of digital items by Internet search engines. 

The Dublin Core metadata element dc.title in total has 279 matched keywords and 
a 74.01 percent matching rate. Considering the margin of error, it is 95 percent possible 
that, among the total search engine traffics to the digital repository, between 69.62 and 
78.40 percent of keywords from the Internet search engines search against the value 
of dc.title and result in landing visits. Similar conclusions can be drawn for another 
two Dublin Core metadata elements, Description and Subject. The range of keyword 
matching rate for dc.description is between 50.19 and 60.15 percent, and for dc.subject 
the range of matching rate is between 16.88 and 25.02 percent.

Dublin Core elements Creator and Contributor seem to play minor roles in facilitat-
ing discovery. In the same confidence level 95 percent, the range of matching rate for 
metadata element dc.creator is between 1.62 and 5.28 percent, and for dc.contributor 
is between 0.18 and 2.48 percent. Although the Dublin Core element Identifier has 
two matched keywords, the two metadata values contain citation information and the 
matched keywords are the author names. Moreover, the matching rate is too small 
for a confidence interval, which means it is confident enough that keywords finding 
matched values in the citation element should be rare. 

If removing the overlapping match issues from analysis, however, and considering 
only the independence matching rate for each metadata element, circumstances become 
slightly different. Metadata element dc.subject independently only has three matched 
keywords and the matching rate (1.54%; 0.80%) is lower than dc.creator (4.62%; 2.39%) and 
dc.contributor (2.56%; 1.33%) (see figure 3). Granted, the Subject metadata element cannot 
be ignored because it still has up to 25.02 percent (see figure 2), combining with other ele-
ments, of matched keywords from Internet search engines. The two Dublin Core metadata 
elements, dc.title and dc.description, consistently have significantly higher matching rates 
than others (57.44%; 29.71% for dc.title and 33.33%; 17.24% for dc.description).

Conclusion
The functionality of metadata for facilitating discovery of online resources has been a 
key issue in the discussion of metadata quality evaluations. A few early studies were 
conducted for testing the correlation between metadata of Web resources and Internet 
search engines, but these studies yield contrasting conclusions. Although few quality-
ranking studies of metadata are found, no research is found in the literature regarding 
effectiveness of metadata between digital collections and Internet search engines. 

To learn the answer to which metadata elements in digital collections are effectively 
facilitating the discovery of the resources through Internet keyword searching, and 
how much significance each metadata element play in this role, the researcher designed 
a method for the research. By retrieving data from Texas Tech University Libraries’ 
digital repository in DSpace, the researcher determined an ideal sample size for rep-
resenting the whole population of organic search keywords and search engine traffic. 
The researcher then compared each randomly selected keyword against the associated 
digital item’s metadata value to determine the matching rate for metadata elements.



18  College & Research Libraries January 2016

Based on the results, it can be concluded that three most significant metadata ele-
ments in enhancing discovery of digital repository items through Internet search en-
gines are Title, Description, and Subject, which in this case are Dublin Core metadata 
elements dc.title, dc.description, and dc.subject. Research results remind librarians 
that, when implementing metadata with digital items, they should pay specific at-
tention to these three metadata elements if the institution aims to attract more traffic 
from Internet search engines. For example, a metadata librarian can try to input more 
detailed and precise information in the Description fields, which will enhance the 
discoverability of the digital items on search engines. Moreover, the results can also 
lead to a reclassification within metadata schemas to group metadata elements based 
on their functionality. Based on the findings, for example, scholars can categorize the 
three Dublin Core metadata elements into a searchability or discoverability group. 
Reclassifying the metadata elements based on the functionality can help professionals 
better understand the function of each specific element while implementing metadata. 

Academic libraries have invested much time and effort in developing digital collec-
tion and institutional repositories, and having insight into how these digital assets are 
discovered by search engines is necessary and helpful. The findings of this research 
fills the blank of the effectiveness study of metadata’s facilitating discovery of digital 
repository items through Internet search engines and contributes to the literature 
of metadata quality and metadata evaluations by pointing out a new direction for 
relevant research. Further studies, however, are still needed to rectify the limitations 
of this research. 

The data for the study is retrieved only from one digital repository. Further research 
needs to be conducted on more digital repositories to see if a similar conclusion can be 
reached. A more specific analysis on targeted metadata elements can also be conducted in 
the future research to reach a more precise finding. For example, researchers can design 
a specific study to include designated keywords and metadata element values, so as to 
test searching keywords and to observe if the keywords search against and navigate to 
the targeted metadata elements. A comparison study can also be carried out to test the 
correlation of same values in different metadata elements. In doing so, the researcher can 
observe if retrievability is affected when designated metadata value is included in one 
metadata element instead of the other. Both non-PDF digital items and PDF-included 
digital items are needed in comparison study as well, to gauge the effects of PDF full-
text indexing features provided by the digital repository system and the search engines. 

Notes

 1. Ann Windnagel, “The Usage of Simple Dublin Core Metadata in Digital Math and Science 
Repositories,” Journal of Library Metadata 14, no. 2 (2014): 77–102.

 2. Diane I. Hillmann, “Metadata Quality: From Evaluation to Augmentation,” Cataloging & 
Classification Quarterly 46, no. 1 (2008): 65–80; R.J. Robertson, “Metadata Quality: Implications for 
Library and Information Science Professionals,” Library Review 54, no. 5 (2005): 295–300.

 3. Matthew Beacom, Crossing a Digital Divide: AACR2 and Unaddressed Problems of Networked 
Resources, available online at http://eric.ed.gov/?id=ED454861 [accessed 15 August 2014]; Ann 
Huthwaite, “AACR2 and Its Place in the Digital World: Near-Term Solutions and Long-Term 
Direction” (Nov. 2000), available online at http://files.eric.ed.gov/fulltext/ED454865.pdf [ac-
cessed 15 August 2014]; Amy K. Weiss and Timothy V. Carstens, “The Year’s Work in Cataloging, 
1999,” Library Resources & Technical Services 45, no. 1 (2001): 47–58; Mehdi Safari, “Search Engines 
and Resource Discovery on the Web: Is Dublin Core an Impact Factor?” Webology 2, no. 2 (2005), 
available online at www.webology.org/2005/v2n2/a13.html [accessed 15 August 2014].

 4. Safari, “Search Engines and Resource Discovery on the Web.”
 5. Jung-Ran Park, “Metadata Quality in Digital Repositories: A Survey of the Current State 

of the Art,” Cataloging & Classification Quarterly 47, no. 3/4 (2009): 213–28.



Metadata Effectiveness in Internet Discovery  19

 6. F.O. Isinkaye, A.B.C. Robert, and B.A. Ojokoh, “An Evaluation of Metadata Integrity in 
Textual Documents,” Journal of Library Metadata 12, no. 1 (2012): 1–14.

 7. Windnagel, “The Usage of Simple Dublin Core Metadata,” 77–78.
 8. Kathleen Burnett, Kwong Bor Ng, and Soyeon Park, “A Comparison of the Two Traditions 

of Metadata Development,” Journal of the American Society for Information Science 50, no. 13 (1999): 
1209–17.

 9. Kuang-Hwei Lee–Smeltzer, “Finding the Needle: Controlled Vocabularies, Resource Dis-
covery, and Dublin Core,” Library Collections, Acquisitions, and Technical Services 24, no. 2 (2000): 
205–15.

 10. National Information Standards Organization, “Understanding Metadata” (Bethesda, Md.: 
NISO Press, 2004), available online at www.niso.org/publications/press/UnderstandingMetadata.
pdf [accessed 15 August 2014].

 11. Hillman, “Metadata Quality,” 67.
 12. William E. Moen, Erin L. Stewart, and Charles R. McClure, “The Role of Content Analysis 

in Evaluating Metadata for the U.S. Government Information Locator Service (GILS): Results 
from an Exploratory Study” (1997), available online at http://digital.library.unt.edu/ark:/67531/
metadc36312/ [accessed 15 August 2014].

 13. Thomas R. Bruce and Diane I. Hillmann, “The Continuum of Metadata Quality: Defining, 
Expressing, Exploiting,” in Metadata in Practice, eds. D. Hillmann and E.L. Westbrooks (Chicago: 
American Library Association, 2004).

 14. Park, “Metadata Quality in Digital Repositories,” 215.
 15. Sevim McCutcheon, “Keyword vs Controlled Vocabulary Searching: The One with the 

Most Tools Wins,” The Indexer 27, no. 2 (2009): 62–65.
 16. Tyler E. Phelps, “An Evaluation of Metadata and Dublin Core Use in Web-based Re-

sources,” Libri 62, no. 4 (2012): 326–35.
 17. Warwick Cathro, “Metadata: An Overview” (1997), available online at www.nla.gov.au/

openpublish/index.php/nlasp/article/view/1019/1289 [accessed 15 August 2014].
 18. Stuart Weibel, Jean Godby, Eric Miller, and R. Daniel, “OCLC/NCSA Metadata Workshop 

Report” (1995), available online at http://ci.nii.ac.jp/naid/10011891419 [accessed 15 August 2014].
 19. NISO, “Understanding Metadata.”
 20. Ininkaye, Robert, and Ojokoh, “An Evaluation of Metadata Integrity in Textual Documents.”
 21. Safari, “Search Engines and Resource Discovery on the Web.”
 22. Ibid.
 23. Thomas P. Turner and Lise Brackbill, “Rising to the Top: Evaluating the Use of the HTML 

Meta Tag to Improve Retrieval of World Wide Web Documents through Internet Search En-
gines,” Library Resources & Technical Services 42, no. 4 (1998): 258–71.

 24. Jin Zhang and Alexandra Dimitroff, “Internet Search Engines’ Response to Metadata Dublin 
Core Implementation,” Journal of Information Science 30, no. 4 (2004): 310–20.

 25. Ibid.
 26. Robin Henshaw and Edward J. Valauskas, “Metadata as a Catalyst: Experiments with 

Metadata and Search Engines in the Internet Journal, First Monday,” Libri 51, no. 2 (2001): 86–101.
 27. Safari, “Search Engines and Resource Discovery on the Web.”
 28. Ibid.
 29. Yen Bui and Jung-Ran Park, “An Assessment of Metadata Quality: A Case Study of the 

National Science Digital Library Metadata Repository” (2006), available online at www.academia.
edu/download/31045967/bui_2006.pdf [accessed 15 August 2014].

 30. Phelps, “An Evaluation of Metadata and Dublin Core Use in Web-based Resources.”
 31. Windnagel, “The Usage of Simple Dublin Core Metadata.”
 32. Google, “Traffic Source Dimensions,” available online at https://support.google.com/

analytics/answer/1033173?hl=en [accessed 15 August 2014].
 33. Random Integer Generator, available online at www.random.org/integers/?num=378&m

in=1&max=22559&col=10&base=10&format=html&rnd=new [accessed 15 August 2014].
 34. DuraSpace, “Configure Full Text Indexing,” available online at https://wiki.duraspace.org/

display/DSPACE/Configure+full+text+indexing [accessed 15 August 2014].
 35. Google, “Referral Traffic,” available online at https://support.google.com/analytics/

answer/1247839?hl=en&ref_topic=1631856 [accessed 15 August 2014].