Seeing through Ontologies

EDITORIAL BOARD THOUGHTS

Seeing through Vocabularies
Kevin Ford

INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2020
https://doi.org/10.6017/ital.v39i2.12367

Kevin Ford (kevinford@loc.gov) is Librarian, Linked Data Specialist in the Library of Congress’s
Network Development and MARC Standards Office. He works on the Library’s Bibframe Initiative,
and similar projects, such as MADS/RDF, and is a member of the ITAL Editorial Board. The ideas
and opinions expressed here are those of the author and do not necessarily reflect those of his
employer.

“Ontologies” are popular in library land. “Vocabularies” are popular too, but it seems that the
library profession prefers “ontologies” over “vocabularies” when it comes to defining classes and
properties that attempt to encapsulate some realm of knowledge. Bibframe, MADS/RDF, BIBO,
PREMIS, and FRBR are well-known “ontologies” in use in the library community.1 They were
defined either by librarians or to be used mainly in the library space, or both. SKOS, FOAF, Dublin
Core, and Schema are well known “vocabularies.”2 They are used widely by libraries though none
were created by librarians or specifically for library use. In all cases, those ontologies and
vocabularies were created for the very purpose of publication for broader use, which is one of the
primary objectives behind creating one: to define a common set of metadata elements to facilitate
the description and sharing of data within a group or groups of users.

Ontologies and vocabularies are common when working with RDF (Resource Description
Framework), a very simple data model in which information is expressed as a series of triple
statements, each consisting of three parts: a subject, a predicate, and an object. The types of
ontologies and vocabularies referred to here are in fact defined using RDF—Thing A is a Class and
Thing Z is a Property. Those using any given ontology or vocabulary employ the defined classes
and properties to further describe their Things, for a lack of a better word.

It is useful to provide an example. The first block of triples below represents Class and Property
definitions in RDF Schema (RDFS), which provides some very basic means to define classes and
properties and some relationships between them, such as the domains and ranges for properties.
The second block is instance data.

ontovoc:Book rdf:type rdfs:Class

ontovoc:authoredBy rdf:type rdf:Property

ontovoc:authorOf rdf:type rdf:Property

ex:12345 rdf:type ontovoc:Book

ex:12345 ontovoc:authoredBy ex:abcde

ontovoc:Book is defined as a Class and ontovoc:authoredBy is defined as a Property. Using
those declarations, it is possible to then assert that ex:12345, which is an identifier, is of type
ontovoc:Book and was authored by ex:abcde, an identifier for the author. Is the first block—
the definitions—an “ontology” or a “vocabulary?” Putting aside the question for now, air quotes—
in this case literal quotes—have been employed around “ontologies” and “vocabularies” to suggest
that these are more terms of art than technical distinctions, though it must also be acknowledged
that there is a technical distinction to be made.

mailto:kevinford@loc.gov

INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020

SEEING THROUGH VOCABULARIES | FORD 2

Ontologies in the RDF space frequently, if not always, use classes and properties from the Web
Ontology Language (known as OWL) to define a specific realm’s classes and properties and how
they relate to each other within that realm of knowledge. This is because OWL is a more
expressive definition language than basic RDFS. Using OWL, and considering the example above,
ontovoc:authoredBy could be defined as an inverse of ontovoc:authorOf.

ontovoc:authoredBy owl:inverseOf ontovoc:authorOf

In this way, and given the little instance data above (the two triples that begin ex:12345), it is
then possible to infer the following bit of knowledge:

ex:abcde ontovoc:authorOf ex:12345

Now that the owl:inverseOf triple/declaration has been added to the definitions, it’s worth re-
asking: Do the definitions represent an “ontology” or a “vocabulary?”

A purist might answer “not an ontology,” but only because those statements have not been
combined in a document, which itself has been given a URI and declared to be an owl:Ontology.
That’s the actual OWL Class that says, “This is an OWL Ontology.” But let’s say those statements
had been added to a document published at a URI and declared to be an owl:Ontology. Is it an
ontology now? Perhaps in a strict sense the answer is “yes.” But in a practical sense few would
view those four declarations, wrapped neatly in a document that has been given a URI and called
an Ontology, as an “ontology.” It doesn’t quite rise to the occasion—“ontologies” almost always
have a broader scope and employ more formal semantics—making its use a term of art, often,
rather than a real technical distinction.

Yet, based on the same narrow definition (a published document declaring itself to be an
OWL:Ontology) combined with a far more extensive set of class and property definitions with
defined relationships between them, it is possible to describe FOAF as an ontology.3 But it is
widely known as, and understood as, a “vocabulary.” (There is also an experimental version of
Schema as OWL.4)

And that gets to the crux of the issue in many ways. Putting aside the technical distinction that can
be argued to identify something as an “ontology” versus a “vocabulary,” there are non-technical
semantics at work here—what was earlier described as a “term of art”—about when, how, and
why something is deemed an “ontology” versus a “vocabulary.” The library community appears to
think of their creations as “ontologies” and not “vocabularies,” even when the documentation
tends to avoid the word “ontology.” For example, the opening sentence of the Bibframe and
MADS/RDF documentation very clearly introduces each as a “vocabulary,” as does FRBR in RDF.5
On the surface they may be presented as “vocabularies,” which they are of course, but despite this
prominent self-declaration they are not seen in the same light as FOAF or Schema but instead as
something more exacting, which they also are. It is worth contemplating why they are viewed
principally as “ontologies” and to examine whether this has been beneficial. Perhaps the ideas
behind designating something a “vocabulary” are, in fact, more in line with the way libraries
operate, whereas “ontologies” represent an ideal (and who doesn’t set their sights on the ideal?),
striving toward which only exposes shortcomings and sows confusion.

The answer to “why” is historical and probably derives from a combination of lofty thinking,
traditional standards practices, and good ol’ misunderstanding. Traditional standards practices
favor more formal approaches. Libraries’ decades-long experience with XML and XML Schema

INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020

SEEING THROUGH VOCABULARIES | FORD 3

contributed significantly to this mindset. XML Schema provides a way to describe the precise
construction of an XML document and it can then be used to validate the XML document. XML
Schema defines what elements and attributes are permitted in the XML document and frequently
dictates their order. It can further constrain the values of an element or attribute to a select list of
options. In many ways, XML Schema was the very expression of metadata quality control.
Librarians swooned. With the right controls and technology in place, it was impossible to produce
poor, variable metadata.

In the case of semantic modelling, OWL is certainly a more formal approach. It’s founded in
description logics whose expressions take the form of occult-like mathematics, at least as viewed
by a librarian with a humanities background. OWL can be used to declare domains and ranges for
properties. One can also designate a property as a Datatype Property, meaning it takes a value
such as a string or a date, as its value, or an Object Property, which means it will reference another
RDF resource as its object. But these declarations are actually more about inferencing—deriving
information by applying the ontology against some instance data—and not about restrictions,
constraints, or validation. To be clear, there are ways to apply restrictions in OWL—“wine can be
either red or white”—but this is a form of advanced OWL modelling that is not well understood
and not often implemented, and virtually never in ontologies designed by librarians. Conversely,
indicating a domain for a property, for example, is easy, relatively straightforward, and seductive
because it gives the appearance that the property can only be used with resources of a specific
class. Consider: The domain of ontovoc:authoredBy is ontovoc:Book. That does not mean

that the ontovoc:authoredBy can only be used with a ontovoc:Book resource. It means that
whatever resource uses ontovoc:authoredBy must therefore be a ontovoc:Book. Defining
that domain for that property is not restricting its use only to books; it allows one to derive the
additional knowledge that the thing it is used with must be a book even if it doesn’t identify itself
as one. This may seem like a subtle distinction and/or it may seem like tortured logic, but if it does
it may suggest that one’s point of view, one’s mindset, favors constraints, restrictions, and
validations.

And that’s OK. That’s library training and conditioning, completely reinforced in our daily work.
It’s what has been taught in library schools for decades and practiced by library professionals
even longer. Names should be entered “last name, first name” and any middle initial, if known,
included. The data in this field should only be a three-character language code from this approved
list of language codes. These rules and the consistency resulting from these rules are what make
library data so often very high quality. Google loves MARC records from our community for this
very reason.

Wishing to exert strong control at the definition level when creating a model or metadata scheme
with an eye to data quality, it is a natural inclination for librarians to gravitate to a more formal
means of defining a model, especially one that seems to promise constraints. So, despite these
models self-describing at a high-level as vocabularies, the models themselves employ a
considerable amount of OWL at the technical level, which becomes the focus of any users wishing
to implement the model. Users comprehend these models as something more than a vocabulary
and therefore view the model through this more complex lens. Unfortunately, because OWL is
poorly understood (sometimes by creators and sometimes by users, and sometimes by both), this
leads to various problems. On the one hand, creators and users believe there are technical
restrictions or constraints where there are, in fact, none. When this happens, the “constraint” is

INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020

SEEING THROUGH VOCABULARIES | FORD 4

either identified as a problem (“Consider removing the range for this property”) or—and this is
more damaging—the property (read: model/vocabulary/ontology) is avoided. Even when it is
recognized that the “constraint” is not a real restriction (just a means to infer knowledge), forging
ahead can generate new issues. When faced with a domain and range declaration, for example,
forging ahead can result in inaccurate, imprecise, or simply undesirable inferences. Most of the
currently open “issues” (about 50 at the time of writing) about Bibframe follow a basic pattern: 1)
there is a declaration about this Property or this Class that makes it difficult to use because of how
it has been defined with OWL; 2) we cannot really use it presently because it would cause
potential inferencing issues; 3) consider altering the OWL definitions.6 Pursuing an (OWL)
ontology, while formal and seemingly comforting because it feels a little like constraining the
metadata schema, can result in confusion and a lack of adoption. Given that vocabularies and
ontologies are developed and published to encourage users to describe their data in a way that
fosters wide consumption by others, this is unfortunate to say the least.

It is notable that SKOS, FOAF, Dublin Core, and Schema have very different scopes and potentially
much wider user bases than the more library-specific ontologies (Bibframe, MADS/RDF, BIBO,
etc.). There is something to be learned here: the smaller the domain, the more effective an
ontology might be; the larger the universe, a more general approach may be better. It is further
true that FOAF, Dublin Core, and Schema define specific domains and ranges for many of their
properties, but they have strived for clarity and simplicity. The creators of Schema, for example,
eschewed the formal semantics behind RDFS and OWL and redefine domain and range to better
match their needs and (perhaps unexpectedly) most users’ automatic understanding.7 What is
generally true is that each of the “vocabularies” approached the creation and defining of their
models so as to minimize the use of formal semantics, and promoted this as a feature. In this way,
they limited or removed altogether the actual or psychological barriers to adoption. Their offering
was more accessible, less fussy. Bearing in mind the differences in scale and scope, they have been
rewarded with a wider adopter base and passionate advocates.

The decision to create a “vocabulary” or an “ontology” is a technical one and a political one, both of
which must be in alignment. It’s a mindset and it is a statement. It is entirely possible to define the
model at a technical level using OWL, making it by definition an ontology, but to have it be
perceived, and used, as a vocabulary because it is flexible and not strictly defined. Likewise, it is
not enough to call something a vocabulary, but in reality be a model burdened with formal
semantics that is then expected to be adopted and used widely. If the objective is to fashion a
(pseudo?) restrictive metadata set with rules that inform its use, and which is strongly bonded
with a specific community, develop an “ontology,” but recognize that this may result in confusion
and lack of uptake. If, however, the desire is to cultivate a metadata element set that is flexible,
readily useable, and positioned to grow in the future because it employs fewer rules and formal
semantics, create a “vocabulary.” That’s really what is being communicated when we encounter
ontologies and vocabularies. Interestingly, the political difference between “vocabulary” and
“ontology” appears, in fact, to be understood by librarians: library models self-identify as
“vocabularies.” But once past those introductory remarks, the truth is exposed quickly in the
widespread use of OWL, revealing beyond doubt that it is not a flexible, accommodating
vocabulary but a strictly defined model. To dispense with the air quotes: as librarians we’re
creating ontologies and calling them vocabularies. We really want to be creating vocabularies that
are ontologies in name only.

INFORMATION TECHNOLOGY AND LIBRARIES JUNE 2020

SEEING THROUGH VOCABULARIES | FORD 5

ENDNOTES

1 “Bibframe Ontology,” Library of Congress, accessed May 21, 2020,
http://id.loc.gov/ontologies/bibframe.html; “MADS/RDF (Metadata Authority Description
Schema in RDF),” Library of Congress, accessed May 21, 2020,
http://id.loc.gov/ontologies/madsrdf/v1.html; “Bibliographic Ontology Specification,” The
Bibliographic Ontology, accessed May 21, 2020, http://bibliontology.com/; “PREMIS 3
Ontology,” Premis Editorial Committee, accessed May 21, 2020,
http://id.loc.gov/ontologies/premis3.html; Ian Davis and Richard Newman, “Expression of
Core FRBR Concepts in RDF,” accessed May 21, 2020, https://vocab.org/frbr/.

2 Alistair Miles and Sean Bechhofer, editors, “SKOS Simple Knowledge Organization System
Reference,” W3C, accessed May 21, 2020, https://www.w3.org/TR/skos-reference/; Dan
Brickley and Libby Miller, “FOAF Vocabulary Specification 0.99,” accessed May 21, 2020,
http://xmlns.com/foaf/spec/; “DCMI Metadata expressed in RDF Schema Language,” Dublin
Core™ Metadata Initiative, accessed May 21, 2020,
https://www.dublincore.org/schemas/rdfs/; “Welcome to Schema.org,” Schema.org, accessed
May 21, 2020, http://schema.org/.

3 “FOAF Ontology,” xmlns.com, accessed May 21, 2020, http://xmlns.com/foaf/spec/index.rdf.

4 See “OWL” at “Developers,” schema.org, accessed May 21, 2020,
https://schema.org/docs/developers.html.

5 See “Bibframe Ontology” and “MADS/RDF (Metadata Authority Description Schema in RDF)”
above.

6 “Issues,” Bibframe Ontology at GitHub, accessed 21 May 2020,
https://github.com/lcnetdev/bibframe-ontology/issues.

7 R.V. Guha, Dan Brickley, and Steve Macbeth, “Schema.org: Evolution of Structured Data on the
Web,” acmqueue 15, no. 9 (15 December 2015): 14,
https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1.

http://id.loc.gov/ontologies/bibframe.html
http://id.loc.gov/ontologies/madsrdf/v1.html
http://bibliontology.com/
http://id.loc.gov/ontologies/premis3.html
https://vocab.org/frbr/
https://www.w3.org/TR/skos-reference/
http://xmlns.com/foaf/spec/
https://www.dublincore.org/schemas/rdfs/
http://schema.org/
http://xmlns.com/foaf/spec/index.rdf
https://schema.org/docs/developers.html
https://github.com/lcnetdev/bibframe-ontology/issues
https://dl.acm.org/ft_gateway.cfm?id=2857276&ftid=1652365&dwn=1

ENDNOTES