Teaching Linked Open Data using Bibliographic Metadata


RESEARCH PAPER

CORRESPONDING AUTHOR:

Terhi Nurmikko-Fuller

Centre for Digital Humanities 
Research, Australian National 
University, Canberra, Australia

terhi.nurmikko-fuller@anu.
edu.au

KEYWORDS:
Linked Open Data; 
bibliographic metadata; 
pedagogy; participant 
evaluations

TO CITE THIS ARTICLE:
Nurmikko-Fuller, T. (2022). 
Teaching Linked Open Data 
using Bibliographic Metadata. 
Journal of Open Humanities 
Data, 8: 6, pp. 1–11. DOI: 
https://doi.org/10.5334/johd.60

Teaching Linked Open Data 
using Bibliographic Metadata

TERHI NURMIKKO-FULLER 

ABSTRACT
This paper describes LD4DH, the Linked Data for Digital Humanities: Publishing, Querying, 
and Linking on the Semantic Web workshop at the Digital Humanities Oxford Summer 
School. It includes a description of the general structure of the workshop, how it has 
changed over the course of the last seven years, between 2015 and 2021, and evaluates 
the differences between in-person delivery in 2018–2019 and the online mode in 2020–
2021. Discussion is centred on the description of the data as well as the illustration 
of the processes, methods, and software used throughout the workshop. The paper 
concludes with a summary of participant evaluation, and reflects on the opportunities 
and challenges of teaching Linked Open Data to a mixed cohort of predominantly 
Humanities researchers and professionals from the cultural heritage sector.

mailto:terhi.nurmikko-fuller@anu.edu.au
mailto:terhi.nurmikko-fuller@anu.edu.au
https://doi.org/10.5334/johd.60
https://orcid.org/0000-0002-0688-3006


2Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

1 INTRODUCTION
The Linked Data for Digital Humanities: Publishing, Querying, and Linking on the Semantic Web 
(henceforth, LD4DH1) workshop has formed part of the proceedings of the Digital Humanities 
Oxford Summer School (DHOxSS) since 2012. I am an alumna of the workshop myself, having 
attended it as a participant in its first iteration. At that time in my academic career, LD4DH was 
a space to acquire essential practical skills for implementing Linked Open Data (LOD). Having 
since become the convener and tutor of the same workshop, it has now become an annual 
highlight, an opportunity to fully immerse myself in the methodology, to discuss research, 
and to engage with diverse groups of researchers, academics, and GLAM (galleries, libraries, 
archives, and museums) sector professionals.

The workshop aims to provide participants with an understanding of the theories behind LOD 
as an information publication paradigm, and then build on that foundation with practical and 
hands-on activities. This deliberate pedagogical structure is designed to reflect the insight 
that neither the use and implementation of digital methods, nor the critical evaluation of 
the projects and platforms that have been developed using those methods can be taught 
exclusively in abstract terms (Brier, 2012). In recognition of the role of collaboration and co-
authoring in digital humanities (DH) research (Needham & Haas, 2019), workshop participants 
are encouraged to work together and communicate openly as a group.

Since 2015, I have taught LD4DH with John Pybus and Graham Klyne, both from the Oxford 
e-Research Centre at the University of Oxford. The secret to our successful delivery of the 
workshop has not rested only on our friendship, or our common interest in LOD, but also because 
of our differences in interests, expertise, and academic backgrounds. This diversity within the 
tutor group enables us to discuss each topic from different perspectives. There is no guarantee 
of unanimous agreement and that gives the learners access to a greater diversity of ideas. We 
can thus more confidently cater for the needs and intellectual preferences of diverse cohorts.

In recognition of the challenges of the course, and the role that a pleasant and supportive 
learning environment can play in successful information and skills-acquisition (Imlawi & Gregg, 
2014), there has been a deliberate attempt to create a jovial and friendly atmosphere. Humour 
is used to promote openness between the teachers and the learners, and to make acronyms 
and concepts more memorable. An examples of this is theme-specific clothing, such as a pair 
of golden trousers worn in homage to the query language SPARQL2 (pronounced “sparkle”) and 
a skirt with owls, in reference to the Web Ontology Language (OWL).3

2 STRUCTURE OF THE WEEK
LD4DH is a five-day workshop. Each day follows the same three-session pattern, with a different 
topic: a 90-minute theory session followed by a two-hour practical one. The final hour of the  
day is a lecture by a guest speaker discussing the use of LOD in their area of research, often their 
own project (Table 1). Although there has been some flexibility to the list of speakers, in most 
years the topics covered numismatics (Prof Andrew Meadows, University of Oxford, speaking 
about Nomisma.org4), digital musicology (Dr Kevin Page, University of Oxford, reporting on a range 
of projects), and digital libraries (Prof Stephen Downie, University of Illinois Urbana-Champaign, 
summarising on the work of the HathiTrust Research Center5), as well as geolocation and digital 
mapping (Dr Valeria Vitale, now at the Alan Turing Institute in London, and Chair of the Pelagios 
Network6). Other colleagues who have contributed to LD4DH include Dr Daniel Bangert (talking 
about the JazzCats project7), Dr Paula Granados García, (from the Open University, who gave a 
summary of the experience of an alumna of the workshop), as well as Dr Athanasios Velios and 

1 The acronym is the name of the Slack channel (https://ld4dh-dhoxss.slack.com) and the Twitter hashtag 
(#LD4DH) for this workshop.

2 https://www.w3.org/TR/rdf-sparql-query/.

3 https://www.w3.org/OWL/.

4 http://nomisma.org/.

5 https://www.hathitrust.org/htrc.

6 https://pelagios.org/.

7 http://jazzcats.cdhr.anu.edu.au/.

https://doi.org/10.5334/johd.60
http://Nomisma.org
https://ld4dh-dhoxss.slack.com
https://www.w3.org/TR/rdf-sparql-query/
https://www.w3.org/OWL/
http://nomisma.org/
https://www.hathitrust.org/htrc
https://pelagios.org/
http://jazzcats.cdhr.anu.edu.au/


3Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

Prof Donna Kurtz, who have spoken on the topic of OxLOD8 and its precursor projects, such as 
CLAROS,9 respectively.

The common thread throughout all these diverse projects is the illustration of practical use 
of LOD in the Humanities. All speakers favour Open licensing for data and software, promote 
collaboration, and have developed tools that enable users to engage with it without the need 
to learn programming. These talks strongly support the philosophy of LD4DH, and serve to 
provide a complementary and enriching context for the learners.

Over the years, LD4DH has undergone several tweaks, rearrangements, and changes. The most 
recent version has been a response to the COVID-19 crisis, and the move to an entirely online 
delivery. The workshop became an hour-long lecture on the fundamentals of LOD for a general 
audience, followed an afternoon hands-on session for those who had opted to enrol in it. The 
latter is delivered by Dominic Oldman and Diana Tanase, both of the British Museum, using 
ResearchSpace.10 At the time of writing, expectations are high for an in-person event in 2022, 
which would see a return to the pre-COVID-19 mode of delivery of the LD4DH workshop.

3 DATA
Since 2016, the workshop has centred on the data of the ElePHãT11 (Early English Print in 
HathiTrust, Linked Semantic Worksets Prototype) project.12 This prototype (which was funded 
through the Andrew W. Mellon Foundation Workset Creation for Scholarship Analysis project 
award) combines bibliographic metadata from two very different types of collections: the 
behemoth HathiTrust Digital Library (HTDL), and the rather more boutique Early English Books 
Online Text Creation Partnership (EEBO-TCP13).

The aim of the ElePHãT project was to see whether two digital library collections (which at 
a distance appeared to share similarities, but on closer inspection had many idiosyncratic 
features) could be bridged at the metadata level (Page, Nurmikko-Fuller, Cole, & Downie, 2017). 
Both the HTDL and EEBO-TCP are aggregators: the EEBO-TCP contains information from some 
150 sources – the number of institutions (each of which contain a multitude of collections and 
sources) that form the HathiTrust is closer to 250. The considerable variation both within and 
between these two large projects is evident from the metadata.

Although the data for the ElePHãT project thus consisted of two aggregated datasets, for the 
purposes of the LD4DH workshop, the focus has been exclusively on the data from the EEBO-
TCP. The reason for this are two-fold: first, throughout the project, the HTDL data was modelled 
and provided by the HathiTrust Research Centre (HTRC) team, whilst the EEBO-TCP data was 
worked on by scholars at Oxford, meaning the team at Oxford had the opportunity to gain 

8 https://www.glam.ox.ac.uk/oxford-linked-open-data-pilot.

9 https://eng.ox.ac.uk/claros/.

10 http://researchspace.org/.

11 The project workset viewer is availabe at https://eeboo.oerc.ox.ac.uk/.

12 The project has been reported on by Page, K. and Willcox, P. in the 2015 project report, available from 
https://www.ideals.illinois.edu/bitstream/handle/2142/79017/ElEPH%C3%A3T%20final%20report-with_
appendix-20150615.pdf.

13 https://textcreationpartnership.org/tcp-texts/eebo-tcp-early-english-books-online/.

MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY

9:00–
10:00

Registration 9:00–
10:30

Ontologies 
(theory)

Producing RDF 
(theory)

SPARQL 
(theory)

British Museum 
(intro, theory)

10:30–
12:00

Introduction 
to LD4DH

11:00–
13:00

Ontologies 
(praxis)

Producing RDF 
(praxis)

SPARQL 
(praxis)

British Museum 
(praxis)

LUNCH

13:30–
15:30

Introduction 
to LOD

14:30–
15:30

Guest Speaker 
(Musicology)

Guest Speaker 
(Libraries)

Guest Speaker 
(Alumna)

Guest Speaker 
(Museums)

16:00–
17:00

Guest Speaker 
(Numismatics)

Table 1 Table illustrating 
the daily structure of the 
workshop. Each day consists 
of a theory session, a hands-
on session (praxis), and a talk 
by a guest speaker.

https://doi.org/10.5334/johd.60
https://www.glam.ox.ac.uk/oxford-linked-open-data-pilot
https://eng.ox.ac.uk/claros/
http://researchspace.org/
https://eeboo.oerc.ox.ac.uk/
https://www.ideals.illinois.edu/bitstream/handle/2142/79017/ElEPH%C3%A3T%20final%20report-with_appendix-20150615.pdf
https://www.ideals.illinois.edu/bitstream/handle/2142/79017/ElEPH%C3%A3T%20final%20report-with_appendix-20150615.pdf
https://textcreationpartnership.org/tcp-texts/eebo-tcp-early-english-books-online/


4Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

familiarity with that dataset, thus making it easier to work with in the context of the workshop. 
Second, of the HTDL data, 66% remains subject to copyright restrictions, limiting access and 
use (Jett, Nurmikko-Fuller, Cole, Page, & Downie, 2016). The EEBO-TCP data on the other hand 
consisted of 25,000 records which became publicly available in 2015.

The EEBP-TCP data has a number of idiosyncracies resulting from the combination of historical 
data and the processes of aggregation. This manifests in the dataset containing several 
categories for the same concept, e.g. discrete ID numbers and titles. As the data is derived 
from historical sources it contains a significant quantity of complex and messy details that do 
not comfortably sit with modern metadata categories (see for example column “Publisher” in 
Table 2, which provides a sample of the dataset).

Rather than provide the learners with the entire, rather complex TEI P5 XML14 files, a simplified .CSV 
version of the data has been used for the hands-on activities at LD4DH. These tabular datasets 
were initially generated as part of the workflow for the ElePHãT project using a set of Python 
scripts to pull out the data for author, publication place, publisher, date, six distinct ID numbers, 
and three separate titles. The various .CSV files, the Python script and the custom-build project 
ontology EEBOO are all available from the project Github page.15 It is worth noting that the data 
wrangling at the stage of generating the .CSV files did not involve any semantics. This is significant 
as the match between the modern metadata category and the data contained within it is not 
always exact. For example, the “Imprint” category (displayed in Table 2 as “Publisher”) contains 
a large degree of additional information about the historical printing process, its funding model, 
and even geographical location, as these details were recorded in the original historical text. An 
example of this is from the record A00648/STC 10783/ESTC S114801. The imprint data contains 
information about the individual carrying out the printing (“G.Eld”); the individual commissioning 
the print (“Roger Barnes”); the location of the shop that sold it (“S. Dunstans Church-yard”); 
and the name of the street of said shop (“Fleet Street”). Indeed, so rich is this information that 
in 2016, we carried out an investigation into the extraction of specific details from this data 
category using natural language processing (Khan, Nurmikko-Fuller, & Page, 2016).

Due to the dataset’s internal richnesss and diversity, many learners opt to engage in some 
degree of data wrangling themselves, although it is possible to complete the workflow process 
without an additional step of data tidying (beyond the minting of URIs). Many participants 

14 That is to say, information that was captured as XML, in adherence to the Text Encoding Initiative’s (or TEI’s) 
P5 guidelines. For more information about the TEI’s P5 guidelines, see https://tei-c.org/guidelines/p5/.

15 https://github.com/oerc-elephat/preprocessed-elephant.

AUTHOR PUBPLACE PUBLISHER DATE ID0 ID1 ID2

Fennor, 
William

London : Barnes, and are sold at his 
shop in S. Dunstans Church-
yard in Fleetstreet

1615 A00648 STC 10783 ESTC 
S114801

Bacon, Francis, 
1561–1626

London : Printed [by Richard Field] for 
Felix Norton and are to be 
sold in Pauls Church-yard at 
the signe of the Parrot

1604 A01003 STC 1111 ESTC 
S104433

Forser, Edward, 
1553?–1630

London : Printed by B. A[lsop] for 
Nathaniel Butter, and are to be 
sold at his shop, at the Pyed 
Bull, neere Saint Austens Gate

1624 A01075 STC 11189 ESTC 
S119405

Bacon, Francis, 
1561–1626

London : Printed by I. Okes, for Humphrey 
Mosley, at the Princes Armes 
in Pauls Church-Yard

1638 A01446 STC 1157 ESTC 
S100504

Anonymous Londini : In officina Iohannis 
Haviland

1626 A01639 STC 1177 ESTC 
S115271

Godwin, Francis, 
1562–1633

In Vtopia [i.e. 
London?]

[J.Bill] 1629 A01809 STC 11944 ESTC 
S118694

Godwin, Francis, 
1562–1633

[Oxford] : J. Barnes 1603] A01812 STC 11948 ESTC 
S118380

Table 2 A sample of the 
project data showing the 
information categories 
captured in the .CSV file. The 
punctuation marks serve to 
capture uncertainty about the 
date, the “Publisher” column 
cells contain several data 
points each.

https://doi.org/10.5334/johd.60
https://tei-c.org/guidelines/p5/
https://github.com/oerc-elephat/preprocessed-elephant


5Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

also opt to spend some of their free time engaged in additional research around the subject, 
merging their aim of technological up-skilling (during the workshop) and their existing sense of 
needing or desiring to have an understanding of the data prior to working with it.

Many workshop participants opt to separate out surname and first name, or at the very least, the 
date and the name (see column “Author” in Table 2). Although the punctuation (such as the colon 
and square brackets) are meaningful as indicators of ambiguity for the library professionals who 
created and painstakingly curated the original EEBO-TCP data, these additional characters can 
be problematic in later stages of the RDF production workflow. For this reason, many learners 
choose to edit these characters out, essentially applying a reductionist approach to simplify the 
messy historical data into the categories of modern information representation systems.

4 THEORY
Each LD4DH workshop combines theory and praxis. The former includes aspects and 
considerations that are relevant to the entire DH community, far beyond the scope of the 
niche group researchers who choose to use the LOD methodology in their research. A major 
component of this theoretical part of the workshop is to equip the learners with enough 
information regarding the pragmatics, the challenges, and the opportunities presented by LOD 
so that they can critically evaluate the method for its strengths and weaknesses. This in turn 
enables the participants to make an informed decision as to whether or not to engage with it in 
their research beyond DHOxSS. It is not always the right tool. There are no silver bullets.

Another aspect of the theoretical component is jargon-busting. These sessions are crucial in 
establishing a shared vocabulary, to facilitate communication, and to support engagement with 
the material. These sessions also enable participants to engage in meaningful conversations 
with other members of community of LOD practitioners. They also help to boost confidence 
in terms of using the appropriate terminology when discussing their project, and their 
technical needs, with their colleagues and the IT service provision at their home institutions. 
We introduce core concepts such as the Five Star Linked Open Data Standard;16 the idea of 
knowledge graphs;17 and the RDF triple.18 All the examples of data as RDF that the participants 
encounter in the workshop are expressed in one specific syntax, .TTL (pronounced “turtle”, 
and one of several possible options, the most common alternatives currently being JSON-LD 
and RDF-XML)19 to provide learners with a sense of consistency between examples but specific 
activities during the week also enable them to learn about the possibility of using different 
syntaxes for representing RDF.

Among the participants at each iteration of LD4DH, there has been a small minority of those 
who attend for reasons other than wanting to learn how to use LOD in their research. These 
include industry representatives, and the occasional “scout” – those who had been sent by 
their superiors to find out what LOD is all about. To address their needs (as well as help those 
participants interested primarily in the the research potential of this method), the LD4DH 
lesson materials contain information to help them engage with the IT support services at their 
own institutions. These cover the practical considerations of setting up a LOD project that go 
beyond issues like researcher aims and institutional policies (both of which I discuss at length in 
Nurmikko-Fuller, in press), such as the need for a server (and a person to manage that server!); 
and the process by which to decide which triplestore is best for the project, and so on. We 
also compare LOD to markup languages (such as XML) and standard relational databases. 
These discussions can help those who have prior experience of either of the two alternatives to 
quickly visualise the differences between them and RDF.

The theoretical component of the workshop discusses vexed ethical issues associated with the 
use of this digital methodology. One of these (and arguably the one that is easiest for all of to 
relate to at a personal level), is the enormous potential it has to invade individual privacy. At the 
core of the Linked Data paradigm is a potential privacy crisis. It is the promise that this method can 

16 https://www.w3.org/2011/gld/wiki/5_Star_Linked_Data.

17 https://www.ontotext.com/knowledgehub/fundamentals/what-is-a-knowledge-graph/.

18 https://www.w3.org/TR/rdf11-primer.

19 https://www.w3.org/TR/turtle/.

https://doi.org/10.5334/johd.60
https://www.w3.org/2011/gld/wiki/5_Star_Linked_Data
https://www.ontotext.com/knowledgehub/fundamentals/what-is-a-knowledge-graph/
https://www.w3.org/TR/rdf11-primer
https://www.w3.org/TR/turtle/


6Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

unlock knowledge by bringing information from disparate but complementary datasets together. 
When dealing with data such as historical records, this is of an immense benefit: scholars are able 
to create much more comprehensive pictures of the past by bringing together information from 
several different sources. But as I argue in my forthcoming book (Nurmikko-Fuller, in press), what 
if this technology is uncritically applied to us? Many of us categorise information in specific places, 
and recoil from the idea that a third party would have access to all that information simultaneously. 
Imagine finding out someone else was accessing our financial information, health records, 
employment history, and social media habits. Even if information is held in separate databases 
with different details removed to anonymise the data, Linked Data has potential for bringing all 
these fragments together, thus effectively removing any and all anonymisation.

The discussion of the theoretical foundations is supported, enriched, and diversified with 
daily hands-on activities, exposing the participants to both theory and praxis. The activities 
constitute the structural backbone of the workshop, and are arranged to follow the order of a 
RDF production workflow.

5 ACTIVITIES
There are two types of workflow that structure LD4DH. At the macro-level, there is the RDF 
production process, which gives the workshop its cohesion. At the micro-level, each hands-
on session has its own specific workflow, with task-appropriate software, and session-specific 
learning objectives.

Throughout the week, participants focus on the same dataset, but move from familiarising 
themselves with the data (as illustrated in Table 2 above) to modelling the content (converting 
.CSV to .TTL). Towards the end of the week, they progress from RDF production to writing SPARQL 
queries. This workflow has been reported on in the context of specific projects (Nurmikko-Fuller, 
Bangert, & Abdul-Rahman, 2017; Nurmikko-Fuller, Bangert, Dix, Weigl, & Page, 2018), so I will 
limit the discussion to an outline the of activities to illustrate the learning objectives that bring 
the workshop together.

There are four discrete tasks as summarised in Table 3. The first task requires participants to 
engage with data represented as RDF through a non-SPARQL endpoint (the so-called Follow-
Your-Nose approach to information discovery). In the past, this activity has focused on the use 
of the Pubby UI,20 however, from 2022 onwards the plan it to use the four different UIs available 
for DBpedia:21 DBpedia’s own resource page,22 OpenLink Faceted Browser23, OpenLink Structured 
Data Editor,24 and the LodLive Browser.25 This activity is most useful to those participants who 
have prior knowledge of database design and management, as it helps them to start thinking 
about and understanding the notion of information captured in RDF as an interconnected 
graph. As part of this session, participants also practice converting between different syntaxes 
of RDF such as Turtle, RDF-XML, and JSON-LD using EasyRDF.26

20 https://github.com/cygri/pubby.

21 The homepage for DBpedia is at https://dbpedia.org, but the dropdown list for the three UIs is best accessed 
through a page for a resource. An example of such a page might be https://dbpedia.org/page/Oxford.

22 As above, for Oxford.

23 For example https://dbpedia.org/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FOxford.

24 https://osde.demo.openlinksw.com/#/editor?uri=http:%2F%2Fdbpedia.org%2Fdata%2FOxford.
ttl&view=statements.

25 http://en.lodlive.it/?http%3A%2F%2Fdbpedia.org%2Fresource%2FOxford.

26 https://www.easyrdf.org/converter.

DAY TASK SOFTWARE

Monday Follow-Your-Nose approach DBpedia interfaces, EasyRDF

Tuesday Design and implement ontologies pen and paper, Protégé

Wednesday Producing instance-level RDF Web-Karma, Blazegraph

Thursday Using triplestores and SPARQL SPARQL Playground, Blazegraph

Friday Exploring the British Museum’s collections ResearchSpace

Table 3 Table of the assigned 
tasks for each day of the five 
days of LD4DH. In the middle 
are the tasks for each day; 
the right-hand column lists 
the specific software used for 
each task.

https://doi.org/10.5334/johd.60
https://github.com/cygri/pubby
https://dbpedia.org
https://dbpedia.org/page/Oxford
https://dbpedia.org/describe/?uri=http%3A%2F%2Fdbpedia.org%2Fresource%2FOxford
https://osde.demo.openlinksw.com/#/editor?uri=http:%2F%2Fdbpedia.org%2Fdata%2FOxford.ttl&view=statements
https://osde.demo.openlinksw.com/#/editor?uri=http:%2F%2Fdbpedia.org%2Fdata%2FOxford.ttl&view=statements
http://en.lodlive.it/?http%3A%2F%2Fdbpedia.org%2Fresource%2FOxford
https://www.easyrdf.org/converter


7Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

The second task is to develop an ontological structure. As part of this process, participants 
must first familiarise themselves with the data (as illustrated in Table 2). They are also asked 
to formulate the research questions their ontological structure would be able to answer, and 
spend some time reading over the scope notes or specifications of other existing ontologies, so 
as to establish an overview as to how this type of data has been represented by others. In the 
case of the ElePHãT project data, learners focus almost exclusively on bibliographic metadata 
ontologies such as Bibframe,27 FaBio,28 FRBRoo,29 MODS/RDF,30 MADS/RDF,31 and Schema.org.32 
They can also opt to incorporate aspects of these ontologies into their own ontological model, 
and to examine other vocabularies, schemas, and ontologies such as FOAF.33 The activity 
includes a “show and tell” session where learners take turns to present their ontological models 
for peer-review by the others in the class.

Most participants are eager to engage with the praxis part of each day. Engaging with software 
and producing results or finding answers creates a sense of doing DH. For the ontology 
development stage however, the “software” of choice is pen and paper. Drawing and redrawing 
concepts is a lengthy, iterative process, and throughout the session the learners discuss 
and change aspects such as type of information category. Some examples might include: 
Is “Person” sufficiently detailed? Do we need “Author” and “Publisher” as different types of 
People? But what about when either is an institution? Should the data be modelled as authors 
and publishers as types of “Agent”? The aim is to create a schema-level representation of 
the data – to define the possible information categories and the relationships between those 
categories that are present in the dataset.

It is only once a consensus of sorts has been reached – normally under increasing time pressure 
as the end of the workshop approaches – that the participants progress to the implementation 
stage. This is done using Protégé.34 It was chosen for two reasons: first, it is a popular tool, 
used across various different disciplines beyond Humanities, Arts, and Social Sciences; and, the 
point-and-click UI means that users do not need to acquire additional (potentially distracting) 
programming skills to complete this stage of the workflow.

Once complete, the ontological model is exported from Protégé as a .TTL file. This syntax of RDF 
is selected because it is most suitable for use in the next stage of the workflow.

The third step is to combine the original dataset (available as a .CSV and illustrated in Table 2) 
and the ontology (exported as a .TTL) to produce instance-level RDF. The tool of choice at 
DHOxSS for this part of the process has been Web-Karma,35 a free and Open software from 
the University of Southern California. Like Protégé, this tool also has a point-and-click UI, 
which makes the mapping between the .CSV and the .TTL file possible without the need for 
programming. It provides a visualisation of the resulting graph, which is a clear and convenient 
way to check the progress and accuracy of the data mapping, and fix any possible errors.

The fourth and final stage of the process is two-fold. First, participants upload their RDF into 
a triplestore, and second, they learn to explore the new knowledge graph using the SPARQL 
Protocol and RDF Query Language (SPARQL).36 A conscious decision has been made not to 
explore the protocol aspect, focusing exclusively on the query language. Our decision reflects 
existing advice (DuCharme, 2013) that describes the protocol as “rules for how a client program 
and a SPARQL processing server exchange SPARQL queries and results. These rules are … mostly 

27 https://www.loc.gov/bibframe/.

28 https://sparontologies.github.io/fabio/current/fabio.html.

29 http://www.cidoc-crm.org/frbroo/home-0.

30 https://www.loc.gov/standards/mods/modsrdf/.

31 https://www.loc.gov/standards/mads/rdf/.

32 https://schema.org/.

33 http://xmlns.com/foaf/spec/.

34 https://protege.stanford.edu/.

35 https://usc-isi-i2.github.io/karma/.

36 SPARQL is a recursive acronmy, meaning that the ‘S’ in SPARQL stands for “SPARQL”! It combines a Protocol 
and a Query Language for RDF, but for the purposes of LD4DH, we have chosen to focus on the query language 
aspect exclusively.

https://doi.org/10.5334/johd.60
http://Schema.org
https://www.loc.gov/bibframe/
https://sparontologies.github.io/fabio/current/fabio.html
http://www.cidoc-crm.org/frbroo/home-0
https://www.loc.gov/standards/mods/modsrdf/
https://www.loc.gov/standards/mads/rdf/
https://schema.org/
http://xmlns.com/foaf/spec/
https://protege.stanford.edu/
https://usc-isi-i2.github.io/karma/


8Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

an issue for SPARQL processor developers”. Given that the LD4DH workshops are not for SPARQL 
processor developers, the protocol is not covered.

Anecdotally, most participants have appeared to exhibit the most uncertainty and lack of 
confidence when asked to engage with SPARQL. They seemed to regard it as the most technical 
of the tasks. Undoubtedly this was due in part to a lack of readily available WYSIWYG (or What 
You See Is What You Get) UIs or graphical user-interfaces (GUIs). These would make the task 
less daunting by hiding the code behind a more familiar search box. The solution for this was 
the introduction of the SPARQL Playground37 into the curriculum – this provided participants 
with simple and easy-to-read examples that allowed them to build up their familiarity with 
SPARQL in a step-by-step process.

For a number of years, the triplestore of choice was Virtuoso38 based on two factors: first, it, like 
the other tools encountered in the context of LD4DH, was at one time a free tool (in the sense of 
both gratis and libre39); second, it was the triplestore of choice for the original ElePHãT project. This 
benefited the workshop as the teaching staff were familiar with the triplestore and its idiosyncrasies. 
In 2018, the decision was made to switch to Blazegraph.40 It emerged as the triplestore of choice 
because it remains free and open, as well as being relatively intuitive to manage.

By the end of Thursday afternoon, the learners have completed an entire workflow for 
converting tabular data into a knowledge graph. They have familiarised themselves with the 
data; produced an ontological model to enable them to represent the information within 
that dataset in a meaningful way, and to answer their desired research questions; they have 
produced instance-level RDF; and, successfully uploaded those triples into a triplestore, and 
completed SPARQL queries over them. In many ways, Thursday represents the pinnacle of the 
DHOxSS experience: it is the day where the challenges are the greatest, the frustrations the 
deepest, the euphoria of success the highest.

The week culminates in a day-long exploration of British Museum data. Friday’s activities are 
primarily focused on applying the skills that have been acquired throughout the week, as 
opposed to up-skilling in a new area or technical ability. In 2018 and 2019, the LD4DH workshop 
concluded with a hands-on practical session (a mini-workshop, of types) exploring the British 
Museum’s collection using the ResearchSpace tool.41 The day is largely run by Dominic Oldman 
and Diana Tanase of the British Museum, and presents the learners with an opportunity to apply 
their knowledge to a genuine, real-world LOD project, and to see how much they have learnt in 
the course of the week. They are also able to assess which skills they find most useful, relevant, 
and worth developing further. We also provide links and suggestions for additional tools (such 
as Open Refine42 for tidying data) and publications (DuCharme, 2013; Wood, Zaidman, Ruth, & 
Hausenblas, 2013; Van Hooland & Verborgh, 2014), as well as solving idiosyncratic problems 
(usually connected to projects the learners are working on outside of the LD4DH workshop). 
The afternoon trip to the Royal Oak43 is, of course, purely optional, but well-attended.

6 EVALUATIONS
There are two things that most learners at the LD4DH have in common: they have identified 
LOD as a methodology they are interested in or that might have value for them. At the same 
time, it is very rare to have participants who have prior experience of the methodology (see 

37 https://sparql-playground.sib.swiss/.

38 https://virtuoso.openlinksw.com/.

39 For those interested in the topics of Open Access and Open Source, the Wikipedia article on gratis and 
libre provides an succinct and easy-to-read summary of the differences between the two, and their application 
to intellectual property, computer code, and other relavant outputs” https://en.wikipedia.org/wiki/Gratis_
versus_libre.

40 https://blazegraph.com/.

41 Note that this refers specifically to the ResearchSpace tool (https://researchspace.org/), and not to the 
British Museum’s defunct orginal SPARQL endpoint (which at one time was available from http://collection.
britishmuseum.org/). At the time of writing, the latter has been inaccessible for at least half a decade, and has 
never been used for the exercises at LD4DH.

42 https://openrefine.org/.

43 The pub on Woodstock Road, which, according to signage within the building was in the 1770s “a desolate 
spot”.

https://doi.org/10.5334/johd.60
https://sparql-playground.sib.swiss/
https://virtuoso.openlinksw.com/
https://en.wikipedia.org/wiki/Gratis_versus_libre
https://en.wikipedia.org/wiki/Gratis_versus_libre
https://blazegraph.com/
https://researchspace.org/
http://collection.britishmuseum.org/
http://collection.britishmuseum.org/
https://openrefine.org/


9Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

for example, the independent reports from 201744 and 201845). The workshop presents an 
opportunity for up-skilling, but it is also an intense experience.

Participant feedback was available for four years, 2018–2021 inclusive. This provides a 
balanced set of two years when the workshop was taught in person in Oxford, and two of 
which were online. The number of respondents was small, but increased annually: only three 
of the 20 participants filled in the feedback form in 2018. This increased to six in 2019, eleven 
in 2020, and twelve in 2021, resulting in just 32 responses across four years (the size of the 
group is largely capped at 20 due to the limitations of available teaching space, but some 
additional students attend each year). The questions also changed with the move to the online 
medium: in 2018 and 2019 there is no data regarding the country of origin of the participants, 
but the corresponding information from 2020–2021 (“the COVID-years”) shows a spread of 
eleven different countries (the Netherlands, UK, Italy, India, Canada, Portugal, Germany, China, 
Switzerland, Mexico, and Spain). Participants from earlier years (although this is not captured 
in the survey data explicitly) are known to include those from at least France, Austria, Norway, 
and Sweden, as well as the UK.

None of the feedback forms collected demographic details of participants such as age or 
gender, focusing instead on levels of professional development and domain. With just one 
exception (a software researcher), all participants across all years were either academics 
(including students, early career, mid-career, and late career) or GLAM-sector professionals. Of 
the 32 respondents, nine described themselves as falling into more than one category, most 
frequently as being both researchers and practitioners – this is not surprising given the nature 
of DH more generally and the interest and uptake of LOD specifically.

With such a low response rate (32 attendees) it is difficult to draw conclusions of any statistical 
significance. Having said that, the respondents were a heterogeneous group and, albeit self-
selected, in some respects they could be seen as a maximum diversity sample. At the very 
least, they offer impressions worth noting.

The feedback was overwhelmingly positive, with most critical feedback reflecting the challenges 
of having to move to the online delivery method with short notice in 2020. Throughout the four 
years, almost all aspects of the Linked Data workshop were categorised as either “good” or 
“excellent” (and only two as “satisfactory” and none as “poor”). Qualitatively, the comments 
from participants include phrases such as “inspiring” (2018), “brilliant”, “excellent”, “great 
job” (2019), “successful event” (2020), and “extremely well moderated and extraordinarily 
well organised”, “…just great. It is very helpful…”, “Very interesting talks and very good overall 
experience”, and “Excellent Organisation. Wonderful you get to have the presentations” (2021). 
In 2018, the benefit of having several tutors in particular were highlighted: “The workshop 
was inspiring and challenging (in the best way). Terhi, John, and Graham were so generous 
with their time and knowledge. I enjoyed having their different points of view. I have already 
recommended the summer school to several colleagues”.

Two aspects of the workshop received criticisms in subsequent years. The first highlights 
the importance of expectation management. Unfortunately, the survey data does not cover 
those iterations of the workshop which took place in the facilities at the University of Oxford: 
in those years, the software was pre-installed on desktop machines in a small computer lab. 
In 2018 and 2019, participants were asked to arrive at the Summer School with the software 
pre-installed on their personal devices: not all participants complied with this requirement in 
either year. This represented a major challenge for the organisers: at least one tutor and often 
more than one had to shift their focus from teaching content or explaining tasks to the whole 
group to focus on problems arising from an individual machine. In some cases participants 
with institutional laptops had limited access rights, preventing software installation and/or the 
downloading of prerequisite libraries. Other participants attended with a tablet rather than a 
laptop, others refused to power up their machines with an alternative operating system from 
the USB-stick they had been given. Participant expectation can also present some challenges: 
there is an underlying assumption that the tutors of the workshop are also experts at installing 

44 https://dhh.uni.lu/2017/07/12/dhoxss-2017-linked-open-data/.

45 https://www.hirmeos.eu/2018/08/07/discovering-linked-open-data-at-the-digital-humanities-at-oxford-
summer-school/.

https://doi.org/10.5334/johd.60
https://dhh.uni.lu/2017/07/12/dhoxss-2017-linked-open-data/
https://www.hirmeos.eu/2018/08/07/discovering-linked-open-data-at-the-digital-humanities-at-oxford-summer-school/
https://www.hirmeos.eu/2018/08/07/discovering-linked-open-data-at-the-digital-humanities-at-oxford-summer-school/


10Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

the necessary software on any and all machines, regardless of operating system, prerequisites, 
or administrative restraints. Feedback from 2018 illustrates this point: “The software tools did 
not work and I felt this could have been sorted out in advance”.46

The main difference between 2018–2019 and the COVID-years was the move to online 
delivery. In an era of Zoom-fatigue and relentless online delivery, we may be quick to 
categorise the latter as less preferable. The participant feedback paints a more complex 
picture, however. In 2018–2019, there was negative feedback about the physical room and 
conditions in which the workshop took place, an aspect of in-person teaching, which we 
may be prone to forget: “Room was physically uncomfortable and layout did not suit the 
style of workshop, hence the low score for learning environment” (2018); “The teaching was 
excellent but the size of the room for the number of attendants and facilitators was not 
appropriate. It was very difficult to move around the room, see presentations from various 
angles of the room, for facilitators to communicate to attendees in small groups/individually 
and for us to break out to do group work. The unexpected heat wave made this even more 
unbearable” (2019); and “Our room was way too small for the amount of people, at times it 
was very loud” (2019).

The sudden move to online delivery for LD4DH in 2020 elicited negative feedback on some 
aspects of the workshop, in particular the hands-on element. It was inevitably more difficult to 
provide a seamless experience without the necessary time to develop the appropriate mode of 
delivery: “The Linked Data workshop felt like another lecture and was not really hands-on” and 
“The interactive workshop I attended seemed completely unprepared for teaching in an online 
environment” (2020).

Having learnt our lesson, and perhaps reflecting evolving attitudes as to the benefits of online 
learning, the feedback in 2021 was very positive: “the theoretical part of the morning sessions 
connected perfectly to better understand the practical part of the afternoon workshops. 
Congratulations!” (2021). A very welcome result of the move to the online medium was that 
it opened the workshop up to an international audience (as illustrated by the inclusion of 
participants, for the first time, from China and India) as well as for at least one neurodivergent 
attendee: “…Format worked well. As someone who is autistic, aspects of this worked better 
than in person. It would be great if there was a way to make your next in person event more 
accessible to neurodivergent participants by including some hybrid elements from the online 
event. You might see if a few neurodivergent people in DH could make specific suggestions 
to help” (2020). Future iterations of LD4DH will seek to find ways to replicate some of these 
successes and affordances, and to continue to cater for the needs of diverse cohorts.

7 CONCLUSION
Thinking back to my experience as a participant of the LOD workshop in 2012 has provided 
an opportunity to stand back and evaluate how it has evolved during my time as a lecturer 
and how it meets the needs of those who participate today. None of the lecturers wore 
sparkly trousers, for one. Software develops and changes, new projects emerge, some of the 
conceptual and philosophical debates remain the same. What else is different? How has our 
pedagogy evolved? Has the market shifted, or has the typical participant changed?

I believe that the feedback from the participants, with all its caveats, shows that the 
approaches we have applied to LD4DH have been successful in meeting and even exceeding 
the expectations of our diverse cohort of students. But the workshop is only one part of the 
much greater experience of DHOxSS itself. The unique and undoubtedly strongest asset of 
the Summer School is that it brings together some of the very best of the DH community. It 
creates an international, open, and dynamic learning environment for participants, providing 
ample opportunities for up-skilling, knowledge transfer, and networking. All these aspects 
have contributed to success. And so, as is so often the case with examples of successes in 
DH, at the core here too is the most important thing that makes the Summer School what it 
is: the people.

46 Please note that in 2018 the participants were asked to arrive with the workshop with the prerequisite 
software already installed.

https://doi.org/10.5334/johd.60


11Nurmikko-Fuller  
Journal of Open 
Humanities Data  
DOI: 10.5334/johd.60

TO CITE THIS ARTICLE:
Nurmikko-Fuller, T. (2022). 
Teaching Linked Open Data 
using Bibliographic Metadata. 
Journal of Open Humanities 
Data, 8: 6, pp. 1–11. DOI: 
https://doi.org/10.5334/johd.60

Published: 10 March 2022

COPYRIGHT:
© 2022 The Author(s). This is an 
open-access article distributed 
under the terms of the Creative 
Commons Attribution 4.0 
International License (CC-BY 
4.0), which permits unrestricted 
use, distribution, and 
reproduction in any medium, 
provided the original author 
and source are credited. See 
http://creativecommons.org/
licenses/by/4.0/.

Journal of Open Humanities 
Data is a peer-reviewed open 
access journal published by 
Ubiquity Press.

COMPETING INTERESTS
The author has no competing interests to declare.

AUTHOR CONTRIBUTIONS
Conceptualization, Formal Analysis, Investigation, Methodology, Supervision, Writing – original 
draft, Writing – review & editing.

AUTHOR AFFILIATION
Terhi Nurmikko-Fuller  orcid.org/0000-0002-0688-3006 
Centre for Digital Humanities Research, Australian National University, Canberra, Australia

REFERENCES
Brier, S. (2012). Where’s the pedagogy? In M. K. Gold (Ed.), The role of teaching and learning in the Digital 

Humanities, 1, 390–401. Minneapolis, MN: University of Minnesota Press. DOI: https://doi.org/10.5749/
minnesota/9780816677948.003.0038

DuCharme, B. (2013). Learning SPARQL: querying and updating with SPARQL 1.1. Sebastopol, CA: O’Reilly 
Media.

Imlawi, J., & Gregg, D. (2014). Engagement in online social networks: The impact of self-disclosure and 
humor. International Journal of Human-Computer Interaction, 30(2), 106–125. DOI: https://doi.org/10
.1080/10447318.2013.839901

Jett, J., Nurmikko-Fuller, T., Cole, T. W., Page, K., & Downie, J. S. (2016). Enhancing scholarly use 
of digital libraries: A comparative survey and review of bibliographic metadata ontologies. In 

Proceedings of the 16th ACM/IEEE Joint Conference on Digital Libraries (pp. 35–44). Newark, NJ: ACM. 

DOI: https://doi.org/10.1145/2910896.2910903
Khan, N. J., Nurmikko-Fuller, T., & Page, K. (2016). BABY ElEPHãT: Building an analytical bibliography for 

a prosopography in Early English imprint data. In IConference 2016 Proceedings. Urbana, IL: iSchools. 

DOI: https://doi.org/10.9776/16588
Needham, J., & Haas, J. C. (2019). Collaboration adventures with primary sources: Exploring creative and 

digital outputs. The Journal of Interactive Technology and Pedagogy, 14(9). Retrieved from https://
jitp.commons.gc.cuny.edu/collaboration-adventures-with-primary-sources-exploring-creative-and-
digital-outputs/

Nurmikko-Fuller, T. (in press). Linked Data for Digital Humanities. Oxford, UK: Routledge.
Nurmikko-Fuller, T., Bangert, D., & Abdul-Rahman, A. (2017). All the things you are: Accessing an 

enriched musicological prosopography through JazzCats. In Proceedings of the international 

conference of Digital Humanities (pp. 554–556). Montreal, Canada: Alliance of Digital Humanities 

Organizations. Retrieved from https://dh2017.adho.org/abstracts/305/305.pdf
Nurmikko-Fuller, T., Bangert, D., Dix, A., Weigl, D., & Page, K. (2018). Building prototypes aggregating 

musicological datasets on the Semantic Web. Bibliothek Forschung und Praxis, 42(2), 206–221. DOI: 

https://doi.org/10.1515/bfp-2018-0025
Page, K., Nurmikko-Fuller, T., Cole, T. W., & Downie, J. S. (2017). Building worksets for scholarship by 

linking complementary corpora. In Proceedings of the international conference of Digital Humanities 

(pp. 319–321). Montreal, Canada: Alliance of Digital Humanities Organizations. Retrieved from 

https://dh2017.adho.org/abstracts/606/606.pdf
Van Hooland, S., & Verborgh, R. (2014). Linked Data for libraries, archives and museums: How to clean, link 

and publish your metadata. London, UK: Facet.

Wood, D., Zaidman, M., Ruth, L., & Hausenblas, M. (2013). Linked Data: Structured data on the Web. 
New York, NY: Manning Publications.

https://doi.org/10.5334/johd.60
https://doi.org/10.5334/johd.60
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
https://orcid.org/0000-0002-0688-3006
https://orcid.org/0000-0002-0688-3006
https://doi.org/10.5749/minnesota/9780816677948.003.0038
https://doi.org/10.5749/minnesota/9780816677948.003.0038
https://doi.org/10.1080/10447318.2013.839901
https://doi.org/10.1080/10447318.2013.839901
https://doi.org/10.1145/2910896.2910903
https://doi.org/10.9776/16588
https://jitp.commons.gc.cuny.edu/collaboration-adventures-with-primary-sources-exploring-creative-and-digital-outputs/
https://jitp.commons.gc.cuny.edu/collaboration-adventures-with-primary-sources-exploring-creative-and-digital-outputs/
https://jitp.commons.gc.cuny.edu/collaboration-adventures-with-primary-sources-exploring-creative-and-digital-outputs/
https://dh2017.adho.org/abstracts/305/305.pdf
https://doi.org/10.1515/bfp-2018-0025
https://dh2017.adho.org/abstracts/606/606.pdf