Perseids: Experimenting with Infrastructure for Creating and Sharing Research Data in the Digital Humanities


CODATACODATA
II
SS
UU

Almas, B 2017 Perseids: Experimenting with Infrastructure for Creating and Sharing 
Research Data in the Digital Humanities. Data Science Journal, 16: 19, pp. 1–17, 
DOI: https://doi.org/10.5334/dsj-2017-019

PRACTICE PAPER

Perseids: Experimenting with Infrastructure for 
Creating and Sharing Research Data in the Digital 
Humanities
Bridget Almas
Perseids Project, Tufts University, 5 the Green, Medford, MA 02155, US
balmas@gmail.com

The Perseids project provides a platform for creating, publishing, and sharing research data, 
in the form of textual transcriptions, annotations and analyses. An offshoot and collaborator 
of the Perseus Digital Library (PDL), Perseids is also an experiment in reusing and extending 
existing infrastructure, tools, and services. This paper discusses infrastructure in the domain of 
digital humanities (DH). It outlines some general approaches to facilitating data sharing in this 
domain, and the specific choices we made in developing Perseids to serve that goal. It concludes 
by identifying lessons we have learned about sustainability in the process of building Perseids, 
noting some critical gaps in infrastructure for the digital humanities, and suggesting some 
implications for the wider community.

Keywords: infrastructure; digital humanities; data sharing; interoperability; research data

Overview
The Perseids project provides a platform for creating, publishing, and sharing research data, in the form of 
textual transcriptions, annotations and analyses. An offshoot and collaborator of the Perseus Digital Library 
(PDL), Perseids is also an experiment in reusing and extending existing infrastructure, tools, and services. 

This paper discusses infrastructure in the domain of digital humanities (DH). It outlines some general 
approaches to facilitating data sharing in this domain, and the specific choices we made in developing 
Perseids to serve that goal. It concludes by identifying lessons we have learned about sustainability in the 
process of building Perseids, noting some critical gaps in infrastructure for the digital humanities, and 
 suggesting some implications for the wider community.

General Approaches
What constitutes infrastructure, and how does it facilitate data sharing in the domain of DH, and in the 
Perseids project in particular? According to Mark Parsons, Secretary General of the Research Data Alliance 
(RDA), infrastructure can be defined as ‘the relationships, interactions and connections between people, 
technologies, and institutions that help data flow and be useful (Parsons 2015).’

In the realm of DH, any of the following might be considered infrastructure: original digital collections, 
linked data providers, general purpose and domain-specific platforms, content management systems 
(CMSs), virtual research environments (VREs), online tools and services, repositories and service providers, 
 aggregators and portals, APIs and standards. Table 1 provides some specific examples of these in the DH and 
digital classics (DC) communities, illustrating the diversity and breadth of infrastructure in this community.

Enabling data sharing includes ensuring that data objects have persistent, resolvable identifiers,  providing 
descriptive and structural metadata, providing licensing and access information, and using standard data 
formats and ontologies. The recent W3C recommendation ‘Data on the Web Best Practices’ (Loscio, et. al. 
2016) cites many strategies such as providing version history, provenance information, and data quality 
information.

https://doi.org/10.5334/dsj-2017-019
mailto:balmas@gmail.com


Almas: PerseidsArt. 19, page 2 of 17  

Infrastructure type Examples in DH and DC

Original digital collections PDL, Papyri.info, NINES, Digital Latin Library, Coptic Scriptorium, 
Roman de La Rose

Linked data providers and gazetteers Pleiades, PeriodO, Syriaca.org, VIAF, Getty, Trismegistos, DBPedia

General purpose platforms, CMS, VREs, tools 
and services

Omeka, MediaWiki, Heurist, TextGrid, Voyant, Mirador, CollateX, 
JUXTA, Neatline

Domain-specific platforms, CMS, VREs, tools 
and services

Perseids, Recogito, Symogih, PECE

Repositories and service providers CLARIN, DARIAH, EUDAT, MLA Commons/CORE, HumaNum, Hathi 
Trust Research Center, California Digital Library

Aggregators and portals Europeana, Digital Public Library of America, HuNi, EHRI

APIs and standards IIIF, OA, TEI, OAUTH, Shibboleth/SAML, CTS

Table 1: Examples of infrastructure in digital humanities and digital classics.

Above and beyond this, ensuring that adequate editorial and/or peer review has taken place before data 
is shared is often an important criteria for data sharing in the humanities.

Background
Perseids evolved to fill a critical need of the digital classics community of scholars and students (Bodard 
and Romanello 2016): infrastructure that supports textual transcription, annotation, and analysis at a large 
scale, with review, in both scholarly and pedagogical contexts. Such infrastructure would give us the ability 
to work with text-centric publications containing a variety of different data types, and would include:

• stable, persistent identifiers for all publications;
• a versioned, collaborative editing environment;
• the ability to extend the environment with data type-specific behaviors and tools;
• customized review workflows.

Perseids is, in part, a successor to a prior ambitious, but ultimately unsuccessful, infrastructure effort in 
the humanities, Project Bamboo (Dombrowski 2014). One of the aims of Project Bamboo was to develop a 
Service Oriented Architecture (SOA) that could serve a wide variety of use cases and requirements for textual 
analysis and humanities research. This accorded with the goal of the PDL: to begin to decouple the many 
services making up the Perseus 4 application, so that they could be recombined and reused to build new 
applications (Almas 2015). The PDL’s contribution to Bamboo included development (and implementation) 
of service APIs for morphological analysis and syntactic annotation. These services, intended to be shared 
on the Bamboo Services Platform, reused code from two main sources: the PDL’s web application and the  
Alpheios Project’s reading environment, and were designed to be easily extended to serve additional  
languages and use cases. They provided essential functionality for textual analysis and annotation.

At the same time, we also began separately investigating development of a scalable solution for  
 engaging undergraduate students in the production of original transcriptions and translations of 
Medieval Latin Manuscripts and Greek Epigraphy. This work was inspired by, and involved reuse of 
 architecture and tools from, two major projects in digital classics, the Homer Multitext and Papyri.info  
(Almas and Beaulieu 2013).

One thing that prevented Bamboo from succeeding was the assumption that scholars would be willing 
to give up their domain-specific tools and services for a more general infrastructure to which everyone 
would contribute (Dombrowski 2015). Humanities use cases at the time appeared too diverse for that, and 
t echnologies were moving very fast. It is unclear whether or not Bamboo could have succeeded but the 
project ended before we could develop a critical component needed for our own use cases, a platform for 
management of the data and scholarly workflow which would allow for full peer and professorial review. 

Perseids took up in part where Bamboo left off, but with a more modest goal of providing infrastructure 
for our own specific set of use cases. We reused the services we built for Bamboo in Perseids, and also reused 

http://www.syriaca.org/


Almas: Perseids Art. 19, page 3 of 17

an existing piece of infrastructure from another project, the Son of SUDA Online (SoSOL), to fill the role of 
managing the data and review workflows. 

Drawing on the experiences of Bamboo, we decided that Perseids would support a looser coupling of 
 existing tools and services. One goal of infrastructure is to connect what already works, adding value and 
capacity without reinventing solutions. Our development approach for Perseids was thus based on three 
principles:

1. data interoperability;
2. flexibility and agility;
3. tool interoperability.

We wanted not only to support our scholarly workflows, but also to be sure that the outputs would be fully 
sharable and preservable.

Perseids currently serves an active user base, averaging between one and two thousand sessions by at least 
five hundred unique users per month during the academic year, the majority of which come from six active 
DH communities: Tufts, the University of Nebraska at Lincoln, the College of Letters and Science of the Sao 
Paulo State University, the University of Leipzig, the University of Lyon, and the University of Zagreb. Several 
external projects also connect to Perseids’s tools and review workflow via its API.

Functionality
Use Cases
Perseids offers functionality for creation, curation and review of texts, translations and annotations. It 
enables its users to:

1. Create and edit a new text transcription.
2. Edit an existing text transcription.
3. Create and edit a new text translation.
4. Edit an existing text translation.
5. Create and edit a new commentary annotation.
6. Create and edit a new treebank1 annotation.
7. Create and edit a new text alignment2 Annotation. 
8. Ingest and edit simple annotation data from external sources.
9. Create and edit simple annotations on texts.

The process of creating a publication on Perseids involves workflows fulfilling one or more of these use cases 
(Figure 1).

Workflows
A workflow, in this context, is a series of actions carried out by a user to achieve some goal. In a typical 
workflow on Perseids the user creates a publication containing one or more of the supported data types. 
She uses an editing tool appropriate to the data type to edit and curate her work and then submits it to a 
review board for acceptance. For example, she may choose to create and edit a Treebank annotation using 
the Arethusa editing tool (Figure 2).

If the work is being done in the context of a pedagogical assignment, the review board is likely to be made 
up of the professor and teaching assistants for the class. If the work is being done in the context of a specific 
project or community, the review board will be composed of peers or expert members of an editorial team 
(Figures 3 and 4).

The ability to support peer-review functionality is a distinguishing feature of the Perseids infrastructure, 
and an important driving factor behind the architectural decision to built it upon the SoSOL platform. As we 
discuss further below, a common driver for external projects to integrate with Perseids is to take advantage 
of the flexible review workflow features it offers.

 1 Annotation of morphology, syntax and sentence structure.
 2 N-to-N word-level alignment across two texts.


Almas: PerseidsArt. 19, page 4 of 17  

Figure 1: The Perseids home screen, showing a variety of data types and actions.

Figure 2: Annotating a Treebank in Arethusa.

Figure 3: Perseids user interface – voting on a publication. 


Almas: Perseids Art. 19, page 5 of 17

Figure 4: Perseids review workflow.

Architecture
The Perseids architecture (Figures 5–7) supports these workflows through a complex sequence of 
 interactions between its core components, hosted tools and services, 3rd party applications and platforms 
and external identity providers and content repositories.

SoSOL is the core of the Perseids platform. It is a Ruby on Rails application, built on top of a Git  repository, 
that provides an open-access, version-controlled, multi-author web-based editing environment that  supports 
working with collections of related data objects as publications. SoSOL was developed for the Papyri.info site 
by the Integrating Digital Papyrology project, a multi-institution project aimed at supporting interoperability  
between five different digital papyrological resources (Baumann 2013) and is now maintained jointly by the 
Duke Collaboratory for Classics Computing and the Perseids project.

A Git repository provides versioning support for all documents, annotations and other related objects 
managed on the platform. SoSOL also provides additional functionality on top of Git’s, including  document 
validation, templates for documentation creation, review boards, and communities. It uses a relational 
database (MySQL) to store information about document status and to track the activity of users, boards, 
and communities. SoSOL uses the OpenID and Shibboleth/SAML protocols to delegate responsibility for 
user authentication to social or institutional identity providers. Social identity providers (IdP) are supported 
through a third-party gateway, currently Janrain Engage.

The Perseids deployment of SoSOL incorporates the Canonical Text Services (CTS) protocol. The CTS 
 specification defines an API protocol and a URN syntax for identifying and retrieving text passages via 
machine-actionable, canonical identifiers (Smith and Blackwell 2012). To support CTS, as well as provide 
features such as tokenization of texts, the Perseids deployment of SoSOL delegates some functionality to 
external databases and services.

The SoSOL application itself provides lightweight user interfaces for creating and editing documents and 
annotations, but in order to support an open-ended set of different editing and annotation activities, we rely 
on integrations with external web-based tools for editing and annotating. These integrations are enabled by 
API interactions between the tools and the SoSOL application.

The Perseids Client Applications component acts as a broker between the end-user, the SoSOL platform, 
external repositories and services, and the web-based editing and annotation tools.3 Built on the Python 
Flask framework, this component implements a client-side workflow for the creation of new annotations of 
text passages identified by CTS URN. It uses the CTS abstraction libraries from CapiTainS infrastructure for 
CTS URN resolution and processing, as does the Nemo browsing interface, which offers a discovery interface 
for identifying texts to annotate and an anchoring point for front-end annotation tools and visualizations. 

 3 The Perseids Client Applications were co-developed by Perseids and The Humboldt Chair for Digital Humanities at the University 
of Leipzig.


Almas: PerseidsArt. 19, page 6 of 17  

Figure 6: Perseids core components.

Figure 5: Perseids infrastructure and ecosystem.

A recent addition to the platform is a Flask based GitHub Proxy Service which enables us to send data 
directly to external GitHub repositories after it has been through the review workflow.4 (See the ‘Tool 
Interoperability’ section below for further details on these scenarios.) 

The role that each component of the architecture and ecosystem plays in supporting the workflow is 
described in the ‘Tools Interoperability’ section below.

 4 Development of this component was supported by an NEH-funded collaboration with the Syriaca.org project.

http://www.syriaca.org/


Almas: Perseids Art. 19, page 7 of 17

Information Model
Data publications produced on Perseids are collections of related data objects of different types. The SoSOL 
information model was designed for this type of publication. The “Publication” is the container for a 
 collection of data objects belonging to a parent abstract class of “Identifier.” Different type object types are 
implemented as derivations of the “Identifier” class, which add type-specific behaviors and properties, such 
as schema validation rules. Figure 8 shows how this design applies in Perseids.

However, Perseids publications can also be thought of as research objects (Bechhofer, et. al. 2013), where 
the object of the research is a passage or passages of canonically-identifiable text. Figure 9 shows our 
 original vision for a CTS-focused publication on Perseids5 (Figure 9).

Tool interoperability
Decoupling data creation tools from the sources and destinations of the data was a key part of our design 
approach. APIs and standards are critical components of infrastructure, and integration and sharing require 
that data be retrievable from and persistable to any source (Hilton 2014).

Perseids offers an API for Create, Read, Update, and Delete update operations for all data types supported 
by the platform. API clients can authenticate using the OAuth 2.0 protocol (Hardt 2012) or co-hosted tools 
have the option of using a shared session cookie. These approaches enable integration with specific tools 
and  services, such as the Arethusa Annotation Framework and the Alpheios Alignment Editor, as well as 
external projects such as Sematia (Vierros and Henriksson 2016) and the Syriaca.org Gazetteer (Figure 10).

Perseids also uses external APIs to pull data from other infrastructures. We use the Canonical Text Services 
URN protocol and API (Smith and Blackwell 2012) to identify and retrieve textual transcription, translation, 
and annotation targets (Figures 11 and 12).

 5 The vision in Figure 3 has largely been implemented, with the exception of CITE collections server component. We now expect this 
function to be filled by an implementation of a multidisciplinary Collections API we are working on as part of the Research Data 
Alliance’s Research Data Collections Working Group.

Figure 7: Perseids hosted tools and services.

http://www.syriaca.org


Almas: PerseidsArt. 19, page 8 of 17  

Figure 8: Perseids information model.

Figure 9: Perseids publication as a CTS focused research object.


Almas: Perseids Art. 19, page 9 of 17

Figure 10: Creating and submitting a publication from an external application using OAuth2.

Figure 11: Sequence of API interactions for creating and editing a CTS-focused annotation template using 
the Perseid Client Apps and a locally hosted editing tool.

We also offer a lightweight URL-based API which lets individual scholars and smaller projects, particularly 
those without the time or skills to develop client software, pull their own data in or integrate Perseids with 
their application. Professors such as Robert Gorman at University of Nebraska Lincoln (Gorman and Gorman, 
forthcoming) are using this feature to produce templates for new annotations that they publish on their 
university Learning Management Systems (LMS). They then include links to Perseids in their syllabi that 
instruct Perseids to pull the templates from the LMS to create a new annotation publication (Figure 13). 


Almas: PerseidsArt. 19, page 10 of 17  

Figure 12: Using the Perseids Client Apps to create a new translation alignment annotation in Perseids for 
editing via the Alpheios Alignment Editor. Texts available for use are populated via a call to the CTS API.

Figure 13: Sequence of actions for creating a publication from an LMS-hosted syllabus and annotation 
template.

Other applications such as Digital Athenaeus use Perseids’s URL API to offer links to Perseids with specific 
content already identified for transcribing, translating, or annotating (Figures 14 and 15).

We also implemented a workflow for Marie-Claire Beaulieu’s Journey of the Hero course which allows 
 students to use the Hypothes.is annotation tool to annotate named entities and social networks of 
 mythological characters from Smith’s Dictionary of Greek Names. This workflow uses the Hypothes.is API to 
pull the annotations into Perseids for review and publication (Figure 16).

https://hypothes.is/
https://hypothes.is/


Almas: Perseids Art. 19, page 11 of 17

Figure 15: Screenshot of the Digital Athenaeus interface (at http://www.digitalathenaeus.org) showing the 
links to Annotate in Perseids.

Figure 16: Perseids Hypothes.is workflow.

Figure 14: Sequence of actions for creating a CTS targeted text annotation publication from a link from 
Digital Athenaeus.

http://www.digitalathenaeus.org/
https://hypothes.is/


Almas: PerseidsArt. 19, page 12 of 17  

The Perseids/EAGLE integration uses a combination of both of these pull strategies: links from EAGLE to 
Perseids identify a resource on the EAGLE site, and trigger a callback to the EAGLE MediaWiki API to pull 
metadata and data from that resource into new translation publications on Perseids (Figures 17 and 18). 

We also use external APIs to push data to external repositories. For the EAGLE project integration, Perseids 
uses the MediaWiki API to publish data to the EAGLE repository once it has passed through a review  workflow. 
Through a new NEH-funded collaboration with the Syriaca.org project, we have developed a service which 
allows us to push data to external GitHub repositories at the end of the review workflow (See Figure 4,  
Step 5b). Eventually we’d like to be able to support pushing data to any external API endpoint. 

Figure 17: Perseids/EAGLE workflow.

Figure 18: Screenshot of the EAGLE Portal (http://www.eagle-network.eu/wiki) showing a link to edit a 
translation in Perseids.

http://www.syriaca.org/
http://www.eagle-network.eu/wiki
file://192.168.1.10/TypeSetting/Silicon%20Chips/UP_Journals/005_DSJ/Application%20CS5.5/2017/dsj-681_almas\
file://192.168.1.10/TypeSetting/Silicon%20Chips/UP_Journals/005_DSJ/Application%20CS5.5/2017/dsj-681_almas\


Almas: Perseids Art. 19, page 13 of 17

Designing for Flexibility and Agility
From the outset, we have taken an agile approach to development of Perseids. While we do not use official 
sprints and strictly scheduled iterations, we approach planning in short increments, guided by a long-term 
vision and goals. In addition, we aim to deploy features to users as quickly as possible, so that we can get 
feedback from them. We do this not only for internal-facing features, but also to prototype new integrations 
with external services and projects. This flexibility allows us to try many things, keeping those that work and 
prove to be useful and deprecating those that do not. 

To support this approach, we could not commit to a specific set of hardware requirements in advance, 
as we needed the flexibility to extend and reduce resources used as development proceeded. We therefore 
chose to budget for cloud-based resources on the Amazon Web Services (AWS) platform rather than using 
university IT resources. Full ownership and control over our infrastructure allowed us to experiment with 
features and integrations that otherwise would not have been possible; however, it did have some  drawbacks 
and unexpected costs. These are described in the ‘Sustainability’ section below.

Standards for Data
Data Interoperability
A strategic principle in our development is to take steps to ensure data interoperability through the use of 
stable identifiers and standard formats. 

We use CTS URNs to identify both texts and annotation targets. These URNs can be considered stable  
 identifiers, but do not quite qualify as persistent identifiers as they are not universally resolvable or 
 guaranteed to be available. Other identifier systems, such as Handles (Sun et. al. 2003), are designed for 
persistence, and one approach we might take in the future to address this would be to map CTS URNs to the 
Handles (Almas and Schroeder, 2016), but in the absence of this piece of infrastructure, the CTS URNs do 
provide stable, machine actionable identifiers that are technology independent. 

We also use other types of stable identifiers within our annotations and texts, including the URIs published 
by the Pleiades Gazetteer. We are working towards ensuring that any data published by the platform has 
a persistent  identifier as well. We are therefore participating in the Research Data Alliance’s Research Data 
Collections working group to develop a multidisciplinary, collections-based approach to data management that  
supports persistent identifiers for the collections themselves, and for the items within a collection. 

We also strive to use standard data formats and ontologies for our data and to validate all objects against 
these. The primary data format standards supported on the platform include the TEI Epidoc Schema for 
 textual transcriptions and translations, the Open Annotation protocol for annotations, the ALDT/ALGT 
 schemas for treebank data, the Alpheios Alignment Scheme for translation alignments, and the SNAP 
 ontology for social network annotations.

Provenance and Preservation
Incorporating provenance information in our publications is an important enabling factor for data 
 sharing. We have taken steps in this direction, for example by supporting Shibboleth/SAML protocol for 
 authentication on Perseids in order to to be able to ensure a chain of authority for university repository  
systems. We have also included provenance information for tokenization services and tools in our  annotation 
documents, and have explored models for more comprehensive approaches (Almas, Berti, et. al. 2013).  
However, capturing and recording provenance information reliably across a diverse ecosystem of tools and 
services is difficult, and we need general-purpose solutions that we can reuse. As articulated by Padilla 
(2016): “A researcher should be able to understand why certain data were included and excluded, why  certain 
transformations were made, who made those transformations, and at the same time a researcher should 
have access to the code and tools that were used to effect those transformations. Where gaps in the data are 
native to the vagaries of data production and capture, as is the case with web archives, these nuances must 
be effectively communicated.” We recognize that we fall short of meeting these goals currently and aim to 
do a more complete job of this in the future.

It is also very important to us that the research data produced with Perseids be preserved. However, our 
data models and approach to publications are constantly evolving, making coordination with the  university 
library to preserve this data challenging, as they don’t necessarily fit the data models the library is already 
able to support. As a publicly available and open infrastructure, we also have many users from many 
 institutions across the world, and it is not clear what responsibility Tufts, the university hosting the infra-
structure, should have for data created by external users. We mitigate this with Perseids by providing links  


Almas: PerseidsArt. 19, page 14 of 17  

that users can use to access and download their data, and encouraging them to take responsibility for  
publishing and preserving it on their own. We continue to explore general models such as the Research 
Object (Belhajjame, et. al. 2015), or BagIt, which will enable users to export data in a format that is ready to 
store in a repository. Another question is that of software preservation (Rios 2016). As the Perseids software 
is under active development, it is difficult to keep the code for digital publications up to date with all the 
underlying services providing the data (Rios and Almas 2016). We need to plan better for this  preservation, 
including taking into account the need to represent interdependencies between visualizations and the 
underlying services and software (Lagos and Vion Dury 2016).

Sustainability
Human and Governance Factors
We have learned much about infrastructure building throughout the course of this project. The technical 
hurdles to interoperability and sharing are usually much less difficult to overcome than those of social issues, 
funding, and governance. Even where there was a clear interest in interoperability and it was  technically  
possible, we failed sometimes to implement or sustain an integration because doing so wasn’t in the funded 
mandate of the partner project. This was the case for us with the Recogito application from the Pelagios  
Project. But even where explicit funding support doesn’t exist, interoperability can still succeed if one  project 
can fill a key gap in another, and if there are people willing to champion the effort to ensure its success. One 
example was our integration with the EAGLE project, where Perseids provides a review workflow for EAGLE, 
and which was implemented without being a funded deliverable for either project, but it remains to be 
seen if we can sustain it indefinitely. This is an area where more formal governance structures, such as those 
offered by larger research infrastructures such as CLARIN and DARIAH (Lossau 2012) could be useful. The 
key challenge for the community is to encourage and support ad-hoc collaborations to get initial solutions 
working, and then move from there to more formal agreements to ensure sustainability. 

Hardware and Software Factors
Laura Mandell talks about the various models being considered for where and how to position DH, and 
points out that the question of how to support diverse infrastructure needs is still unsolved (Dinsman 2016). 
A second lesson we have learned from our experience on Perseids is that for development of  interoperable 
infrastructure to succeed and be sustainable, we need better collaborative models for working with our 
 university Information Technology departments and libraries. We knew we needed the flexibility to change 
our hardware requirements as we developed, and to deploy new code and services quickly to support 
rapid prototyping. This allows us to develop and try out new solutions more rapidly than we would have 
been able to if we had to go through university policies and procedures, but it also involved a lot of extra  
system administration work we had not anticipated, leaving us with a somewhat over-complicated  
infrastructure at the end of the first phase of the project. Accordingly, in the second phase we built in 
 funding for a devops consultant, who helped us move to a fully configuration-managed system, so that 
the Perseids platform can be deployed easily by others and sustained for the long term. This is a critical 
characteristic for software-related infrastructure - in order for it to be reproducible by others, building and 
deploying it must be automated. In hindsight, having such consultancy from the outset would have been 
beneficial; collaboration between developers and the IT staff responsible for deploying and sustaining  
software is a more viable model than throwing code ‘over the wall’ at the end of a project (Arundel 2016). 
As cloud computing becomes increasingly cost-efficient, and new models of deployment, such as container-
based solutions, are introduced, there is a need for models in which university IT departments can partner 
with projects to provide expertise and facilities (for example, private cloud or container infrastructure, or 
extending university infrastructure to the public cloud). 

Conclusion
With Perseids, we have explored an agile approach to infrastructure development, emphasizing reuse of 
both software and data. This has been successful on many levels. Reuse of existing infrastructure compo-
nents leads to collaborations which increase the chances of sustainability, such as the joint maintenance 
of the SoSOL application. Agile approaches to prototyping cross-project integration also benefit all parties 
involved. However, transitioning to more formal governance models and increased engagement with host 
institutions will be essential to longer term success. 


Almas: Perseids Art. 19, page 15 of 17

Acknowledgements
The author wishes to thank her colleagues, John Arundel, Frederik Baumgardt, Marie-Claire Beaulieu and 
Thibault Clérice for contributing their energy and ideas in reviewing this paper.

Competing Interests
The author has no competing interests to declare.

Author Information
Bridget Almas is the software architect and co-director of the Perseids Project at Tufts University. Bridget has 
worked in software development since 1994, in roles which have covered the full spectrum of the software 
development life cycle, focusing since 2007 in the fields of digital humanities and pedagogy. Bridget served 
as an elected member of the Technical Advisory Board of the Research Data Alliance (RDA), from 2013–2015, 
and currently is co-chair of the Research Data Collections Working Group, the Data Fabric Interest Group 
and acts as liaison between the Alliance of Digital Humanities Organizations (ADHO) and RDA.

References
Almas, B 2015 The Road to Perseus 5 – why we need infrastructure for the digital humanities. Blog post on the 

Perseus Updates Blog (18, May 2015). Available at: http://sites.tufts.edu/perseusupdates/2015/05/18/
the-road-to-perseus-5-why-we-need-infrastructure-for-the-digital-humanities/.

Almas, B and Beaulieu, M-C 2013 Developing a New Integrated Editing Platform for Source  Documents 
in Classics. Literary and Linguistic Computing, 28: 493–503. DOI: https://doi.org/10.1093/llc/
fqt046

Almas, B, Berti, M, Choudhury, S, Dubin, D, Senseney, M and Wickett, K 2013 Representing  Humanities 
Research Data Using Complementary Provenance Models. In Building Global Partnerships - RDA  
Second Plenary Meeting, Washington, D.C.: RDA. Available at: https://www.rd-alliance.org/filedepot_
download/694/158.

Almas, B and Schroeder, C T 2016 Applying the Canonical Text Services Model to the Coptic SCRIPTORIUM. 
Data Science Journal, 15: 13. DOI: http://doi.org/10.5334/dsj-2016-013

Arundel, J 2016 Build bridges not walls: devops is about empathy and collaboration. Available at: http://
bitfieldconsulting.com/bridges-not-walls.

Baumann, R 2013 The Son of Suda Online. In: Dunn, S and Mahoney, S (Eds.) The Digital Classicist 2013. 
Offprint from BICS Supplement-122. London: The Institute of Classical Studies University of London,  
pp. 91–106.

Bechhofer, S, Ainsworth, J, Bhagat, J, Buchan, I, Couch, P, Cruickshank, D, Delderfield, M, Dunlop, I, 
Gamble, M, Goble, C, Michaelides, D, Missier, P, Owen, S, Newman, D, De Roure, D and Sufi, S 2013 
Why Linked Data is Not Enough for Scientists. Future Generation Computer Systems, 29(2): 599–611. DOI: 
https://doi.org/10.1016/j.future.2011.08.004

Belhajjame, K, Zhao, J, Garijo, D, Gamble, M, Hettne,K, Palma, R, Mina,E, Corcho,O, Gómez-Pérez,J M,  
Bechhofer,S, Klyne,G and Goble, C 2015 (May) Using a suite of ontologies for preserving workflow-
centric research objects. Journal of Web Semantics, 32: 16–42. DOI: https://doi.org/10.1016/j.web-
sem.2015.01.003

Bodard, G and Romanello, M (eds.) 2016 Digital Classics Outside the Echo-Chamber. London. Ubiquity 
Press.

Dinsman, M 2016 (April 24) The Digital in the Humanities: An Interview with Laura Mandell - Los Angeles 
Review of Books. Available at: https://lareviewofbooks.org/article/digital-humanities-interview-laura-
mandell/.

Dombrowski, Q 2014 What Ever Happened to Project Bamboo? Literary and Linguistic Computing, 29(3): 
326–339. DOI: https://doi.org/10.1093/llc/fqu026

Gorman, R and Gorman, V forthcoming Approaching questions of text reuse in Ancient Greek using  
computational syntactic stylometry. Open Linguistics Topical Issue on Treebanking and Ancient  
Languages.

Hardt, D (ed.) 2012 The OAuth 2.0 Authorization Framework, RFC 6749. Available at: http://www.rfc-editor.
org/info/rfc6749. DOI: https://doi.org/10.17487/RFC6749

Hilton, J L 2014 Enter Unizin. EDUCAUSE Review, 49(5).

https://doi.org/10.1093/llc/fqt046
https://doi.org/10.1093/llc/fqt046
http://doi.org/10.5334/dsj-2016-013
https://doi.org/10.1016/j.future.2011.08.004
https://doi.org/10.1016/j.websem.2015.01.003
https://doi.org/10.1016/j.websem.2015.01.003
https://doi.org/10.1093/llc/fqu026
https://doi.org/10.17487/RFC6749


Almas: PerseidsArt. 19, page 16 of 17  

Lagos, N and Vion-Dury, J-Y 2016 (September 13–16) Digital Preservation Based on Contextualized 
Dependencies. Doc Eng. Available at: http://www.xrce.xerox.com/content/download/93294/1307736/
file/2016-031.pdf.

Loscio, B F, Burle, C and Calegari, N 2016 (30 August) W3C. 2016 Data on the Web Best Practices. W3C 
Candidate Recommendation. Available at: https://www.w3.org/TR/2016/CR-dwbp-20160830/.

Lossau, N 2012 An Overview of Research Infrastructures in Europe - and Recommendations to LIBER. LIBER 
Quarterly, 21(3–4): 313–329. DOI: https://doi.org/10.18352/lq.8028

Padilla, T 2016 Humanities Data in the Library: Integrity, Form, Access. D-Lib Magazine, 22(3/4). DOI: 
https://doi.org/10.1045/march2016-padilla

Parsons, M 2015 (22 September) e-Infrastructures & RDA for data intensive science. Available at: https://
rd-alliance.org/sites/default/files/attachment/Infrastructures,%20relationship,%20trust%20and%20
RDA_MarkParsons.pdf.

Rios, F 2016 The Pathways of Research Software Preservation: An Educational and Planning Resource for 
Service Development. D-Lib Magazine, 22(7/8). DOI: https://doi.org/10.1045/july2016-rios

Rios, F and Almas, B 2016 Preserving Digital Scholarship in Perseids: An Exploration. Blog Post. DOI: https://
doi.org/10.5281/zenodo.159569

Smith, N and Blackwell, C W 2012 Four URLs, limitless apps: Separation of concerns in the Homer  Multitext 
architecture. In Donum natalicium digitaliter confectum Gregorio Nagy septuagenario a discipulis  collegis 
familiaribus oblatum: A Virtual Birthday Gift Presented to Gregory Nagy on Turning Seventy by His  
Students, Colleagues, and Friends. Boston: The Center of Hellenic Studies of Harvard University

Sun, S, Lannom, L and Boesch, B 2003 Handle System Overview, RFC 3650.  Available at: http://www.rfc-
editor.org/info/rfc3650. DOI: https://doi.org/10.17487/RFC3650

Vierros, M and Henriksson, E 2016 Preprocessing Greek Papyri for Linguistic Annotation. Hal-01279493. 
Preprint. Available at: https://hal.archives-ouvertes.fr/hal-01279493.

Projects, Websites, Software
Alpheios [WWW Document] n.d. Available at: http://alpheios.net/ (accessed 9.29.16).
Alpheios Alignment Editor [Software] n.d. Available at: https://github.com/alpheios-project/alignment-

editor (accessed 9.29.16).
Arethusa [Software] n.d. Available at: https://github.com/alpheios-project/arethusa (accessed 9.29.16).
CapiTainS [WWW Document] n.d. Available at: http://capitains.github.io/ (accessed 9.29.16).
Digital Athenaeus - A digital edition of the Deipnosophists of Athenaeus of Naucratis [WWW 

Document] n.d. Available at: http://digitalathenaeus.org/ (accessed 9.29.16).
EpiDoc Guidelines 8.22 [WWW Document] n.d. Available at: http://www.stoa.org/epidoc/gl/latest/ 

(accessed 9.29.16).
Flask (A Python Microframework) [WWW Document] n.d. Available at: http://flask.pocoo.org/ (accessed 

11.8.16).
flask-github-proxy: Github proxy to push resource to github [Software] n.d. Available at: https://

github.com/PonteIneptique/flask-github-proxy (accessed 9.29.16).
Hypothes.is [WWW Document] n.d. Available at: https://hypothes.is/ (accessed 9.29.16).
Journey of the Hero [WWW Document] n.d. Available at: http://perseids.org/sites/joth/#index (accessed 

9.29.16).
Morphological Analysis Service Contract Description - v1.1.1 [WWW Document] n.d. Available at: 

https://wikihub.berkeley.edu/display/pbamboo/Morphological+Analysis+Service+Contract+Descript
ion+-+v1.1.1 (accessed 9.29.16).

OpenID Authentication 2.0 - Final [WWW Document] n.d. Available at: http://openid.net/specs/openid-
authentication-2_0.html (accessed 11.3.16).

Pleiades Gazetteer [WWW Document] n.d. Available at: https://pleiades.stoa.org/ (accessed 9.29.16).
RECOGITO [WWW Document] n.d. Available at: http://pelagios.org/recogito (accessed 9.29.16).
Research Data Collections WG [WWW Document] n.d. Available at: https://rd-alliance.org/groups/pid-

collections-wg.html (accessed 9.29.16).
Sematia [WWW Document] n.d. Available at: http://sematia.hum.helsinki.fi (accessed 9.29.16).
Shibboleth [WWW Document] n.d. Available at: https://shibboleth.net/ (accessed 11.3.16).
Standards for Networking Ancient Prosopographies [WWW Document] n.d. Available at: https://

snapdrgn.net/ontology (accessed 9.29.16).

https://doi.org/10.18352/lq.8028
https://doi.org/10.1045/march2016-padilla
https://doi.org/10.1045/july2016-rios
https://doi.org/10.17487/RFC3650
https://hypothes.is/


Almas: Perseids Art. 19, page 17 of 17

Syntactic Annotation Service Contract Description - v1.1.1 [WWW Document] n.d. Available at: https://
wikihub.berkeley.edu/display/pbamboo/Syntactic+Annotation+Service+Contract+Description+-+v1.1.1 
(accessed 9.29.16).

Syriaca.org: The Syriac Reference Portal [WWW Document] n.d. Available at: http://syriaca.org/ (accessed 
9.29.16).

The Ancient Greek and Latin Dependency Treebank by PerseusDL [WWW Document] n.d. Available at: 
https://perseusdl.github.io/treebank_data/ (accessed 9.29.16).

The BagIt File Packaging Format (V0.97) [WWW Document] n.d. Available at: https://tools.ietf.org/html/
draft-kunze-bagit-08 (accessed 9.29.16).

How to cite this article: Almas, B 2017 Perseids: Experimenting with Infrastructure for Creating and Sharing 
Research Data in the Digital Humanities. Data Science Journal, 16: 19, pp. 1–17, DOI: https://doi.org/10.5334/
dsj-2017-019

Submitted: 10 November 2016       Accepted: 17 March 2017       Published: 18 April 2017

Copyright: © 2017 The Author(s). This is an open-access article distributed under the terms of the Creative 
Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/
licenses/by/4.0/.
 

     OPEN ACCESS Data Science Journal is a peer-reviewed open access journal published by Ubiquity 
Press.

http://www.syriaca.org/
https://doi.org/10.5334/dsj-2017-019
https://doi.org/10.5334/dsj-2017-019
https://doi.org/10.5334/dsj-2017-019
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/