Theadok - Theaterdokumentation Collecting metadata of performances Klaus Illmayer IFTR 2018 – Belgrade Slides are licensed – if not stated otherwise – as CC BY 4.0 (Creator: Klaus Illmayer) https://creativecommons.org/licenses/by/4.0/ What is Theadok? ● Find out at: https://theadok.at ● Aim of Theadok ○ Collecting metadata of performances ○ Current focus: performances of dance and theatre in Austria between 1945 and 2001 ○ Gathering new data ● History of Theadok ○ Collecting material to performances (esp. theatre reviews) since the establishment of the Department of Theatre Studies at Vienna University in 1943 ○ In 1970s first attempts to create a database out of this material (and also to connect with other collections at other Departments) ○ Around 2000: OpenTheadok - a first web version of Theadok (based on CD: 50 years of theatre in Austria) ○ Since 2015 re-design of data model and website https://theadok.at/ Frontpage of Theadok: https://theadok.at (all screenshots taken on July 11, 2018) Zeile 1 Zeile 2 Zeile 3 Zeile 4 0 2 4 6 8 10 12 Spalte 1 Spalte 2 Spalte 3 https://theadok.at/ Frontpage of Theadok (scrolling down): https://theadok.at https://theadok.at/ Example of a search result: Searching for "radovic" https://theadok.at/search_thd?search_api_fulltext=radovic Dataset on a person found by the search: https://theadok.at/person/82719 (see identifier and references to authority files) https://theadok.at/person/82719 Dataset on a person (scrolling down): https://theadok.at/person/82719 (additional information coming from authority file GND, also seeing connections to works that are present in Theadok) https://theadok.at/person/82719 http://www.dnb.de/EN/Standardisierung/GND/gnd_node.html Dataset on a work where the person was involved: https://theadok.at/work/33378 https://theadok.at/work/33378 Dataset on a work (scrolling down): https://theadok.at/work/33378 (person is author of the work; see also the relation to a performance of the work that is registered in Theadok) https://theadok.at/work/33378 Dataset on a performance related to the work: https://theadok.at/performance/4722 (see the fields in the group "Relations") https://theadok.at/performance/4722 Dataset on a performance (scrolling down): https://theadok.at/performance/4722 https://theadok.at/performance/4722 Dataset on a performance (scrolling down): https://theadok.at/performance/4722 https://theadok.at/performance/4722 Dataset on a performance (scrolling down): https://theadok.at/performance/4722 https://theadok.at/performance/4722 Data model (simplified) There are quite a few performance focused databases, here some lists: ● Nic Leonhardt: Digital Humanities and the Performing Arts: Building Communities, Creating Knowledge , 2014. ● Vincent Baptist: Inventory of European Performing Arts Data Projects, 2017. ● Klaus Illmayer et al: Zotero group “Digital Humanities in Theatre, Film, and Media Studies, 2016- ongoing. Collections of performance metadata exist on paper already since a long time (fruitful next step: get them into databases). Comparable projects https://f-origin.hypotheses.org/wp-content/blogs.dir/1944/files/2014/09/Nic-Leonhardt_DH-and-the-Performing-Arts_June-2014.pdf https://f-origin.hypotheses.org/wp-content/blogs.dir/1944/files/2014/09/Nic-Leonhardt_DH-and-the-Performing-Arts_June-2014.pdf https://public.tableau.com/profile/v.baptist#!/vizhome/InventoryofEuropeanPerformingArtsDataProjects_0/InventoryofEuropeanPerformingArtsDataProjects https://www.zotero.org/groups/494335/digital_humanities_in_theatre_film_and_media_studies? Why another performance database? It is based on an older database → does imply a specific data model Lack of tools/databases, that can be shared/used easily → maybe better to have different instances and domains of databases Paradigm change: Connecting between databases more important than using the same database Connecting data, e.g. between Theadok and the archive at the Departement of Theatre, Film, and Media Studies at Vienna University ● Different Domains ● Theadok: Performance oriented ● tfm Archive: Material oriented ● Example of such a connection: Anatol ("has material in archive") https://theadok.at/performance/15893 Example of a connection between databases: https://theadok.at/performance/15893 (look at Materials, "has material in archive") https://theadok.at/performance/15893 Connection between databases: related material to a performance in the tfm archive https://archiv-tfm.univie.ac.at/record-set/680 https://archiv-tfm.univie.ac.at/record-set/680 Connection between databases (scrolling down): https://archiv-tfm.univie.ac.at/record-set/680 (see "is associated with" in the "Concept/Thing relations"-group) https://archiv-tfm.univie.ac.at/record-set/680 What to collect in a performance database? Different possibilities: material to performances / metadata to performance / etc. Theadok collects metadata because: - it has done so before - it focuses on structured data, connecting entities - specialization on metadata, but referencing via links to other sources Metadata on performances as research data Difficulties: data quality, establish references, connect via identifiers, tools for (semi)automatically connection necessary on both sides Performance metadata as research data ● Theater researchers need to understand (meta)data on performance as research data ● Such data should be converted into digital structured data … … and it should be shared with others ● Theadok as a platform that enables researchers to put their data into digital collections and (let) re-use it ● Enrich this data with additional information, research results, data from other sources ● Combine theater research methods with digital methods Research data life cycle ● How to gather research data and how to further work on this data, e.g. collections in Theadok. ● Example of a research data life cycle: UK Data Archive life cycle model Copyright of graphic and related text: University of Essex, University of Manchester and Jisc https://www.ukdataservice.ac.uk/manage-data/lifecycle FAIR data principles ● Work-in-progress: Applying FAIR data principles on Theadok, https://www.go-fair.org/fair-principles/ By SangyaPundir (Own work) [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons: https://commons.wikimedia.org/wiki/File%3AFAIR_data_principles.jpg https://www.go-fair.org/fair-principles/ https://commons.wikimedia.org/wiki/File%3AFAIR_data_principles.jpg Ongoing effort to enable interoperability, e.g. how to connect data of IbsenStage with Theadok? Example: Search for the Austrian "Burgtheater" in Ibsenstage https://ibsenstage.hf.uio.no/search?searchwords=burgtheater&type=venues&restrictyear=&submit=Search Enabling interoperability: Search for "Burgtheater" in Theadok Different informations, would be interesting to combine them https://theadok.at/search_thd?search_api_fulltext=burgtheater Enabling interoperability: Performances of works from Ibsen at Burgtheater (Ibsenstage): https://ibsenstage.hf.uio.no/pages/venue/14147 https://ibsenstage.hf.uio.no/pages/venue/14147 Enabling interoperability: Performances of Gespenster by Ibsen at Burgtheater (Theadok): https://theadok.at/stage/50296 (choosing Operator "Contains" and Title "Gespenster") https://theadok.at/stage/50296 Enabling interoperability: Details on a performance of Iben's Gespenster at Burgtheater (IbsenStage): https://ibsenstage.hf.uio.no/pages/event/88351 https://ibsenstage.hf.uio.no/pages/event/88351 Enabling interoperability: Details on a performance of Iben's Gespenster at Burgtheater (Theadok): https://theadok.at/performance/11649 (see "same as" in group "Relations"). But there is also different data details on the performance. Linking datasets is a first step, next step would be to share data. https://theadok.at/performance/11649 Theadok as part of a digital Infrastructure How to introduce such an infrastructure for theater research projects? ● We need to establish better communication between databases! ● Use of APIs (Application Programming Interfaces): for getting data in a structured, machine-readable format + for doing (semi)automatically analysis and data exchange ● Use of Vocabularies (see: https://vocabs.acdh.oeaw.ac.at/en/ ): agreements on the used terms, concepts, methods; they need not to be the same, but we need to formalize similarities (see also: Jonathan Bollen: Data Models for Theatre Research: People, Places, and Performance. In: Theatre Journal 68, 2016, 615-632, DOI: https://doi.org/10.1353/tj.2016.0109 ) ● We need identifier services for performance data (like DOI for documents), to find and connect similar data sets (see also: Miguel Escobar Varela and Nala H. Lee: Language documentation: a reference point for theatre and performance archives? In: International Journal of Performance Arts and Digital Media, 2018, DOI: https://doi.org/10.1080/14794713.2018.145342 ) ● Documentation and sharing of data model of databases is crucial. Helpful are abstractions of data models as ontologies > can help to map data between domains, e.g. Swiss Performing Arts Data Model https://vocabs.acdh.oeaw.ac.at/en/ https://doi.org/10.1353/tj.2016.0109 https://doi.org/10.1080/14794713.2018.145342 https://old.datahub.io/dataset/spa-data Better communication via APIs, e.g. Theadok API with JSON-Output Connect research data infrastructure on different levels Example of connecting research data infrastructures: PARTHENOS See: http://www.parthenos-project.eu/ "PARTHENOS aims at strengthening the cohesion of research in the broad sector of Linguistic Studies, Humanities, Cultural Heritage, History, Archaeology and related fields through a thematic cluster of European Research Infrastructures, integrating initiatives, e-infrastructures and other world-class infrastructures, and building bridges between different, although tightly, interrelated fields. PARTHENOS will achieve this objective through the definition and support of common standards, the coordination of joint activities, the harmonization of policy definition and implementation, and the development of pooled services and of shared solutions to the same problems." Gives support for research communities but also needs sharing of data from research communities. http://www.parthenos-project.eu/ https://www.parthenos-project.eu/ What to do next? ● Infrastructure for an identifier service/authority file for entities that are specific to theater research ● Establish and discuss vocabularies ● Build up mappings to the databases, that allow such connections ● Recommendations for data models, technical solutions ● Best practices, especially methods and use cases that a database should be able to handle ● Support for connecting datasets between databases ● Linked open data endpoints for complex queries Folie 1 What is Theadok? Folie 3 Folie 4 Folie 5 Folie 6 Folie 7 Folie 8 Folie 9 Folie 10 Folie 11 Folie 12 Folie 13 Data model (simplified) Comparable projects Folie 16 Folie 17 Folie 18 Folie 19 Folie 20 What to collect? Performance metadata as research data Folie 23 FAIR Folie 25 Folie 26 Folie 27 Folie 28 Folie 29 Folie 30 Thedok as part of a digital Infrastructure Folie 32 Folie 33 Folie 34 Research data infrastructures How to create a network in the discipline and outside?