The ‘assertive edition’ RESEARCH ARTICLE The ‘assertive edition’ On the consequences of digital methods in scholarly editing for historians Georg Vogeler1 Published online: 11 May 2019 # Springer Nature Switzerland AG 2019 Abstract The paper describes the special interest among historians in scholarly editing and the resulting editorial practice in contrast to the methods applied by pure philological textual criticism. The interest in historical ‘facts’ suggests methods the goal of which is to create formal representations of the information conveyed by the text in structured databases. This can be achieved with RDF representations of statements extracted from the text, by automatic information extraction methods, or by hand. The paper suggests the use of embedded RDF representations in TEI markup, following the practice in several recent projects, and it concludes with a proposal for a definition of the ‘assertive edition’. Keywords Digital scholarly edition . History. RDF (Resource Description Framework) . Semantic web . Historical documents . Critial edition . TEI (Text Encoding Initiative) 1 Introduction The approach to scholarly editing used by historians differs from the approach used by literary scholars. Both historians and literary scholars share an interest in a good text created by textual criticism, as texts are the main sources on which historians draw in their constructions of narratives about history. Nevertheless, historians can have a slightly different approach to text: linguistic and physical aspects are considered mere intermediates to the information conveyed by the text. Historians consider the content of the text ‘data’, and they want to use this data in their research to gain knowledge about the past. The circumstances under which archival documentation as a major type of text with which historians work was created support their perception of text: people recorded administrative activities in text to preserve information about these activities International Journal of Digital Humanities (2019) 1:309–322 https://doi.org/10.1007/s42803-019-00025-5 * Georg Vogeler georg.vogeler@uni-graz.at 1 Zentrum für Informationsmodellierung, Universität Graz, Graz, Austria http://crossmark.crossref.org/dialog/?doi=10.1007/s42803-019-00025-5&domain=pdf mailto:georg.vogeler@uniraz.at for contemporary but absent clerks or for future clerks. In other words, they stored data in texts written on paper. In pre-digital editorial practice this can lead to decisions which are unacceptable to literary scholars, such as paraphrasing parts of the text. I will try to show that in digital scholarly editing the approach to editing used by historians can be reconciled with methods of textual scholarship. I suggest calling this combined method ‘assertive editing’ to avoid the impression that this method can only be used by historians. The method of assertive editing is not defined by disciplinary interests but by an interest in one facet of text: the information recorded. In terms of Patrick Sahle’s text wheel (Sahle 2013:III,45-49), the assertive edition is the editorial practice dedicated to the ‘text as content’ perspective. In the following I usually will oppose this ‘content’ to the ‘text’ as pure transcription and the result of textual critical work. 2 Contributions to the assertive edition Assertive editing is fed by two streams in pre-digital and early digital scholarship. The ideas of content-oriented navigation, the possibility of multiple forms of representation, and extensive historical commentary are drawn from pre-digital editorial practice in historical research. I will try to show this by presenting on three major German historical printed editorial series: The Monumenta Germaniae Historica (MGH), the Records of the Early Modern Imperial Diet (BReichstagsakten^), and the Official Minutes from the Imperial Chancellery (BAkten der Reichskanzlei^: Bundesarchiv 1982). 2.1 Pre-digital contributions To facilitate navigation and reception, the editions in the MGH Diplomata series prepends abstracts of the legal core to each document. This is common practice in European charter editions, and it was codified by a committee of historical editors under the direction of Robert-Henri Bautier in 1974 (Bautier 1976:13, 17). More recent Diplomata series editions dedicate a paragraph in the introduction of each document to the historical context (e.g. the charters of Emperor Frederic II by Koch 2002-2017). Some MGH editions of historiographical texts indicate the year to which the current text refers in the margins (e.g. Georg Waitz’s edition of the Historia Danorum Roskildensi, 1892:21-26). This helps the readers find the events in which they are interested. Of course, abstracts serve more purposes than simple navigation. In the editions of the Imperial Diets, abstracts replace some of the documents (e.g. Heil 2014, 87-91). Dietmar Heil describes the interest of the editors: BThe priority is … philological authenticity, but optimal accessibility^ (2015, 29, trans. Georg Vogeler). They reduce historical orthography and change punctuation when it deviates from the modern syntactical analysis of the text (Heil 2015, 29-31). Editors of correspondence have also considered this approach (Steinecke 1982). Editorial work in contemporary history is defined by the selection of significant material and contextualization of the text. The editors of the Minutes of the Cabinet of the German Federal Government (Bundesarchiv 1982), for instance, explain their 310 G. Vogeler selection by relevance of content, discarding as irrelevant for instance the agenda in the head of each minute, invitations, and their attachments. This content-oriented approach can be found in other editorial principles of this edition. Orthographic and syntactic errors are emended without notice, for instance. Single entries start with a heading, persons present, and the place and time of the meeting, not as a verbal copy but as an extract created by the editors. The Minutes also serve as an example of the third element of pre-digital editorial practice. They add extensive notes on the subjects of the meetings to each transcript with the aim of making the texts understandable. This kind of annotation is not specific to this one edition, but is generally recommended in historical editing (Cullen 1981; Stevens and Burg 1997, 157). The edition of the Minutes of the Bundeskabinett serves primarily to illuminate government decisions, rather than their wording. Similarily, many MGH editors add extensive comments on the historical context, e.g. in the pre- publication of anonymous continuations of Frutolf 1101-1106 (Marxreiter 2018). These approaches have been directly transferred into electronic editions. The idea of facilitating the understanding of the text accepts translations as a way of editing. This leads to solutions like David Postles’ online representation of Stubbington medieval records (Postles 2011), which gives the text in a translation of the original Latin. This not an individual practice, as P.D.A. Harvey discusses in his introduction to historical translation as a method in editing (2001, 31-32). From a historian’s point of view, a translation is a sensible solution, as it facilitates the use of the document. It would not satisfy the research interests of textual scholars. Paul D.A. Harvey argues that the edition of historical records can be reduced to a calendar of abstracts when the original or photocopies of the records are easily accessible (2001, 56-59). Several project follow an approach of this kind. The Records of the Swiss Foreign Office (Zala et al. 1978–2018) replaces the transcription with images. This calendar plus image approach is also used by Soundtoll registers (Veluwenkamp and van der Woude 2009; Gøbel 2010) and Peter Rauscher and his colleagues in the Donauhandel project (2008-2018). Both create databases with struc- tured information directly from the source and link it to images of the source. 2.2 Early digital contributions Historians’ interests in the ‘facts’ and the dominance of sociologic approaches to history in the 1960s to 1980s led them to create ‘databases’ of historical information (Boonstra et al. 2004). A famous example of this approach is the Online Catasto of 1427 (Herlihy et al. 2002), an online edition created by R. Burr Litchfield and Anthony Molho based upon David Herlihy and Christiane Klapisch-Zuber’s project Census and Property Survey of Florentine Dominions in the Province of Tuscany, 1427-1480 (1978; Herlihy 1964; Herlihy 1967). The data keeps close to the source, copying the information on wealth recorded for each taxable household in the city (as it is found in the initial tax declarations of 1427 plus additions and adjustments made in 1428 and 1429). Seeing historical records as an accidental medial solution to preserve and process information, one could consider this database a simple change in recording medium, not in information itself. The needs of the recording medium require substan- tial changes in the recording method. Herlihy / Klapisch-Zuber had to create new encodings and had to break the text rigorously into table columns. In the end, The ‘assertive edition’ 311 the database tries to recreate the information recorded by the Florentine offi- cials, addressing three essential questions: who had to pay what amount of taxes for which kind of property. Philological editors certainly cannot consider this database an edition. The encoders did not copy family names, names, and patronymics letter by letter, but standardized them and truncated them when they went beyond ten letters. Historians were well aware of the modifications that database encoding made to the original records. In the 1970s, however, digital scholarly editing was not yet developed enough to provide a solution. The concept of scholarly editing does not even appear in the more recent book on Historical Information Science by Lawrence McCrank’s (2002). At the time, computing methods in the historical sciences chiefly meant the production of relational databases and spreadsheets. In the 1980s, Manfred Thaller proposed a historical database system that kept closer to the original source (1980, 1988, 1992, 1993). He developed the Clio database as a ‘source oriented’ database. It would reduce the amount of encoding and transformation of the source customary. Clio kept as much information from the source as possible by allowing for hierarchical organization of information, better representation of incom- plete data, and integration of alternatives and comments. This source-oriented database approach is clearly a type of editorial work, combining text from the source with interpretation by and for historians. At the same time, a philologist would regret the lack of a full transcription. 3 Digital editions and facts Digital scholarly editing has developed since the days of Clio and has built upon the methods developed for the MGH, the Reichstagsakten, and the Akten der Reichskanzlei. The assertive edition developing out of these strands is something between pure textual representations and well-formed databases structured around specific research questions. No edition yet calls itself an assertive edition, but many bear features that fit the definition put forward here. A selection may be found by searching Patrick Sahle’s catalogue for Bgeneral subject area: history^ (Sahle 2008– 2017). Browsing through the projects on the list, one can identify four major questions: 1. Which interface elements are typical for an assertive edition? 2. How can we use automatic information extraction processes in the scholarly edition? 3. Is semantic markup (provided by the TEI) sufficient? 4. How can we integrate the Web of Data (the ‘Semantic Web’) into scholarly editions? 3.1 Interface elements Editions like the letters of Alfred Escher (Jung 2012–2018), the Acta Pacis Westfalica (Lanzinner and Braun 2014), and the Diplomatic Correspondence of Thomas Bodley 1585-1597 (Adams 2011) offer avenues of access to the text beyond the pre-existing 312 G. Vogeler textual structure. Typically, tools include indices of persons, places, and subject keywords. Other entry points to the texts show better what an assertive scholarly edition would concern itself with: APW, for instance, gives access via a timeline of events, a calendar of relevant dates, and a map. Indeed, indices of persons, places, and events and calendars and maps are fast becoming default components for historical digital editions. Additional fact-oriented interface elements seem to depend more on the type of documents edited: rich prosopographical information like in correspondence suggests using network visualisations, for instance in the diplomatic correspondence of Thomas Bodley (Adams 2011, visualisations). Economic information suggests the use of bar charts to visualize income and expenditure, as in the case of the edition of the municipal accounts of Basel 1535-1611 (Burghartz 2015, Konten). The latter builds upon the source-oriented database approach advocated by Manfred Thaller by allowing the user to select entries from the accounts and collect them in a ‘data basket’ (Burghartz 2015, databasket). This allows the user to perform basic arithmetic opera- tions and download the results as a spreadsheet. Finally, semantic networks like those used in Burkhardt Source (Ghelardi et al. 2015) hold some general promise, but for the moment they remain lonely solutions for single projects. 3.2 Information extraction The user interfaces, of course, are only the surface of the edition. How does one harvest information? What form does the information take as digital data? Which models relate the information to the transcription? One approach to data harvesting from texts is automatic information extraction. Computer linguists have been working on this since the 1950s. Their goal is to reduce free prose text to answers to the questions BWho did what to whom and when?^ and represent these answers in a structured way. A typical information extraction pipeline starts with generic Natural Language Processing steps and then uses Named Entity Recognition to mark up the words representing persons, locations, or organizations, temporal data, and quantifying data. The pipeline then relates these entities to one another, building connections between the entities. This can take the form of predicates in sentences, coreferences by pronouns, etc. The possible relationships can be inferred from external knowledge about the domain, like dates of birth and death for people mentioned in a text, or it can be the result of the semantic role, such as can be inferred from the predicate in a sentence. The task is very domain-specific, as it depends on what type of information is considered relevant. A typical task for historical research could be event extraction, which is already applied to automatic news analysis (see Grishman 2015 for a general introduction). Recent projects dealing with US foreign affairs records have taken this approach to transcripts of archival documents. They take the historical records as source data without any intermediate scholarly processing. Using OCR to create a digital represen- tation of the text, scholars then apply distant reading methods like topic modelling or information extraction to this corpus (e.g. Kaufmann 2014–2018). Gao et al. (2017) have used even used the electronic texts of the cables in the 1970s for their computer- based analysis. The aim of implementing this approach in scholarly editions would be to create a reliable text with classical textual criticism and to extract from this text the information for historians. Existing information extraction methods are built for The ‘assertive edition’ 313 modern texts, and thus they have to be modified to be applicable to historical texts or historical texts have to be modified to come closer to modern texts. Piotrowski (2012) has described the many challenges in this task. Some progress has been made e.g. in the handling of variants in historical language, for instance by Bryan Jurish (2008, 2010, 2011, 2013) or Kestemont et al. (2017). However, most of the problems still remain to be solved. Scholarly editors still have to rely on their own competence and on human labour for the introduction of substantial knowledge about what people in the past wrote in their texts. 3.3 TEI and semantic markup The problems computers still have with historical languages led to the decision to create manually annotated texts. Digital editions use the extensible mark-up language XML to add semantic markup to texts. This is made possible in particular by the strong connection between the communities maintaining the guidelines of the Text Encoding Initiative (TEI) with the community of digital scholarly editors. TEI provides semantic annotation for many phenomena interesting to historians: names of persons, locations, or organizations can be encoded as , temporal expressions as and