PEMM Charter - for sharing (2).pdf The charter is the foundational document that describes the rationale, goals, plan of work, resources needed, terms and conditions, and outcomes of a Center for Digital Humanities at Princeton (hereafter CDH) project. Charters are written by core members of a project team in a series of planning meetings taking place over the course of a month. The planning process is intensive, collaborative and requires substantial input from everyone on a team. Charters serve as formalized agreements among all team members on such crucial questions as scope, technical design, infrastructural needs, and success criteria. A draft of each project charter is peer-reviewed by all CDH staff, and optionally by additional partners or stakeholders, at a “design review” before the start of project work. It is circulated at least one week before the review takes place in an open comment period. Questions and concerns from this period may be raised at the design review. Project teams have two weeks after the design review to address any issues raised and make any requested changes. Project work only begins (and funds are released) once the charter has been finalized and signed by the Project Director (PI) and the CDH Faculty Director. Charters are amended as necessary throughout the project lifecycle to document major changes and note when “Built by CDH” Software Warranty and “Built by CDH” Long Term Service Agreement take effect, and serve as part of the CDH project archive. CDH charters and their planning documents exist in several forms as we have refined them over the years and tailored them to the several types of projects we have supported. For more about CDH project management, including the charter process, visit: https://cdh.princeton.edu/research/project-management/ Cite this document: Belcher, Wendy Laura, Rebecca Sutton Koeser, Rebecca Munson, Gissoo Doroudian, and Meredith Martin. CDH Project Charter — Princeton Ethiopian Miracles of Mary 2019-20. Center for Digital Humanities at Princeton. 2019. http://doi.org/10.5281/zenodo.3359178 PEMM Charter (2019-20) Part I: Project Overview Stories have been told for almost two millennia about the Virgin Mary, the mother of Christ, and the miracles she has performed for the faithful who call upon her name. One of the most important collections of such folktales is the body of over 700 Ethiopian Marian miracles, written from the 1300s through the 1900s, in the ancient African language of Gəˁəz (also known as classical Ethiopic). These story collections, called the ​ Täˀammərä Maryam ​ (The Miracles of Mary), are central not only to the ancient church liturgy of Ethiopia, but to the daily felt and religious life of 50 million Ethiopians and Eritreans. Princeton University has in its Firestone Library one of the largest and finest collections of Marian miracle manuscripts anywhere in the world outside of Ethiopia, with over 130 codices and hundreds of textual amulets. Worldwide, at least 100,000 ​Täˀammərä Maryam ​manuscripts exist, some with just a handful of stories, some with hundreds, and many with different versions of the same stories. While the ​Täˀammərä Maryam ​ is one of the most important African archives of texts, basic information about it and its stories are lacking; as a result, scholars can authoritatively state almost nothing about them. How many are there? When was each written? What themes do they have? Have these African stories grown and changed across regions, languages, and periods? Princeton Ethiopian Miracles of Mary (PEMM) project will collect and collate information about these hundreds of stories across hundreds of manuscripts as the basis for an open access resource that will enable researchers and Ethiopian community members around the world to conduct in-depth research on this vital corpus. Wendy Laura Belcher, professor of African literature in the departments of Comparative Literature and African American Studies will serve as the project’s Principal Investigator (PI). PEMM was begun with a Center for Digital Humanities at Princeton (CDH) Dataset Curation grant. Description and Objectives With the guidance of the CDH, the PEMM project team will collect and collate data about hundreds of Marian miracle stories in hundreds of Ethiopian manuscripts. Our aim is to enable computational analysis of this vital corpus of African folktales and to generate answers about their number, dating, origin, provenance, themes, recensions, translations, sources, placement, and diachronic change. The CDH and PEMM will design a robust data structure to migrate, store, connect, validate, and query the data. We will discuss a preliminary web interface with sample data visualizations that will make that data available to scholars in the United States, Europe, and Ethiopia; this interface may not be possible in AY20. 1 Relevant Resources and Projects PEMM builds on the work of previous scholars, using resources created by others: the most important for PEMM is the Macomber Handlist, and the finding aids of the Princeton University Rare Books and Special Collections. We hope to reference the manuscripts in the British Library. We will also make use of and share data with both the Oxford Cantigas de Santa Maria (CSM) database and the University of Hamburg Beta Maṣāḥǝft (Hamburg BM) Database with the intention of avoiding overlap as much as possible. For complete information about what has been collected and catalogued thus far about the ​Täˀammərä Maryam, ​please see Appendix A. Research Questions We are collecting and collating data in AY20 in order to answer three main research questions. ● How many Ethiopian Marian miracle tales are there? ​ Scholars have not been able to arrive at an accurate number of how many Ethiopian Marian tales there are despite a century of labor on the issue. One scholar says 540, another says 643; a current database project (the Hamburg BM) has likely identified over 700. Meanwhile, Princeton has some stories in its manuscripts that appear on none of those lists. PEMM has access to what those scholars did not have: thousands of digitized manuscripts (instead of dozens) and sophisticated ways of curating and analyzing the data in those manuscripts (see below). ● What are the themes of the Ethiopian Marian miracle tales? ​ ​Macomber’s 1980s catalog provided keywords for some of the tales, but many of the terms were dated, insufficient, and inconsistently applied. By enhancing dataset with better keywords, by refining and standardizing keywords from a controlled vocabulary and consistently applying them, PEMM will give scholars access to an accurate dataset of tale themes and ways of studying how tales correlate with those keywords. We will build a controlled vocabulary from combining and refining Macomber’s handlist, the Hamburg BM, the Oxford CSM, and consulting the Index on Medieval Art. ● What is the origin of each​ ​Ethiopian Marian miracle tale? ​ ​No one has clearly established which of the tales were originally from Europe or the Middle East. Some say only 33 of them, others say 75 of them, but no one has done the work to be certain. This matters not only to Ethiopianists, but to scholars working on the European Marian tales. Our work correlating the Ethiopian Marian tales with the tales in the Oxford CSM database may enable scholars to discern patterns across and analyze indigenous, European, and Middle Eastern Marian tales. 2 Project Significance Significance for African and Literary Studies PEMM is a historic project with a range of scholarly contributions: ● Makes a disciplinary contribution​: These folktales about the Virgin Mary are rich repositories of cultural knowledge and literary practice, providing a matchless comparative literature site to study tales across continents, languages, and periods. Comparative literature remains a largely Eurocentric discipline and the Marian miracle tales have seldom been studied outside of their European iterations. PEMM provides a useful corrective to such limited approaches and does so through pairing two innovative comparative literature methodologies: distance reading and world literature. ● Fills a scholarly gap​:​ African literature in general, and eight centuries of Ethiopian written literature in particular, are criminally understudied. PEMM will enable more scholars to do more research on African literature. It will also bring greater global visibility to this vital corpus through a web interface. ● Serves an underserved community: ​ ​The number of digital humanities projects that focus on African literature is miniscule. Indeed, perhaps the only other initiative is the Programme in African Digital Humanities, 2018–2023, at the South African universities of Cape Town, Pretoria, Stellenbosch, Western Cape and the Witwatersrand, which “ ​aims to examine ​the current forms and practices of reading and digital publishing in order to encourage and support self-directed, digital literary enquiries in the South African humanities environments.” PEMM is part of increasing the number of digital humanities projects that focus on Africa. ○ For instance, the annual ​Digital Humanities conference in Utrecht for 2019 decided to have an Africa focus, working to give funding for Africans to attend and noting that African DH projects “cover the spectrum of DH topics in a somewhat different way than elsewhere.” However, only one panel was about Africa: “African Languages And Digital Humanities: Challenges And Solutions,” (which has a linguistics focus). Only one of the paper presentations seemed to be about Africa. Isabelle Alice Zaugg’s “Global Language Justice in the Digital Sphere: The Ethiopic Case” is “an instrumental case study of Unicode inclusion and the development of supports for the Ethiopic script and its languages.” Thus, the DH focus seems to be on methods, underscoring the need for a project like PEMM that focuses on literature. ● Collects scattered information in one place. ​ ​Ethiopian literature has been the subject of study for some centuries. There are large repositories of Ethiopian manuscripts inside and outside of Ethiopia, and massive cataloging and digitizing projects have been underway for the past sixty years. But, little of this information is available online, in one place, in English, for computational analysis. 3 ● Focuses on literature. ​ Most projects that focus on Ethiopian manuscripts do not attend to literature. They are linguistic or philological in nature, focusing on manuscripts as material objects, and tend to prioritize biblical books. PEMM is focused on African stories and their themes, providing a useful corrective to the overemphasis on influence and apparatus and underemphasis on African thought and creativity in Ethiopian studies. ● Increases information about and access to stories. ​Of the tens of thousands of Täˀammərä Maryam ​ in existence, only a few hundred have been catalogued, and only a few dozen of those have been cataloged with any detail; that is, naming the exact tales in that manuscript. PEMM will increase the number of cataloged ​Täˀammərä Maryam manuscripts. ● Provides foundation for Belcher’s book. ​Belcher works on Ethiopian literature in general and has a book in progress on ​Täˀammərä Maryam ​, titled ​Ladder of Heaven: The Miracles of the Virgin Mary in Ethiopian Literature and Art ​. It is a book of literary analysis, which will appear with many gorgeous illuminations of the tales from Princeton’s manuscripts. Given the dearth of information about these stories, PEMM will provide a necessary basis for the writing of this book. Significance for Digital Humanities The CDH’s approach to this particular project will also serve to make existing tools and approaches more robust and more useful to Digital Humanities researchers who do not have the support of a development team. ● Google Sheets as a simplified relational database. ​We will develop and document a model for working with Google Sheets as a simplified relational database, and exploring the possibilities offered by an exportable static site based on relational data exported from Google Sheets. Working with spreadsheets and Google Sheets is obviously not new for data work or for Digital Humanities. However, it seems clear that there is a 1 need for a data curation and management solution that sits somewhere between a spreadsheet and a relational database. By applying CDH Development & Design Team 2 skills and expertise, we will push these technologies forward in a way that will benefit others, including those doing data curation and graduate students working on their own Digital Humanities projects. Our approach will include structuring the data across multiple sheets as a simplified relational database, considering the spreadsheet as a user interface, providing enhanced functionality via scripting, and documenting the data structure and the implementation. We will also write scripts to automate data export, data validation and reporting--tools which have the potential to be generalized for wider use. 1 For example, see Matthew Lincoln’s post on using Google Sheets as part of a Getty data migration project. https://matthewlincoln.net/2018/03/26/best-practices-for-using-google-sheets-in-your-data-project.ht ml 2 See for instance the popularity of products like ​Airtable ​ or the existence of projects like ​NodeGoat ​. 4 ● Furthers DH work with static website technologies​. Using static website technologies, as championed by the ​Minimal Computing ​ working group, is also not new for Digital Humanities, although it is new for a CDH sponsored project supported by the Development & Design Team. Our commitment to development best practices and documentation will help further work being done by others to make static sites more accessible to scholars. In addition, for this project the possibility of sharing the results of our work through alternate means is particularly appealing. ● Takes advantage of new initiatives. ​ Scholars have access to only a fraction of Ethiopian manuscripts, as most are in remote monasteries. Only a few have been digitized; some estimates put the number as low as 10 percent of all Ethiopian manuscripts. Fortunately, over the last decade, we have seen a huge push to digitize, as Ethiopian manuscripts are globally recognized as an important endangered archive. PEMM takes advantage of this newly available archive. ● Strengthens partnerships​ with scholarly communities in Oxford and Hamburg by establishing a model for sharing data across different projects using different technologies. Audiences The audiences for this project are multiple and overlapping. ● Scholarly audiences for the data: ​ Scholars of Ethiopian literature (mostly scholars in Europe and North American, Ethiopian and non-Ethiopian) will find this information useful for doing their own research, in particular, the three existing comparative projects will be interested in the data we produce: the Hamburg BM project, the Oxford CSM Database, and the Miracula Mariae project. ● Scholarly audiences for a public interface: ​ Clerical scholars (including Ethiopian clerics); non-clerical scholars (scholars of Marian miracles); and students (undergraduate and graduate) ● Non-scholarly audiences for a public interface​: Ethiopian priests interested in sources and themes for writing sermons. Project Team Project Director​ (a.k.a. Project PI): Professor Wendy Belcher ● Leads and champions project ● Learns enough about project’s technical components to be able to describe it on a basic level ● Oversees, participates in, and delegates project work ● Attends regular project meetings (approximately twice a month during periods of active development) ● Supervises and guides Project Manager; keeps open line of communication, responds to emails and questions in a timely manner; alerts in advance of any disruptions or PI unavailability 5 ○ Support for PM may take the form of writing and commenting on charter, helping with project team coordination, approving project workflows, approving project publicity, or other duties TBD in consultation with PM, Technical Lead, and CDH Project Manager and Project Coordinator. ● If necessary, attends quarterly check-in with Technical Lead ● If necessary, submits quarterly data progress report to CDH ● Participates in acceptance testing on software development work ● Responsible for approving the work and progress of the data team ● Responsible for project budget, including and overseeing payment of students. ● Responsible for final project summary of accomplishments Project Manager​: Evgeniia Lambrinaki ● Maintains regular communication with team members, partners, and groups engaged in project work ● Helps to design and implement project workflows with PI approval ● Schedules and facilitates project check-in meetings (including creating an agenda); captures meeting notes ● Tracks progress on project goals and outcomes and communicating with CDH on project progress and/or issues ● Prepares and updates project documentation ● Responsible for overseeing the day-to-day work of the data team ● Responsible for acceptance testing on software features ● Responsible for project publicity (project page, blog) with PI approval Technical Lead: ​ Rebecca Sutton Koeser ● Oversees design and implementation of project’s technical aspects ● Acts as main CDH decision maker on project ● If necessary, modifies decisions about software tools and approach in order to more efficiently and effectively complete the project ● If needed, holds quarterly check-in meeting with PIs ● Has authority to make project decisions if PIs are unavailable ● Responsible for technical documentation at the conclusion of project CDH Project Manager​: Gissoo Doroudian ● Helps manage and coordinate development and design work ● Serves as a resource and point of contact for Project Manager ● Attends and helps create agenda for project meetings (with Project Manager) ● Helps design and implement project workflows ● Supplies periodic updates on project status and progress ● Decides in collaboration with Project Manager who will document meetings User Experience (UX) Designer: ​ ​Gissoo Doroudian 6 ● Collaborates on data structure and architecture for project data and advises on configuration and customization for data entry user experience ● Recommends the appropriate types of data visualizations to try with the project data (diagrams/maps) ● Thinks through and conducts user research on access for target audiences specific to this project ● Consults on collaborative and iterative design for content structure and website architecture ● Helps iteratively design a usable, and accessible interface CDH Project Coordinator​:​ Rebecca Munson ● Advises CDH Project Manager on coordinating development and design work ● Serves as a resource and point of contact for CDH Project Manager ● Advises on project workflows ● Supplies periodic updates on project status and progress ● When applicable, attends and helps to document project team meetings CDH Developers​: Rebecca Sutton Koeser, Nick Budak ● Develop or consult on data architecture and implementation ● Contribute to and review custom software developed for the project ● Document custom software and data architecture ● Write automated tests for custom software ● Experiment with and assesses potential project technologies (e.g. Jekyll, Wax, Hugo) and advise Technical Lead on decisions ● Prototype and implement custom data visualizations ● Provide consultation and training on tools such as OpenRefine to empower Project Director and other project team members to work with the project data Data Team (Student Researchers) ● Have skills in at least one of these languages: French, Italian, Amharic, Gəˁəz ● Catalog uncatalogued ​Täˀammərä Maryam ​ manuscripts using Macomber handlist identifiers and Gəˁəz incipits Budget [ Budget available upon request. ] 7 Part II: Grant Year 2019-2020 Plans - Data Data Status Types of Data and Storage Format The data are currently in structured text files, pdfs, Google Docs and Sheets, XML, and MS Word. Past FY19 Data Work Macomber Handlist cleanup. ​ ​In 2018-2019, Belcher and Lambrinaki converted the handlist from a PDF (a scan of a hand typed manuscript with many hand emendations in pen) into a structured text file. They cleaned up the file in Sublime Text, but there are still some errors, since it was incredibly garbled. This file will be converted into a Google Sheet titled “Macomber Canonical Stories.” Here is the structure of the text file: ● MAC###: ​ Macomber Marian Miracle Canonical Story identifier (some of these stories are various parts of the same story, so the identifier for a story may be something like MAC034A) ● Title: ​ English Title from Macomber ● Text ​: ​Secondary source that discusses this particular Canonical Story ● English Translation: ​ Translation of that story as it appears in one or two manuscripts (unfortunately, this is not currently a standard category in the structured file because most entries don’t have English translations; not sure if this needs to be made standard in all entries before transferring to Google Sheets) ● PEth​: Shelf number, beginning folio, and ending folio for where this particular Canonical Story appears in Princeton’s RBSC ​Täˀammərä Maryam ​manuscripts (shelfmark and folios needs splitting out). This field is often empty. ● EMIP​: Shelf number, beginning folio, and ending folio for where this particular Canonical Story appears in the EMIP digital repository. This field is often empty. ● MSS​: Shelf number and beginning folio only for where this particular Canonical Story appears in other repositories. Macober uses abbreviations for these (see list) ● EMML​: Shelf number and beginning folio only for where this particular Canonical Story appears in the HMML digital library. ● Keywords​: keywords from Macomber handlist with a few additional ones that Belcher and Lambrinaki came up with (problematic and needs to be updated, or categories missing) ● Incipit ​: Current text in this field is garbage and will be deleted. It will be globally replaced with Brown’s list of incipits, matching up using the Macomber identifiers. 8 Earlier data work​. ​Some earlier work was done correlating tales and manuscripts and themes in Excel (such as matching Princeton manuscripts to Macomber based on data from the Princeton finding aid), but those files are now out of date. Create a data structure. ​We designed a preliminary data structure for Canonical Stories, Story Instances,, as well as for manuscripts and incipits. Planned FY20 Data Work Over the next year, we will: ● Migrate data. ​ ​Project data (namely, Macomber’s handlist) currently managed as a structured text file will be migrated to Google Sheets by the CDH team, structured in multiple sheets based on planned data architecture (see Technical Design Plan section on Preliminary Data Structure, below) with data validation to share information across sheets. ● Enhance data​.​ Project data will be enhanced by the CDH team by importing data provided from other teams (e.g., those from the Hamburg BM project and Oxford CSM database). ● Match data. ​ Use Oxford CSM story data to identify Canonical Stories that came out of Europe and the Middle East, not Ethiopia (to distinguish between foreign and indigenous). ● Configure data validation. ​ ​Connect different sets of data without duplicating information (i.e., Gəˁəz Marian miracle manuscripts and Gəˁəz Marian miracle tales) ● Develop a controlled vocabular​y. ​Wendy and Evgeniia work with Macomber, Oxford CSM, Hamburg BM, and Index on Medieval Art’s controlled vocabularies to develop a controlled vocabulary for PEMM Canonical Stories. ● Develop a simple incipit tool (or outsource). ​Create a tool for searching Macomber’s standardized incipits so that research assistants can catalog manuscripts, preferably a dialog box in Google Sheets. Two challenges are homophones and recensions. The tool for searching must account for homophones, treating certain fidal letters as exchangeable (say ሀ and ኅ) because scribes easily substitute one for the other. Also, when searching incipits, research assistants must be careful about using proper noun searches. That is, a Canonical Story may give a name for the main character (e.g., Barok from Finqe [Phoenicia]), but a Story Instance in a particular manuscript may refer to him only as a “deacon” or as a “sinner.” Data Standards and Capture Procedures ● Controlled vocabulary​.​ We are developing and using controlled vocabulary lists (see above). ● Abbreviations​. ​ We are developing and will use abbreviations for repositories (e.g., BN not Bibliotheque Nationale). The data currently in the structured text file uses codes for repositories holding Ethiopic manuscripts. The repository abbreviations will be used to generate brief repository records in the new Google Sheets data structure, which will be used to document current locations of materials. 9 ● Organize data​. ​We use Google Drive and Google Docs and Sheets to organize the project. After migration to Google Sheets, data will be stored in multiple sheets of a single document in order to allow data validation and autocomplete-style lookups for related data. A regular, automated export will be set up to export Google Sheets data to a GitHub repository. We use Slack to communicate about it. ● Validate data​. ​In some instances, we have multiple researchers typing the same data, or cataloging the same manuscripts, to check accuracy. Grant Year Objectives – Data Main Outcomes/Deliverables: ● Dataset of Canonical Stories (publishable and citable) ● Dataset of 100+ MSS cataloged using list of Canonical Stories (publishable and citable) ● Import relevant data from Princeton Ethiopian manuscripts to Google Sheets ● Documentation of the new data structure ● Documentation of data entry workflow & processes 3 ● Cataloging of the MSS to provide to Hamburg BM project ● Linking to Hamburg BM IDs ● Export data from Google Sheets (possibly as XML) to share with Hamburg BM Possibly in Scope – Data: ● Linking PEMM Canonical Stories to other project by cross-linking identifiers from Hamburg BM, Oxford CSM database project (European stories) ● Linking to Index of Medieval Art ● Linking MSS records to Princeton University Libraries (PUL) digital editions in DPUL (Treasures of the Manuscript Division) ● Linking MSS records to EMML, if the digital edition is available Out of Scope – Data: ● Cannibal of Khmer project with The Textual History of the Ethiopic Old Testament (THEOT) ○ 90 MSS with the story carefully selected from different times and regions and typed to run through the software to compare versions. Won’t be done until October or November, but then all of that data will be available. May provide interesting information to help with this work but out of scope for this phase. ● Describing characters, objects, places, and other subject matter that occur in stories ● Annotating PUL digitized images or crowd-sourcing 3 ​CDH developers expect to write code to do the migration, but the code itself is not a deliverable sinces it's a means to an end and not something we are likely to reuse or generalize. 10 Project Needs – Data ● Research assistants with the language skills (either reading-level or at the level of recognizing the letters) ● Data structure for Google Sheets (see Technical Design Plan section on Preliminary Data Structure, below) ● Incipit tool ● Structured text file migrated into Google Sheets with customized data validation and formatting, including: ○ autocomplete on incipits with homophone search functionality, for matching particular stories to Canonical Stories ○ data validation for field types as appropriate, e.g. numeric values or sequential / increasing numbers for folio numbers within a manuscript. ● Regular check-ins with CDH staff including dev team and project management time. ● Training and support to query the data (e.g., OpenRefine training) Concerns – Data: Risks ● Possible overlaps with Hamburg BM; will use their data where possible, and hope to supply ours back to them to be incorporated, but want to avoid duplication of effort. In some cases we may use their data for checking and comparison. We have consulted them with this concern and they are willing to assist. ● Handling Fidel in Google Sheets and in search. One challenge is that the data will use Latin letters with diacritics (about 20 Unicode characters, like ə, ṭ, ṣ, ś, ä, ə, w, ḍ, ǧ, ḥ, ḫ, ḵ, č, ñ), as well as over 300 Ethiopic fidəl characters (also available in Unicode, like ወ፣ቀ፣ም፣ት). ● Different types of software that researchers use to handle and input Fidel ● Finding students with necessary language skills ● Changes with the scale of data may make Google Sheets unusable Interdependencies ● Sharing / linking to data without other projects ● Manuscript access (digital or otherwise) ○ Firestone Library began digitizing its Gəˁəz manuscripts; prioritizing the Täˀammərä Maryam ​. All ten are now digitized and ​online ​ in Digital PUL. 11 Data security consideration​s ● Structured text file is currently stored and shared via DropBox; a copy has been added to the PEMM Google Team Drive for backup. ● This project does not include any personal or sensitive data Data management plan ● After data migration, Google Sheets will be the primary canonical data source; GitHub will be a secondary source for backup and experimentation. Long-term preservation plan ● A released version of data exported from Google Sheets will be deposited with Zenodo or other repository for long term secure storage, and also to make it citable. ● Where appropriate, data will be exported and shared with other relevant projects. Future Plans Future Data Work ● Write precises of Canonical Stories. ​ ​A huge and difficult task will be writing short summaries of the 700+ indigenous Canonical Stories. Only those with an excellent understanding of Amharic, French, or Ge`ez will be able to do this work. Maybe 100 can be done from stories available in English translation. Or, perhaps, if the keywords are good enough, no precis is needed? ● Track word length of Story Instances. ​ ​This is a way of getting at the possibility of different recensions. ● Tag Canonical Stories with keywords. ​ Another difficult task will be using the controlled vocabulary list to better tag Canonical Stories. Macombere did tag 642 of them with keywords, but many remain and his list can be improved. Only those with a good level of Ge`ez and English will be able to do this work. ● Identify new Canonical Stories. ​ ​We need to give new identifier numbers, titles, themes, and incipits for Canonical Stories not in Macomber. We will use Hamburg BM identifiers where possible, but may need to do this a bit ourselves. ● Translate into Amharic. ​Translate titles, keywords, and website into Amharic. ● Design and write static website. ​This can be done in the last year. ● Compare Cannibal of Qemer transcripts​. Once Steve, Jeremy, Jonah, and Ashlee complete typing up all 90 versions of the Cannibal of Qemer tale, we will do computational analysis. Eventually, PEMM would like to answer the following question: 12 ● How did individual Ethiopian Marian miracle tales change over time and region? ​ ​PEMM is conducting a textual history of just one of the tales, called the Cannibal of Qəmər. We already know it has three quite different recensions, but we are trying to determine how, where, and when the tale differs. With new denogram comparative software, we can begin to establish recensions and compare them statistically. To do this, we are collaborating with The Textual History of the Ethiopic Old Testament (THEOT) Project, which has worked out the methods and workflow necessary to carry out textual histories of Gəˁəz texts. Part III: Grant Year Plans - Interface Grant Year Objectives – Interface Main Outcomes/Deliverables: ● Preliminary interface (prototype): ○ Feed from Google sheets → Github ○ Simple data viz including map ○ Showcasing images from PUL MSS via IIIF ○ Lightweight search and browse ○ Article or post to accompany visualizations Possibly in Scope – Interface: ● Support for multilingual site capacity ● Preliminary web interface design Project Needs – Interface: ● Geographic data to generate a map ● Script to pull data from Google sheets into static website Concerns – Interface: Risks ● Front-facing deliverables are dependent on progress in data work ● Working with fidel and amharic (multilingualism) ● Community push-back on making data on certain stories accessible and visible 13 ● Prototyping with static website technologies may have more constraints than we expect; CDH development team does not have substantial experience with static site technology ● Changes with the scale of manuscript data may impact static site performance ● If we shift from a static site to a database driven site in a future phase of the project, there are likely to be changes in the site structure and architecture Interdependencies ● Hamburg BM project ● Working with PUL IIIF + digitized content, but not annotating ● The Textual History of the Ethiopic Old Testament (THEOT) Project ● Rights on existing code for searching fidel Future Plans – Frontend: ● Distribution of static site with data and images on thumb drives ● Assess prototype and data work to determine how to expand, e.g. database driven site ● Apply to follow up Research Partnership grant for next phase of project development Part IV: Technical Design Plan Data In this phase of PEMM, we will migrate the data from the semi-structured text file into a Google Sheets spreadsheet comprised of multiple structured and related sheets with data validation configured to automatically connect data between different sheets within a single Google Sheets spreadsheet. The data for this project is highly relational and certainly could be implemented as a relational database, but given the phase of this project and the amount of data work still to be done, we chose Google Sheets. Sheets supports edits by multiple concurrent users and tracks versions, and working within a spreadsheet will allow project team members to manage and query the data more easily without developer intervention. We will model the data as if for a relational database (see Preliminary Data Model), but implement it in Google Sheets with an eye towards data entry usability and efficiency rather than a fully normalized database structure. As a first step, we will prototype the Google Sheets structure based on the new data architecture and determine the appropriate data validation, formatting, and any other configuration that is useful and necessary. This will also give us a chance to experiment with Fidel characters to make sure we can get everything working as expected. We expect to write one piece of custom code in Google Apps Script to support homophone searching in Fidel for an incipit lookup, which will 4 4 The public search interface for Hamburg BM project (​https://betamasaheft.eu/as.html ​) includes an option for homophone searching, and their help text includes a list of orthographic variants. We will use their implementation as a reference and their list of characters as a starting point for our implementation. 14 enable project team members to match stories in a manuscript with Canonical Stories from the Macomber catalog based on unique words or phrases. A prototype incipit lookup has already been created by Steve Delamarter and demoed to the team which we could purchase, but we prefer to design and implement something simple based on project needs. This will give us in-house expertise to maintain and support the tool, and we can iteratively refine it if necessary. We also plan to document as an example for others using Google Sheets for dataset work. If implementing the incipit search proves to be more difficult than anticipated, we will consult with Garry Jost or purchase the prototype. Once the Project Director has agreed to the data structure and tested and accepted Google Sheets functionality, CDH developers will write a script to parse the structured text file and convert into multiple CSV files for import into Google Sheets, which will create preliminary records for Manuscripts, Canonical Stories (i.e. those cataloged by Macomber) and Story Instance (a story as it occurs in a manuscript). Records for archival repositories that hold these manuscripts will be added manually to the spreadsheet by project researchers, since there are a small number and the structured text file does not supply the needed information. After data is migrated into Google Sheets, that will become the canonical data source for the project, and the project researchers can begin working on the data. After the migration is complete, CDH developers write a script to generate a regular, automatic export of the Google Sheets as CSV and/or JSON which will be added to a GitHub repository. The data in the GitHub repository will serve as both a versioned backup and as a data source for querying, visualization, and a prototype interface; it may also eventually be used to publish a citable version of the data via Zenodo or a similar service. The export will be powered by the “publish to web” functionality available in Google Sheets, if it is sufficient; otherwise it will be implemented with an existing Python Google API client to access the data. Additionally, if the Google APIs allow it without too much difficulty , we will use revision information to credit the project team members who have 5 made edits to the data as co-authors of the commit using the GitHub co-author syntax , as a way 6 of making the contributions of project team members a visible part of the record of the data. We may also implement continuous validation and reporting on the data in GitHub, making use of continuous integration tools that are usually applied to software code in order to automate regular data validation. Interface – Prototype Website If time permits, we will develop a prototype website as proof of concept which will allow us to experiment, try new technologies, get familiar with the data and working with Fidel characters. The ultimate goal is to know enough to decide how to proceed in the next phase of the project. 5 Documentation on the Google Drive API indicates this should be possible (​https://developers.google.com/drive/api/v3/reference/revisions ​), but it’s unclear how difficult it is. 6 ​See ​https://help.github.com/en/articles/creating-a-commit-with-multiple-authors 15 The GitHub data repository generated from the Google Sheets data will be used as a starting point for experimentation, creating a prototype static site which could allow the project team to 7 browse and search the data, and will give the development team a chance to become familiar with the data and working with Fidel characters. We have chosen to work with static site technology because it should allow for quick prototyping and experimentation based on the data from the Google Sheets without making a heavy investment in a particular technology stack for the next phase of the project. We hope to experiment with the following static site technologies: ● Jekyll (​https://jekyllrb.com/​; implemented in ruby and commonly in use for Digital Humanities projects) ● Hugo (​https://gohugo.io/​ implemented in Go; newer and more powerful than Jekyll) ● Gatsby (​https://gatsbyjs.org/​ implemented in JavaScript) ● Wax (​http://marii.info/projects/wax​, software for generating exhibit sites with IIIF, spreadsheets, and Jekyll) ● Elasticlunr.js (​http://elasticlunr.com/​, browser-based searching) The prototype website will be implemented with a responsive design that supports mobile devices by choosing an existing theme to allow us to focus on the more innovative aspects of the project. Creating a site that is usable, accessible, and welcoming to the diverse audiences for this project will require user research, but because this is a prototype website we may begin that research during the project year to guide later phases of the project. The static site will be hosted on GitHub pages as we prototype. If we determine we want a Princeton URL for the prototype site before the end of the current grant phase, we will request a hostname and possibly a virtual machine from OIT. Alongside the static site development, dependent on data delivery, CDH developers and UX Designers will contribute to data visualizations and maps as appropriate to help answer the project research questions. These may or may not be part of the static site; they may be included in an essay to be published at the end of this phase of the project. If time allows, CDH developers will experiment with internationalization, with the hope of making the prototype site available in both English and Amharic. This is not only something we’re technically interested in, but also something we feel ethically challenged and compelled to do, based on Wendy Belcher’s comment that no Ethiopic manuscript materials or data are currently available in Amharic. There are existing solutions for multilingual sites implemented with Jekyll, including notably ​The Programming Historian​, in the Digital Humanities space. 7 Any static site code committed to GitHub will be put in a separate repository from the data to allow the data to be easily deposited with Zenodo without including static site software or content. 16 We are also interested in leveraging minimal computing principles and techniques to provide a version of the static site as a standalone package, including project data and any image content with permissions that allow redistribution, to be shared and distributed via alternate means, such as USB drive or inexpensive hardware. 8 Preliminary Data Model Part V: Deliverable Timeline Summer 2019 (by September 1): Data: ● Documentation & diagram describing data structure 8 For example, see Ed Summer’s post about building an offline static site with React https://inkdroid.org/2018/01/10/offline-react/ 17 ● Google sheets spreadsheet that implements data structure ● Data from Sublime Text file migrated to Google Sheets Interface: ● None Fall 2019 (through December 31): Data: ● Incipit lookup with phonetic searching ● PUL finding aid data added to existing Google Sheets ● BM data added to existing Google Sheets ● Oxford CSM data added to existing Google Sheets ● Trained student researchers (at least orient to MSS and project, orient to Google Sheets; more if data structure is ready) ● Automated feed from Google Sheets to a GitHub repository ● Continuous integration for data validation on GitHub (nice to have) ● 40 Story Instances catalogued Interface: ● None Spring 2019 (through the end of the grant period) Data: ● Data visualizations ● 20 manuscripts catalogued Interface: ● Prototype static site, if sufficient data is available and time permits Part VI: Grant Year Wrap-up This section is completed after the grant year concludes. Describes the goals reached and outcomes of the project, and explains major changes and discrepancies with planned work. Continuing projects will include this in the charter for their subsequent project phase. Completed projects must provide to the CDH Project Coordinator within one month of the conclusion of the grant period. 18 Part VII: Agreement Project pause policy To ensure that all projects receive sufficient and equitable development time, time-sensitive queries and requests must be addressed within two weeks of initial (email) request. The PI is responsible for communication with development team. If PI is does not respond to a task that has been indicated as time-sensitive by the CDH team within 2 weeks of initial request, further project development will be paused until the project can be reasonably integrated back into the CDH development schedule. Rights, Permissions, and Attribution Site content and data will both be licensed under Creative Commons Attribution 4.0 International (CC-BY 4.0). If any of the datasets consist solely of factual data where authorship cannot be claimed, they will be licensed as CC0. Any software developed by CDH that merits release will be licensed under Apache 2.0. The Technical Lead will fill out an invention disclosure form in order to gain approval from the Office of Technology Licensing in order to release the code. Before approval is granted, the code will be owned by the Trustees of Princeton University. Web Presence and Project Publicity The PI will create a project page on the CDH website, keeping it up-to-date and accurate during the grant year. The Project Manager will submit at least two blog posts per year, to be published on the CDH website. The schedule for publishing blog posts will be determined in consultation with CDH staff. In the case of a public site launch, or similar event, the PI and PM will work closely with CDH and Princeton University Library staff as needed on publicity, communication, and outreach. Currently, the PI has a web page for the project at https://wendybelcher.com/african-literature/pemmproject/ Credit All team members will be credited on the project’s website and CDH project page. The project’s website will include a sponsorship statement (indicating the CDH as well as any other supporting groups, departments, agencies) and will include a citation statement indicating how the project assets should be cited. The site will also list and link to other projects that contributed data. 19 Project PI ___________________________________________________________ CDH Faculty Director ___________________________________________________________ Date: 20 Appendix A: Relevant Resources and Projects Data Currently, the project data exists in seven separate, uncorrelated formats: 1. Macomber Handlist of Marian Miracles in the Ethiopian tradition with identifiers for each of 642 Canonical Stories, translations and analyses, and its keywords, and shelfmarks and folios for Story Instances in about 200 manuscripts). 2. Brown’s list of incipits for Macomber’s Canonical Stories: (typed up in fidäl and available in Google doc or sheet; checked by Hamburg BM for accuracy; used to catalog stories in manuscripts.) 3. Oxford CSM database for controlled vocabulary for themes (700+ terms, which need to be cleaned up and merged with the Macomber Handlist themes, which need to be updated, ) (available in text file.) 4. Princeton RBSC finding aid for its ten ​Täˀammərä Maryam ​manuscripts (not yet cataloged with the Macomber Handlist identifiers,) (available in EAD XML). The project will also reference data and images from PUL digitized editions of relevant Ethiopic manuscripts held by PUL, managed by PUL and provided via IIIF Presentation and Image APIs . 5. Hamburg BM identifiers and cataloging data for about 75 ​Täˀammərä Maryam manuscripts (cataloged with the Macomber Handlist identifiers, ) (available in TEI XML.) 6. Delamarter’s list of 90 manuscripts across six centuries (1400s, 15oos, 1600s, 1700s, 1800s, 1900s) and five regions (north, south, central, east, west Ethiopia) (in Google Sheets, uncataloged) at EMIP / HMML 7. Archives of manuscripts from EMML, Bibliotechque Nationale, the British Library (none cataloged with Macomber handlist). Archives and Databases Lots of information has been collected about the ​ Täˀammərä Maryam ​, but is spread across hundreds of obscure print catalogs, French and German articles, Italian books, and Gəˁəz monasteries. Each uses different numbering systems and tale titles, almost none have keywords, and many catalogs do not enumerate the tales in a manuscript, rather simply stating that something is a ​Täˀammərä Maryam ​ and moving on. Thus, a crucial aspect of PEMM will be collating information using the following archives and databases. Macomber Handlist The most important PEMM resource is William Macomber’s unpublished handlist of 642 Marian miracles, based on his study of 100+ manuscripts, including each story’s title, 21 translations, themes, and incipit (the unique first sentence of each story; used with medieval manuscripts as an identifier, as they have no titles). (This was quite an extraordinary accomplishment, before digital work in the humanities was common. It gives PEMM a huge leg up.) ● Macomber, William F. n.d. [1980s]. ​[Handlist of] The [Ethiopian] Miracles of Mary. Collegeville, MN: Hill Monastic Museum and Library, St. John's Abbey and University. Lombardi Handlist Chiara Lombardi disagrees with Macomber and thinks there are only 530 Canonical Marian Miracle Tales. Where available, and depending on time, we may include her identifiers in addition to Macomber’s. ● Lombardi, Chiara. 2009. "Il Libro etiopico dei Miracoli di Maria (The Ethiopic Miracles of the Blessed Virgin)."BA thesis, Archeology, Università di Napoli. ● Lombardi, Sabrina. 2010. "Miracoli di Maria."MA thesis, Anthropology, Corso di laurea in lettere, Università di Firenze. Archive Catalogs An indispensable source for PEMM are extant catalogs of the ​Täˀammərä Maryam manuscripts. Macomber worked with about fifteen (see list of abbreviations). While most catalogs do not use Macomber’s numbering, these catalogs often provide manuscript dates, provenance, and folios of each story. The most important catalogs are as follows: 1. Princeton University Rare Books and Special Collections. ​ Belcher and Qesis Melaku catalogued the RBSC’s ten ​Täˀammərä Maryam ​manuscripts, although they did not use Macomber’s numbering for their stories. ll of ten of these manuscripts have been digitized and are available online. a. Treasures of the Manuscripts Division, Ethiopic manuscripts https://dpul.princeton.edu/msstreasures/catalog?f%5Breadonly_collections_ssi m%5D%5B%5D=Ethiopic+Manuscripts&q= b. Melaku Terefe, and Wendy Laura Belcher. 2009. Princeton Collections of Ethiopic Manuscripts, 1600s-1900s: Finding Aid. Princeton, NJ: Princeton University Library, Department of Rare Books and Special Collections, Manuscripts Division. ​https://findingaids.princeton.edu/collections/C0776 2. Hill Museum and Manuscript Library (HMML), St. John’s Abbey and University, Collegeville, MN. ​ ​This is a digital library; that is, it archives microfilms and digital images of manuscripts that are elsewhere. A huge project of HMML in the 1960s and 1970s was the Ethiopian Microfilm Manuscript Library (EMML), which microfilmed 8,000 manuscripts in monasteries and churches in Ethiopia. William Macomber, author of the handlist, was one of the directors for this project, and his handlist used many EMML manuscripts. HMML also hosts other digital collections of Ethiopian manuscripts, such as the Ethiopian Manuscript Imaging Project (EMIP, by Stephen Delamarter). At least 22 ​Täˀammərä Maryam ​are currently available for free online, but HMML has over 535 ​Täˀammərä Maryam ​manuscripts awaiting transfer 22 from microfilm to online digital form. In general, they only digitize microfilm if a client wants access to it and pays for it. Anyone can access the microfilm for free, but of course only on site, in Minnesota. Belcher is in correspondence with them about gaining digital access to more of their manuscripts. Developing a relationship with HMML may be useful to PEMM more broadly as they have digital copies of over 250,000 handwritten manuscripts in many languages from around the world. a. One can search parts of this collection at ​https://www.vhmml.org/readingRoom/ 3. British Library​.​ This archive has eighteen of the most splendid ​Täˀammərä Maryam manuscripts in existence, most looted from the royal scriptorium. Manuscripts from the center of power, from the clerics of the royal house, will be most useful for establishing the canon of stories, as they would be the ones to make it. The regions will have their own interesting trends; and perhaps have their own standard collections. Both are interesting, but tracking what is canon is vital. This collection has been catalogued and recently digitized, but has not used Macomber’s numbering. a. http://www.bl.uk/manuscripts/BriefDisplay.aspx?source=advanced b. Wright, William. 1877. ​Catalogue of the Ethiopic Manuscripts in the British Museum Acquired since the Year 1847​. London: British Museum. 4. UCLA Library. ​ ​Belcher and Qesis Melaku also catalogued UCLA’s collection, but before a big acquisition. It has very few ​Täˀammərä Maryam ​manuscripts, however. a. http://digital2.library.ucla.edu/viewItem.do?ark=21198/zz0009gx3x 5. Catholic University of America, Institute of Christian Oriental Research (ICOR). ​Its catalog of its Ethiopic manuscripts is not yet online, but it has 12 Täˀammərä Maryam ​ manuscripts. a. Weiner Codex 233 – EMIP 2175 (17 ​th​ cent.?): 178 miracles b. Weiner Codex 260 – EMIP 2259 (1930–1974): 5 miracles c. Weiner Codex 308 – EMIP 2340 (20​th​ cent.): 11 miracles d. Weiner Codex 335 – EMIP 2370 (18 ​th​ cent.): 80 miracles e. Weiner Codex 364 – EMIP 2399 (20​th​ cent.): 31 miracles f. Weiner Codex 366 – EMIP 2401 (20​th​ cent.): 81 miracles g. Weiner Codex 370 – EMIP 2405 (19​th​ cent.): 3 miracles h. Weiner Codex 395 – EMIP 2660 (20​th​ cent.): 3 miracles i. Weiner Codex 403 – EMIP 2668 (20​th​ cent.): 36 miracles j. Weiner Codex 428 – EMIP 2716 (18 ​th​ cent.): 3 miracles k. Weiner Codex 449 – EMIP 2737 (18 ​th​ / 19​th​ cent.): 34 miracles l. Weiner Codex 463 – EMIP 3238 (19​th​ cent.?): 45 miracles 6. Other Archives. ​ Many other archives exist in Europe and North America, including the Bibliotheque Nationale, the Vatican Library, St Petersburg Library, and so on. a. Platt, Thomas Pell. 1823. ​A Catalogue of the Ethiopic Biblical Manuscripts in the Royal Library of Paris, and in the Library of the British and Foreign Bible Society: Also Some Account of Those in the Vatican Library at Rome, to which are Added, Specimens of Versions of the New Testament Into the Modern Languages of Abyssinia​. London: R. Watts. 23 Oxford Cantigas de Santa Maria (CSM) database This database, created by Stephen Parkinson of the Oxford University Centre for the Study of the Cantigas de Santa Maria​ (manuscript) and PI for the Oxford CSM database launched in 2005, is the most authoritative on the world-wide collection of Marian miracle stories, but based largely on manuscript collections in Europe. The Oxford CSM database is “designed to give access to a vast range of information relevant to the processes of collection, composition and compilation” of the Marian miracle stories. It provides a fully searchable electronic version of Poncelet’s list of Marian miracles (​Index miraculorum B.V. Mariae quae saec. VI-XV latine conscripta sunt ​); brief descriptions of all European Marian miracles; and a controlled vocabulary list for Marian miracle story themes. The principal investigator is Stephen Parkinson, who is eager to share any data he has, in return for better information about the Ethiopian Marian miracles for his database. This database will help us to identify which stories originated outside of Ethiopia and to develop our own controlled vocabulary theme list. ● http://csm.mml.ox.ac.uk/ Beta Maṣāḥǝft (Hamburg BM) Database This database project is hosted by the Hiob Ludolf Centre for Ethiopian Studies at the Universität Hamburg in Germany and with Prof. Alessandro Bausi as the principal investigator. It is a very long-term project (from 2016–2040) with large German government funding to create a “virtual research environment” for collecting and managing data about “the predominantly Christian manuscript tradition of the Ethiopian and Eritrean Highlands.” They are also using Macomber’s Handlist to catalog Marian miracle stories, although they are not currently tagging themes. They are also collating data on manuscripts (date, provenance, total folios), developing controlled vocabulary lists for people and places, and arriving at their own identifier for Canonical Stories (since Macomber missed some). They are open access and so we will be exchanging information as much as we can: the Hamburg BM giving PEMM its data in accessible forms and PEMM giving the Hamburg BM whatever data we create. 7. Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea (Schriftkultur des christlichen Äthiopiens und Eritreas: eine multimediale Forschungsumgebung) 8. “ ​See the section on contributing and reusing data below.” Miracula Mariae project This comparative project, with principal investigators Ewa Balicka Witakowska and Anthony John Lappin, is also called “Miracles of the Virgin: Medieval Short Narratives Between Languages and Cultures.” Begun in 2015, it will compare six to ten individual Marian miracle stories across many languages and regions as a way of studying transmission (in image and text) (in Arabic, Armenian, Croatian, Dutch, Ethiopic, French, Georgian, Greek, Hungarian,Latin, Middle English, Old Icelandic, Old Swedish, Polish Italian, Romanian, Slavonic, South Slavonic, Spanish, Syriac, and Ukranian). It will also compare sociological and psychological aspects of 24 the stories in different cultural contexts. The eventual aim is a database, but there is no evidence of this having been initiated yet. “The overall tradition [of Marian miracle stories] offers a rich and vast body of literature, which, in its totality, has not been studied, and whose intertextuality offers a number of interesting problems and resources for further study. … The project will seek to analyse this complex set of interrelated traditions from three successive standpoints. The ​first will consider manuscript transmission and the physical distribution of miracle-tales; the second ​will compare collections and versions, in order to understand the cultural pressures that led to variation and re-elaboration of a set number of miracle-tales; and the ​third ​will look at the resulting texts from a narratological point of view, and aim to establish the limits and development of a story within a primarily manuscript culture. The first stage of the project will be a​ text-critical study of a selected number of miracles from the core collection, tracing their development across manuscripts, enabling sub-families and recensions to be established, and allowing the evolution of the collections to be precisely identified.” ● https://hildefonsus.wordpress.com Index on Medieval Art This Princeton institute may have useful controlled vocabularies. Appendix B. Planned FY20 Data Steps Step 1. Create PEMM Canonical Stories Dataset. ​The Macomber handlist will be used as the basis for a ​Google Sheet titled PEMM Canonical Stories Dataset.​ It will have more data than the Macomber one. Below is its format (with invented data for one story to give a sense for the appearance of the data): ● PEMM miracle tale ​identifier​: MAC0007 ● Hamburg BM identifier: LIT3640Miracle ● Lombardi miracle tale identifier: 53 ● Oxford CSM miracle tale identifier: 5 [From BM] ● Macomber ​title​ ​of Marian miracle tale: The monk of Dabra Qalǝmon who did not fast. ● Lombardi/ Cerulli title of Marian miracle tale: Il monaco che non ha digiunato ● Dillmann title of Marian miracle tale: Monachi non ieiunant ● EMIP title of Marian miracle tale: The monk who did not fast ● Oxford CSM title of Marian miracle tale: none ● Colin title of Marian miracle tale: Le moine qui n'a pas jeûné ● Tsegaye title of Marian miracle tale: ያልበሰበው መነኩሴ ● Budge miracle tale ​edition/text ​[translator, title, page]: Budge, ​Hundred ​, item 86; Budge, Miracles, p. 42. 25 ● Tsegaye miracle tale edition [translator, title, page]: None ● Tasfā Giyorgis edition [translator, title, page]: Tasfā Giyorgis ​TM​, item 77, page 274-275. ● English ​translation​ ​of tale [translator, title, page]: Budge, ​Hundred ​, p. 87. ● French translation of tale [translator, title, page]: Colin, ​TM​, p. 10 ● Amharic translation of tale [translator, title, page]: Tsegaye, ​TM ​, p. 402 ● Italian summary of tale [translator, title, page]: Cerulli, Il libro, p. 166. ● Princeton Ethiopic ​manuscripts ​with the tale and its beginning and ending ​folios​: 47 (23r-24v) ● Other repositories known manuscripts with the repository name, shelfmark, and beginning (and sometimes ending) folios: G-7; ZBNE 60-29; 61-27; CRA 52-35; 54 (91r); 55 (91r); SBLE 32-28; BM 2-31; 3-36; VLVE 267 (56v); 298 (23r, 50v); SALE 23-27; 43-29; LUE 30-40; 32-28; CBS-28; CCBE 951-28; AECE 1 (30r). ● EMIP (Ethiopian Manuscript Imaging Project) digitized manuscripts with the tale and their beginning and ending folios: [to come] ● EMML (Ethiopian Microfilmed Manuscript Library) digitized manuscripts with the tale and its beginning and ending folios: 2275 (194r); 6938 (29v); 5520 (31v); 2378 (20v); 6640 (49r); 2060 (195r); 2066 (31v, 135v); 2802 (22v); 6196 (81r); 7543 (17v). ● Keywords/themes​ (controlled vocabulary list TBD ): ● Story length (how many characters or words): 1,250 ● Story number of total stories in manuscript (order): 231 ● Story precis (100 words or fewer ): ● Story translation (if in public domain): ● Story Instance ​illustrations ​no.: 4 ● Story Instance illustrations characters: farmer, abbott ● Story Instanceillustrations objects: bow and arrow ● Story Instance illustrations dating (if later): same ● Incipit ​1 with manuscript shelfmark (imported from Brown): ወሀሎ፡ በደብረ፡ ቅዱስ፡ ዐቢይ፡ አባ፡ ሳሙኤል፡ ዘቀልሞን፡ ቤተ፡ ክርስቲያን፡ ሠናይት፡ በስመ፡ እግዝእትነ፡… ወኮነ፡ ውስተ፡ ዛቲ፡ ቤተ፡ ክርስቲያን፡ ስዕል፡ ዐቢይ፡ ወመንክር፡ (6938). ● Incipit 2 ​ ​with manuscript shelfmark: ወሀሎ፡ አሐዱ፡ ብእሲ፡ መነኮስ፡ በደብረ፡ አባ፡ ሳሙኤል፡ ዘቀልሞን፡ ወያፈቅራ፡ ለእግዝእትነ፡ ማርያም፡ ወያነብብ፡ ወትረ፡ ተአምኆተ፡ መልአክ፡ ሌሊተ፡ ወመዐልተ። ወዝንቱሰ፡ ብእሲ፡ ኢይጸውም፡ ወኢይጼሊ፡ ወይትሜሰል፡ ከመ፡ አብድ፡ ወእንቡዝ፡ (2378). Macomber’s abbreviations for repositories are as follows (note that some repositories appear twice because Macomber is using catalogues of collections, not the actual collections): ● AECE = Abbaye d'En Calcat, Dourgne, France (but microfilmed for HMML) ● CBS= manuscript of the Berlin Staatsbibliothek (described by E. CERULLI) 26 ● CCBE = manuscripts of the Chester Beatty Library (described by E. CERULLI) ● CF= manuscripts of the Biblioteca Nazionale of Florence (described by E. CERULLI) ● CL= manuscript of the Academy of Sciences of Leningrad (described by E. CERULLI) ● CRA= manuscripts of the d'Abbadie Collection of the Bibliotheque Nationale in Paris (described by Conti Rossini) ● DULE = Ethiopian manuscripts of the Duke University Library, Durham, North Carolina (but microfilmed for HMML) ● EMML= Ethiopian Manuscript Microfilm Library, of Hill Monastic Manuscript Library (HMML), St. John's Abbey and University, Collegeville, Minnesota ● G = manuscript of the Biblioteca Giovardiana in Veroli (described by E. CERULLI) ● GBAE = manuscripts of the Biblioteca Ambrosiana in Milan (described by S. Grebaut) ● GVE = manuscripts of the Vatican Library (described by S. Grebaut and E. Tisserant) ● HBS = manuscripts of the Staatsbibliothek in Berlin (described by E. Hammerschmidt) ● LUE= Ethiopian manuscripts of the Uppsala University Library (described by O. Lofgren) ● SALE= Ethiopian manuscripts of the Conti Rossini and Caetani Collections of the Accademia Nazionale dei Lincei in Rome (described by S. Strelcyn) ● SBLE = Ethiopian manuscripts of the British Library (described by S. Strelcyn) ● SGE = Manuscripts of the Griaule Collection of the Bibliotheque Nationale in Paris (described by S. Strelcyn) ● SWE = Ethiopian manuscripts of the Seabury-Western Theological Seminary (described by W. F. Macomber) ● VLVE = Ethiopian manuscripts of the Vatican Library (described by A. Van Lantschoot) ● WBLE = Ethiopian manuscripts of the British Library (described by W. Wright) ● ZBNE = Ethiopian manuscripts of the Bibliotheque Nationale in Paris (described by Zotenberg) Step 2. Create keyword fields for PEMM Canonical Stories Dataset. ​ ​When consultants who read Gəˁəz start cataloging stories, they will need access to the controlled vocabularies through dropdown menus for many keyword/theme fields. Although this particular type of cataloging task will not be done until FY21, we want to be aware of the types of fields we will need in the future. For now, the dataset will have fields for characters and settings; we can add others later. Below are the possible fields. ● Story themes ● Story main human character, including proper noun [e.g., Barbara, Simon]; profession [abbess, beggar, wife]; nation/town; status [noble, royal, commoner]; gender; age [infant, child, teenager, young adult, middle age adult, the old]; type [protagonist, antagonist, and/or generally bad, generally good, and/or, evil, sinning believer, good nonbeliever, saintly]; religon [Muslim, Jew, Christian, pagan]; type of conflict [against self, against society/group, against another, against nature]; problem/challenge/conflict [lame, blind, castration, deaf, poor, disbelief, disease, exile, false accusation, famine, childlessness, away from home, pregnant]; sin [man-eater, adultery, jealousy, arson, blasphemy, drunk, frivolity, heresy]; virtue [celibacy, chastity, belief in Mary, doing 27 something for Mary, fasting]; threat [hell, ambush, discovery, death, drought, drowning, hanging]; activity [plowing, bathing, childbirth, dream, fall, travelling]; body part [ear, hands, penis] ● Story human character 2 [same as above] ● Story human character 3 [same as above] ● Story human character 4 [same as above] ● Story human character 5 [same as above] ● Story human character 6 [same as above] ● Story human character 7 [same as above] ● Story other characters (nonacting): wife, children, servants, friends ● Story human characters group(s): monks, children, Muslims, Cistercians, family, enemies, demons ● Story Divine Character 1 : Mary ● Story Divine Character 2: angels, demons, Christ, Holy spirit ● Story plot (maybe): travelling away from home, committing a sexual sin, healing ● Story emotions: hate, envy, terror ● Story animal(s): dog, dragon, birds, frog ● Story food(s): bread, grain, honey, beer ● Story four elements: water, earth, air, fire ● Story Mary mechanism: icon, vision, apparition, milk, hand, baptism, her belt, fragrance ● Story national setting/location: Ethiopia, Egypt, Israel, Syria, Cyprus, France/Europe/Farang ● Story province setting/location: Gojjam, Tigray ● Story town/village setting/location: ● Story landscape setting/location: mountain, sea, lake, farm, field, bridge, cave, garden, heaven ● Story building setting/location: monastery, church, home, castle, boat, furnace, gallows ● Story religious rite: baptism, prayer, burial, confession, Easter, eucharist ● Story texts: Gospel of John, Hail Mary ● Story religious objects: icon, Bible ● Story domestic objects: gourd, table, candle ● Story fighting objects: bow, arrows, sword ● Story other objects: alms, bell ● Story sources/intertextuality: seems to be in relation with foreign story ● Story origin: France, Germany, England or Europe Step 3. Create PEMM Manuscripts Dataset. ​ ​The Macomber handlist will be used as the basis for a ​Google Sheet titled PEMM ​Täˀammərä Maryam ​ Manuscripts Dataset ​. The sheet will have more data than Macomber. It will be important to include information on manuscript dating and region, where available. Below is its format (with invented data for one manuscript to give a sense for the appearance of the data): ● Manuscript title: ​Täˀammərä Maryam ● PEMM Ms No.: PEMM 105 28 ● Others’ Ms No.: BN Ms. No. 23 ● Manuscript original repository: Dabra Libanos, Ethiopia ● Manuscript provenance (lat., long.): 9.712177, 38.848075 ● Manuscript current repository now: BN (Bibliotheque Nationale) ● Manuscript total no. of folios: 405 ● Manuscript total no. of pages: 202 ● Manuscript total no. of images: 202 ● Manuscript total no. of stories: 103 ● Manuscript century: 15.25 ● Manuscript date range (if available): 1517-1543 ● Manuscript illustrations no.: 28 ● Manuscript illustrations size: 25 full page; 3 quarter page Step 4. Create controlled vocabularies for PEMM Canonical Stories Dataset. ​We need to develop a controlled vocabulary of our own because (1) Macomber’s is outdated (e.g., it uses “Moslem” instead of “Muslim”); (2) Hamburg BM’s is designed for the Ethiopian environment, but not the Marian miracles specifically; (3) Oxford CSM’s was not that well controlleds, so needs to be cleaned up; and (4) the Index’s doesn’t account for the Ethiopian environment specifically. Wendy and Evgeniia plan to combine all four controlled vocabulary sets and then comb through them for redundancies, to create cross references, and to identify hierarchies (e.g., we have “oxen” but also “animal” and the first is a type of the second). For instance, since catalogers may not think of the exact same word, we should have cross references (e.g., “angels” and “divine messengers [angels]”). With tightly controlled vocabulary lists, we can do better analysis of story themes. Step 5. Mark beginning of incipts of each Story Instance in each manuscript. ​The research assistants will not know where the incipits are for each Story Instance. Those with an excellent level of Ge`ez will need to mark up manuscripts so that research assistants using the incipit tool and matching incipits don’t have to read for the beginning of the incipit on the page. Step 6. Catalog manuscripts. ​ The biggest task will be increasing the number of Story Instances in the Google Sheet. With a bigger data set, we will be able to better answer the research questions. The research assistants will use the Incipit tool to catalog manuscripts first, giving each Story Instance a Canonical Story identifier number and then marking their degree of certainty that the incipits match. If they are not confident, someone who reads Ge`ez will go after them, checking. Alternate Tasks Currently out of scope is the following, but it might come into scope if we run into problems with students doing cataloging. ● Write precises of Canonical Stories. ​ ​A huge and difficult task will be writing short summaries of the 700+ indigenous Canonical Stories. Only those with an excellent 29 understanding of Amharic, French, or Ge`ez will be able to do this work. Maybe 100 can be done from stories available in English translation. Or, perhaps, if the keywords are good enough, no precis is needed? ● Track word length of Instantiation Stories. ​ ​This is a way of getting at the possibility of different recensions. ● Tag Canonical Stories with keywords. ​ Another difficult task will be using the controlled vocabulary list to better tag Canonical Stories. Macombere did tag 642 of them with keywords, but many remain and his list can be improved. Only those with a good level of Ge`ez and English will be able to do this work. ● Identify new Canonical Stories. ​ ​We need to give new identifier numbers, titles, themes, and incipits for Canonical Stories not in Macomber. We will use Hamburg BM identifiers where possible, but may need to do this a bit ourselves. ● Translate into Amharic. ​Translate titles, keywords, and website into Amharic. ● Design and write static website. ​This can be done in the last year. ● Compare Cannibal of Qemer transcripts​. Once Steve, Jeremy, Jonah, and Ashlee complete typing up all 90 versions of the Cannibal of Qemer tale, we will do computational analysis. Regarding the précises specifically, there would be certain sources: ● Amharic ​translations (currently out of scope): ○ Täsfa Giyorgis, ed. 1931. ​Täˀammərä Maryam bä-Gəˁəz ənna bä-Amarəññ​a [The Miracles of Mary in Gəˁəz and Amharic: 111 Miracles]. Addis Ababa, Ethiopia. ○ Täsfa Gäbrä Śəllase, ed. 1996. ​Täˀammərä Maryam bä-Gəˁəz ənna bä-Amarəñña ​[The Miracles of Mary in Gəˁəz and Amharic: Part Two: 402 Miracles]. Addis Ababa, Ethiopia: Täsfa Gäbrä Śəllase Printing Press. ○ Täsfa Gäbrä Śəllase, ed. 1994. ​Sǝdsa Arattu Täˀammərä Maryam ​ [Sixty-four Miracles of Mary]. Addis Ababa, Ethiopia: Täsfa Gäbrä Śəllase Printing Press. ○ Täsfa Gäbrä Śəllase, ed. 1968. ​Täˀammərä Maryam bä-Gəˁəz ənna bä-Amarəñña ​[The Miracles of Mary in Gəˁəz and Amharic: Part One: 270 Miracles]. Addis Ababa, Ethiopia: Täsfa Gäbrä Śəllase Printing Press. ● English ​translations (currently out of scope): ○ Budge, E. A. Wallis. 1900. ​The Miracles of the Blessed Virgin Mary, and the Life of Hannâ (Saint Anne), and the Magical Prayers of 'Aheta Mîkâêl: The Ethiopic Texts Edited with English Translations Etc​. 2 vols, Lady Meux Manuscripts Nos. 2-5. London: W. Griggs. ○ Budge, E. A. Wallis, ed. 1933. ​One Hundred and Ten Miracles of Our Lady Mary. London: Oxford University Press, H. Milford. ○ Zärˀa Yaˁqob. 1992.​ The Mariology of Emperor Zärˀa Yaˁəqob of Ethiopia: Texts and Translations ​. Translated by Getatchew Haile. Edited by Getatchew Haile. Rome, Italy: Pontificium Institutum Studiorum Orientalium. ● French ​translations (currently out of scope): a. Colin, Gérard. 2004. Le livre éthiopien des miracles de Marie (Taamra Mâryâm). Paris: Les Editions du Cerf. ● Italian ​translations (currently out of scope): 30 a. Cerulli, Enrico. 1943. Il libro etiopico dei Miracoli di Maria e le sue fonti nelle letterature del medio evo latino. Rome: G. Bardi. 31