1 Developing Collaborative Best Practices for Digital Humanities Data Collection: A Case Study Rachel Di Cresce Information Technology Services, University of Toronto Libraries rachel.dicresce@utoronto.ca Julia King Department of English and Drama, University of Toronto jlm.king@utoronto.ca mailto:rachel.dicresce@utoronto.ca mailto:jlm.king@utoronto.ca Best Practices for Digital Humanities Data Collection 2 Abstract This case study explores the data management practices of medieval manuscript scholars working on the Digital Tools for Manuscript Study project at the University of Toronto. We chose this user group, despite their incredibly domain specific praxis, since the data challenges they face while doing digital humanities work are representative of the wider community. Our goal is to rethink how librarians can best assist researchers within a digital humanities centered environment. This paper first explores how data is conceived in the DH context and what insights can be drawn for data management. Next, focus shifts to the key characteristics of data collection and post-processing activities carried out by manuscript scholars during repository visits. Parallels are drawn between manuscript scholar practices and those of other humanities disciplines. Finally, the implications for information professionals are explored and best practices for assisting digital humanists defined. In particular, community engagement in the process is stressed throughout as it is the authors’ belief it is necessary for success. The best practices are in no way exhaustive, and they are intended to be broadly applicable to a range of disciplines within the digital humanities and to librarians. Future work will involve validating a new data management approach informed by this study by testing in the field. Keywords: Data Management, Digital Humanities, Manuscripts, Scholarly Needs, Best Practices, Knowledge Organization Best Practices for Digital Humanities Data Collection 3 Data and the Humanist Following the scientific method, a researcher should pose a hypothesis, collect data, and test it against that hypothesis to declare it true or false. The data collected is often measurable in some way; it has gone through a rigorous experimentation process, been approved by ethics boards, repeated hundreds of times to ensure little variation, and published as evidence to shore up a hypothesis. The data in the scientific model is meant to be uniform; by doing the same experiment twice, if the research is sound and the experiment has been set up properly, the data should come out to be similar, if not exactly the same. Data that lends itself to measurability, like numbers, computerized data, or facts, is valued by the sciences, and this conception of visible and tangible data is what has shaped our modern understanding of numbers, charts, sets, and tables as more related to laboratory experimentation than humanistic study. But what of humanities data? Unlike scientific studies, which seek to repeat answers to confirm their truth, humanistic inquiry takes an assumption and answers it in several different ways. A simple question can have multiple answers, and the value of a good research question is that it can produce a variety of responses. Compare this to the value of repeatable scientific data. How do you manage data that comes out of humanistic inquiry when it is not as mathematically measurable and regular as scientific data? How do humanists view and manage their own research output and do they conceive of it as manageable data? To truly tend to humanist data management needs it is important to understand these questions and look for answers within the community. One method of understanding the variety of data available to humanists is to recognize the different kinds of data humanities research can produce. For example, Research Best Practices for Digital Humanities Data Collection 4 Data Canada refers to the “Knowledge Map of Information Studies” study, which, among other things, collected 130 definitions of data formulated by forty-five scholars (Zins 2007). Within it, all data, regardless of format or medium, are recognized. Research Data Canada’s broad definition of research data reads: Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records. (Research Data Canada 2017) Humanities researchers produce most, if not all, of these types of data. The multimedia aspect of humanities research is only part of the complex puzzle of how to organize data management. One must understand the theoretical underpinnings of humanities research and the data it produces in order to appreciate the often much smaller and more nuanced data sets of humanists scholars and the unique nature of humanist inquiry. Taken from a professor of Digital Medieval Studies, the following excerpt explores this phenomenon: Humanities’ data has depth in small universes. Our material has the capacity to unfold inwards, as it were, to disclose layer upon layer of insights and connections, within a comparatively tiny amount of data--almost an inverse matryoshka, as it were, where each inner doll is bigger and more complex than the one encasing it. (Bolintineanu 2016) Humanities data requires a level of interference and analysis that is divergent from scientific inquiry. It is changeable, shaped by everything from the tools used to analyse or present it to the scholars who attempt to interpret it. This is why traditional understandings of data seem foreign or unfit for use in a humanities context. Perhaps Posner put it best in stating, “When you call something data, you imply that it exists in discrete, fungible units; that it is computationally Best Practices for Digital Humanities Data Collection 5 tractable; that its meaningful qualities can be enumerated in a finite list; that someone else performing the same operations on the same data will come up with the same results. This is not how humanists think of the material they work with” (Posner 2015). In our case, whether digital or traditional humanities research is concerned, the data produced often poses challenges to the information professional. Simply applying scientific understanding and practices to the field of humanities data management ignores the theoretical underpinnings of humanities research. Even when tools or analytical techniques from the sciences can be fit into a humanities-esque mold, disagreement exists about their appropriateness: [DH visualization tools borrowed from the sciences] carry with them assumptions of knowledge as observer-independent and certain, rather than observer co-dependent and interpretative. […] To begin, the concept of data as a given has to be rethought through a humanistic lens and characterized as capta, taken and constructed. (Drucker 2011) This does imply, however, a unified understanding of what constitutes data within the realm of scientific research and beyond (Funari 2014 or 2015?). Definitions abound, with their own inclusions and focus, even among scholars of the same university department (Whitmire, Boock, and Shutton 2015). It has been shown that academic institutions, federal funding agencies, and regulatory bodies all define ‘data’ uniquely (Joshi and Krag 2010). For example, the Tri-Council Agencies of Canada, made up of the Canadian Institutes of Health Research (CIHR), the Natural Sciences and Engineering Research Council of Canada (NSERC), and the Social Sciences and Humanities Research Council (SSHRC), provide a definition for data in its policies for all grant- funded projects. The agencies note that research data, “include observations about the world that are used as primary sources to support scientific and technical inquiry, scholarship and research- creation, and as evidence in the research process” (Tri-Agency Statement of Digital Principles on Digital Data Management 2016). A more agnostic definition, from ISO/IEC 2382-2015 (2015), Best Practices for Digital Humanities Data Collection 6 defines data as “a re interpretable representation of information in a formalized manner, suitable for communication, interpretation, or processing.” But even these definitions of data, rooted in scientific modes of understanding research, cloud how humanities scholars interpret their own research. It does little to bring the humanities or social sciences, which tend to not think of their findings as tractable, finite, or identically reproducible, into the realm of research data. In an effort to be more succinct, and to align ourselves more with humanistic data theory, we wish to present one more definition of data: Data is “units of information observed, collected, or created in the course of research” (Erway et al. 2013). Importantly, Erway’s definition presumes no scientific inquiry, quantitative analysis, or identically reproducible results. From here, we are better placed to understand the data management needs of digital humanist scholars. Research data management As with all projects, it is imperative to invest in a data management strategy in the digital humanities. As early as 1968, researchers were concerned that “librarians are less than ever before keepers of books; they are coming to be managers of data” (Hays, 1968, 5). More recently, literary scholars have become concerned with the ‘computational turn’, or the increasing reliance on computer science techniques to perform humanities research. This is necessarily different from the concept of the digital humanities, but it is responsible for what Manovich has termed the ‘cultural analytics paradigm’, whereby one assumes that the “big data” created by twenty-first century cultural production is vast, and, therefore unknowable (Hall 2013). Research data management, however, is all aspects of creating, housing, maintaining, and Best Practices for Digital Humanities Data Collection 7 retiring data (O’Reilly et al.2012) and therefore makes these vast amounts of data knowable, sortable, and manageable. The data lifecycle, although originally conceived for science data, is also applicable to humanities data management and can provide helpful guidelines for structuring a data management plan. The California Digital Library defines the data life cycle as having eight steps: plan, collect, assure, describe, preserve, discover, integrate, and analyze (Strasser et al. 2012). By managing these steps, standardized and usable data is created; housed in a way that it is stable, searchable, and findable; maintained through various switches of file formats, permutations, and manipulations; and retired to an archive in a sustainable fashion. Although all of the different permutations of Manovich’s big data are unknowable, research data management makes them possible and searchable. Managing data created during the course of (digital) humanities research requires that the data manager pay attention to the special landscape which they navigate to create, conceptualize, and analyze their data. Humanities research data management is, as Awre et al. (2015) point out, an example of Rittel and Webber’s (1973) ‘wicked problem’, that is, a problem that is seen differently to different stakeholders. As opposed to a ‘tame problem’, where there exists one answer to each problem, for example, “How do I execute a search strategy on the library catalogue”, a wicked problem has multiple solutions that are neither true nor false, just a good solution or a bad solution. As Awre et al. point out, the first step in reckoning with managing any amount of research data is to recognize the complexity of the problem. Keeping this necessary complexity in mind, it becomes obvious that individual projects require an individualized plan, and, to that end, we have used the experience of one particular humanities research data problem as a lens through which to view the subject. Best Practices for Digital Humanities Data Collection 8 Method Rimmer et al. point out that when designing digital resources for humanities scholars, “we need to better understand their research experiences and practices” (2008, 1378). This same principle extends to designing digital humanities data management strategies. The research experiences and practices of scholars heavily informed the work of this project. The case study arose out of collaborative work on the Digital Tools for Manuscript Study Project, based jointly out of the University of Toronto Libraries and Centre for Medieval Studies, to create modular, interoperable tools for scholars using digital medieval manuscripts. The project pairs a set of development outcomes with a scholarly counterpart to demonstrate the capabilities of the tools. One tool we wish to extend and improve upon, in particular, is called VisColl (Porter 2013). VisColl is designed to generate digital visualizations of the binding structure and physical makeup of a medieval manuscript. These digital visualizations are known to scholars as ‘collation diagrams’, and are of immense importance to scholars interested in the method, context, and afterlife of the creation of medieval codices. Traditionally, collation diagrams are produced by hand, as the scholar carefully analyzes the binding of each section of pages in a manuscript (known as a ‘quire’), producing diagrams of the quire’s structure and developing what is known as a collation statement. VisColl is intended to make this process easier and more robust. Scholars want to use VisColl to produce multiple visualizations and statements of extant Canterbury Tales manuscripts. Data collected by researchers will need to interact with the VisColl tool, which, in turn, will need to interpret and represent the data. As such, from the Best Practices for Digital Humanities Data Collection 9 outset, we recognized the need for a research data management strategy to streamline collection processes. We not only felt that this was essential to the success of the overall project, but we also saw an opportunity for progress in the world of digital humanities data management. Two researchers (referred to as Researcher A and Researcher B) were sent overseas to visit multiple archives and libraries to examine several manuscripts. Instruction came from the lead scholar only; no prior input was given to the researchers by an information professional. From speaking with medieval scholars across several institutions prior to this research trip, it became very apparent that, even among specialties, there is no standard data collection practice shared by scholars. As the digital humanities continue to grow and develop in current and new fields, practices most likely will not be standardized across or among disciplines. Upon their return, researchers were interviewed separately about their experiences. At the same time, we examined the data files, both analog and digital, and developed basic organizational spreadsheets in which the researchers were to insert their data. The spreadsheets were created in order to get a good understanding of what raw data we were dealing with while creating a preliminary organizational scheme and preparing for data transfer to our collation tool. Throughout the post-collection process we kept in close contact with researchers to ensure that our assumptions and ideas were valid and representative of their experiences. Our findings from this experience will be discussed in the following section. Discussion How can we as library professionals best aid humanities scholars in the area of data management? We operated under the assumption that the data collected by researchers would be Best Practices for Digital Humanities Data Collection 10 input into a collation tool and used to develop a scholarly argument. By analyzing the data produced by the researchers and speaking with them about their process we recognized four key findings that characterize a researcher’s approach to manuscript study and provide a roadmap for information professionals: the influence of time, universality of pre-data collection practices, reliance on mixed media data collection, and personalized information management. i. Time: Scholars have very limited time to work with physical manuscripts. Any implemented data management processes must be cognizant of this. Time was by far the most influential factor to researchers during the data collection process. One researcher’s ideal data collection process was described simply as “More time.” During the research visit, most of the items had not been digitized, meaning if information was missed or questions remained, and the researcher could not easily refer to the manuscript once back home. In addition, researchers must operate within the fixed hours of the library or archive they visit, resulting in their having an average of between six and eight hours per manuscript per day. Given the size and complexity of many manuscripts, certain texts required more time to analyze than others. This in turn affected research processes, data collection and data management. Researcher B, for example, stated these timeframes were “not really enough time to study a manuscript. It’s just enough for collation and notes on interesting things”. The more time given to researchers, the more information and detail can be collected. Both researchers stated that they spent twice as much time post-processing their data as compared to time spent with a manuscript. This is significant because it frames the way in which the researchers think of their work in repositories. Researcher B had even less time than normal Best Practices for Digital Humanities Data Collection 11 when looking at certain select manuscripts, which affected the type and quality of data they were able to collect. Both researchers described their time as being dominated by taking notes about what they felt were the most important aspects of a manuscript as quickly as possible. Researcher B stated, “If I know I’m running out of time, I take as many pictures as I can and hope they are sufficient later on”. It seems, in this instance, that work done in a repository often entails collecting information that is interesting or has the potential to be interesting in the future, and relying on later information processing to make sense of the data that was gathered. Development of scholarly connections and arguments often happen far away from the material in question. Ideally, any data management approach we develop for these scholars must not require excessive time. For this reason, any alteration to their research process must be minimal or we risk non-adoption or misuse. It should be noted that, through speaking with other manuscript scholars, there are instances where time may be less of a challenge (e.g., when interested in one specific manuscript or a few which are all housed at the same repository), but, for the most part, time is of the essence. Researchers want to spend their time examining a manuscript and opt for whichever collection method they feel is the fastest. In a broader context, all scholars operate under similar constraints and preferences. Digital tools and their associated workflows need to feel natural and easily work into the current research process, because if they do not it is a waste of valuable time (Antonijevic 2015). ii. Pre-data collection Preparation: Researchers conduct basic to very in-depth research about their objects of interest prior to a repository visit. This should be the stage of Best Practices for Digital Humanities Data Collection 12 intervention for information professionals in which clarity of research purpose has been reached and time is not a stressor. Both researchers engaged in pre-visit preparation for this and other projects. Other researchers with whom we have spoken over the last few months indicate that they follow the same practice. Actions range from checking bibliographic cataloguing records to reading previous scholarship about the manuscripts. The researchers seek out an understanding of the research that has already been completed on the object, note items of interest, and identify areas where research may be lacking. These preparatory practices are closely related to time limitations. As one researcher pointed out, “I prep in advance, try to figure out how much time each manuscript will take me, especially with a limited amount of time in an archive”. If there is time, or the research goal is very well articulated, researchers tend to think about organization, even in an abstract way, prior to their visit. For example, researcher B cobbled together checklists they came across over years of study. Researcher A found information to compare findings to the scholarly canon. Every trip teaches them something new about their data collection process, and they recognize holes in their preparation that affect results. What is interesting, however, is as they reflected back on their collection processes they consistently identified tactics well known to information professionals. For example, without using the information terms precisely, researchers recognized controlled vocabularies, pre- defined categories, improved workflows, tracked tags, and systematic file-naming as beneficial to their research. One researcher stated, “I wish I had thought about my categories prior to visits so my notes would have been organized and efficient”. Best Practices for Digital Humanities Data Collection 13 Ultimately, this discussion was not prompted by the potential to reduce post-collection work on the part of the information professional, but the potential for the researcher to save time in the archive. In the researcher’s mind, better quality data does not mean less post-processing. The goal is to decrease the inconsistency of data collection. Researchers lamented notes that became less clear depending on situational factors. Information deemed nonessential is often left out only to be missed later. It is their belief that with a more structured process, the frequency of these occurrences will decrease. The often serendipitous nature of manuscript work is a concern for information managers and researchers alike. Researchers truly never know exactly what they will see when looking at a manuscript - their intention to study one aspect may be completely pushed aside upon the discovery of something unexpected. As with most research, what is fascinating to one researcher may not be worth a second glance from another. One simply cannot control for all of the possible variabilities in manuscripts and the whims of human nature. Any data management plans constructed prior to archival visits must reflect the potentially unstructured path of inquiry. Any attempts at imposing an immovably rigid system will risk serving a few of users and will ensure non-adoption from many others who do not trust the system or are not able to adapt their research practices around it. iii. Mixed Media: Manuscript researchers tend to produce a multitude of both digital and analog files during their visits. Not a trait solely of manuscript scholars, humanists of all disciplines subscribe to a “fusion of digital and ‘pen and paper’ practices” (Antonijevic 2015). Manuscript scholars rely Best Practices for Digital Humanities Data Collection 14 heavily on do-it-yourself images regardless of available digital surrogates. Based on responses from our two researchers, photos are often taken of details which were not caught in the digitization process, when something is too difficult to describe quickly, is an example of a particular phenomenon, when a feature looks interesting, or, as a last resort, to gather as much information as possible before running out of time. Researcher A even took a video of a part of a binding structure which was so different from the standard so that they could consult with colleagues about it later. Due to their volume, and difficulty to track, organize, and store, photos are a particular problem. Researchers often spend a lot of time naming image files and linking them to their notes in some way. These are often kept in greater disarray than other files, with non-descriptive file names and non-standardized tags. Alongside images and photo data, researchers create textual notes about the manuscript they are examining. One researcher took analog notes completely while the other started with analog but switched to digital when they felt it was not efficient. Other researchers we talked to also report a mix of analog or digital notes depending on the individual scholar’s preference, subject matter, and experiences. Often, certain items are interesting, but are not easily expressed digitally. For example, a collation statement, such as the one for Cambridge, Corpus Christi College MS 144, which is notated I8-VII8 VIII8 (+1), is easier to write down manually than enter into a text document because of the superscript notation. The preferred method for collecting digital notes is in Microsoft Excel or Word whereas analog notes tended to have a loose structure of organization such as charts, columns and sub-headings that were unique to the researcher. Finally, researchers often create drawings of manuscript structures, either manually or digitally. These collation diagrams are essential to the researcher, and are most easily produced by hand. Often times, the structure of a binding will reveal oddities of book production or call into Best Practices for Digital Humanities Data Collection 15 question the textual content of a manuscript. These diagrams are often referred to countless times throughout research and used in publications. They are made most commonly with pencil and paper, but digital collation tools are becoming more usable. One researcher was able to visualize a binding pattern by creating a digital collation in Excel while keeping the data neat and organized. iv. Personalized Information Management: All manuscript researchers create their own personalized approach to study which is reflected in every aspect of their personal information management practices. Both interviews and analysis of raw data collected by researchers made very apparent that each researcher develops their own idiosyncratic data management system. There was a lack of standardized vocabulary, researchers disagreed on what labels to put on their data, and their organization grew organically as their data was produced. This presents a series of problems. The creation of standardized vocabulary is quite difficult within the field. “Things like how to record a manuscript’s quire formulas are pretty standard, but the words we use are all over the place,” said researcher A. For example, describing the cover of a manuscript can take many forms; one scholar might refer to the “boards”, whereas another might call it a “cover”, and another might lump it in with the general description “binding”. As researcher B points out, “This is why pictures and diagrams are very useful as they can transcend the vagaries of language.” More difficult is the phenomenon of the organic development of a data management style. Researcher A commented, “Because I was collecting a whole pile of data and I wasn’t sure what I would find I put everything into tiny categories; I started to refine a better system as I Best Practices for Digital Humanities Data Collection 16 went through. By that point I had missed earlier data.” Because of the restrictions of different repositories, it is difficult to return and retrieve the missing data. However, when asked if there was a particular feature of their data management system that they did not like, the researcher responded, “No, because if there was, I would change it. I wouldn’t know [I didn’t like a feature] until I found the magic work around difference.” This individualization of research processes makes it extremely difficult for the information professional to create a pre-defined research procedure. Since each researcher has created a method created through testing different strategies to find what works for them and what does not, they will often be resistant to strategies that have been deemed appropriate for the group which they have personally found ineffective. Problems for the information professional For the information professional, then, creating a data management strategy can be difficult. For those who want data that is sortable and easily malleable, creating Microsoft Excel tables or asking for checklists to be completed might clash with a researcher’s desire to take more photographs that cannot be sorted or to take notes with a more organic information structure by hand. Time is always a factor in these decisions as it puts further constraints on a data management plan. At some point in the research process, data collected on these trips will need to take on a digital form. Whether for analysis, preservation, sharing, or publication, all data will go through transformations to facilitate use. Given this inevitable outcome, information professionals need to work with scholars to identify a suitable point of intervention while communicating the benefit of such actions. Best Practices for Digital Humanities Data Collection 17 Our desire for order, through standardization, structure, and schemas often runs opposite to the more nuanced, organic, and personalized work of individual humanists. Terminology, itself sometimes a subject for scholarly argument, changes depending on the era of study or background of the researcher. Since humanities research is often a discipline given to individual study it leads to individual practices and vocabularies. As such, dreams of standardized workflows or even a taxonomy of vocabulary terms are fairly unrealistic in this climate. A compounding factor is the uniqueness of the material of study itself. No two manuscripts are exactly the same nor are the scholars who look at them. Attempting to predict every scenario, oddity, or change of interest is impossible. Best Practices A result of this study has been the development of general best practices that will work to serve the manuscript scholarly community and the greater digital humanities community simultaneously. In the near future, we plan to test our ideas in the field with the same subjects to determine whether the approach holds value. As Antonijevic states, “although generic tools have better potential to meet research needs of a broader set of humanists, there is also space for a smaller-scale and more experimental tool building” (97). Our hope is that by creating best practices that work within the context of our manuscript based-research project, these smaller- scale tools will have broader application to the wider digital humanities environment. The first practice is to work with scholars during the planning phase of the data life cycle. Information professionals should promote early planning as both beneficial to the overall research process and compliant with university and funding agencies. Our researchers preferred Best Practices for Digital Humanities Data Collection 18 preparation methods, with one noting, “I think the main thing is the more prep-work beforehand to be honest.” Scholars can lay out expectations, create resources that are mutually agreeable to both the scholar and the information professional, and address any concerns before reaching the repository. Information managers can and should create basic tables or checklists at this time to ensure that data is standardized, sortable, and searchable. The second practice, and perhaps most important, is to follow a community approach to data management solutions. Information professionals should incorporate scholars during planning and use their insights to develop solutions. Providing them with a taxonomy or rigid, generalized rules does little to encourage scholars to make use of them, regardless of benefit. But working in a more interdisciplinary way, information managers can borrow from different research communities of practice that fit researchers’ needs. For example, a field like archaeology - with its marriage of both scientific and artistic practices - could be used as a reference point for humanities data management practices. “In archaeology,” writes Antonijevic, “there is no real distinction between digital and non-digital tools” (49). Finally, the third practice is to develop an approach that aligns with scholarly practice as closely as possible. In her ethnographic study, Antonijevic recognizes, “humanities scholars envision tools that would enable seamless and multidimensional flow of research activities from one phase to another and back, across multi sided and multimedia corpora” (95). Indeed, our study participants imagined a futuristic world in which the collection of data in a library could be immediately organized, tagged, and connected to related information with little intervention. The first step in this direction would be careful consideration of the data and processes that surround it. The easier it is to incorporate protocols into research, the more likely scholars will make use of them and the greater the potential for data sharing, long-term preservation, and reuse. Best Practices for Digital Humanities Data Collection 19 Conclusion Based on our findings, we are beginning to develop an approach for the next stage of our research. Still in the preliminary planning stage, our hope is for the beginnings of an ontology, which allows flexible changes to its collection and structure, a formalized checklist outlining the essential data that need to be collected, and a template, both in analog and digital form, which will add structure to their research notes and facilitate the use of tools later on in the research cycle. All of this will be developed and vetted with the close consultation of researchers to ensure their cooperation and our mutual success. This data will then be usable throughout our wider digital humanities project, and the structures and workflows that we develop for data collection and curation can be used for future digital humanities projects. It will serve to validate the tools we create for digital manuscript scholars and also test our framework against the wider field of digital humanities. As the digital humanities grow and adapt to new environments and applications research data practices will come under necessary review. Although humanities scholars have always ‘managed’ their data, in that they track their research and use their own organizational systems, incorporating digital tools changes the way this process unfolds. In short, digital humanities research necessitates an approach perhaps more in line with the standardized scientific approach than the traditionally individualized nature of humanist inquiry. As information professionals, we need to understand these differences and reconcile them with current research data management practices. We must challenge our traditional notions of research data management by placing ourselves within the context of different fields and theories. Information professionals are well suited for this role since we understand both the potential and limitations afforded by different Best Practices for Digital Humanities Data Collection 20 data sets and practices. In short, we must understand and accommodate both the digital and the humanities in our own work. Future efforts in the realm of DH data management will only be successful if we stake out a path in which both sides of the digital humanities coin are recognized and considered. Best Practices for Digital Humanities Data Collection 21 References Abbas, June. 2010. “Structures for organizing knowledge: exploring taxonomies, ontologies, and other schemas”. New York, NY: Neal-Schuman Publishers. Antonijevic, Smiljana. 2015. “Amongst Digital Humanists. An ethnographic Study of Digital Knowledge Production”. New York, NY: Palgrave MacMillan. Awre, Chris, et al. 2015. “Research Data Management as a ‘Wicked Problem’.” Library Review. 356-371. Baofu, Peter. 2008. “The future of information architecture: conceiving a better way to understand taxonomy, network and intelligence”. Michigan: Chandos. Bolintineanu, Alexandra. 2017. “DH History and Data”. Lecture at Woodsworth College, CCR199H1S, Introduction to Spatial Digital Humanities, January. Briney, Kristin. 2015. “Data management for researchers: organize, maintain and share your data for research success.” Exeter, UK: Pelagic Publishing Ltd. Crompton, C., Lane, R. J., Siemens, R. G. 2016.” Doing digital humanities: Practice, training, research”. New York, NY: Routledge. Drucker. Johanna. 2011. “Humanities Approaches to Graphical Display”. Digital Humanities Quarterly. 5(1). Retrieved from http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html Erway R. et al. 2013. “Starting the Conversation: University-wide Research Data Management Policy”. Retrieved from: http://www.oclc.org.myaccess.library.utoronto.ca/content/dam/research/publicatio ns/library/2013/2013-08.pdf Funari, Maura. 2015. “Research data and humanities: a European context” Italian Journal of Library and Information Science 5(1): 209-236. Goven, Abigail and Raszewski, Rebecca. 2016. “The data life cycle applied to our own data”. Journal of the Medical Library Association. 103(1): 40-44. http://www.oclc.org.myaccess.library.utoronto.ca/content/dam/research/publications/library/2013/2013-08.pdf http://www.oclc.org.myaccess.library.utoronto.ca/content/dam/research/publications/library/2013/2013-08.pdf http://www.oclc.org.myaccess.library.utoronto.ca/content/dam/research/publications/library/2013/2013-08.pdf http://www.oclc.org.myaccess.library.utoronto.ca/content/dam/research/publications/library/2013/2013-08.pdf Best Practices for Digital Humanities Data Collection 22 Hall, Gary. Dec 2013. “Toward a Postdigital Humanities: Cultural Analytics and the Computational Turn to Data-Driven Scholarship.” American Literature 85(4): 781-809. Hays, David G. 1968. “Data management in the humanities”. Library, Information Science & Technology Abstracts, EBSCOhost (accessed October 4, 2016). http://www.dtic.mil/dtic/tr/fulltext/u2/668752.pdf Heuser, Ryan and Le-Khac Long. 2011. “Learning to Read Data: Bringing out the Humanistic in the Digital Humanities” Victorian Studies, 54(1):79-86. ISO/IEC 2382-2015. 2015. “Information Technology: Vocabulary”. Retrieved from: https://www.iso.org/obp/ui/#iso:std:iso-iec:2382:ed-1:v1:en Joshi, Margi and Krag, Sharon S. 2010. “Issues in Data Management” Science and Engineering Ethics 16:743-748. Kanare, Howard M. 1985. “Writing the laboratory notebook”. Washington, D.C.: American Chemical Society. Krier, Laura and Strasser, Carly A. 2014. “Data Management for libraries”. Library and Information Technology Association, Chicago: Neal Schuman Publisher. O’Reilly, Kelley et al. 2012. “Improving University Research Value: A Case Study” SAGE Open 2:3 (https://doi.org/10.1177/2158244012452576) Porter, Dorothy. 2013. “Viscoll: Visualizing physical manuscript collation”. Retrieved from: https://github.com/leoba/VisColl. Posner, Miriam. 2015. June 25. “Humanities Data: A Necessary Contradiction”. Retrieved from: http://miriamposner.com/blog/humanities-data-a-necessary-contradiction Posner, Miriam. 2016, April 19. “Data Trouble: Why Humanists Have Problems with Datavis, and Why Anyone Should Care”. Retrieved from: https://www.youtube.com/watch?v=sW0u1pNQNxc&t=209s Research Data Canada. 2017. “Original RDC Glossary”. Retrieved from: https://www.rdc- drc.ca/glossary/original-rdc-glossary/ http://www.dtic.mil/dtic/tr/fulltext/u2/668752.pdf https://www.iso.org/obp/ui/#iso:std:iso-iec:2382:ed-1:v1:en https://github.com/leoba/VisColl http://miriamposner.com/blog/humanities-data-a-necessary-contradiction/ https://www.youtube.com/watch?v=sW0u1pNQNxc&t=209s https://www.rdc-drc.ca/glossary/original-rdc-glossary/ https://www.rdc-drc.ca/glossary/original-rdc-glossary/ Best Practices for Digital Humanities Data Collection 23 Richardson, Julie and Hoffman-Kim, Diane. 2010. “The Importance of Defining “Data” in Data Management Policies” Science and Engineering Ethics 16: 749-751. Rittel, Horst W. J. and Melvin M. Webber. 1973. “Dilemmas in a General Theory of Planning” Policy Sciences 4: 155-169. Rimmer, J. and C. Warwick, A. Blandford, J. Gow and G. Buchanan. 2008. “An examination of the physical and digital qualities of humanities research.” Information Processing and Management 44: 1374–1392 Strasser, Carly; Cook, Robert; Michener, William; & Budden, Amber. 2012. Primer on Data Management: What you always wanted to know. UC Office of the President: California Digital Library. Retrieved from: https://escholarship.org/uc/item/7tf5q7n3 Tri-Agency Statement of Digital Principles on Digital Data Management. 2016. Retrieved from:http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html Whitmire, A. L., M. Boock., and S. C. Sutton. 2015. Variability in academic research data management practices. Program, 49(4): 382-407. Zins, C. 2007. Conceptual approaches for defining data, information, and knowledge. Journal of the Association for Information Science and Technology, 58(4): 479–493. doi:10.1002/asi.20508 https://escholarship.org/uc/item/7tf5q7n3 http://www.science.gc.ca/eic/site/063.nsf/eng/h_83F7624E.html