, then
). • Encoding a “box” of code is not simply a technical matter. It has social dynamics, including influencing how people make sense of culture and texts. Sometimes the technical and the social are at odds. • Still, the shift from syntagm-focused print text to paradigm-oriented digital text need not imply “dumbing” down a text (e.g., all links go to denotations in the dictionary) or determining how a reader interprets it. (See Module 1 on curious relationships. Here, it might be productive to think of Stein’s style through “boundary” or “hybrid” objects, like the Scythian Lamb. Often, the words in her poetry fit, quite purposefully, in multiple categories simultaneously. Her writing’s wonderfully monstrous.) What Now?: Applications • Select a specific paradigm for reading “A BOX”—a rule for reading, if you will. This will be your generative constraint for encoding your interpretation into the poem. Let’s look at William Gass’s etymology cluster of the poem for an example. • Now let’s review an example of a page written in XHTML. Note how the text is written in nested “boxes.” Again, the boxes, when opened, must be closed. • And let’s review an example page in CSS. Note how CSS stylizes the boxes written in XHTML. • In Notepad, practice encoding “A BOX” in XHTML and CSS (in two separate files) for the web. In the XHTML, include, at a minimum, the , ,, and tags and elements. In your CSS, stylize the XHTML body and the element. • After your encoding, in the XHTML file, please write a sentence or two explaining what your generative constraint for encoding was. • How did encoding the text influence your interpretation of it? How did that interpretation manifest in the encoding? How would your encoding influence how a reader interprets the poem? What’s Next?: Modules Ahead • Collecting Idea Pockets, Do You Believe in Angels? What to Consider during Future Modules • For a module in the near future, you’ll start thinking about refashioning a print-based project you’ve already started. How might paradigms and syntagms play a role in this refashioning? http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://books.google.com/books?id=U8qwGsiCCScC&printsec=frontcover&dq=word+world+gass&client=firefox-a#PPA94,M1 http://books.google.com/books?id=U8qwGsiCCScC&printsec=frontcover&dq=word+world+gass&client=firefox-a#PPA94,M1 Stillman, in Paul Auster’s City of Glass: My brilliant stroke has been to confine myself to physical things, to the immediate and tangible. My motives are lofty, but my work now takes place in the realm of the everyday. That’s why I’m so often misunderstood. But no matter. I’ve learned to shrug these things off. . . . You see, I am in the process of inventing a new language. Learning Outcomes for the Module • Understand how new media can be integrated into collecting information for, and collaborating in, digital humanities research projects. • Practice some basics in WordPress and Google Books, Maps, & Reader. The Paris arcades (iron and glass structures popular in the 1820s and 1830s) are, according to Walter Benjamin (1892–1940) in The Arcades Project: • “a center of commerce in luxury items” (3) • “a world in miniature” (Illustrated Guide to Paris qtd. in the text, 3) • “buildings that serve transitory purposes” (4) The collector and collecting play prominent roles in the arcades. Benjamin on collecting: • “What is decisive in collecting is that the object is detached from all its original functions in order to enter into the closest conceivable relation to things of the same kind” (204). • “Collecting is a form of practical memory, and of all the profane manifestations of ‘nearness’ it is the most binding” (205). • “The true method of making things present is to represent them in our space (not to represent ourselves in their space)” (206). • “The collector dreams his way not only into a distant or bygone world but also into a better one— one in which, to be sure, human beings are not better provided with what they need than in the everyday world, but in which things are freed from the drudgery of being useful” (9). For The Arcades Project, Benjamin’s method is collecting: snippets of writing put into juxtaposition, pockets of ideas that are contrived. (See Module 1 on contrivances, hybrid objects, and practicality, as well as Module 2 on association blocks and paradigms.) This method corresponds with the form of Benjamin’s book (see the hard copy), not to mention his research practices. In a way, Benjamin gave theory a new language, with his dictionary of collections. Implications for blogging and digital humanities research projects in this class • Research as Wunderkammer-making (see Module 1) • Relevance of the everyday to academic research and new media • The habit of documenting work (archive it now, arrange it later, delete nothing) • Articulating thoughts through paradigms first, then organizing the syntagms (e.g., compiling things before making a claim (“X causes Y”), rather than making a claim and finding the evidence to “fill it in” or support it) (see Module 2 on paradigms and syntagms) • Embracing a type of experimentation in your academic work—as you collect, being open to change, flexibility, and failure and avoiding the “theory hammer,” where everything in sight becomes a nail http://books.google.com/books?id=8WVIigYB8HQC&dq=city+of+glass&source=gbs_summary_s&cad=0 http://books.google.com/books?id=sKkoT9QyjD4C&q=arcades+project&dq=arcades+project&ei=T4-kSb6DH5-OkASU-9WNAg&pgis=1 http://books.google.com/books?id=sKkoT9QyjD4C&q=arcades+project&dq=arcades+project&ei=T4-kSb6DH5-OkASU-9WNAg&pgis=1 http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-2.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-2.pdf Mapping the Digital Humanities Module 3, Page 2 _________________________ • Class blog: Collaborative collection of microcontent in a networked space, which offers juxtapositions across our individual collections • Conjecturing (per Willard McCarty in Humanities Computing): “a collecting or throwing together of particulars” in an attempt to make sense of them (47) What Now?: Applications • Log-in to the class blog. (I’ll give you your username and password.) • Post your first entry, categorized under “introductions” and tagged as you find appropriate. Before you publish it: o Introduce yourself to the class in whatever way you wish. o Provide a link to your XHTML and CSS exercise (which should be at students.washington.edu/[yourUWnetID]/chid498/) o Include an image of the book or text you encoded in your exercise. If you can’t find one, then tell me. We’ll think of something relevant.) Of note, all images on the blog must be 400 pixels or less in width. You can always use a program to shrink them accordingly. • When you are finished, I will also show you how to post a video. Of note, all videos on the blog must be 200 pixels or less in width. • Now log-in to the class Google account (“mappingthedigitalhumanities”): o Note how a majority of our online class content is aggregated at iGoogle. Peruse it to see what’s there. o In Google Books, add a book that you’ll likely be using this quarter or that you think is relevant to the class. o In Google Maps, add something (e.g., a comment, an image, or a video) to the class map. We’ll also have to decide by what standards we’ll be collaborating to map the campus this quarter. o Time permitting, in Google Reader, add a relevant snippet from the web. • How is each of these a form of collecting? Of research? Of everyday life? • How is each of these a form of collaboration? And what kind of collaboration, exactly? Consider other ways you’ve collaborated that might differ from what we’re doing here. What’s Next?: Modules Ahead • Do You Believe in Angels?, Oh How Reductive What to Consider during Future Modules • When you have so much to collect for a given research project, then how do you refine your options? Data, but how to gather it? http://books.google.com/books?id=o_sHHQAACAAJ&dq=humanities+computing&ei=G52kSZbhApPOkAS21byrBg http://mappingthedigitalhumanities.org/ http://www.google.com/ig http://books.google.com/books?hl=en http://maps.google.com/maps http://maps.google.com/reader From Steve Tomasula’s The Book of Portraiture: The unexamined life is not worth living —Socrates, and www.homecams.com, the site that lets you see inside 1,024 private homes…. Learning Outcomes for the Module • Explore media differences between print and digital texts and the implications of these differences on remediation and intermediation projects. • Examine the distinctions between “remediation” and “intermediation” through some examples. Let’s give a look at an animation of the first newsreel from John Dos Passos’s The 42nd Parallel in tandem with a digitized version of it and its print version. Now, let’s unpack the relations between these three “versions” of the text through two terms: remediation and intermediation. Per Jay David Bolter and Richard Grusin, remediation is • “the representation of one medium in another” (45) • nearly synonymous with “‘repurposing:’ to take a ‘property’ from one medium and reuse it in another” (45) Per N. Katherine Hayles, intermediation is • the “complex transactions between bodies and texts as well as between different forms of media” (7) • includes “interactions between systems of representations, particularly language and code, as well as interactions between modes of representation, particularly analog and digital” (33) • “denotes mediating interfaces connecting humans with the intelligent machines that are our collaborators in making, storing, and transmitting informational processes and objects” (33) How do the two terms offer different readings of our three versions of Dos Passos? Consider what they emphasize (e.g., “medium,” “representation,” “bodies,” and “collaborators”). To help us along, we might consider what Hayles, in a different text, says are the characteristics of computer-mediated text. It • is “layered” (e.g., layer of text on a screen and code layer) (163) • “tends to be multimodal” (e.g., including “text, images, video, and sound”) (164) • exists such that “storage is separate from performance” (e.g., store files on a server in Seattle, read them in Santiago) (164) • “manifests fractured temporality” (e.g., reader does not control “how quickly the text becomes readable”) (164) Implications for Your Digital Humanities Project When thinking of “remediating” or “intermediating” print, the characteristics of computer-mediated text should factor what remediation or intermediation will afford—how either invites or pressures certain http://books.google.com/books?id=9ftlAAAAMAAJ&q=book+of+portraiture&dq=book+of+portraiture&ei=gbikSYzqPIfEkASW04FD&pgis=1 http://www.youtube.com/watch?v=TItBHwKORm8 http://books.google.com/books?id=TFlVe4ySsKQC&pg=PP1&dq=dos+passos+42nd+parallel&ei=IcGkSc6eOJTUlQS5ye23CA#PPA1,M1 http://books.google.com/books?id=TFlVe4ySsKQC&pg=PP1&dq=dos+passos+42nd+parallel&ei=IcGkSc6eOJTUlQS5ye23CA#PPA1,M1 http://books.google.com/books?id=TFlVe4ySsKQC&pg=PP1&dq=dos+passos+42nd+parallel&ei=IcGkSc6eOJTUlQS5ye23CA#PPA1,M1 http://books.google.com/books?id=NHwwHwAACAAJ&dq=remediation&ei=g8KkSbOlJ43qkQSM0v2fCw http://books.google.com/books?id=lwaRyOZfBzgC&dq=my+mother+was+a+computer&source=gbs_summary_s&cad=0 http://books.google.com/books?id=5gtoAAAAMAAJ&q=hayles+electronic+literature&dq=hayles+electronic+literature&ei=AcqkSeXaMYa4kwS-y72NAg&pgis=1 Mapping the Digital Humanities Module 4, Page 2 ----------------------------------------- readings and engagements. (See Module 1 on curious relationships and Module 3 on the class blog as a collection.) What Now?: Applications • Check out Marsha’s Throne Angels! As a parody of old school, low-tech personal web pages, what media is it remediating? How does it achieve humor in this remediation? • In the above line, what happens to our interpretations when we revise “remediating” and “remediation” to “intermediating” and “intermediation”? What’s Next?: Modules Ahead • Oh How Reductive, Making Swervy Things What to Consider during Future Modules • How might these angels, not to mention these distinctions between intermediation and remediation, inform your project? Which of the two terms do you prefer? Why? http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-3.pdf http://collection.eliterature.org/1/works/wittig__the_fall_of_the_site_of_marsha/Spring/index.html From Marianne Moore’s “The Student”: “When will your experiment be finished?” “Science is never finished.” And from her “People’s Surroundings”: there is something attractive about a mind that moves in a straight line— Learning Outcomes for the Module • Explore the implications of “reduction” and classification in digital humanities research. • Consider ways you might use specific data elements to methodically reduce the primary text(s) in your research project. Franco Moretti is a cartographer of sorts. He makes literary maps, with a science. In Graphs, Maps, Trees, he writes: “What do literary maps do . . . First, they are a good way to prepare a text for analysis. You choose a unit—walks, lawsuits, luxury goods, whatever—find its occurrences, place them in space . . . or in other words: you reduce the text to a few elements, and abstract them from the narrative flow, and construct a new, artificial object . . . And with a little luck, these maps will be more than the sum of their parts: they will posses ‘emerging’ qualities, which were not visible at the lower level” (53). Literary maps also afford what Moretti calls a “distant reading,” “where distance is however not an obstacle, but a specific form of knowledge: fewer elements, hence a sharper sense of their overall interconnection. Shapes, relations, structures. Forms. Models” (1). To flesh out “distant reading,” let’s look at a couple of examples (1 and 2) from Moretti’s Atlas of the European Novel: 1800-1900. What’s mapped? What’s not? One trick: How to avoid assuming that a distant reading fully accounts for its territory. Alfred North Whitehead called this slippage “misplaced concreteness.” Abstractions such as maps—in their richness and utility—are used to explain the territory. They become objectifying media that always generate reliable results (e.g., facts from maps) or uniform products (e.g., the same houses from a single blueprint). As Matthew Fuller observes: “The ruse of concrete misplacedness, of an ideally isolatable element, produces its offspring—but they are unruly” (104). Frankenstein’s creature animates this very unruliness (e.g., the uncontrollable monster of science), as does Stein’s poetry (e.g., “a rose is a rose is a rose,” where the definition of a rose is historically and culturally dependent). (See Module 2.) So does the image (right) of Astaire’s unruly movement; he looks positioned in the still shot, but photography needn’t give us the illusion that this event is isolatable and easily repeated. (I certainly couldn’t pull it off.) Consider, too, syntagms from Module 2. This shot of Astaire is in a sequence of shots. What comes before and after is crucial. Abstraction here is not what Ezra Pound means when he writes (in Poetry, 1913), “Go in fear of abstractions.” Pound’s on a different register. For him, the idea is to avoid writing in imprecise language what someone else already wrote precisely. Treat the thing directly. Use the exact word. For Whitehead and Moretti, abstractions are quite useful for collecting elements and showing their relations. When they are understood as the causes that produce homogenous territories, then misplaced concreteness occurs. (Consider, too, Nietzsche on how the cause is generated after the effect.) http://books.google.com/books?id=Bvdm-SmgMHcC&printsec=frontcover&dq=Marianne+Moore&ei=Md6lSdS_JpDUlQSB9YiKDg&client=firefox-a#PPA101,M1 http://books.google.com/books?id=YL2kvMIF8hEC&dq=graphs+maps+trees&printsec=frontcover&source=bn&hl=en&ei=5-GlSffDB4jTnQfBud2pBQ&sa=X&oi=book_result&resnum=4&ct=result#PPA53,M1 http://books.google.com/books?id=YL2kvMIF8hEC&dq=graphs+maps+trees&printsec=frontcover&source=bn&hl=en&ei=5-GlSffDB4jTnQfBud2pBQ&sa=X&oi=book_result&resnum=4&ct=result#PPA53,M1 http://books.google.com/books?id=ja2MUXS_YQUC&printsec=frontcover&dq=atlas+of+the+european+novel&ei=PuWlSZOaO4nwkQSAhIWKDg#PPA12,M1 http://books.google.com/books?id=ja2MUXS_YQUC&printsec=frontcover&dq=atlas+of+the+european+novel&ei=PuWlSZOaO4nwkQSAhIWKDg#PPA50,M1 http://books.google.com/books?id=ja2MUXS_YQUC&dq=atlas+of+the+european+novel&source=gbs_summary_s&cad=0 http://books.google.com/books?id=ojs-nAgV170C&q=science+and+the+mordern+world&dq=science+and+the+mordern+world&ei=3eqlSeKmJomulQSw9fSJDg&pgis=1 http://books.google.com/books?id=ojs-nAgV170C&q=science+and+the+mordern+world&dq=science+and+the+mordern+world&ei=3eqlSeKmJomulQSw9fSJDg&pgis=1 http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-2.pdf http://en.wikipedia.org/wiki/You%27re_All_The_World_To_Me http://en.wikipedia.org/wiki/You%27re_All_The_World_To_Me http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-2.pdf http://books.google.com/books?id=1FLIHNPucroC&dq=media+ecologies&client=firefox-a&source=gbs_summary_s&cad=0 http://www.english.illinois.edu/maps/poets/m_r/pound/retrospect.htm http://books.google.com/books?id=OwGPCsLiBlwC&dq=nietzsche+causes+effects+genealogy+of+morals&printsec=frontcover&source=bn&hl=en&ei=BwumSa75FYGEsQP_-dT1Dw&sa=X&oi=book_result&resnum=4&ct=result#PPA58,M1 http://books.google.com/books?id=Bvdm-SmgMHcC&printsec=frontcover&dq=Marianne+Moore&ei=Md6lSdS_JpDUlQSB9YiKDg&client=firefox-a#PPA55,M1 Mapping the Digital Humanities Module 5, Page 2 _________________________ Implications of the reductive method for your digital humanities project: • Textual/Literary maps are not only geographical maps. Think broadly about how to map the space of your text(s) (e.g., places in a novel, recurrence of concepts in a poem, publication dates in a genre/corpus). • Novel questions, complex issues, and creativity can emerge from reduction and classification. (Consider Oulipo!) In fact, reduction and classification can help generate interpretations you may have never considered. Moretti writes, “I had found a problem for which I had absolutely no solution. And problems without a solution are exactly what we need . . . we are used to asking only those questions for which we already have an answer” (26). • Reduction is a practical way of narrowing rich research projects, of keeping them simple. It forces you to not only isolate elements of the text, but to also articulate how you isolated them and how you are assessing/quantifying them. • Distant reading runs contrary (in some ways) to “close reading” in the humanities. Keep in this in mind. How will some audiences object to the distant reading you’re conducting? What Now?: Applications • In your clusters, work together so that each student selects three data elements that reduce the primary text(s) of her/his project. These elements would ostensibly lead to a textual mapping. • On the blog, list your three elements and address three things about each: (1) what kind of interpretation would it afford? (2) what of importance might it ignore? (3) how does it relate to—or join—the other two elements? What’s Next?: Modules Ahead • Making Swervy Things, Mapping in Stakes What to Consider during Future Modules • How does the kind of map you ultimately produce influence your choice of data elements and vice versa? http://en.wikipedia.org/wiki/Oulipo http://books.google.com/books?id=YL2kvMIF8hEC&dq=graphs+maps+trees&source=gbs_summary_s&cad=0 From Donna Haraway’s Modest_Witness@Second_Millenium.FemaleMan_ Meets_OncoMouse: Feminism and Technoscience: In Greek, trópos is a turn or a swerve; tropes mark the nonliteral quality of being and language. Metaphors are tropes, but there are many more kinds of swerves in language and in worlds. Models, whether conceptual or physical, are tropes in the sense of instruments built to be engaged, inhabited, lived. Learning Outcomes for the Module • Consider the implications of modeling for humanities research through examples from Google Visualization API. • Become familiar with how digital models enable the organization of difference and patterns. • Explore some possible options for modeling the data from your own project. According to Willard McCarty, a model is “either a representation of something for purposes of study, or a design for realizing something new” (24). These two understandings of models correspond with Clifford Geertz’s “denotative ‘model of’, such as a grammar describing the features of a language, and an exemplary ‘model for’, such as an architectural plan” (24). Here, models relate to maps. McCarty suggests that, like modeling, mapping “can be either of or for a domain, either depicting the present landscape or specifying its future—or altering how we think about it, e.g., by renaming its paces. A map is never entirely neutral, politically or otherwise” (33). (For more, see his “Modeling: A Study in Words and Meanings.”) McCarty also suggests that there are two features of modeling as a practice • Take knowledge for granted and just start modeling. Eventually, meaningful surprise occurs when the model generates an occurrence that cannot be explained (e.g., something is where it shouldn’t be), or when the model fails to generate the expected occurrence (e.g., something isn’t where it should be) (25-26). Both of these examples could also be called “contrivances,” or the bringing about of unintended events. (See Module 1 on knowledge production, curiosity, and the Wunderkammer.) • Perceive the manipulability of information. Models are repeatedly altered and must be interactive (26). Digital models are arguably more flexible, interactive, and manipulable than print ones. How, then, does a map become Haraway’s nonliteral swervy thing, or Geertz’s “model for”? How might it alter common perceptions of history, of landscape, of culture, of literature? Or how might it become a vehicle for humor or political action? (We’re really going to unpack these questions in the next module.) Implications for your digital humanities research projects Modeling entails the • Introduction of, and interaction between, media layers (e.g., the spreadsheet, the motion chart, the notes, the text, and the essay) in the stages of research and collecting data. (See Module 4 on intermediation and remediation, and Module 3 on collecting and conjecturing.) http://books.google.com/books?id=ftO4jLQ2RM8C&dq=haraway+modest&client=firefox-a&source=gbs_summary_s&cad=0 http://books.google.com/books?id=ftO4jLQ2RM8C&dq=haraway+modest&client=firefox-a&source=gbs_summary_s&cad=0 http://books.google.com/books?id=o_sHHQAACAAJ&dq=Humanities+Computing&ei=7SumScXVCYL8lQSd_fmJDg http://www.digitalhumanities.org/companion/view?docId=blackwell/9781405103213/9781405103213.xml&chunk.id=ss1-3-7&toc.depth=1&toc.id=ss1-3-7&brand=9781405103213_brand http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-7.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-4.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-3.pdf Mapping the Digital Humanities Module 6, Page 2 __________________________ • Mobilization of theory through what McCarty calls “the continual process of coming to know by manipulating things” (28). In other words, the swervy thing is also a theory thing: it’s a material object (that has force and is used by people in certain ways) and a concept repeatedly put into action. • Integration of quantitative approaches and classifications into critical approaches to history, culture, and literature. • “Distant reading” of texts and discovering a problem without a solution. (See Moretti’s comments in Module 5.) • Challenges of: o (1) Synthesizing various modes of perceiving, storing and transmitting information, o (2) Selecting the most effective data elements (for a swervy thing), o (3) Finding the most persuasive model for your audience(s) and purpose(s), and o (4) Determining whether you are representing information (“model of”) or designing for the realization of the new (“model for”). What Now?: Applications • Check out Google Visualization API library. Scroll through the options (e.g., motion chart, geo- map, and annotated time line) with your project in mind. • For each that interests you, look (at least) at the examples provided, the data format, and configuration options. Considering the aims of your project, as well as your elements (from Module 5), does any of the visualizations work for you? Why or why not? • As a class, we’ll work through an example motion chart using a spreadsheet as a data source. • When we are finished, on the blog, respond to the following in your own entry: o (1) Given this cursory look at modeling, what obstacles do you foresee? o (2) For your project, are you more invested in modeling for or modeling of? Why? o (3) How do the visualizations affect your perception of your elements (from Module 5)? What might need to change from that last module? o (4) What other kind of visualizations or models would you like to work with in class? What’s Next?: Modules Ahead • Mapping in Stakes, What’s Data? What to Consider during Future Modules • Soon, you’ll be submitting data for your project. Regardless of whether you are modeling for or modeling of, how will you make your data interesting, and how will it be organized? What audience(s) do you have in mind, and what matters to them? http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://code.google.com/apis/visualization/documentation/gallery.html Protagonist, from Ralph Ellison’s Invisible Man: All things, it is said, are duly recorded—all things of importance, that is. But not quite, for actually it is only the known, the seen, the heard and only those events that the recorder regards as important that are put down, those lies his keepers keep their power by. . . . Where were the historians today? And how would they put it down? Learning Goals for the Module • Become familiar with some critical approaches to technology and how to apply one or two of those approaches to your own project, especially to how you are gathering data. • Determine—through examples and an assessment of your data elements—how those critical approaches might help you increase the stakes of your project. What or who a map excludes, as well as what or who it enables, are arguably its most important aspects. Often, humanities research projects attend to how objects, such as maps, function in certain social or cultural domains—how, for example, maps render invisible certain people, places, and events and how to change existing maps or create new ones accordingly. Indeed, maps are ways of writing and classifying history, of putting it down. A question, then, is how to recognize what’s missing from your own work, why what’s missing matters, and how to revise, if need be. Before we start there, let’s look at an example mapping project, “Queering the map: The Productive Tensions of Colliding Epistemologies,” by Michael Brown and Larry Knopp. Here’s the abstract from their article: “Drawing on and speaking to literatures in geographic information systems (GIS), queer geography, and queer urban history, we chronicle ethnographically our experience as queer geographers using GIS in an action-research project. We made a map of sites of historical significance in Seattle, Washington, with the Northwest Lesbian and Gay History Museum Project. We detail how queer theory/activism and GIS technologies, in tension with one another, made the map successful, albeit imperfect, via five themes: colliding epistemologies, attempts to represent the unrepresentable, productive pragmatics, the contingencies of facts and truths, and power relations. This article thus answers recent calls in the discipline for joining GIS with social-theoretical geographies, as well as bringing a spatial epistemology to queer urban history, and a cartographic one to queer geography.” With this project as a case study, how might we consider how “Queering the Map” could emerge from different critical approaches to the map as a technology? Below are five possible approaches, which are broadly framed and adopted from Roel Nahuis’s and Harro van Lente’s “Where Are the Politics? Perspectives on Democracy and Technology.” • Intentionalist: How is a map (as an artifact representing the values of mapmakers and specific social groups) a materialization of power and authority? • Proceduralist: How is mapping (as a set of social practices with rules and agreed-upon guidelines) a negotiation between interested groups? And who do these groups represent? • Actor-Network: How is the map (as an artifact that affords and forbids certain actions) the result of a struggle between forces or programs, and how does it affect people’s actions on a local level? http://books.google.com/books?id=KBYUH8Qk5HIC&q=invisible+man&dq=invisible+man&ei=EHzKSfGNKJqGkATBuZWKBg&pgis=1 http://faculty.washington.edu/michaelb/index.html http://www.d.umn.edu/~lknopp/ http://sth.sagepub.com.offcampus.lib.washington.edu/cgi/reprint/33/5/559 http://sth.sagepub.com.offcampus.lib.washington.edu/cgi/reprint/33/5/559 http://www.informaworld.com.offcampus.lib.washington.edu/smpp/content~content=a791001958~db=all~order=page http://www.informaworld.com.offcampus.lib.washington.edu/smpp/content~content=a791001958~db=all~order=page Mapping the Digital Humanities Module 7, Page 2 __________________________ • Interpretivist: How are the map (as a text with multiple meanings) and the mapmaker (as an participant with certain investments) influencing and influenced by the discourse in which they are embedded? • Performative: How is the setting of mapping practices (as activities influenced by particular biases) enabling people to act the way that they do, and what other approaches to the setting would somehow surprise or lay bare biased mapping practices? As a class, let’s unpack these approaches a bit. Then, in your clusters, you can decide—in the context of the “Queering the Map” case study—which two critical approaches you find most relevant. After you chat and blog (with one entry per group) about your decisions, then we’ll reconvene and discuss. Implications for your digital humanities research projects • Digital projects that are motivated by and well aware of their specific critical approaches to technology will be more persuasive—they will have higher stakes—than those projects where the critical approach is loosely articulated or even nonexistent. • Critical approaches to technology allow digital humanities projects to do more than simply “represent” information in new forms (e.g., digitize print texts). They allow them to produce new knowledge. • Note how these five critical approaches relate to Module 5 (on modeling “of” and “for”) and Module 1 (on emergent media and knowledge production). • Selecting one or two of the approaches above and mobilizing it in your own work might be a way of focusing your project. • These critical approaches affect both how projects are theorized and how they are practiced (e.g., your project as an idea and your project as a process of gathering and organizing data). What Now?: Applications • Return to your data elements from Module 5 and to your workflow. In your own blog entry, please respond to the following questions: o How, if at all, are your data elements emerging from one or several of the critical approaches listed above, and to what effects? If they don’t appear to be emerging from one of these approaches, then explain why you think that is the case. o If you were to revise your data elements along the lines of one of these approaches, then what would change? (For example, would you cut an element? Add one? Revise them so that they relate differently? Change how they are worded?) • Time permitting, let’s discuss your entries in your clusters and as a class. What’s Next?: Modules Ahead • What’s Data?, Close Reading What to Consider during Future Modules In the next module, you’ll be gathering data based upon the data elements you selected in your workflow. Given this module, what kind of data do you expect? How might you make that data more interesting? Riskier? More provocative? http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/assignment-3.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-8.pdf From Linda Nagata’s Limit of Vision: Virgil squeezed his eyes shut, wondering if they ever would have the power to heal death. The human body was a machine; he knew that. He had looked deep into its workings, all the way down to the level of cellular mechanics, and there was no other way to interpret the processes there than as the workings of an intricate, beautiful, and delicate machine. Machines, though, could be repaired. They could be rebuilt, copied, and improved—and sometimes it seemed inevitable that all of that would soon be possible for the human machine too. Learning Outcomes for the Module • Understand how data elements (as categorizations of data) are imbricated in material practices, which are associated with actual people and places. • Consider the importance of scope in assessing your data. • Learn some “textured” language for assessing your own data and data sources. In Nanovision, Colin Milburn writes about how nanotechnologists and nanoscientists can “fashion their work as a mapping practice, an effort to contain novel territory within a representational topography that is pictorial, rhetorical, and numerical all at the same time—a ‘data map,’ a visual rendering, and a descriptive survey of the landscape that transforms its various physical properties into property as such” (65-66). Put broadly, nanovision, or, for instance, a researcher’s ability to see objects and bodies at the atomic level, translates the microscopic world into a landscape to be explored, mapped, and territorialized—to visualize it, give it a language, and quantify it. The world as we know it is rendered strange through a new scale. For one, bodies and objects behave differently when we zoom in, when we use technologies such as scanning tunneling microscopes to see what the human eye cannot. What’s more, if we can now map what we cannot see with the naked eye, then we can also start to manipulate and shape it. In short, the nanoworld becomes a world of new affordances and possibilities. And as Milburn points out: “Indeed, a vocabulary of western exploration and ‘Manifest Destiny’ plays a powerful epistemic role in nanoscience research” (67). Expand vision? Expand human control and domain over the world (67). (Martin Jay, among others, refers to this as “ocularcentrism.”) Perhaps a video spells it out better. Let us see. Implications for your digital humanities research projects • With maps, we tend to think of how to make things that are larger than us (e.g., the whole world) smaller than us (e.g., a map of the world). Yet nanotechnology demonstrates how mapping is really a matter of scope—of expanding our scale (e.g., applicability) and range (e.g., breadth) of knowledge, whether that is seeing the entire world or seeing the minute, inner-workings of the body. The scope of your data (and not necessarily the amount of it) is thus always something to consider. Of course, thinking big isn’t always the best option, and your acute knowledge of your project’s scope—of why you are setting its scale and range the way that you are—will only enhance how persuasive audiences find it. • While nanotechnologies afford us increasing freedom (e.g., of choice, of movement), freedom is not the same as control. For instance, our bodies still function in ways we cannot see, let alone grasp. Increased access to information about them does not imply that all material problems will be easily remedied. Put another way, political issues cannot be resolved technologically. (See Wendy Chun and Module 7 here.) Persuasive digital projects often recognize that knowledge does not exist in objects, bodies, technologies, or information alone, but rather in the material relationships between them. (Some refer to these relationships as ecologies.) http://books.google.com/books?id=W4XuL513rgkC&printsec=frontcover&dq=limit+of+vision&ei=ZrnOSePyDoqUkQShprm4AQ#PPA17,M1 http://www.nano.washington.edu/index.asp http://books.google.com/books?id=_T-BfsiIWCoC&printsec=frontcover&dq=martin+jay+vision&ei=HMHOSeihLIXElQTjx6iiAQ#PPA8,M1 http://books.google.com/books?id=46J9NQAACAAJ&dq=Nanovision&ei=D7vOSZncGIzUkwSs58mqAQ http://www.vimeo.com/3835663 http://books.google.com/books?id=M-RZAAAACAAJ&dq=control+and+freedom&ei=mMbOSd2dFIjSlQTlppi5AQ http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-7.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf Mapping the Digital Humanities Module 8, Page 2 _________________________ What Now?: Applications • For this module, I asked you to bring in some data. More specifically, I asked you to actually cut up your print project—to cut into print, gather what you need, and consequently cut out the rest. I also asked you to arrange your data according to your data elements. Now, with that arranged data in front of you, let’s ask the following questions of what we’ll call your data’s “texture.” These metaphors, borrowed in part from Sorting Things Out by Bowker and Star, will be means of reminding ourselves of your data’s materiality and its scope. Comparable to how nanotechnologists speak of carbon nanotubes, let’s speak of your data as threads: o How “thick” of a thread is it? (That is, how well does it account for the range of possibilities suggested by your data elements?) o How “durable” of a thread is it? (That is, how would it hold up to critique? To what critical approaches (see Module 7) is it accountable?) o How “tightly or loosely woven” is it? (That is, how broadly or narrowly does it describe the place, people, or things it’s describing?) o How well are your data sets “knotted” or “tied” together? (That is, how do they relate, and how do they contradict/complement each other?) • With these questions in mind, please, in your own entry, blog about miscellany. But by “miscellany,” I’m being quite specific. After conducting the above material assessment of your data’s scope: o What do you think you “cut out” from the data sources and archive you’ve been working with? What’s in the remnants? In “zooming in” on specific elements of the text, what did your nanovision occlude, and to what effects on your project? Especially consider how tightly or loosely woven the data is. o What are the limits of your data sources and archive? Their limits of vision? Do you need to look to more texts? Why, or why not? Especially consider the thickness and durability of your threads. o Now that you have some data, how, if at all, did the data elements (as constraints) help you gather data that surprised you? Put another way, what, if anything, did you think you had under control and all mapped out that, in fact, you do not? Especially consider the ties and knots across your data sets. If were not surprised, then why? What’s Next?: Modules Ahead • Close Reading, Assessing Your Project What to Consider during Future Modules • In the near future, you’ll be producing a data model, which is essentially an abstraction of how you are organizing and processing your data. In composing such an abstraction, what are some ways to remind yourself of your data’s texture? Of its material embeddedness and implications? Good luck, humans. http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-7.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/assignment-4.pdf http://books.google.com/books?id=xHlP8WqzizYC&dq=sorting&source=gbs_summary_s&cad=0 From The Verbal Icon, by W.K. Wimsatt and Monroe C. Beardsley: One must ask how a critic expects to get an answer to the question about intention. How is he to find out what the poet tried to do? If the poet succeeded in doing it, then the poem itself shows what he was trying to do. Learning Outcomes for the Module • Understand what might be some critiques of “distant reading” and how to engage those critiques. • Collaboratively annotate a text that has been popular in the class thus far and see what collaborative annotation affords. • Recognize some possible tensions between “distant reading” and “close reading” and articulate why that tension is productive. Put this possibility on the table: For the entire quarter, you’ve been compiling data on an author’s entire corpus—let’s say Virginia Woolf’s. More specifically, you’re studying what places are referenced in her novels, and you’re locating those places, together with relevant quotes from their texts, on a single map. When the quarter’s finished, it’s quite possible that you haven’t read—in its entirety—a single book by Virginia Woolf. My first suggestion? Read a book by Virginia Woolf. My next suggestion? Consider what someone (e.g., a literary critic, a fan of Woolf) would value as “close reading,” where careful attention is paid to the words and ideas of a text (and often just the text alone). Select passages of the text are then scrutinized in a work of criticism. (You’ve likely done this, no?) Actually, for this module, let’s conduct a close reading on a text that’s been popular in the class. For now—of course, subject change—I’ll go with Martin Heidegger’s “The Question Concerning Technology,” first published in 1954. I select it primarily because it’s essentially a canonical (or ubiquitous) text as far as the culture, philosophy, history, and sociology of technology is concerned. Regardless of the text (which should be only a chapter or an article), we’ll go through it, in class, line by line, and annotate it using Microsoft Word. I’ll then circulate that annotated text for your future reference. During the module, it might not be a bad idea for a number of us to play the role of transcriber, taking down the annotations, in the margins, as they emerge. After all, transcription is a matter of interpretation, and it’s labor-intensive. Switching up transcribers will thus give people breaks and generate a broader range of experiences and questions during the exercise. Once the text is annotated, we’ll ask what we’ve learned from the close reading and how it differs, if at all, from the work you’ve been doing all quarter. Implications for your digital humanities research projects • Distant readings are often, fairly enough, critiqued as ignoring the principles and benefits of close reading. While assessing your project and speaking to it, keeping these critiques in mind is a smart practice. • Rather than eschewing close reading for distant reading (or vice versa), a more complex response is to note how the two differ, to what effects, and why. For instance, a literary historian might be more invested in a distant reading, while a New Critic might be more invested in a close reading. Both afford distinct and (when done persuasively) equally important readings. http://books.google.com/books?id=KmrUKcU2JUoC&printsec=frontcover&dq=the+verbal+icon&ei=4BrQSZe_B5-OkAT_ndS2AQ#PPA4,M1 http://en.wikipedia.org/wiki/Virginia_Woolf http://books.google.com/books?id=QEHI-uN0tmgC&dq=mrs.+dalloway&ei=XB3QSfPpC4zMlQT3qdCvAQ http://books.google.com/books?id=kVc9AAAAIAAJ&printsec=frontcover&dq=basic+writings+heidegger&ei=BR7QSaGhMI3wkQSlsMGuAQ#PPA283,M1 http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf Mapping the Digital Humanities Module 9, Page 2 __________________________ • If you’ve been asked to conduct close readings in the past, then you might consider how your project for this class has shaped your learning and humanities research differently. • Collaboratively annotating a text, where a screen and document are shared, is one digital humanities practice that highlights how subordinating individual investments toward a shared goal (e.g., annotating a text as a group and collectively determining the benefits of close reading) becomes the vehicle for mutual, technology-focused learning. (See Chris Kelty here.) What Now?: Applications • As a class, we’ll create a document that puts our annotated text into conversation with your individual projects. In so doing, we should draw upon each of them for evidence and address the following: o What does a distant reading afford humanities research, especially digital humanities research? How? o What does a close reading afford humanities research, especially digital humanities research? How? o How are the two approaches coextensive or complementary? In tension? o How, if at all, do computers and new media figure into the above questions? • If we have time, then you should, in your own blog entry, respond to this exercise with your own thoughts. Things to consider: What concerns do you have about distant reading? How, if at all, is it at odds with other ways you’ve practiced reading and criticism? What approach(es) do you prefer and why? What’s Next?: Modules Ahead • Assessing Your Project What to Consider during Future Modules • For the last module, you’ll be thinking through how to assess your project. How might this conversation between close and distant reading figure into your assessment? By focusing, perhaps, on what your project is not doing, what have you learned about what it is doing persuasively? http://books.google.com/books?id=wC2stJS83rYC&pg=PP1&dq=chris+kelty&ei=5VjQSbroAZG4kwTxkYCfAQ#PPA228,M1 Walter Benjamin, in "Theses on the Philosophy of History”: Thinking involves not only the flow of thoughts, but their arrest as well. While it’s tempting to spend the balance of the quarter aggregating data and piling on media, I say we stop for a second and start building things. But! This one’s not the whole idea. It’s a thought piece. And it should consist of the following: • As a field of study, what you think the digital humanities does, • How you think its practitioners do what they do, and • Initial and interesting ideas for at least one digital humanities project that you could develop this quarter. At least one. By “you,” I mean you in particular. Be selfish, people. How you shape this information is up to you. You can essay, diagram, video, draw . . . The medium is not the matter. Pick what you prefer. However, you should figure this in: your medium will influence how you (and your audience) create and think through a message. (Consider “remediation” and “intermediation” from Module 4, as well as “syntagms” and “paradigms” from Module 2.) And remember: A thought piece is a riff. The point is to conjecture. Speculate. Toss out a rich idea or two or three, and later we’ll talk about making the whole thing happen. Outcomes Your thought piece should: • Demonstrate a general understanding of how Modules 1 through 4 relate to the digital humanities as a field and a set of practices (e.g., apply some of the concepts from the modules, think through how to use new media for new forms of scholarship, or unpack the distinctions between print and digital texts). • Give your audience (that is, your 498 peers and me) a sense of why your project(s) would be filed under “digital humanities” and what’s interesting—provocative, even—about your idea(s). Before and during the process, consider: • Reviewing the visualization/diagram of the class (in the syllabus). What’s familiar? What isn’t? • Giving the class modules another gander. What appeals? What confounds? • Looking back at some of your old work from other classes. What have you written on? Studied? What do you care about? What’s curious, and what could be developed? Conversation Coming Soon Your thought piece is due—on the class blog (embedded, via a link, or as text)—before class on Wednesday, April 15th. It will serve as a vehicle for conversation during your first conference with me. Which is to say: I’ll attend to it before we meet. That way, we don’t start cold. I swear. The thought piece will be graded on the 4.0 scale, and it can be revised once. It’s part of your individual project grade. If you have problems with the blog, then let me know. http://books.google.com/books?id=AFJ7dvSdXPgC&dq=illuminations&ei=Q4e6Sd6_O5WWkATG6oiCDA&pgis=1 http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-2.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-4.pdf Ishmael, in Herman Melville’s Moby Dick: God keep me from ever completing anything. . . . Oh, Time, Strength, Cash, and Patience! Michel Eyquem de Montaigne in “Of Cannibals”: I am afraid our eyes are bigger than our bellies and that we have more curiosity than capacity. We grasp at all, but catch nothing but wind. You’ve made a thought piece. We’ve talked about it. Now it’s time to sketch out what—aside from time, strength, cash, and patience—is needed to put a thought in motion. Of course, a thought moving isn’t a thought complete. Keep that pithy line in mind as you respond to this prompt. Or, to contextualize: The goal for the quarter isn’t to finish a research project; it’s to build one worth developing in the future. Recall Shelley Jackson, from Module 1: “there can be no final unpacking.” Determine, then, what you can grasp—what’s feasible—between now and June-ish. How practical, especially for humanists. Let’s give such practicality a name: “needs assessment.” However! As opposed to the image below, your “needs” here won’t simply be downloaded for regurgitation later. You’ll have to come up with them on your own, with some guidelines. As with the first prompt, the medium is yours. But please respond to the following: • What do you want from your emerging project? Or, what is your objective, and what’s motivating it? • What do you need (e.g., knowledge, experience, materials, and practice) to pull everything off? Or, to return to Moretti and Module 5 for a sec: For now, what knowledge are you taking for granted? • Where are you going for evidence or data? That is, what texts will you be working with? Outcomes Your needs assessment should be: • Specific, pointing to the particular knowledge you need and want (e.g., XHTML, GIS, literature review, and media theory/history) and what materials you should have (e.g., software, time, and books). • More refined and focused than your thought piece. (If the thought piece was about broad possibilities, then your needs assessment is about concrete ones.) • A way of responding to your first conference with me. (Reference our conversation and expound upon it.) • Aware that its audience consists of your peers and me. (Feel free to use names or speak to particular bits from class.) Before and during the process, consider: • What is realistic for a quarter? • How do you avoid reinventing the wheel? What did you learn from another course or project that could be developed and re/intermediated? • When the spring’s finished, what kind of project will be most useful for you? Think before and beyond now. http://books.google.com/books?id=cYKYYypj8UAC&dq=moby+dick&ei=Boq6SbL7K5r6kASbl6yJCA http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-4.pdf Mapping the Digital Humanities Assignment 2, Page 2 __________________________ Critiques Soonish Your needs assessment is due—on the class blog (embedded, via a link, or as text)—before class on Monday, April 20th. You will share it during in-class critiques. During those critiques, you’ll also respond to your peers’ assessments. The needs assessment will be graded on the 4.0 scale, and it can be revised once. It’s part of your individual project grade. If you still have problems with the blog, then let’s talk. I might need to revise or address something. Carl von Clausewitz in On War: Everything in strategy is very simple, but that does not mean that everything is very easy. Fair enough, Carl, but that doesn’t mean we can’t at least try to make things a tad easier, right? Despite the fact that plans and thoughts and needs and life are all subject to change, sketching out an agenda, through some simple elements, is rarely a bad idea. The key is—to borrow from Chris Kelty—“planning in the ability to plan out; an effort to continuously secure the ability to deal with surprise and unexpected outcomes” (12). So how about what we’ll call a “workflow”? Again, the medium is yours, but please transmit the following: • What is your research question? (Try one that starts with “how.”) • What are the data elements for your project? (We have already discussed these in class; and, if all’s on par, then you should have already drafted them.) • How are you animating these elements (e.g., through what medium—for example, a motion chart, a geomap, or a timeline—are you shaping information)? • What do you expect to emerge from this animation (e.g., what will information look like, how will the audience interpret it, or what might you learn from it)? • Ultimately, what are you going to do with it (e.g., how will it influence your current work, how might you use it in other classes, how will it persuade audiences, or how will it change the ways in which you perceive the text(s) you’re working with)? Outcomes Your workflow should: • Be driven by a concrete and provocative research question, which emerges from your responses to Prompts 1 and 2. • Be very specific about the data elements you are using. Name them. List them out. • Be very specific about the kind of animation you are using, including some knowledge of how that animation allows you and your audience to produce knowledge—or how that animation is a “swervy thing.” • Demonstrate that you are aware of why you are using the data elements and animation you’re using and what might be the implications of your decision (e.g., what are the benefits and deficits, or the hot ideas worth some risk and not-so-hot possibilities that are deterring you). • Aware that its audience consists of your peers and me. Before and during the process, consider: • How your digital project—through computational animation—demands a different mode of thought than, say, writing a paper. How might you take advantage of this difference? What does it afford? • What options you have for animation, what you are most comfortable with, and—again, again, again—what seems feasible for a quarter. http://books.google.com/books?id=fXJnOde4eYkC&printsec=frontcover&dq=on+war&ei=7ZG6SfWZLIWekwS-3LT8Cw#PPA110,M1 http://books.google.com/books?id=wC2stJS83rYC&printsec=frontcover&dq=kelty+free+software&ei=QoLJSdLcDI6QkASH1pCCDg#PPA12,M1 http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-5.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-6.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-6.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/assignment-1.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/assignment-2.pdf Mapping the Digital Humanities Assignment 3, Page 2 __________________________ • In your previous work, what terms or concepts pop up most often, which ones interest you the most, which ones you’d rather do without, and how those terms would translate in a computational approach. Toward Making Animation Matter Your workflow is due—on the class blog (embedded, via a link, or as text)—before class on Monday, April 27th. In class, we’ll get theoretical and address the “stakes” of your animation and data elements, or how you can make them matter and for whom. The workflow will be graded on the 4.0 scale, and it can be revised once. It’s part of your individual project grade. Keep me posted with questions and quibbles. http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-7.pdf From Dyeth in Samuel R. Delany’s Stars in My Pocket Like Grains of Sand: Someone once pointed out to me that there are two kinds of memory (I don’t mean short- and long-term, either): recognition memory and reconstruction memory. The second is what artists train; and most of us live off the first—though even if we’re not artists we have enough of the second to get us through the normal run of imaginings. A constant challenge in academic work, then, is to model something that reshapes the material with which you and others are already familiar—to re-construct and re-imagine history, culture, texts, territories, and places through new paradigms, without simply recognizing them as what you already know, using the same blueprints, strategies, and maps as before. To produce a contrivance. To project a world and animate it. To swerve. I’m not saying it’s easy. It’s not. But give it a whirl. You’ve thought about your project (in your Thought Piece), assessed its possibilities (in your Needs Assessment), made it elemental (in your Work Flow), and speculated on what might happen come June (during in-class workshops). Now’s the time to give people the classification system for your information collecting and some results—that is, your data model and some data. This time around, the medium isn’t yours. Sorry. Please complete the data model worksheet. However, when you provide your data, you can choose the medium. For instance, feel free to use a spreadsheet, provide copies of a log, or complete the table I provide at the end of the worksheet. Outcomes Your data model should be: • Extremely specific, providing your audience with exact details for each of your data elements, following the form provided, and leaving no necessary field blank. • A cogent means of giving a reader who is not familiar with your project a sense of how you are collecting and organizing your data. Your elaboration on your data model should be: • A mobilization of terms and concepts from class (e.g., classification, paradigms, re/intermediation, collecting, affordance, intent, procedures, bias, discourse, animation, and distant reading), putting them to work in the context of your project. • Concrete and situated in your project. Abstract language should be avoided. Responses to each question should be based on examples from and exact instances in your project. • Aware of the limits and benefits of the decisions you are making and how those decisions will affect your target audience and your own learning. Remember: you can’t do everything, but you should be able to account for how you are mapping your project. Your data should be: • Well-organized and specific, based upon the framework outlined in your data model. • Sufficient enough to—at this juncture in your project—allow you to make some preliminary findings based upon your research. (However, the data does not need to be complete. You http://books.google.com/books?id=ngHQ_ZghbbYC&printsec=frontcover&dq=stars+in+my+pocket&source=gbs_summary_r&cad=0#PPA183,M1 http://mappingthedigitalhumanities.org/wp-content/uploads/2009/04/data-model.doc Mapping the Digital Humanities Assignment 4, Page 2 __________________________ might still be in the process of collecting more. In the worksheet, I require three rows of data. I recommend collecting much more, if possible. For some projects, twenty to forty rows will be necessary.) Before and during the process, consider: • What you expect to emerge from your animation at the quarter’s end. How do those expectations resonate with your data model? • Returning to what you churned out in response to Prompts 1 through 3. What’s your trajectory, collector? • How, broadly speaking, this approach to humanities work relates to your previous coursework and experiences, and to what effects. • Revisiting the modules and contacting me and/or your peers with any questions you have about the terms and concepts used. Another Review Coming Soon Your data model worksheet is due—on the class blog (embedded, via a link, or as text)—before class on Monday, May 11th. During that class, your worksheet will be peer reviewed, and I will grade your worksheet based on that peer review. The data model will be graded on the 4.0 scale, and it can be revised once. It’s part of your individual project grade. Hope all’s coming along well. As always, let me know about your concerns. http://mappingthedigitalhumanities.org/?page_id=235 From Hervé Le Tellier’s “All Our Thoughts”: I think the exact shade of your eyes is No. 574 in the Pantone color scale. Ah . . . the abstract: the oh so academic act of summarizing work that’s often still in progress. Your project’s not finished, you’re still not sure if everything coheres, and the thing’s so deep you can’t dare reduce it to a single paragraph. I know this. I don’t particularly enjoy writing abstracts, either. But abstracts are necessary beasts. Aside from giving your readers a quick snapshot of your research, they also force you to articulate—in a precise fashion and in exact numbers—what, exactly, you are up to. To the details, then. Your abstract should include: • The aim of your project and its motivation/purpose, • Your research question (although it does not need to be articulated as a question), • Your method (how you did what you did), • Your results (what you learned), • The implications of your results (or why your research matters), and • The trajectory of your project (what you plan to do with it in the future). This one should be in words. Despite Blake’s abstract of humans (above-right), we’re going with the industry standard here. Outcomes Your abstract should: • Be no more than three hundred words. • Be one concise and exact paragraph. • Include a title for your project, three keywords for it, and a one-sentence tagline describing it. (The keywords and tagline are not part of the three-hundred word limit.) • Be written for educated, non-expert audiences (e.g., academic types who might not be familiar with the digital humanities) and avoid jargon. • Summarize your work as it stands, instead of becoming an idea hike into unventured regions (that is, avoid speculations). • Mobilize terms and concepts from the class, again, for educated, non-expert audiences. • Demonstrate, through clear language, how your project’s motivation, question, method, results, and trajectory are related. • Follow the form below on page two. Before and during the process, consider: • How your data model is one way of thinking through your method. • Returning to your response to Prompt 3, which asked you for your research question, and to Prompt 2, which asked you what you want from your project. • Module 7 (on making your project matter) and how it speaks to your project’s motivation and the implications of your results. • How to write for people who would have absolutely no clue what, exactly, the digital humanities is. • How terms common in the course thus far (e.g., paradigm, syntagm, model, distant reading, remediation, and intermediation) might be helpful when articulating your project. • When terms should be defined. Contextualizing the Thing Your abstract is due—on the class blog (attached as a Word document)—before class on Wednesday, May 20th. On May 27th, we’ll consider how to integrate your abstract into the presentation of your project. An abstract is nothing without what it’s abstracting. The abstract will be graded on the 4.0 scale, and it can be revised once. It’s part of your individual project grade. If you need help condensing, then let me know. Form for the Abstract Project Title Your Name, Your Major Tagline Three keywords Body of abstract (300 words, one paragraph) Examples View some sample abstracts (which do not necessarily follow the format and outcomes for this prompt, but are nevertheless good references). http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/assignment-3.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/assignment-2.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/01/module-7.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/04/userguide.pdf http://mappingthedigitalhumanities.org/wp-content/uploads/2009/04/userguide.pdf http://www.sccur.uci.edu/sampleabstracts.html From DJ Spooky’s Rhythm Science: As George Santayana said so long ago, “Those who cannot remember the past are condemned to repeat it.” That one’s scenario. But what happens when the memories filter through the machines we use to process culture and become software—a constantly updated, always turbulent terrain more powerful than the machine through which it runs? Memory, damnation, and repetition: That was then, this is now. We have machines to repeat history for us. . . . The circuitry of the machines is the constant in this picture; the software is the embodiment of infinite adaptability, an architecture of frozen music unthawed. Reflection, reflection, reflection. Instructors often like the word. I’m not sure it fits here, though. The purpose of this project assessment isn’t for you to ruminate on whether you’re good enough or smart enough. We know you are, and people like you. It’s for you to articulate what—over the course of the quarter—ultimately emerged from your project and what you think of it. The thing began as an idea. You then converted it into an agenda, with a model, compiling pieces of data, and ultimately animating those pieces. That said, I hope you collected something you’re happy with. The project goal was for you to think through “generative constraints” as strict as computation and data models to produce provocative questions, new knowledge, and reconfigurations of literature, culture, and history. After all, the hardware of history needn’t determine its interpretation, and the wiring of culture is never neutral. Infinite adaptability. With that adaptability in mind, please unpack this list, without, of course, the brazen assumption that your unpacking is final. The quarter just so happens to be over. (And I’m really sad about that.) • How—for better and for worse—does your animation project differ from an academic paper (especially one intended for print)? What does it ask of audiences and to what effects? • How does your project produce new knowledge and about what? • Considering the brevity of a quarter, how was your project a success? What did you learn from it? What will others? • How could you improve your project? What do you want to continuing learning from it? • How, if at all, do you plan on developing (or using) your project in the future? Do you plan to circulate it to others or make it public? Why or why not? Unless you are going for writing credit, I’ve decided to let you choose the medium or media here. You can make—or blend together—video, a website, audio, word docs, or what-have-you. Be creative. Just do me two favors: 1. With your assessment, include three outcomes upon which I should assess your project and your assessment of it. Those outcomes should include references to your method for collecting data, your awareness of your own bias/intent/procedures, your project’s design, and how your project produces knowledge (instead of just re-presenting known information). 2. Provide me with your final animation project. Upload it to the blog, provide a link, or the like. (See more below.) Outcomes By focusing on your project as a process, your project assessment should: Mapping the Digital Humanities Assignment 6, Page 2 __________________________ • Be composed for educated, non-expert audiences (e.g., academic types who might not be familiar with the digital humanities). • Demonstrate your understanding of the digital humanities as a field, using material from the class when appropriate. • Reference specific aspects of your project and draw upon it for evidence. • Exhibit critical approaches to your own project (e.g., show that you know how you did what you did, what worked, and how you could have done things differently). • If applicable, include a works cited page of texts quoted, paraphrased, or the like. Before and during the process, consider: • Returning to your responses to all prompts. How has your project—and your framing of it— changed since then? • Returning to the course syllabus and assessing what you’ve learned in the class since day one of the quarter. • Returning to the user’s guide for CHID 498. • Circulating a draft assessment to me and your peers. (Use the blog!) • How to write for people who would have absolutely no clue what, exactly, the digital humanities is. • Doing something that will keep you interested. It’s finals week, in spring, just before summer, y’all. This One Will Not Be Revised Your project assessment and final portfolio are due—on the class blog (filed under your name)—by the end of the day, Wednesday, June 10th. Here’s what (ideally) should be uploaded to your author page on the blog: • Mapping 1, • Thought Piece (First Draft and Revision, if applicable), • Needs Assessment (First Draft and Revision, if applicable), • Work Flow (First Draft and Revision, if applicable), • Mapping 2 • Data Model (First Draft and Revision, if applicable), • Abstract (First Draft and Revision, if applicable), • Animation (all versions, including the one presented on June 3rd), • Project Assessment, and • Anything else you think is relevant. As a reminder, here’s how your work in 498 will be graded: • Class participation (30% of the grade) • Blogging and collaborative mapping (20% of the grade) • HTML quiz (5% of the grade) • Final exhibition (5% of the grade) • Individual project (40% of the grade) These five components of the class will each be graded on a 4.0 scale and then, for your final grade, averaged according to the percentages I provide above. And here’s how the portfolio is graded: Mapping the Digital Humanities Assignment 6, Page 3 __________________________ • Thought piece (10% of portfolio, can be revised once after it’s graded), • Needs assessment (10% of portfolio, can be revised once after it’s graded), • Work flow (10% of portfolio, can be revised once after it’s graded), • Data model (15% of portfolio, can be revised once after it’s graded), • Abstract (15% of portfolio, can be revised once after it’s graded), and • Final prototype and assessment (40% of portfolio, cannot be revised after it’s graded). See me with questions! Have a rad summer break, people. It’s been a pleasure, and—to reiterate—make this last bit interesting. After all, CHID 498 was, from the get-go, an experiment.
work_a734rm3mxbf4pn5ghgxadmhjja ---- Preparing Non-English Texts for Computational Analysis Dombrowski, Q 2020 Preparing Non-English Texts for Computational Analysis. Modern Languages Open, 2020(1): 45 pp. 1–9. DOI: https://doi.org/10.3828/mlo. v0i0.294 ARTICLE – DIGITAL MODERN LANGUAGES Preparing Non-English Texts for Computational Analysis Quinn Dombrowski Stanford University, US qad@stanford.edu Most methods for computational text analysis involve doing things with “words”: counting them, looking at their distribution within a text, or seeing how they are juxtaposed with other words. While there’s nothing about these methods that limits their use to English, they tend to be developed with certain assumptions about how “words” work – among them, that words are separated by a space, and that words are minimally inflected (i.e. that there aren’t a lot of different forms of a word). English fits both of these assumptions, but many languages do not. This tutorial covers major challenges for doing computational text analysis caused by the grammar or writing systems of various languages, and ways to overcome these issues. Introduction Most methods for computational text analysis involve doing things with ‘words’: counting them, looking at their distribution within a text or seeing how they are juxtaposed with other words. While there’s nothing about these methods that limits their use to English, they tend to be developed with certain assumptions about how ‘words’ work – among them, that words are separated by a space, and that words are minimally inflected (i.e. that there aren’t a lot of different forms of a word). English fits both of these assumptions, but many languages do not. Depending on the text analysis method, a sufficiently large corpus (on the scale of mul- tiple millions of words) may sufficiently minimize issues caused by inflection, for instance at the level commonly found in Romance languages. But for many highly inflected Slavic and Finno-Ugric languages, Arabic, Quechua, as well as historical languages such as Latin and Sanskrit, repetitions of what you think of as a ‘word’ will be obscured to algorithms with no understanding of grammar, when that word appears in different forms, due to variation in the number, gender or case in which that word occurs. To make it possible for an algorithm to count those various word forms as the same ‘word’, you need to modify the text before run- ning the analysis. Likewise, if you’re working with Japanese or Chinese, which don’t typically separate words with spaces, you need to artificially insert spaces between ‘words’ before you can get any meaningful result. For example, ‘I went to Kansai International Airport’ is writ- ten in Japanese as 関西国際空港に行きました. The lack of spaces between words means that tools dependent on spaces to differentiate (and then count) words will treat this entire sentence as a single ‘word’. Segmentation – the process of adding spaces – is not always an obvious or straightforward process; on one hand, it’s easy to separate ‘to’ and ‘went’ from the https://doi.org/10.3828/mlo.v0i0.294 https://doi.org/10.3828/mlo.v0i0.294 mailto:qad@stanford.edu Dombrowski: Preparing Non-English Texts for Computational AnalysisArt. 45, page 2 of 9 name of the airport (関西国際空港 ‘Kansai International Airport’ に ‘to’ 行きました ‘went’), but depending on what sorts of questions you are attempting to answer with the analysis, you may want to further split the proper name to separate the words ‘international’ and ‘airport’, so that they can be identified as part of a search, or contribute to instances of those words in the corpus: 関西 ‘Kansai’ 国際 ‘international’ 空港 ‘airport’ に ‘to’ 行きました ‘went’. Goals This tutorial covers major challenges to doing computational text analysis caused by the grammar or writing systems of various languages, and offers ways to overcome these issues. This often involves using a programming language or tool to modify the text – for instance by artificially inserting spaces between every word for languages such as Chinese that aren’t regularly written that way, or replacing all nouns and verbs with their dictionary form in highly inflected languages such as Finnish. In both of these situations, the result is a text that is less easy to parse for a human reader. Removing inflection may have the effect of making it impossible to decipher the meaning of the text: if a language has relatively flexible word order, removing cases renders it impossible to differentiate subjects and objects (e.g. who loved whom). But for some forms of computational text analysis, the ‘meaning’ of any given sentence (as readers understand it) is less important; instead, the goal is to arrive at a different kind of understanding of a text using some form of word frequency analysis. By modifying a text so that its ‘words’ are more clearly distinguishable using the same conven- tions as found in English (spaces, minimal word inflection etc.), you can create a text deriva- tive that is specifically intended for computation and will lead to much more interpretable computational results than if you give the algorithm a form of the text intended for human readers. While this lesson provides pointers to code and tools for implementing changes to the text in order to adapt it for computation, the landscape of options is evolving quickly and you should not feel limited to those presented here. Audience Text analysis methods are most commonly used in research contexts, and frequently appear as part of ‘an introduction to digital humanities’ and similar courses and workshops. While these courses are taught worldwide, the example texts are, most often, in English, and the application of these text analysis methods may not be as straightforward for students work- ing in other languages. This tutorial is intended for instructors of such workshops, to help them be better informed about the challenges and needs of students working in other lan- guages and to provide them with pointers for how to troubleshoot issues that may arise. For instructors of modern languages, text analysis methods can also have a place in inter- mediate to advanced language courses (see Cro & Kearns). For instance, while many digital humanities researchers now use more nuanced methods than word clouds, they can still be employed in a language pedagogy context to provide a big-picture visualization of word frequency – starting with the generic and obvious (prepositions, articles, pronouns etc.) and becoming more and more related to the content of the text as students apply and refine a stopword list (a list of words that should be removed prior to doing the word counts and generating the visualization). Depending on the text, even a word cloud may make visible the impact of inflection, as it may contain multiple forms of a given ‘word’, which can spur discussion about what constitutes a ‘word’. Intuitively, we think of saber (‘to know’ in Spanish) as the ‘same word’ as sé ‘I know’, sabemos ‘we know’, sabía ‘knew’ and so on, but what do we gain and lose if we treat them as ‘different words’, the way a computer would by default? Dombrowski: Preparing Non-English Texts for Computational Analysis Art. 45, page 3 of 9 Text encoding Text encoding – or how the low-level information about how each letter/character is actually stored on a computer – is important when working with any text that involves characters beyond unaccented Latin letters, numerals and a small number of punctuation marks.1 It may be tempting to think languages that use the Latin alphabet are safe from a particular set of challenges faced by other writing systems when it comes to computational text analysis. In reality though, many writing systems that use the Latin alphabet include at least a few letters with diacritics (e.g. é, ñ, or ż), and these letters cause the same issues as a non-Latin alphabet, albeit on a smaller scale. While a text in French, Spanish or Polish may be decipherable even if all of these characters are mangled (e.g. ma□ana for mañana is unlikely to cause confusion, and even a less obvious case such as a□os for años is often distinguishable by context), issues with text encoding may cause bigger problems later in your analysis – including causing code to not run at all. For languages with a non-Latin alphabet, text encoding problems will render a text completely unreadable and must be resolved before doing anything at all with the text. Unicode (UTF-8) encoding is the best option when working with text in any language, but particularly non-English languages. What is Unicode? Unicode is the name of a computing industry standard for encoding and displaying text in all writing systems of the world. While there are scripts that are not yet part of Unicode as of 2020 (including Demotic and some Egyptian hieroglyphs), researchers affiliated with the Unicode consortium have done a tremendous amount of work starting in the late 1980s to differentiate characters (graphemes, the smallest units of a writing system) versus glyphs (var- iant renderings of a character, which look a little different but have the same meaning) for the world’s writing systems, and assign unique code points to each character. With some writing systems – including Chinese and various medieval scripts – the decision of what constitutes a character as opposed to a glyph is at times controversial. Scholars who disagree with previous decisions or who feel that they have identified a character that is not represented in Unicode, can put forward proposals for additions to the standard. While the Unicode consortium that shapes the development of the standard is primarily made up of large tech companies, schol- ars and researchers play a significant role in shaping decision-making at the language level (Anderson). Why is Unicode important? Before Unicode was widely adopted, there were many other standards that developed and were deployed in language-specific contexts. Windows-1251 is an encoding system that was widely used for Cyrillic and is still used on 11% of websites with .ru (Russian) domain names (W3Techs). A competing, but less common, Cyrillic encoding for Russian was KOI8-R, and a similar one, KOI8-U, was used for Ukrainian. For Japanese, you may still encounter websites using Shift JIS encoding. For Chinese, you can find two major families of encoding standards prior to Unicode, Guobiao and Big5. A major advantage of Unicode, compared to these other encoding standards, is that it makes it possible to seamlessly read text in multiple languages and alphabets. Previously, if you had a bilingual parallel edition of a text on a single webpage with languages that used two different writing systems, you would have to toggle between 1 Note that ‘encoding’ here refers to the comparatively low-level technical process of standardizing which bits represent which letters in various alphabets. This is a different use of the term than the ‘encoding’ in the Text Encoding Initiative (TEI), https://tei-c.org, which captures structural and/or semantic features of text in a poten- tially machine-readable way. https://tei-c.org Dombrowski: Preparing Non-English Texts for Computational AnalysisArt. 45, page 4 of 9 multiple text encodings – reducing one side of the text, then the other, to gibberish as you switched between them. If you work in a language with a non-Latin alphabet, odds are good that you’ll encounter text that doesn’t use Unicode encoding at some point in your work. Long-running digital text archives, in particular, are likely candidates for not having migrated to Unicode. If you try to open a text file using the wrong kind of encoding, you won’t see text in the alpha- bet you’re expecting to see, but rather a kind of gibberish that will soon become familiar. (For instance, Windows-1251 Cyrillic looks like Latin characters with diacritics: “Äîñòîåâñêèé Ôåäîð Ìèõàéëîâè÷. Ïðåñòóïëåíèå è íàêàçàíèå” for “Достоевский Федор Михайлович. Преступление и наказание” – Dostoevsky Fyodor Mikhailovich. Crime and Punishment.) Making sure your text uses Unicode encoding Most computational text analysis tools and code assume that the input text(s) use UTF-8 (Unicode) encoding. If the input text is not in UTF-8, you may get an error message, or the tool may provide an ‘analysis’ of the unreadable gibberish (Figure 1). It is not obvious what encoding a text file uses: that information isn’t included in the file properties available on Windows or Mac. There isn’t even an easy way to write Python code to reliably detect a file’s encoding. However, most plain text editors have some way to open a text file using various encodings until you find one that renders the text readable, as well as some way to save a text file with UTF-8 encoding. A plain text editor is software that natively reads and writes .txt files, without adding in its own additional formatting (which Notepad does in Windows). Atom is a cross-platform (Windows/Mac/Linux) plain text editor that you can install if you don’t already have a preferred editor.3 There are numerous packages (add-ons) for Atom that provide additional functionality. One of these is called convert-file-encoding.4 Download and install this add-on following the instructions in the Atom documentation.5 Once you’ve installed the convert-file-encoding package, open your text file in Atom. By default, Atom tries to open everything as UTF-8. If everything displays correctly, your file already uses Unicode encoding. If the text is gibberish, go to Edit > Select encoding, and 2 Voyant Tools, https://voyant-tools.org/. 3 Atom is available for download at https://atom.io/. 4 The convert-file-encoding package is available at https://atom.io/packages/convert-file-encoding. 5 Atom documentation is available at https://flight-manual.atom.io/using-atom/sections/atom-packages/. Figure 1: Voyant ‘analysis’ of Windows-1251 encoded Russian text.2 https://voyant-tools.org/ https://atom.io/ https://atom.io/packages/convert-file-encoding https://flight-manual.atom.io/using-atom/sections/atom-packages/ Dombrowski: Preparing Non-English Texts for Computational Analysis Art. 45, page 5 of 9 choose a possible candidate encoding. The encodings are listed in Atom by what languages they cover, so you can try different options for your language if you’re not sure. Once your text appears normally, go to Packages > Convert to encoding and select UTF-8. Then save your file. Segmentation For Chinese and Japanese text, you need to segment your text, or artificially insert spaces between ‘words’, before you can use it for computational text analysis. For Chinese, some scholars treat every character as a ‘word’. This destroys compounds but is more predictable than using a segmenter. For both Chinese and Japanese, segmenters work best when the text does not contain a lot of jargon or highly specialized vocabulary, or non-standard orthogra- phy (e.g. Japanese children’s writing, which often uses the hiragana syllabary where a fully literate adult would use kanji). Stanford NLP (natural language processing) provides a Chinese segmenter6 with algorithms based on two different segmentation standards.7 For Japanese, segmentation is available through the mecab software.8 Rakuten MA is a Javascript-based segmenter that supports Chinese and Japanese.9 There is also a Python implementation, Rakuten MA Python.10 If you have trouble with mecab but aren’t comfortable writing Python code yourself, there’s a Jupyter Notebook available for segmenting Japanese.11 See the Programming Historian tutorial ‘Introduction to Jupyter Notebooks’ (Dombrowski et al.) for a description of Jupyter Notebooks and how to use them. Stopwords Stopwords are words that are filtered out as the first step of text analysis. Many tools have a configuration option where you can define which words should be treated as stopwords.12 Stopword removal is essential for some methods (including word clouds and topic model- ling), to avoid having your results flooded with articles, copulas, prepositions and the like. Other methods, such as word vectors (which analyse words in their context as a way to explore semantic relationships within large corpora), rely on stopwords for important information about the semantic value of words, and stopwords should be retained in the text. Stopwords are language specific, and more nuanced use of stopwords can involve text- specific lists that also exclude things like character names (which are likely to occur with high frequency, but that frequency may or may not be meaningful depending on your research question). If you’re using a tool that supports the use of stopword lists, you should check to make sure that a default, almost certainly English, stopword list isn’t being applied to your non-English text. Some tools provide reasonable built-in stopword lists for multiple languages. Voyant offers generally reasonable lists for thirty-four languages, along with a combined ‘multilingual’ set- ting, and an option for defining your own list. These lists are not identical: the Russian list 6 The Stanford NLP segmenter can be downloaded at https://nlp.stanford.edu/software/segmenter.shtml. 7 This Chinese part-of-speech tagger tutorial begins with a step-by-step guide to segmenting with the Stanford NLP segmenter: https://github.com/quinnanya/dlcl204/blob/master/chinese/pos_chinese.md. 8 Mecab can be downloaded at https://taku910.github.io/mecab/. 9 Rakuten MA is available at https://github.com/rakuten-nlp/rakutenma. 10 Raktuen MA Python is available at https://github.com/ikegami-yukino/rakutenma-python. 11 The Jupyter notebook for running Rakuten MA Python is available at https://github.com/quinnanya/japanese- segmenter. 12 See the settings for the Topic Modeling Tool (https://senderle.github.io/topic-modeling-tool/documenta- tion/2018/09/27/optional-settings.html) or general purpose text exploration environment Voyant (https:// voyant-tools.org/docs/#!/guide/stopwords). https://nlp.stanford.edu/software/segmenter.shtml https://github.com/quinnanya/dlcl204/blob/master/chinese/pos_chinese.md https://taku910.github.io/mecab/ https://github.com/rakuten-nlp/rakutenma https://github.com/ikegami-yukino/rakutenma-python https://github.com/quinnanya/japanese-segmenter https://github.com/quinnanya/japanese-segmenter https://senderle.github.io/topic-modeling-tool/documentation/2018/09/27/optional-settings.html https://senderle.github.io/topic-modeling-tool/documentation/2018/09/27/optional-settings.html https://voyant-tools.org/docs/#!/guide/stopwords https://voyant-tools.org/docs/#!/guide/stopwords Dombrowski: Preparing Non-English Texts for Computational AnalysisArt. 45, page 6 of 9 includes the words for many numbers (including пятьдесят ‘fifty’), the Spanish list has no numbers but does include various forms of emplear ‘use’, and the Czech list includes no num- bers whatsoever but does have a number of words related to news (e.g. články ‘articles’), hint- ing at the domain and context of its origins. Is it the right thing to do to eliminate written-out numbers from a Russian text, or any references to ‘articles’ in a Czech text? It all depends on what you’re trying to learn from the text analysis. Students should examine – and, if neces- sary, modify liberally – any stopword list before applying it to their text. If you’re a digital humanities instructor, be careful about uncritically recommending stopword lists for lan- guages you can’t read yourself. As an initial vetting step, at least run any list you find through Google Translate first, and read through it. There are many resources online that aggregate stopword lists for any number of languages, without considering that many of those lists were developed for very particular use cases, and might, for instance, remove all words about computers, along with the more-expected prepositions. Your stopword list should be influenced by other changes you make to your text. In gen- eral, stopword lists are all lower case, due to the lower-casing that is typically part of the text analysis process. If you lemmatize your text (as described below), you won’t need to include every possible form of pronouns: just the lemma. If you don’t plan to lemmatize your text before the stopword list is applied, you’ll need to work through every number, gender and/or case of undesired pronouns, adjectives, verbs and so forth, to ensure they are all excluded. Remember, these methods are matching, character-for-character, what you put on the list, and including the dictionary form of a word does not by extension include all conjugations, declensions or other variant forms. Lower-casing Capital letters and lower-case letters, in bicameral writing systems (those that have the con- cept of capitalization, unlike Japanese, Hebrew, Georgian or Korean), are different characters from the point of view of text analysis algorithms. Dad, dad and Sad are all treated as separate words, where the latter two are both parsed as having a different first letter from the first. To address this issue, texts are commonly ‘lower-cased’, or converted to all lower-case characters, before they are further processed with stopword removal or used for analysis. Most text analy- sis tools (e.g. with graphical user interfaces, like Voyant and the Topic Modeling Tool) handle this automatically, even for non-Latin alphabets. If you’re writing analysis code yourself, don’t forget this step. Punctuation removal What we easily recognize as punctuation is just another character from the point of view of most algorithms. This leads to problems when the following are all treated as different ‘words’: • cats • “cats • “cats, • (cats) • cats! • cats!! • cats?! • cats. Some tools automatically remove punctuation as part of pre-processing, some tools include punctuation on the stopwords list and others require you to remove it from the text yourself. Dombrowski: Preparing Non-English Texts for Computational Analysis Art. 45, page 7 of 9 For tools that remove punctuation automatically, you should check to make sure that all the punctuation present in your language is being removed successfully. Punctuation removal may be based on English, so punctuation not found in English (such as « » or 「 」, the Russian and Japanese quotation marks, respectively) may not be included. Running the text through a tokenizer algorithm (such as the one provided by the Stanford NLP library for Python, which currently supports fifty-three languages) can also separate punctuation from text, but may make other changes you haven’t anticipated. For instance, in English, a contrac- tion like ‘she’s’ gets split into two ‘words’, she and ’s, which is a reasonable choice reflecting the word’s origins, but can lead to initial confusion when you discover the ‘word’ ’s in the results of your analysis. Lemmatizing If you’re working with a highly inflected language (i.e. if your language has multiple gram- matical cases, or a complex verbal system where different persons and numbers have dif- ferent forms), you may need to lemmatize your text to get meaningful results from any text analysis method. Lemmatization attempts to convert the word forms actually found in a text into their dictionary form. For languages with less inflection (including Romance languages), many scholars don’t feel the need to lemmatize because some methods, such as topic mod- elling, end up successfully clustering together different forms of a word, even given a small amount of variation. It could be a worthwhile activity with students to compare text analysis results with and without lemmatization for these languages. A lot of work goes into developing NLP code for lemmatizing text, and not all lemmatizers perform equally well on all kinds of text: the informal language of tweets and the formal lan- guage of newspapers are different, to say nothing of literary and historical language. English is by far the best-resourced language, given the longstanding academic and commercial inter- est in improving NLP tools for at least modern English. Many languages lack effective lem- matizers, or any lemmatizers at all. If there’s no lemmatizer for the language that you want to work with, another possibility is to look for a stemmer. Stemmers are a shortcut to the same fundamental goal as lemmatizers: reducing variation within a text, in order to more effec- tively group similar words. Rather than replacing the word forms in a text with the proper dictionary form, a stemmer looks for patterns of letters to chop off at the beginning and/or end of words, to get to something similar to (but often distinct from) the root of the word. Stemmers don’t effectively handle suppletive word forms (e.g. ‘children’ as a plural of ‘child’), or other word forms that diverge from the usual grammatical ‘rules’, but they may work well enough to reduce overall variation in the word forms present in a text, if no lemmatizer is available. The truncated forms produced by a stemmer may, however, be harder to recognize and connect back to the original form when you’re looking at the results of your analysis. The current state-of-the-art (whatever state that may be) for lemmatizing most languages is usually not available through an easy-to-use tool: you should expect to use the command line and/or write code. As a few illustrative examples: • For Russian, Yandex (the major Russian search engine) has released software called MyStem for lemmatizing Russian.13 A wrapper is available that makes this code usable in Python, PyMyStem.14 13 MyStem is available at https://yandex.ru/dev/mystem/. 14 PyMyStem is available at https://github.com/nlpub/pymystem3. https://yandex.ru/dev/mystem/ https://github.com/nlpub/pymystem3 Dombrowski: Preparing Non-English Texts for Computational AnalysisArt. 45, page 8 of 9 • For Basque, eustagger-lite (Ezeiza, N. et al.) processes text using the following steps: tokenization, segmentation, identifying grammatical part-of-speech, treatment of mul- tiword expressions and morphosyntactic disambiguation.15 • While the concept of lemmatization doesn’t quite carry over to Korean grammar, the KoNLPy package can be used for some kinds of potentially helpful text pre-processing (Kim).16 • The Classical Languages Toolkit (cltk.org) provides lemmatization for Latin, Greek and Old French, with other languages under development.17 • Lemmatization isn’t enough for agglutinative languages such as Turkish, where very long words can be constructed by stringing together morphemes. The resulting com- plex words (e.g. Çekoslovakyalılaştıramadıklarımızdanmışsınız, ‘you are reportedly one of those that we could not make Czechoslovakian’) are rare, and therefore not ideal to use for word counts, but may consist of morphemes that are repeated with a frequency in the text that more closely resembles other languages’ concept of a ‘word’. Byte-pair encod- ing (Mao) is one algorithm that has been used as a reasonably effective shortcut to ‘sub- word encoding’ (similar to lemmatization, but for linguistic components smaller than a word, such as Turkish morphemes) without requiring tokenization or morphological analysis. Scholars have also worked on more nuanced, linguistically motivated segmenta- tion using supervised morphological analysis as a way of addressing the challenges posed by agglutinative languages (Ataman et al.). • Lemmatization isn’t applicable to Chinese.18 Conclusion Text preparation is essential for computational text analysis but how, specifically, you need to modify the text – and how best to go about doing that – will vary based on the research question, the method and the language. To even begin making sense of the output of com- putational text analysis, it is important to understand how the input text was processed, and to take precautions to ensure that default settings derived from English were not applied to languages with very different grammar or orthography. Fortunately, there is a growing community of scholars working on computational text anal- ysis, and other digital humanities methods, as applied to languages other than English. For scholars working with digital humanities methods, a community has begun to form around the mailing list and resources posted on the Multilingual DH website (https://www.multilin- gualdh.org), which is applying to become a special interest group of the Alliance of Digital Humanities Organizations. These resources, and their applications to digital humanities research as well as language pedagogy, continue to be refined, and self-identified ‘newcom- ers’ are welcome and encouraged to join the conversation. Author Information Quinn Dombrowski supports digitally-facilitated research in the Division of Literatures, Cultures & Languages at Stanford University in the USA. In addition to working on digital humanities projects for a wide variety of non-English languages, Quinn serves on the Global 15 Eustagger-lite is available at http://ixa2.si.ehu.es/eustagger/. 16 KoNLPy is available at http://konlpy.org/en/latest/, along with a tutorial for how to use it for text pre-process- ing at https://lovit.github.io/nlp/2019/01/22/trained_kor_lemmatizer/. 17 The Classical Languages Toolkit is available at http://cltk.org/. 18 At the same time, see this discussion about attempts to decompose characters into radicals as if the radicals were lemmas: https://www.quora.com/Does-the-Chinese-language-have-concepts-of-lemmatization-and-stemming- just-as-English-has. http://cltk.org https://www.multilingualdh.org https://www.multilingualdh.org http://ixa2.si.ehu.es/eustagger/ http://konlpy.org/en/latest/ https://lovit.github.io/nlp/2019/01/22/trained_kor_lemmatizer/ http://cltk.org/ https://www.quora.com/Does-the-Chinese-language-have-concepts-of-lemmatization-and-stemming-just-as-English-has https://www.quora.com/Does-the-Chinese-language-have-concepts-of-lemmatization-and-stemming-just-as-English-has Dombrowski: Preparing Non-English Texts for Computational Analysis Art. 45, page 9 of 9 Outlook::DH executive board and leads Stanford’s Textile Makerspace. Quinn’s publications include “What Ever Happened to Project Bamboo?” about the failure of a digital humanities cyberinfrastructure initiative, “Drupal for Humanists”, and “Crescat Graffiti, Vita Excolatur: Confessions of the University of Chicago” about library graffiti. References Anderson, Deborah. The Script Encoding Initiative, the Unicode Consortium, and the Character Encoding Process. Signa nr. 6 April 2004. https://www.signographie.de/cms/upload/pdf/ SIGNA_Anderson_SEI_1.0.pdf. Accessed 30 January 2020. Ataman, Duygu, Matteo Negri, Marco Turchi and Marcello Federico. ‘Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English’. Prague Bulletin of Mathematical Linguistics, vol. 108, no. 1, 2017, pp. 331–42. DOI: https://doi. org/10.1515/pralin-2017-0031 Cro, Melinda A. and Sarah K. Kearns. ‘Developing a Process-Oriented, Inclusive Pedagogy: At the Intersection of Digital Humanities, Second Language Acquisition, and New Litera- cies’. Digital Humanities Quarterly, vol. 14, no. 1, 2020. http://www.digitalhumanities. org/dhq/vol/14/1/000443/000443.html. Accessed 30 April 2020. DOI: https://doi. org/10.46430/phen0087 Dombrowski, Quinn, Tassie Gniady and David Kloster. Introduction to Jupyter Notebooks. The Programming Historian. 12 December 2019. https://programminghistorian.org/en/les- sons/jupyter-notebooks. Accessed 30 January 2020. Ezeiza, Nerea, Iñaki Alegria, Jose Maria Arriola, Ruben Urizar and Itziar Aduriz. ‘Combining Stochastic and Rule-Based Methods for Disambiguation in Agglutinative Languages’. Pro- ceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, 1998, pp. 380–4. DOI: https://doi.org/10.3115/980845.980909 Kim, Hyunjoong. 말뭉치를 이용한 한국어 용언 분석기 (Korean Lemmatizer), 22 January 2019. https://lovit.github.io/nlp/2019/01/22/trained_kor_lemmatizer/. Accessed 30 January 2020. Mao, Lei. ‘Byte Pair Encoding’. Lei Mao’s Log Book, 2019. https://leimao.github.io/blog/Byte- Pair-Encoding/. Accessed 30 January 2020. W3Techs. Distribution of character encodings among websites that use .ru. Updated 30 January 2020. https://w3techs.com/technologies/segmentation/tld-ru-/character_ encoding. Accessed 30 January 2020. How to cite this article: Dombrowski, Q 2020 Preparing Non-English Texts for Computational Analysis. Modern Languages Open, 2020(1): 45 pp. 1–9. DOI: https://doi.org/10.3828/mlo.v0i0.294 Published: 28 August 2020 Copyright: © 2020 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. OPEN ACCESS Modern Languages Open is a peer-reviewed open access journal published by Liverpool University Press. https://www.signographie.de/cms/upload/pdf/SIGNA_Anderson_SEI_1.0.pdf https://www.signographie.de/cms/upload/pdf/SIGNA_Anderson_SEI_1.0.pdf Introduction Goals Audience Text encoding What is Unicode? Why is Unicode important? Making sure your text uses Unicode encoding Segmentation Stopwords Lower-casing Punctuation removal Lemmatizing Conclusion Author Information References Figure 1
work_aeyo6w7szva7lgp53sg2u7jkma ---- Ergonomic Design of a Main Control Room of Radioactive Waste Facility Using Digital Human Simulation Ergonomic Design of a Main Control Room of Radioactive Waste Facility Using Digital Human Simulation Baekhee Lee1, Yoon Chang2, Kihyo Jung3, Ilho Jung4, and Heecheon You1 1Division of Mechanical and Industrial Engineering, Pohang University of Science and Technology, Pohang, South Korea 2Department of Production System, LG Electronics, Pyeongtaek, South Korea 3School of Industrial Engineering, University of Ulsan, Ulsan, South Korea 4Department of Nuclear, Power & Energy Plant Division, Hyundai Engineering, Seoul, South Korea The present study evaluated a preliminary main control room (MCR) design of radioactive waste facility using the JACK® digital human simulation system. Four digital humanoids (5th, 50th, 95th, and 99th percentiles) were used in the ergonomic evaluation. The first three were selected to represent 90% of the target population (Korean males aged 20 to 50 years) and the last to reflect the secular trend of stature for next 20 years in South Korea. The preliminary MCR design was assessed by checking its compliance to ergonomic guidelines specified in NUREG-0700 and conducting an in-depth ergonomic analysis with a digital prototype of the MCR design and the digital humanoids in terms of postural comfort, reachability, visibility, and clearance. For identified design problems, proper design changes and their validities were examined using the JACK. A revised MCR design suggested in the present study would contribute to effective and safe operations of the MCR as well as operators’ health in the workplace. INTRODUCTION A radioactive waste facility (RWF) is a facility for managing radioactive waste which is usually by-product of nuclear power generation and other applications of nuclear fission or nuclear technology, such as research and medicine. Most Radioactive waste has been charged in a temporary facility in nuclear power plants (NPP) in South Korea, so the Korean government has planned to establish an RWF by the year 2012 in Gyeongju considering the radioactive waste saturation of the temporary facility projected in the future (KRMC, 2009). A main control room (MCR) of the RWF needs to be considered with ergonomic aspects at the initial design stage for effective monitoring of operators and reduction of development cost. Hwang et al. (2009) analyzed three usability issues (operating interface of the display and controls in the MCR, usability of procedures, and layout of the MCR) through ergonomic evaluation of the MCR. Ku et al. (2007) evaluated and analyzed the MCR of the NPPs (unit-1, 2, 3, and 4 of the Kori NPP, unit-1, 2 of the Yeonggwang NPP) applying with ergonomic evaluation checklist as part of the periodic safety review (PSR). The evaluation of developed MCR is effective for analyzing design improvements, but on the other hand, the development of an improved MCR needs considerable time and cost. Therefore, the ergonomic evaluation at the initial design stage is needed for effective MCR design and development. Digital human simulations (DHS) using humanoids have been used for ergonomic design of the workplace. Lee et al. (2005) and Park et al. (2008) carried out ergonomic evaluations using DHS and analyzed design improvements of the overhead crane and the helicopter cockpit respectively (Figure 1). Ergonomic design and evaluation using virtual mockups in DHS at initial design stage have been recommended as a useful method for effective retrenchment of development period and cost (Chaffin, 2005; You, 2007). (a) Overhead crane (b) Helicopter cockpit Figure 1. Ergonomic evaluation using digital human simulation The present study evaluated preliminary designs of the MCR of the RWF and analyzed design improvements. 3D virtual mockups of the MCR of the RWF were developed for use of DHS. We used JACK® for DHS and generated four representative human models (5th, 50th, 95th, and 99th percentiles) considered with the anthropometric data of Size Korea (2004) and secular trend of stature over the next 20 years. In this study, the preliminary designs of the MCR of the RWF were evaluated applying with 4 ergonomic aspects (postural comfort, reachability, visibility, and clearance) and were analyzed to determine design components and improvement direction. METHODS Representative Human Models Four representative human models considered with accommodation percentage of 90% (5th ~ 95th percentiles) for the target population and secular trend of stature over the next 20 years were generated for ergonomic evaluation using DHS. The target population consisted of male aged 20 to 50 was determined considering workforce planning in the MCR of the RWF. Three representative human models (5th, 50th, and 95th Co py rig ht 2 01 2 by H um an F ac to rs a nd E rg on om ic s S oc ie ty , I nc . A ll rig ht s re se rv ed . D O I 1 0. 11 77 /1 07 11 81 31 25 61 27 9 PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 56th ANNUAL MEETING - 2012 1912 percentiles) which can accommodate Size Korea (2004)’s anthropometric data (n = 1,992) of 90% were generated. Additionally, a representative human model having 99th percentile (same as 95th percentile over the next 20 years) which reflected three characteristics (domestic stature growth of male, international stature growth of male, and conservative estimation of stature) for consideration of secular trend. In the last 25 years (from 1979 to 2004 year), stature of Korean male has grown as 4.4 cm (Figure 2; Size Korea, 2004). On the other hand, secular growth has been reported that there are differences among nations according to economic conditions and nutrition (Roche, 1995). For example, secular growth of Korea (GNP: $ 9,287) was 1.65 cm in the last 10 years, while secular growth of Japan (GNP: $ 42.657; 4.5 times bigger than Korea’s) was 1.32 cm. Finally, we applied a conservative secular growth to meet the utmost target accommodation percentage (90%) of the MCR of the RWF in the next 20 years, based on domestic and international stature growth. Figure 2. Stature growth trend of male aged 20 ~ 50 (Size Korea, 2004) Humanoids in the JACK were generated through input of body sizes of generated representative human models as shown in Figure 3. In the JACK, input of 27 body sizes is needed to generate a humanoid; however Size Korea (2004) provides only 24 body sizes of them. So the present study applied with 24 body sizes provided by Size Korea (2004), the other 3 body sizes (hand breadth, head length, and thumb-tip reach) were estimated using JACK’s regression equations based on 24 body sizes. 5th %ile (160.5 cm) 50th %ile (170.2 cm) 95th %ile (180.2 cm) 99th %ile (184.4 cm) Figure 3. Generated representative human models Reference Posture for Evaluation The present study established an operators’ monitoring posture referring to existing studies related to computer workstation postures for DHS evaluation as shown in Figure 4. The existing studies observed and analyzed reference postures at computer workstation (ANSI/HFES, 2007; Chaffin and Andersson, 1984; Grandjean et al., 1983; Salvendy, 1987). In this study, the reference posture for evaluation as shown in Figure 4 was chosen considering the operator’s posture, similar to postures at a computer workstation, for monitoring tasks in the MCR of the RWF. For example, the degree of shoulder abduction was determined as 13°, which is a median degree provided by Chaffin and Andersson (1984)’s recommended range (0 ~ 25°). 90º 95º 13º 80º10º 35º 13º (a) Side view (b) Front view Figure 4. Reference posture of operators in the MCR Ergonomic Evaluation Criteria The present study established a relationship matrix between four ergonomic evaluation criteria and seven design components in the MCR (Table 1). Ergonomic evaluation criteria were determined as postural comfort, reachability, visibility, and clearance which were used in the existing DHS studies (Bowman, 2001; Nelson, 2001; Park et al., 2008). Selected ergonomic evaluation criteria were selectively applied with target design components. For example, Table 1 shows that console being seated by the operator was evaluated using postural comfort and clearance, and large display panel (LDP) providing information about the RWF was analyzed using postural comfort and visibility. The design components of the MCR of the RWF were evaluated using NUREG-0700 design guideline. NUREG- 0700 design guideline (O’Hara et al., 2002) provides ergonomic design parameters of each design component in the NPP. For example, according to NUREG-0700, the console’s clearance should provide adequate height, depth, and knee clearance for the 5th to 95th percentile adults, LDP’s visibility should permit operators at the consoles a full view of all display panels, and LCD’s vertical viewing angle of visibility should not be more than 20° above and 40° below the operator’s horizontal line of sight. PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 56th ANNUAL MEETING - 2012 1913 Table 1. Relationship matrix between ergonomic evaluation criteria and design components (O: related, X: not related) No. Design component Postural comfort Reach- ability Visib- ility Clear- ance 1 Console O X O O 2 Large display panel (LDP) O X O X 3 LCD O X O X 4 Security access control sub- console O O X X 5 CCTV master control rack O O X X 6 Main fire control panel O O X X 7 Printers O O X X RESULTS In this study, we show ergonomic evaluation results of three major design components (console, LDP, and LCD) of the MCR of the RWF. Console’s minimum clearance which was analyzed as 1.6 ~ 6 cm for 4 humanoids was adequately evaluated in terms of the NUREG-0700. Minimum clearance was calculated as the least distance between operator’s leg and console. The more body sizes of humanoid increase, the more clearance of console decrease. For example, Figure 5 shows that 95th and 99th percentile’s minimum clearance were 3.5 cm and 1.6 cm respectively. (a) 95th percentile (b) 99th percentile Figure 5. Clearance of the console for operator’s upper leg LCD’s vertical gaze range (VGR) was analyzed satisfying with the NUREG-0700 design guideline. LCD’s VGR was calculated as humanoid’s vertical viewing angle when the humanoid at the reference posture monitored the top and bottom of LCD. For example as shown in Figure 6, 5th and 95th percentile’s LCD’s VGR (5th percentile: - 29 ~ 1°, 95th percentile: - 34 ~ - 4°) were analyzed satisfying with - 40 ~ 20° recommended by the NUREG-0700 design guidelines. LDP’s VGR could cause postural discomfort when operators monitor for a long time because of the higher than horizontal line (0°). LDP’s VGR was calculated as humanoid at the reference posture monitored the top and bottom of LDP over LCD having 125 cm. For example as shown in Figure 7, 5th percentile’s LDP’s VGR (2 ~ 23°) was adequately evaluated because of being formed over the top of LCD (Figure 7.a). (a) 5th percentile (b) 50th percentile (c) 95th percentile (d) 99th percentile Figure 6. Vertical gaze analysis: LCD (a) 5th percentile (b) 50th percentile (c) 95th percentile (d) 99th percentile Figure 7. Vertical gaze analysis: LDP (125 cm) LDP’s VGR of all humanoids (- 1 ~ 23°) met the NUREG-0700 design guideline that LDP should permit operators at the consoles full view (Figure 7). However, the current design of LDP which was formed over horizontal line (0°) could cause fatigue and postural discomfort during the long monitoring task according to existing studies with regard to recommended display gaze range (- 26 ~ - 2°, Grandjean et al., 1983; - 56 ~ - 1°, Kim et al., 1991; - 40 ~ 20°, O’Hara et al., 2002). To improve LDP’s VGR through decrease of LDP’s height, it was analyzed that LCD’ height should be decreased along with LDP’s height. It was found that LDP’s VGR could improve through reduction of LDP’s height, however interference between LDP’s and LCD’s VGR could appear as shown in Figure 8. To solve this interference effectively, we designed a groove located into console as shown in Figure 9. In case LDP’s height became 115 cm through the LCD installation groove, having height of 10 cm, LDP’s VGR was improved as - 3 ~ 19° (Figure 10). As a result, improved LDP’s VGR in this study became lower than the existing LDP’s VGR (- 1 ~ 23°). For example, LDP’s VGR of 5th percentile was improved from 2 ~ 23° to 0 ~ 19°. Meanwhile LCD’s VGR (- 31 ~ 2.5°) was satisfied with the NUREG-0700 design guideline (- 40 ~ 20°) at the improved design. PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 56th ANNUAL MEETING - 2012 1914 Figure 8. Vertical gaze interference between LCD and LDP (a) LCD installation groove (b) Installed LCD Figure 9. Installation groove of LCD in console (a) 5th percentile (b) 50th percentile (c) 95th percentile (d) 99th percentile Figure 10. Vertical gaze analysis: improved LDP (115 cm) LDP’s horizontal gaze range (HGR) was analyzed satisfying with the NUREG-0700 design guideline that operator’s HGR should be within 30° from center of LDP. The MCR of the RWF has planned to be managed by an operator (operation of 7 consoles left) and a supervisor (operation of 3 consoles right) as shown in Figure 11. LDP’s HGR was calculated as a horizontal gaze interval when both the operator and the supervisor monitored LDP’s left and right points from center of LDP. The operator’s and the supervisor’s HGR were analyzed as 12 ~ 27° and 14 ~ 26° respectively, according to assigned console position. (a) Operator (b) Supervisor Figure 11. Horizontal gaze analysis: LDP DISCUSSION The present study analyzed the preliminary design of the MCR of the RWF through ergonomic evaluation considered with the NUREG-0700 design guideline in digital environment using the JACK. The evaluation of the MCR of the RWF was conducted considering four ergonomic aspects (postural comfort, reachability, visibility, and clearance), NPP design guidelines provided by the NUREG-0700, and references related to ergonomic computer workstation design. With regard to the design components that need to be improved through digital human simulation, ergonomic solutions were developed and evaluated to analyze improvement effects. The improved preliminary design in this study can contribute to the MCR design of the RWF in the future. The present study applied to representative human models to make humanoids in JACK considering Korean anthropometric characteristics and secular trend of stature. Three representative human models were generated considered with demographic characteristics of the operator in the MCR of the RWF to accommodate 90% (5th ~ 95th percentiles) of male aged 20 to 50 of Size Korea (2004). Additionally, one representative human model having 99th percentile for the next 20 years was generated to reflect secular trend of operator’s stature based on Korean stature from the years 1979 to 2004. The present study used estimated body sizes in terms of three anthropometric variables (hand breadth, head length, and thumb-tip reach) provided by the JACK, however these variables were highly correlated with other variables. Meanwhile the JACK generates a humanoid through input of 27 body sizes; body sizes not inputted were automatically estimated. The present study conducted post hoc analysis through stepwise regression analysis (pin = 0.05, pout = 0.1) in terms of the missing 3 anthropometric variables and other 24 anthropometric variables using US Army anthropometric data (Gordon et al., 1988). As a result, regression equations of the missing 3 anthropometric variables had a high adjusted coefficient of multiple determinations (adj. R2 = 52%, hand breadth; 83%, head length; 84%, thumb-tip reach). The present study established the reference posture for evaluation based on computer workstation posture provided by the existing studies. However, the reference posture at the MCR in this study could be different with recommended postures at a computer workstation (only for one display) because more than two displays (LDP and LCD) have been installed in the MCR. Therefore, consideration of monitoring PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 56th ANNUAL MEETING - 2012 1915 tasks for LDP and LCD could be needed for more appropriate evaluation of the MCR of the RWF. ACKNOWLEDGES This research was supported by Korea Power Engineering Company (KOPEC). REFERENCES ANSI/HFES (2007). Human Factors Engineering of Computer Workstations. California, USA: Human Factors and Ergonomics Society. Arcaleni, E. (2006). Secular trend and regional differences in the stature of Italians, 1854-1980. Economics and Human Biology, 4, 24-38. Bielicki, A. and Szklarska, A. (1999). Secular trends in stature in Poland: national and social class-specific. Annals of Human Biology, 26(3), 251-258. Bowman D. (2001). Using digital human modeling in a virtual heavy vehicle development environment. In Chaffin, D. B. (Ed.), Digital Human Modeling for Vehicle and Workplace Design. Warrendale, PA: SAE International. Chaffin, D. B. (2005). Improving digital human modeling for proactive ergonomics in design. Ergonomics, 48(5), 478- 491. Chaffin, D. B. (2001). Digital Human Modeling for Vehicle and Workplace Design. Pennsylvania, USA: SAE International. Chaffin, D. B. and Andersson, G. (1984). Occupational Biomechanics (2nded.). New York, USA: WileyInterscience. Gordon, C. C., Bradtmiller, B., Churchill, T., Clauser, C., McConville, J., Tebbetts, I. and Walker, R. (1988). 1988 Anthropometric Survey of US Army Personnel: Methods and Summary Statistics (Technical Report NATICK/TR- 89/044). US Army Natick Research Center: Natick, MA. Grandjean, E. (1987). Ergonomics in Computerized Offices. Philadelphia, USA: Taylor & Francis. Grandjean, E. (1983). Hunting, W. and Pidermann, M., VDT workstation design: Preferred settings and their effects. Human Factors, 25, 161-175. Hedge, A. and Powers, J. A. (1995). Wrist postures while keyboarding: Effects of a negative slope keyboard system and full motion forearm supports. Ergonomics, 38, 508- 517. Hwang, S.-L., Liang, S.-F.M.b, Liu, T.-Y.Y., Yang, Y.-J., Chen, P.-Y., Chuang, C.-F. (2009). Evaluation of human factors in interface design in main control rooms. Nuclear Engineering and Design, 239, 3069-3075. Kim, C., Lee, N., Jang, M., and Kim, J. (1991). Research on Ergonomic Design and Evaluation Technology for VDT Workstation. Korea Research Institute of Standards and Science. Korea Radioactive Waste Management Corporation (KRMC) (2009). Radioactive Waste. Retrieved August 21, 2009 from http://www.krmc.or.kr. Ku, J., Jang, T., Lee, J., and Lee, Y. (2006). A review of Human Factors Criteria for the Main Control Room MMI in Nuclear Power Plants. Ergonomic In Proceedings of the 2006 Fall Conference of the Society of Korea. Lee, S., Kwon, O., Park, J., Cho, Y., Lee, M., You, H., and Han, S. (2005). Development of a Workload Assessment Model for Overhead Crane Operation. In Proceedings of the 2005 Fall Conference of the Ergonomics Society of Korea. NASA (2006). Man-system integration standards. Retrieved September 22, 2009 from http://msis.jsc.nasa.gov/Volume1.htm. National Institute of Advanced Industrial Science and Technology (AIST) (2006). Secular change in Japan. Retrieved January 11, 2009, from http:// www.dh.aist.go.jp/research/centered/anthropometry/secul ar.php.en. Nelson, C. (2001). Anthropometric Analyses of Crew Interfaces and Component Accessibility for the International Space Station. In Chaffin, D. B. (Ed.), Digital Human Modeling for Vehicle and Workplace Design. Warrendale, PA: SAE International. O'Hara, J. M., Brown, W. S., Lewis, P. M. and Persensky, J. J. (2002). Human-System Interface Design Review Guidelines (DC 20555-0001). U.S. Nuclear Regulatory Commission, Office of Nuclear Regulatory Research. Park, J., Jung, K., Lee, W., Kang, B., Lee, J., Eom, J., Park, S., and You, H. (2008). Development of an Ergonomic Assessment Method of Helicopter Cockpit using Digital Human Simulation. In Proceedings of the 2008 Spring Conference of the Ergonomics Society of Korea. Padez, C. and Johnston, F. (1999). Secular trends in male adult height 1904-1996 in relation to place of residence and parent's educational level in Portugal. Annals of Human Biology, 26(3), 287-298. National Center for Health Statistics (2004). Hyattsville, Maryland, 1995. Size Korea. Statistics of Korean anthropometry. Retrieved September 26, 2009 from http://sizekorea.kats.go.kr. You, H. (2007). Digital Human Model Simulation for Ergonomic Design of Tangible Products and Workplaces. In Proceedings of the 2007 Fall Conference of the Ergonomics Society of Korea. PROCEEDINGS of the HUMAN FACTORS and ERGONOMICS SOCIETY 56th ANNUAL MEETING - 2012 1916
work_af7sa3spenctfit5v264bdayny ---- Mariana_Ou_INM380_2017 CITY, UNIVERSITY OF LONDON, MSC LIBRARY SCIENCE INM 380 LIBRARIES & PUBLISHING IN AN INFORMATION SOCIETY, ERNESTO PRIEGO MAY 2017 ASSIGNMENT OPTION 3 IDENTIFY THE MAIN WAYS IN WHICH TRANSFORMATIONS IN PUBLISHING ARE CHANGING THE WAY PEOPLE DO RESEARCH. WHAT ARE THE RELATIONSHIPS BETWEEN PUBLISHING AND DIGITAL SCHOLARSHIP? AND WHAT DO THESE RELATIONSHIPS MAKE POSSIBLE? WHAT ARE SOME CHALLENGES AND OPPORTUNITIES FOR PUBLISHERS AND/OR LIBRARIES IN THE CONTEXT OF THE NEW DEVELOPMENTS IN DIGITAL SCHOLARSHIP? WORD COUNT 3499, INCLUDING TITLES; ESTIMATED READING TIME: 18 MIN Mariana Strassacapa Ou Publishing as Sharing: OBSERVATIONS FROM ORAL HISTORY PRACTICES IN THE DIGITAL HUMANITIES Despite the evident general feeling that we experience an information deluge in our daily lives, whether ours is an ‘information society’ is subject of great debate. The term implies that ‘information’ is the very defining aspect of today’s society, rather than ‘agriculture’, for example (Bawden & Robinson, 2012); it also implies that at some point in the twentieth century a revolution has taken place, one that would have substituted a previous ‘industrial society’ for the current ‘information society’ as it fundamentally disrupted technologies and cultural practices related to human communication. Even though I am not convinced by the idea that we live in a ‘new’ kind of society, and rather prefer interpretations that identify all the continuities of modernism and capitalism developments through the last century, it is undeniable that recently, in the last decades, transformations in mediated communication have accelerated the production and dissemination of information enormously, increasing the complexity of ways people interact (Borgman et al., 2008). The widespread use of the Internet and the World Wide Web through cheap, personal digital information computing devices is largely to blame for these profound transformations; the term ‘digital’, originally applied as synonymous with discrete electronic processing techniques, came to refer to anything related to computers, from electronics to social descriptors (digital divides, digital natives), to emerging fields of inquiry (digital art, digital physics) (Peters, 2016). ‘Digital scholarship’ fits the latter category; according to Christine Borgman, it ‘encompasses the tools, services, and infrastructure that support research in any and all fields of study’ (2013). Clearly this is a quite broad definition, but does express the essential idea that scholarly practices and research opportunities have been widened through many new supporting ways. As I will argue here, a leading force defining digital scholarship has been the generalisation, in the digital milieu, of publishing as sharing. ‘Sharing’ as the new rhetoric of publishing In the book Digital Keywords: A Vocabulary of Information Society & Culture, Nicholas John scrutinises the term ‘sharing’ in its meanings recently acquired through use in the digital realm. Non-metaphorically, John explains, to share is to divide, and at least from the sixteenth century it refers to the distribution of scarce resources; recently, though, it has also been attributed a more abstract communicative dimension: ‘a category of speech, a type of talk, characterised by the qualities of openness and honesty, and commonly associated with the values and virtues of trust, reciprocity, equality, and intimacy, among others’; it has become ‘the model for a digitally based readjustment of our interactions with things (sharing instead of owning) and with others’ (John, 2016). Furthermore, ‘sharing’ would also mean a positive attitude with regards to future society; John talks in terms of the promise of sharing: The promise of sharing is at least twofold. On the one hand, there is the promise of honest and open (computer-mediated) communication between individuals; the promise of knowledge of the self and of the other based on the verbalisation of our inner thoughts and feelings. On the other hand, there is the promise of improving what many hold to be an unjust state of affairs in the realms of both production and consumption; the promise of an end to alienation, exploitation, self-centred greed, and breathtaking wastefulness. (John, 2016) Publishing after the digital boom—and specifically after the Internet and the World Wide Web having taken over a large share of our usual communication routines—, I argue, has a meaning which is becoming more and more inter-sectioned with that of ‘sharing’ we are referring to here. Digital publishing and ‘sharing’ are intertwined as both follow a ‘distributive logic’ more sustainable and alternative to capitalism models of production and consumption (John, 2016); publishing has had its definition widened as well as its actors and subjects and, just as ‘sharing’, it ‘plays heavily on interpersonal relations, promising to introduce you to your neighbours, for instance, or to reinstate the sense of community that has been driven out by, say, the alienation supposedly typical of modern urban life’ (John, 2016): it is now part of everybody’s daily activities, and not just a specialised profession. This ‘publishing as sharing’ new notion is in accordance with the new paradigm of openness in digital scholarship. Publishing processes had to be readapted, some of them radically, both to developments in digital technologies and to the pervasive digital ‘sharing’; when it comes to academic publishing and research practices, that means ‘open scholarship’, as in making your research data available in a repository for consultation and reuse; ‘open access’, as in publishing free from charge academic articles that would initially be charged for in digital journals; and ‘open dissemination’, as the idea behind institutional websites like Oxford University Research Archive (two screenshots below), a friendly, searchable repository of research outputs, including many open-access articles. In this essay, I use the debates on Oral History in the Digital Humanities to support the presentation of some of the relationships between publishing and digital scholarship and their implications, as well as challenges and opportunities that should concern those involved in both publishing and library & information science. NEW STANDARDS IN ORAL HISTORY widening scholarship practices through digital publishing The transformations in scholarship brought about by the universe of digital possibilities and the World Wide Web abound, but not many fields have been impacted as much as oral history. In the introduction to Oral history in the digital humanities: voice, access, and engagement (Boyd & Larson, 2014), the authors provide an overview of the developments in oral history and highlights how they were heavily influenced by the changing recording technologies of the last decades; if affordable and accessible new analogue technologies helped establish oral history as a compelling methodology for historical research in the 1960s, the transcript of the audio recordings still posed a great challenge from the library/archival perspective: as text, they were considered a more efficient communication than the recording, easier to go through looking for specific bits of information; ‘without the transcript, the archive might have no more information about an oral history interview on its shelves beyond a name, a date, and the association with a particular project’, and oral history collections (of cassettes) were always under the threat of obscurity, with no perspective of use of discovery (Boyd & Larson, 2014). Digital technologies, however, came to solve not only these problems but, with the World Wide Web, also give new and widened meanings for access; as the authors pointed out, ‘Digital technologies posed numerous opportunities to explore new models for automating access and providing contextual frameworks to encourage more meaningful interactions with researchers as well as with community members represented by a particular oral history project’. In this essay, I present four main changes in publishing after the ‘digital shift’ (publishing = sharing) as we can identify from oral history’s new practices in research and dissemination: 1 • the ‘democratic spirit’ Boyd & Larson talk about a ‘democratic spirit’ found in both oral history and the digital humanities as ‘the sense that the materials created, shared, generated, or parsed belong to everyone—not just to the educated or the well-to-do, but to those outside the university walls as well as those within’. Indeed, oral historians are obviously interested in history from ‘bottom-up’, the one that can be found and captured in common people’s voices, and are then characterised by adopting a more ‘democratic’ approach to historical inquiry, one that assumes collective participation in the creation of materials; in combination with the digital humanities, this inclusion of people in the creation process extends also to people’s access to these materials (Boyd & Larson, 2014); oral history’s ‘democratic’ values and preconditions are enhanced and find fertile ground in digital publishing. As we can read from the Founding Statement of The Journal for MultiMedia History of the University at Albany, a website that used to publish oral history collections: [it is] because so much of what we were doing as professional historians seemed so isolating that we wanted to "get out on the Web”, to reach not only academicians, but an entire universe of interested readers. We wanted to bring serious historical scholarship and pedagogy under the scrutiny of amateurs and professionals alike, to utilise the promise of digital technologies to expand history's boundaries, merge its forms, and promote and legitimate innovations in teaching and research that we saw emerging all around us (Zahavi & Zelizer, 1998) I understand this ‘democratic spirit’, as Boyd & Larson put it, as a manifestation of one of the transitions in authorship in the digital realm, ‘From Intellectual Property to the Gift Economy’, suggested by Kathleen Fitzpatrick in her book Planned obsolescence: publishing, technology, and the future of the academy. If academics and publishers are to restore scholarly communication’s origins and work towards genuinely open practices of producing and sharing academic content, she argues, then scholars must embrace the Creative Commons licenses for their work, ‘thus defining for themselves the extent to which they want future scholars to be able to reuse and remix their texts, thereby both protecting their right to be credited as the author of their texts and contributing to a vibrant intellectual commons that will genuinely ‘promote the Progress of Science and useful Arts.”’ (Fitzpatrick, 2011; citing the U.S. Constitution). Oral history research output has always been a complicated type of material in terms of authorship, ownership, and rights; whole collections cannot be made accessible because of copyright issues, e.g. the interviewer has deceased and did not leave any documentation on the matter behind. But online, it is becoming more common to apply CC licenses to oral history interviews through the interviewees consent forms, as in the words of an oral historian, ‘it clearly keeps the copyright in the hands of the oral history interview participant, but allows us to freely share the recording and transcript on our open-access public history website and library repository, where individuals and organisations may copy and circulate it, with credit to the original source’ (Simpson, 2012). The ‘democratic’ solution seems to be already available for academics, but the challenge now is to promote the CC license as such; academic and librarian Jane Secker seems to be on the right track when she refers to ‘copyright literacy’ as closely related to information literacy, to be of concern to everyone who ‘owns a device with access to the internet’ (Secker, 2017). 2 • ‘share your story’: authorship, collaboration, crowdsourcing Co-authorship in interviewing projects is nothing new, but collaborative work tends to become the norm when we consider oral history as related to and part of the digital humanities. If oral history has always been distinct from other practices in the humanities, as it often holds certain complexity with regards to authorship—who is the author of an interview, the interviewer, the interviewee, or both? Or none?—, this complexity has been successfully embraced in the digital realm. With crowdsourced websites like StoryCorps.org and AntiEvictionMappingProject.net (below), anyone is encouraged to ‘share their story’ and take part as author of a larger narrative, comprised of the collection of stories that assemble an inconstant, growing whole. Furthermore, as a oral history collection is published online and becomes a website, new roles which can arguably be corresponded to that of an author become essential: ‘While there are always two (and sometimes more) participants in the initial recording of an oral history, I would argue that there are three primary players in the presentation and preservation of a digital oral history once it has been recorded—the oral historian, the collection manager, and the Information Technology (IT) specialist. These three roles may, in some programs, actually be represented by the same person, but there are specific concerns and responsibilities particular to each’ (Schneider; In Boyd & Larson, 2014). In that sense, oral history is indeed in conformity with the basis of the digital humanities, understood as contrast to the essentially mono-authorial and monographic traditional processes and outputs of research in the humanities; as The DH Manifesto 2.0 states: ‘Digital Humanities = Co- creation’ (The Digital Humanities Manifesto 2.0, 2009; In Boyd & Larson, 2014). This is not to say that digital humanities has not been disruptive to previous practices in the humanities; on the contrary, it appears that the sciences have found continuity and enhancement of their procedures and methods in the digital realm, given that, as Gross & Harmon argue, in the sciences ‘collaboration was already flourishing; the Internet greatly facilitated it, among not only networked scientists from around the globe but also armies of citizen-scientists participating through websites like GalaxyZoo’ (Gross & Harmon, 2016). Knowledge in the humanities, in contrast, the authors argue, build up as ‘a chain of individual achievements. Even in the 21st century, collaboration in the humanities, though more common than previously, is not common at all. When it does occur, only two scholars are usually involved. There is a sense that these achievements ought to be individual.’ The humanities seem to be lagging behind the sciences in terms of being able to embrace the web’s possibilities, as we can see from some online journals: The Oral History Review by Oxford Academic, for example, presents no audio recording files or any other interactive feature, just the traditional pdf, authorial, text article. Institutional digital publishing in the humanities would greatly benefit from more ‘digital’ explorations of content and linking, but that obviously involves difficult changes in well-established mindsets and practices with regards to the notion of the strong individual author and the acclaimed, recognition-provider, conventional text based academic journal article. 3 • ‘archive everything’ A habit that is being abandoned thanks to the possibilities of digital archiving and storage is getting rid of the audio recordings of oral history once they have been transcribed. Now, researchers are not only able to keep the audio recordings and their many versions and editions, but also house and organise the interview collections using digital depositories and content management systems like CONTENTdm, and also enhance access to the interviews with OHMS (Oral History Metadata Syncronizer), which connects search terms with the online audio or video (website screenshot below) (Boyd & Larson, 2014). Usability and discoverability issues are being sorted out by the ‘archive everything’ (Giannachi, 2016) trend that comes with publishing-as- sharing practices. The ‘archive everything’ new paradigm is becoming such a norm in digital scholarship that Fitzpatrick talks about a ‘database-driven scholarship’, that refers to new kinds of research questions made possible through the online availability of collections of digital objects (Fitzpatrick, 2014). Nyhan & Flinn also mention a ‘rubric’ in the present research agenda of the digital humanities as one that looks back at humanities questions long asked and attempt to ask them in new ways, and to identify new questions that could not be conceived of explored before (Nyhan & Flinn, 2016); academic digital datasets, databases and archives are greatly responsible and enablers of these new opportunities. Gross & Harmon use a prize-winning monograph as an example of how current possibilities help ‘historians see anew’: Pohlandt-McCormick’s research on the Soweto uprising uses ‘photographs and official documents as an archive that can supplement, even interrogate the traditional historical archive. Her monograph contains 743 images and reproductions of some 200 written documents in all, a trove hard to imagine in a conventional book. These images and documents are reproduced in an “Archive” in her e-book, and select ones are integrated into the text and hyperlinked to supplementary information.’ (Gross & Harmon, 2016). Of course, database and archival academic websites are not just product of research, but increasingly made available as opportunity for other researchers to come up with new inquiries from them. That is one of the ideas behind making research data accessible as requirement in journal publications; Gross & Harmon cite Science’s stated policy as now typical: ‘As a condition of publication, authors must agree to make available all data necessary to understand and assess the conclusions of the manuscript to any reader of Science’. With the ‘archive everything’ practices and the emergence of digital collections of data and documents, comes the increasing significance of the activity of curation, meaning ‘making arguments through objects as well as words, images, and sounds’ (Digital Humanities Manifesto 2.0, 2009). For Fitzpatrick, curation relates to another shift in authorship that she identifies as ‘from originality to remix’: We might, for instance, find our values shifting away from a sole focus on the production of unique, original new arguments and texts to consider instead curation as a valid form of scholarly activity, in which the work of authorship lies in the imaginative bringing together of multiple threads of discourse that originate elsewhere, a potentially energising form of argument via juxtaposition. (Fitzpatrick, 2011) But just as difficult as establishing this kind of curation as legitimate academic work is enhancing the reusability of these valuable datasets and digital archives; just requiring data sharing seems to be not enough. If we want to ‘archive everything’, discoverability and dissemination are essential, but cannot happen without solid institutional base and support: storage must be big, URLs must always work, metadata and indexing must be precise and efficient. CONCLUSION academic publishing should be about sharing Layers of London is a project being undertaken in the University of London’s Institute of Historical Research, funded by the Heritage Lottery Fund; It ‘will bring together, for the first time, digitised heritage assets provided by key partners across London including: the British Library, London Metropolitan Archives, Historic England, The National Archives, MOLA. These will be linked in an innovative new website which will allow you to create and interact with many different layers of London’s history from the Romans to the present day. The layers include historic maps, images of buildings, films as well as information about people who have lived and worked in London over the centuries.’ (screenshot below) (Layers of London, 2017). It is still being developed at this moment, but it is working hard on its dissemination, as ‘a major element of the project will be work with the public at borough level and city-wide, through crowd-sourcing, volunteer, schools and internship programmes. Everyone is invited to contribute material to the project by uploading materials relating to the history of any place in London. This may be an old photograph, a collection of transcribed letters, or the results of local research project’ (Layers of London, 2017). So, instead of an individual historical research on London mapping that would traditionally be published as textual product, Layers of London is an open, funded website being built in an academic institution as platform for voluntary contributions; it has a blog, a twitter account, and instead of an ‘author’, a team of director, development officer, administrator, and digital mapping advisor. It represents all shifts in authorship as proposed by Fitzpatrick: ‘from product to process’; ‘from individual to collaborative’; ‘from originality to remix’; ‘from intellectual property to the gift economy’; and ‘from text to… something more’ (Fitzpatrick, 2011); and just like contemporary oral history projects, its success will be ‘measured by metrics pertaining to accessibility, discovery, engagement, usability, reuse, and … impact on both community and scholarship.’ (Boyd & Larson, 2014). As an open digital humanities work that fully embraces the possibilities of the web, however, it faces all the challenges that this kind of academic digital publication today usually does, including the recognition that it might even count as academic research. Fitzpatrick points out: ‘The key, as usual, will be convincing ourselves that this mode of work counts as work—that in the age of the network, the editorial or curatorial labor of bringing together texts and ideas might be worth as much as, perhaps even more than that, production of new texts.’ (Fitzpatrick, 2011). This ‘convincing ourselves’ effort involves the difficult task of rethinking university practices and the academic career, which simply cannot afford to shy away from the disruptive impact of digital publishing as sharing. The humanities in special has been trying to work itself out with the digital humanities; according to Nyhan & Flinn, another ‘rubric’ of the DH ‘has a distinct activist mission in that it looks at structures, relationships and processes that are typical of the modern university (for example, publication practices, knowledge creation and divisions between certain categories of staff and faculty) and questions how they may be reformed, re-explored or re-conceptualised.’ (Nyhan & Flinn, 2016). It must be a concern and responsibility of the university to establish and guarantee academic publishing as sharing, addressing today’s unsustainable models of publishing and embracing the shifting, more open forms of scholarly communication and research; I agree with Fitzpatrick: ‘Publishing the work of its faculty must be reconceived as a central element of the university’s mission.’ (Fitzpatrick, 2011). Librarians have significant roles to perform on this mission; the web is not a library, but librarians can help ensure it is used in its full potential: as a world wide networked communication system. And can help to let publishing be about sharing. REFERENCES Antieviction Mapping Project: Documenting the dispossessions and resistance of SF Bay Area residents, (2014-2017). Home. [online] Available at: http://www.antievictionmap.com/#/we-are-here-stories-of- displacement-and-resistance/ [Accessed 02 May 2017]. Bawden, D. and Robinson, L. (2012). Introduction to Information Science. London: Facet. Borgman, C. (2013). Digital scholarship and digital libraries: past, present, and future. Keynote Presentation, 17th International Conference on Theory and Practice of Digital Libraries, Valletta, Malta. Available at: http:// works.bepress.com/borgman/273/ [Accessed 01 May 2017]. Borgman, C., Abelson, H., Dirks, L., Johnson, R., Koedinger, K., Linn, M., … Szalay, A. (2008). Fostering Learning in the Networked World: The Cyberlearning Opportunity and Challenge. National Science Foundation. Available at: https://www.nsf.gov/pubs/2008/nsf08204/nsf08204.pdf [Accessed 01 May 2017]. Boyd, D. and Larson, M. (2014) Introduction. In: Boyd. D. and Larson, M., eds., Oral history and digital humanities: voice, access, and engagement. New York: Palgrave Macmillan US. The Digital Humanities Manifesto. (2009). [online] Available at: http://manifesto.humanities.ucla.edu/ 2009/05/29/the-digital-humanities-manifesto-20/ [Accessed 04 May 2017]. Dougherty, J. and Simpson, C. (2012). Who owns oral history? a creative commons solution. In: Boyd, D., Cohen, Rakerd, S. and D. Rehberger, eds., Oral history in the digital age. Institute of Library and Museum Services. Available at: http://ohda.matrix.msu.edu/2012/06/a-creative-commons-solution/ [Accessed 02 May 2017]. http://www.antievictionmap.com/#/we-are-here-stories-of-displacement-and-resistance/ http://www.antievictionmap.com/#/we-are-here-stories-of-displacement-and-resistance/ http://works.bepress.com/borgman/273/ http://works.bepress.com/borgman/273/ https://www.nsf.gov/pubs/2008/nsf08204/nsf08204.pdf http://manifesto.humanities.ucla.edu/2009/05/29/the-digital-humanities-manifesto-20/ http://manifesto.humanities.ucla.edu/2009/05/29/the-digital-humanities-manifesto-20/ http://ohda.matrix.msu.edu/2012/06/a-creative-commons-solution/ Giannachi, G. (2016). Archive everything: mapping the everyday. Cambridge, Massachusetts: The MIT Press. Fitzpatrick, K. (2011). Planned obsolescence: publishing, technology, and the future of the academy. New York: New York University Press. Gross, A. and Harmon, J. (2016). The Internet revolution in the sciences and humanities. 1st ed. New York: Oxford University Press. John, N. (2016). Sharing. In: Peters, B., ed., Digital Keywords: A Vocabulary of Information Society & Culture. Princeton: Princeton University Press. The Journal for MultiMedia History, (2000, 2001). Current issue. [online] Available at: http://www.albany.edu/ jmmh/ [Accessed 01 May 2017]. Layers of London, (2017). Home. [online] Available at: https://layersoflondon.blogs.sas.ac.uk [Accessed 03 May 2017]. Nyhan, J. and Flinn, A. (2016). Computation and the humanities: towards an oral history of digital humanities. Springer Open. DOI 10.1007/978-3-319-20170-2 Oral History Metadata Syncronizer: enhance access for free, (2017). Home. [online] Available at: http:// www.oralhistoryonline.org [Accessed 01 May 2017]. Oxford University Research Archive, (2008). Home. [online] Available at: https://ora.ox.ac.uk [Accessed 03 May 2017]. Pohlandt-McCormick, H. (2002). ‘I saw a nightmare…’ Doing violence to memory: the Soweto uprising, June 16, 1976. [online] Columbia University Press and Gutenberg-e. Available at: http://www.gutenberg-e.org/pohlandt- mccormick/index.html [Accessed 03 May 2017]. Secker, J. (2017). Digital, information or copyright literacy for all? [Blog] Libraries, Information Literacy and E- learning: reflections from the digital age. Available at: https://janesecker.wordpress.com/2017/02/08/digital- information-or-copyright-literacy-for-all/ [Accessed 01 May 2017]. Schneider, W. (2014). Oral history in the age of digital possibilities. In: Boyd. D. and Larson, M., eds., Oral history and digital humanities: voice, access, and engagement. New York: Palgrave Macmillan US. StoryCorps. (2003). Stories. [online] Available at: https://storycorps.org/listen/ [Accessed 03 May 2017]. http://www.albany.edu/jmmh/ http://www.albany.edu/jmmh/ https://layersoflondon.blogs.sas.ac.uk http://www.oralhistoryonline.org http://www.oralhistoryonline.org https://ora.ox.ac.uk http://www.gutenberg-e.org/pohlandt-mccormick/index.html http://www.gutenberg-e.org/pohlandt-mccormick/index.html https://janesecker.wordpress.com/2017/02/08/digital-information-or-copyright-literacy-for-all/ https://janesecker.wordpress.com/2017/02/08/digital-information-or-copyright-literacy-for-all/ https://storycorps.org/listen/
work_afxycoctbrbcpkbkn7gub2qnpa ---- Raemy_Schneider_VKKS2019_quidproquo_AssigningPID_Art_Design 07.06.2019Raemy & Schneider 07.06.2019 ASSIGNING PERSISTENT IDENTIFIERS TO ART AND DESIGN ENTITIES Julien A. Raemy & René Schneider Fourth Swiss Congress for Art History, VKKS, Mendrisio Quid pro quo: linked data in art history research 07.06.2019Raemy & Schneider 1. Introduction to Persistent identifiers (PIDs) 2. Cool URIs and PIDs 3. The rationale and main results of the ICOPAD project 4. ICOPAD possible follow-up project: INCIPIT Agenda 1. INTRODUCTION TO PIDS 07.06.2019Raemy & Schneider Persistent identifiers (PID) A persistent identifier is a long-lasting and biunique reference to a digital resource. It usually has two parts: 1. A unique identifier (to ensure the provenance of a digital resource) 2. A location for the resource over time (to ensure that the identifier resolves to the correct location) https://www.slideshare.net/AustralianNationalDataService/fsci-persistent-identifiers https://www.slideshare.net/AustralianNationalDataService/fsci-persistent-identifiers 07.06.2019Raemy & Schneider In order to… https://www.interserver.net/tips/kb/404-error-fix/ - Create long lasting (not permanent) access - Avoid error messages https://www.interserver.net/tips/kb/404-error-fix/ 07.06.2019Raemy & Schneider PIDs are essential and indispensable to create fair data. F1 Principle: (meta)data are assigned a globally unique and eternally persistent identifier FAIRness http://www.dit.ie/dsrh/data/fairdata/ http://www.dit.ie/dsrh/data/fairdata/ 07.06.2019Raemy & Schneider § Publications § Data § Persons § Organisations § Citations and more: (antibodies, fictious characters, places, plants, e-books, …) PID ≠ PID 07.06.2019Raemy & Schneider « Persistence is not dependant on the identifier itself, but on legal, organisational and technical infrastructure ». (Hakala 2005) Persistence 07.06.2019Raemy & Schneiderhttp://andrew.treloar.net/research/diagrams/recording-to-archiving-architecture.jpg http://andrew.treloar.net/research/diagrams/recording-to-archiving-architecture.jpg 2. COOL URIS AND PIDS 07.06.2019Raemy & Schneider § Cool URIs don’t change: https://www.w3.org/Provider/Style/URI (Tim Berners-Lee, 1998) § Cool URIs for the Semantic Web: https://www.w3.org/TR/cooluris/ (W3C Interest Group Note, 2008) Cool URIs https://www.w3.org/Provider/Style/URI https://www.w3.org/TR/cooluris/ 07.06.2019Raemy & Schneider PIDs and cool URIs (Bazzanella, Bortoli, Bouquet 2013) Feature PIDs Cool URIs Resolver YES NO Authority YES NO Naming authorities YES NO Level of trust HIGH LOW Policies YES NO Persistence YES NO Actionability of IDs Partially YES Uniqueness YES NO Content change NO YES Content negotiation NO YES Cross linkage NO YES Effort for implementation HIGH LOW Costs for users Potentially HIGH LOW Sustainability issues MANY FEW Identified entities Mainly digital objects Everything Bridge metadata NO YES 07.06.2019Raemy & Schneider Motivation § SARI § ICOPAD PID LOD – cool URIs 07.06.2019Raemy & Schneider PIDs and LOD at the BnF 07.06.2019Raemy & Schneiderhttps://gallica.bnf.fr/ark:/12148/btv1b10542304w/f13.item https://gallica.bnf.fr/ark:/12148/btv1b10542304w/f13.item 3. THE RATIONALE AND MAIN RESULTS OF THE ICOPAD PROJECT 07.06.2019Raemy & Schneider § Identités de confiance pour les données de l’art et du design (ICOPAD) – June 2017 to December 2018 o Haute école de gestion de Genève (HEG-GE) – Instigator and Project Manager o Zentralbibliothek Zürich (ZB) / Zurich Central Library o Zürcher Hochschule der Künste (ZHdK) / Zurich University of the Arts o Schweizerisches Institut für Kunstwissenschaft (SIK-ISEA) / Swiss Institute for Art Research o Goal: feasibility of a suitable PID model (prototype) o Requirements and workflow – link between research data and Linked Data based on PIDs o Dedicated to the disciplines of art, design, and digital humanities to derive conjectures o Transferability of the model to other disciplines ICOPAD Project https://campus.hesge.ch/id_bilingue/projekte/icopad/index_fr.asp https://campus.hesge.ch/id_bilingue/projekte/icopad/index_fr.asp 07.06.2019Raemy & Schneider Swiss PID Landscape ark 07.06.2019Raemy & Schneider ICOPAD use cases from our project partners Institution Data set types/entities Needs SIK-ISEA Artists Artworks Dictionary entries Diverse PIDs and links to normed data. ZB Digital surrogates Fine level of granularity. ZHdK Artists Artworks Events Films Glossary entries Projects Research Data Further development of applications such as eMuseum and Medienarchiv. 07.06.2019Raemy & Schneider Approaches DOI + 𝐶(𝑑𝑜𝑖) = 𝑥 DOI + 1 𝐶(𝑑𝑜𝑖) = 𝑎 𝐶(𝑑𝑜𝑖) = 𝑥+,𝑥-, …,𝑥/DOI + n DOI + 1 + LD 𝐶(𝑑𝑜𝑖) = 𝑎 → 𝑜𝑤𝑙:𝑠𝑎𝑚𝑒𝐴𝑠 𝑥+,𝑥-, …,𝑥/ 𝑎 = 𝑎𝑟𝑘 (𝐴𝑟𝑐ℎ𝑖𝑣𝑎𝑙 𝑅𝑒𝑠𝑠𝑜𝑢𝑟𝑐𝑒 𝐾𝑒𝑦 𝑝𝑟𝑜𝑣𝑖𝑑𝑒𝑑 𝑏𝑦 𝐶𝑎𝑙𝑖𝑓𝑜𝑟𝑛𝑖𝑎 𝐷𝑖𝑔𝑖𝑡𝑎𝑙 𝐿𝑖𝑏𝑟𝑎𝑟𝑦) 07.06.2019Raemy & Schneider § ARK identifiers are free § ARKs are built using a completely different theoretical model, consisting of a decentral and domain (i.e. DNS) agnostic approach § ARKs allow to use with ease LOD on top of them § ARKs can effortlessly be combined with other specifications such as the International Image Interoperability Framework (IIIF) canonical URI syntax Archival Resource Key (ARK) 07.06.2019Raemy & Schneider Solution approaches ark s ervic e to crea te PID service request Swiss PID Hub ark r eque st DaSCH Uni Bas ark CDL exis ting ark s ervic e multitude of PIDs ark service to create own PID Attribution Service if NOT DOI @ ETH ¦ FORS AND if NOT data archived @ DaSCH 07.06.2019Raemy & Schneider o PIDs are a key element in the research data management process and should be assigned to any entities as soon as possible o Trusted identity, FAIR data o PID for the Semantic Web is possible o BnF platforms o A large variety of PIDs à DOIs are not sufficient o Most interesting complement: ARKs (LOD, free, decentral, granularity, etc.) o Need for an infrastructure/service in Switzerland o National Hub that can mint ARKs Conclusion 4. ICOPAD POSSIBLE FOLLOW-UP PROJECT: INCIPIT 07.06.2019Raemy & Schneider §Infrastructure nationale d’un complément pour les identifiants pérennes, interopérables et traçables (INCIPIT) §Project submission (August 2019) §3 phases 1. Attribution service (by the end of 2019) – ArODES 2. Fusion of ArODES and SONAR (2020) 3. Creation of a Hub (2021) Partners welcome (see you at Bits and Bites)! INCIPIT 07.06.2019Raemy & Schneider Julien A. Raemy Research and Teaching Assistant in Information Science julien.raemy@hesge.ch rene.schneider@hesge.ch René Schneider Full Professor of Information Science mailto:julien.raemy@hesge.ch mailto:rene.schneider@hesge.ch 07.06.2019Raemy & Schneider Bibliography • BAZZANELLA, Barbara, BORTOLI, Stefano and BOUQUET, Paolo, 2013. Can persistent identifiers be cool? International journal of digital curation. 14 June 2013. Vol. 8, no. 1, p. 14–28. DOI 10.2218/ijdc.v8i1.246. • BERMÈS, Emmanuelle, 2006. Des identifiants pérennes pour les ressources numériques : l’expérience de la BnF [online]. Paris, France: Bibliothèque nationale de France. [Accessed 20 May 2019]. Available from: https://web.archive.org/web/20181006042857/http://www.bnf.fr/documents/ark_presentation_bermes_2006.pdf • ESPASANDIN, Kate, JAQUET, Aurélie, LEFORT, Lise and SCHNEIDER, René (dir ), 2018. TRMASID 14: Panorama et modélisation d’identifiants pérennes pour la création d’identités de confiance [online]. Genève, Suisse: Haute école de gestion de Genève. [Accessed 20 May 2019]. Available from: https://doc.rero.ch/record/309479 • EU. DIRECTORATE-GENERAL FOR RESEARCH AND INNOVATION, 2018. KI-06-18-206-EN-N: Turning FAIR into reality. Final Report and Action Plan on FAIR Data [online]. Brussels, Belgium. [Accessed 20 May 2019]. Available from: https://doi.org/10.2777/1524 • HILSE, Hans-Werner and KOTHE, Jochen, 2006. Implementing persistent identifiers: overview of concepts, guidelines and recommendations. London: CERL. ISBN 978-90-6984-508-1. • LA TRIBUNE DES ARCHIVISTES, 2018. Choisir des URL persistantes pour la mise en ligne de sa base de données : ARK pas à pas... La Tribune des Archivistes [online]. 21 October 2018. [Accessed 20 May 2019]. Available from: http://latribunedesarchives.blogspot.com/2018/10/choisir-des-url-persistantes-pour-la.html • MEADOWS, Alice, 2017. PIDapalooza – the open festival for persistent identifiers. Insights. 8 November 2017. Vol. 30, no. 3, p. 161–164. DOI 10.1629/uksg.393. • NICHOLAS, Nick, WARD, Nigel and BLINCO, Kerry, 2009. A policy checklist for enabling persistence of identifiers. D-Lib magazine [online]. January 2009. Vol. 15, no. 1/2. [Accessed 20 May 2019]. DOI 10.1045/january2009-nicholas. Available from: http://www.dlib.org/dlib/january09/nicholas/01nicholas.html • PEYRARD, Sébastien, KUNZE, John A. and TRAMONI, Jean-Philippe, 2014. The ARK Identifier Scheme: Lessons Learnt at the BnF and Questions Yet Unanswered. International Conference on Dublin Core and Metadata Applications. 8 October 2014. P. 83–94. • PRONGUÉ, Nicolas and RAEMY, Julien A., 2017. Revue de la littérature : identifiants pérennes (PID), Linked Data, Données de la recherche [online]. Carouge, Suisse: Haute école de gestion de Genève. [Accessed 20 May 2019]. Available from: https://campus.hesge.ch/id_bilingue/projekte/icopad/doc/Prongue_Raemy_Revue_Litterature_2017.pdf • RAEMY, Julien A., 2018. Identifiants pérennes (PID) : Processus d’obtention, mapping et approches d’attribution, modélisation, glossaire [online]. Carouge, Suisse: Haute école de gestion de Genève. [Accessed 20 May 2019]. Available from: https://campus.hesge.ch/id_bilingue/projekte/icopad/doc/Raemy_PID_Processus_Approches_Modelisation_2018.pdf • SCHNEIDER, René and RAEMY, Julien A., 2019a. Résultats du projet ICOPAD. ID Bilingue [online]. February 2019. [Accessed 20 May 2019]. Available from: https://campus.hesge.ch/id_bilingue/projekte/icopad/results_fr.html • SCHNEIDER, René and RAEMY, Julien A., 2019b. Towards Trusted Identities for Swiss Researchers and their Data. 14th International Digital Curation Conference (IDCC) [online]. Melbourne, Australia. 6 February 2019. [Accessed 20 May 2019]. Available from: https://doi.org/10.5281/zenodo.2415995 • VAN DE SOMPEL, Herbert, KLEIN, Martin and JONES, Shawn M., 2016. Persistent URIs Must Be Used To Be Persistent. arXiv:1602.09102 [cs] [online]. 29 February 2016. [Accessed 20 May 2019]. Available from: http://arxiv.org/abs/1602.09102 https://doi.org/10.2218/ijdc.v8i1.246 https://web.archive.org/web/20181006042857/http:/www.bnf.fr/documents/ark_presentation_bermes_2006.pdf https://doc.rero.ch/record/309479 https://doi.org/10.2777/1524 http://latribunedesarchives.blogspot.com/2018/10/choisir-des-url-persistantes-pour-la.html https://doi.org/10.1629/uksg.393 https://doi.org/10.1045/january2009-nicholas http://www.dlib.org/dlib/january09/nicholas/01nicholas.html https://campus.hesge.ch/id_bilingue/projekte/icopad/doc/Prongue_Raemy_Revue_Litterature_2017.pdf https://campus.hesge.ch/id_bilingue/projekte/icopad/doc/Raemy_PID_Processus_Approches_Modelisation_2018.pdf https://campus.hesge.ch/id_bilingue/projekte/icopad/results_fr.html https://doi.org/10.5281/zenodo.2415995 http://arxiv.org/abs/1602.09102
work_ai7mlk2lq5htzcv2pmbhjhhgiu ---- Ergonomic Assessment for DHM Simulations Facilitated by Sensor Data Available online at www.sciencedirect.com 2212-8271 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientific committee of 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015 doi: 10.1016/j.procir.2015.12.098 Procedia CIRP 41 ( 2016 ) 702 – 705 ScienceDirect 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015 Ergonomic assessment for DHM simulations facilitated by sensor data Dan Gläsera, Lars Fritzschea, Sebastian Bauera, Vipin Jayan Sylajaa aimk automotive GmbH, 09128 Chemnitz, Germany Abstract The digital factory with its innovative tools is experiencing an increasing importance, not only in experimental but also productive domains. One of these tools is the digital human model (DHM). In the field of production, the focus of using DHMs lies in the planning and evaluation of processes and products in terms of plausibility, productivity and ergonomics. Up to now, ergonomic assessment within DHM simulations have been mostly limited to static evaluations of reachability and postures. INTERACT is a running R&D project, working on the main weak points of DHM software tools. The industry-driven requirements are mainly the reduction of input effort, the increase of movement quality and a quick and intuitive way to create simulation variations in a workshop environment. The utilization of sensor data to create high quality simulations is another point of development. Next to the addressed improvement in productivity and plausibility, these latest advancements also enable automatic ergonomic assessments, including process oriented standards like EAWS, OCRA and NIOSH lifting index. The inclusion of these standards will allow a more holistic ergonomic assessment and therewith expand the fields of application in the industrial environment. This paper will give an insight in the latest developments and the performance of current implementations of automatic ergonomic assessment within digital human models. © 2015 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Scientific Committee of 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015. Keywords: Ergonomic assessment for DHM simulations facilitated by sensor data 1. Introduction The interactive nature and the flexibility are the main advantages of digital simulations. Especially in the environment of process planning for manual work tasks, where the classic methods have been using paper boxes as mock-ups and string to plan body postures and walking paths, the advantages of a virtual environment become clear. The creation of process variations within seconds, the exchange of objects in the work place, or the shifting of tasks from one worker to another are just a few of many examples. Next to that software systems possess the ability to measure precisely, when it comes to path lengths, times or joint angles. Thus, the full incorporation of ergonomic assessment methods into DHM software tools may improve evaluation efficiency, objectivity and validity. Nethertheless, the simulation of manual processes and the ergonomic assessment of these processes hasn’t been used widely in the past. The simulation of manual manufactoring processes has been a very time consuming work, since the definition of body postures and the © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of the scientifi c committee of 48th CIRP Conference on MANUFACTURING SYSTEMS - CIRP CMS 2015 703 Dan Gläser et al. / Procedia CIRP 41 ( 2016 ) 702 – 705 motions in between had to be defined on the level of individual limbs and joints. The massive time effort, which has been needed hindered the digital human model as a technology to become the intuitive and interactive tool it could be. The INTERACT approach tries to focus explicitly on these weaknesses, to raise the digital human model onto a higher level of intuitiveness and interactiveness. This paper focusses mainly on the ergonomic assessment function of the INTERCAT software prototype. In the following paper the three included assessment methods EAWS, NIOSH lifting index and OCRA will be described, followed by the methodology and the implementation of the regarding software modules. 2. Methodology The automatic ergonomic assessment with the previously mentioned methods EAWS, NIOSH lifting index and OCRA require a certain amount of information of the process: - Body postures - Handled loads - Forces applied to the body These parameters have to analyzed discretely, to be able to assign the parameters to each other at every time of the process. The body posture will be retrieved through the measuring of joint angles and/or distances of joints, limbs and body marks as required by the relevant ergonomic assessment method. The information of the handled loads will be retrieved from the geometry data, which includes information about the mass of the used geometry. If a load in the scene is handled will be retrieved from an ‘attached’/’detached’ information for the right, left or both hands. The forces will be measured and interactively assign to the process through sensor data. This can be done in advance of the simulation or interactively in the work shop environment. Next to that it will be possible to assign forces manually to individual processes. The three methods also allow to define ‘extra-points’ for special ergonomic risks like throwback, sitting on hanging surfaces, walking on sticky floors, etc.. 3. Ergonomic assessment modules 3.1. EAWS The Ergonomic Assessment Work Sheet (EAWS) [1] is a widely used method in the German automotive industry. It’s based on a holistic analysis of the work process, considering all executed work tasks in the context of a whole working day. EAWS is separated in 5 Modules, which are assessed separately. The first module is related to body postures, which are assessed as static (duration > 4 sec.) or dynamic (freq. > 2/min.). A posture is only assessed, if during its occurrence no significant force (> 40 N) or load (>3 Kg) is applied to the worker. If a relevant force or load is occurring, the related parts of the process are assessed with the regarding modules. The first module addresses the extra points, which can’t be or at least not easily quantified within a ‘standard’ assessment. The last module is related to upper limb movements at high frequencies. This module results in an extra index, which is displayed separately. Due to its complex nature and focus on relatively difficult to observe body parts, such as the wrist, this module isn’t used widely. 3.2. NIOSH lifting index The NIOSH lifting index (LI) is a standard assessment method for load handling and together with OCRA one of three ergonomic assessment tools, which are part of the ISO 11228 standard and therewith international standards [2]. The LI applies for lifting and lowering without considering any walking respectively carrying in between. Fig. 1. Skeleton of the INTERACT avatar Fig. 2. Graphic representation of hand location 704 Dan Gläser et al. / Procedia CIRP 41 ( 2016 ) 702 – 705 The result of the assessment – the lifting index- displays the quotient between the handled load and a recommended load for the reviewed tasks. The recommended is calculated by the following equation, which combines the parameters weight of the handled object, horizontal (HM) & vertical locations (VM), distance (DM), angle of symmetry (AM), frequency of lift (FM), duration and the coupling (CM) between hands and object: 3.3. OCRA The OCRA system is a set of set of tools enabling different levels of risk assessment based on the desired specificity, variability and objectives [3]. As mentioned above its part of ISO 11228. OCRA consists of three modules: the Ocra Mini-Checklist, the Ocra Checklist and the Ocra Index. For an automatic assessment the Ocra Index is the one that is used, because only the Index is developed to quantify the work related exposure and risks on a detailed level. As the NIOSH lifting index, the OCRA index is a quotient of actual technical actions (ATA) to recommended technical actions (RTA). The definition of technical actions is shown below ( see Fig. 4) Both are calculated by a number of multipliers containing the number of repetitive tasks per shift, Force exertion, posture, recovery and the additional multiplier. 4. Results All assessment tools have been analyzed with regard to the quantification and measurement of their input parameters. The current prototypes of the assessment tools contain only those parameters, which are measurable within the INTERACT prototype’s functionality. There is still a number of additional parameters, which have to be put in automatically, since they are not assigned to the process or the geometry yet. Some of these additional parameters are the coupling between hand and object during load handling, temperatures or vibration. The workflow for the development and implementation of the tools has been the same for all three methods: method analysis and preparation, GUI draft, program flow chart, implementation, validation through test scenario. 4.1. EAWS Besides the additional points, EAWS has been transferred to a fully automated assessment tool. The body postures are assessed in every frame of the simulation. The loads are retrieved from the masses, which are assigned to the handled geometry, while forces are assigned to tasks via sensor data in the workshop. The results are displayed through the INTERACT GUI (see Fig. 4). On the right the overall score is displayed, with the distribution of points into the several assessment modules posture, action forces, load handling and extra points. The EAWS result is ranked in the three categories green (0-25 pts.), yellow (25-50 pts.) and red (>50 pts.), which indicates either low risk, intermediate risk or urgent need for adaption of the working conditions. In the left part of the GUI several detailed representations of the individual modules (posture, forces, loads) can be displayed regarding to the requirements of the user. Fig. 3. Equation to calculate the recommended weight for the NIOSH lifting index Fig. 5. GUI of the EAWS module Fig. 4. Technical actions in OCRA 705 Dan Gläser et al. / Procedia CIRP 41 ( 2016 ) 702 – 705 4.2. NIOSH lifting index The NIOSH lifting index can be processed almost fully automatically, beside the coupling multiplier between hands and objects. In the long-term, this parameter can be assign directly to the geometry as meta- information. With this further improvement, the NIOSH lifting index will be available as automatic assessment tool. It has to be mentioned that the NIOSH lifting index shows several weaknesses, as a holistic assessment tools, since it only assesses lifting and lowering tasks and points out a number of restrictions. For example a switch of hands, sitting down, tool handling and other tasks are not allowed to be assessed. 4.3. OCRA The OCRA method is suitable for an automatic assessment in principle, but there are several challenges coming with it. Not every technical action is defined irrevocably defined, what makes it difficult to determine them explicitly. For the identification of ‘putting in/pulling out’ it is necessary to be able to differentiate them from a simple ‘moving’. For the technical action ‘start-up’ the software has to know, if a tool is manual or automatic and if it required the pressing of a start-button or not. There are concepts for these problems to be solved, since most the required information can be assigned either to objects or to processes in the future, but the current INTERACT prototype won’t allow to implement all of the required features. Nethertheless there is a tool ready for a semi-automatic OCRA assessment, which requires some manual input (see Fig. 6.). 5. Conclusion and discussion With the automatic assessment with the three process oriented ergonomic assessment tools EAWS, NIOSH lifting index and OCRA, INTERACT makes a big contribution to promote the work with digital human models for the ergonomic evaluation of processes in manufacturing. While all methods show the ability to be used automatically in a virtual environment, there are still problems to solve. Some parameters, which are required by the methods aren’t part of the current virtual representations of product and processes. Properties like surface conditions, temperatures or vibrations aren’t assigned to virtual objects yet. The INTERACT project strengthens the idea, that the focused goals of higher efficiency, objectivity and validity in ergonomic assessment can be achieved with digital human modelling in the near future. References [1] Schaub K, Caragnano G, Britzke B, Bruder R. The European Assembly Worksheet. Theoretical Issues in Ergonomics Science. 2012. DOI: 10.1080/1463922X. 2012.678283 [2] ISO 11228-1:2007 [3] ISO 11228-3:2007 Fig. 6. GUI draft for the OCRA assessment tool
work_ajlebeewlbelvegajsgyamggoq ---- Toward Sustainable Growth: Lessons Learned Through the Victorian Women Writers Project Research Article How to Cite: Borgo, Mary Elizabeth. 2017. “Toward Sustainable Growth: Lessons Learned Through the Victorian Women Writers Project.” Digital Studies/Le champ numérique 7(1): 4, pp. 1–8, DOI: https://doi.org/10.16995/ dscn.276 Published: 13 October 2017 Peer Review: This is a peer-reviewed article in Digital Studies/Le champ numérique, a journal published by the Open Library of Humanities. Copyright: © 2017 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. Open Access: Digital Studies/Le champ numérique is a peer-reviewed open access journal. Digital Preservation: The Open Library of Humanities and all its journals are digitally preserved in the CLOCKSS scholarly archive service. https://doi.org/10.16995/dscn.276 https://doi.org/10.16995/dscn.276 http://creativecommons.org/licenses/by/4.0/ Borgo, Mary Elizabeth. 2017. “Toward Sustainable Growth: Lessons Learned Through the Victorian Women Writers Project.” Digital Studies/Le champ numérique 7(1): 4, pp. 1–8, DOI: https://doi.org/10.16995/dscn.276 RESEARCH ARTICLE Toward Sustainable Growth: Lessons Learned Through the Victorian Women Writers Project Mary Elizabeth Borgo Department of English, Indiana University, US meborgo@umail.iu.edu This case study offers strategies for TEI-based projects with limited funding. By focusing on the needs of our volunteers, the Victorian Women Writers Project has developed truly collaborative relationships with the project’s partners. Contributions to the project’s resources have grown out of digital humanities survey courses, literature classes, and independent work. The paper concludes with a brief sketch of our efforts to support continued work by rethinking our social media outreach and our online presence. Keywords: TEI encoding; feminist DH; sustainability Cette étude de cas offre des stratégies pour les projets TEI (initiative pour l’encodage de texte) dont le financement est limité. En mettant l’accent sur les besoins de nos bénévoles, le projet Victorian Women Writers a mis au point des relations véritablement collaboratives avec les partenaires du projet. Les contributions aux ressources du projet sont issues des cours d’introduction et des classes de littérature en humanités numériques, et de travail indépendant. L’article conclut par un bref résumé de nos initiatives afin d’appuyer le travail continu en réévaluant notre diffusion dans les médias sociaux et notre présence en ligne. Mots-clés: Encodage TEI; HN féministes; durabilité By age 20, Juliana Horatia Ewing had published her first children’s story in the Monthly Packet, “A Bit of Green” (1895). It features a selfish child who learns Christian charity by visiting with his father’s patients, and exhibits all of the hallmarks of a typical Victorian children’s story. While this sentimental tale was unremarkable https://doi.org/10.16995/dscn.276 mailto:meborgo@umail.iu.edu Borgo: Toward Sustainable Growth2 in its time, this short piece launched Ewing’s extraordinary career. As the founder and editor of Aunt Judy’s Magazine, Ewing became one of the most dynamic and influential children’s authors of her time. Her most enduring story, “The Brownies,” even inspired a new division of the Girl Scouts. Ewing’s work, among other rare and often out-of-print texts, has found a new audience through the Victorian Women Writers Project. Since its founding in 1995, the archive has supported feminist literary studies through its innovative approach to preserving nineteenth-century texts. By working alongside groundbreaking projects like Orlando and the Women Writers Project among many others, over 200 texts have been encoded according to TEI-P5 guidelines. We continue to add more texts, critical introductions, scholarly annotations, and biographies with each passing year. But if the authors in our archive are any indication of our future, the next 20 years will be even more spectacular. In order to ensure the project’s sustained growth, the VWWP has been developing new types of partnerships. While there is no “one size fits all” solution to developing sustainable projects, the following case study offers a broad spectrum of approaches for encoding initiatives that rely heavily on the work of unpaid contributors. By assessing their needs, we have become better prepared to create and support mutually-beneficial partnerships. This learning process has shed light on logistical difficulties inherent in collaborative encoding projects, ultimately inspiring a more student-centered approach. Our first step towards sustainable growth was to identify potential contributors who had some familiarity with coding and with nineteenth-century texts. Since its inception, our project has been the result of close partnerships between faculty, students, and librarians at Indiana University. Perry Willett, then Head of Library Electronic Text Resource Service (LETRS), founded the project in 1995 after being approached by an undergraduate, Felix Jung, who requested additional resources to study Victorian poetry, a genre dominated by women. Through close collaboration with Donald Gray from the English department, the founders identified, encoded, and launched new digital editions of rare materials authored by women that had been largely overlooked in subscription-based services. After lying fallow for a few years, the project was revived in 2007 by Angela Courtney, IU’s English Literature Borgo: Toward Sustainable Growth 3 Librarian, and Michelle Dalmau, then Digital Projects Librarian. Their outreach efforts ultimately resulted in one of the first Digital Humanities courses taught at Indiana University in the fall of 2010. Co-teacher Joss Marsh, a Victorianist, and Adrianne Wadewitz, then a graduate student at IU, transformed the VWWP into a powerful pedagogical tool. Encoding texts for the project as part of course objectives gave students the opportunity to practice traditional editorial skills alongside emergent methodologies in the digital humanities (For more information about the project’s founding and development see Courtney et al. 2015). As a student in this course, I saw first-hand how digital preservation projects can lead to exponential professional growth, particularly at a graduate student level. Learning how to code through the VWWP gave me the advanced TEI skills needed for digital preservation projects. This experience laid the groundwork for building my own digital projects and contributing to others. By incorporating digital resource-building into my writing process, I have created publically-accessible versions of my dissertation research. This aspect of my work has made me a more competitive candidate for travel funding and research grants. When I assumed the role of managing editor of the VWWP in the spring of 2011, I did not yet know how formative digital humanities would be for my own approach to nineteenth-century literature, but I was (and still am) passionate about helping undergraduate and graduate students professionalize through their work with the VWWP. Since students have been a key facet of the project’s growth, we then looked for resources which would help us to expand our partnerships with students at a graduate level. Our research included identifying relevant models for classroom engagement. Many successful projects deliberately target the classroom as the primary site of contributions. The Victorian Web and the Map of Early Modern London, for example, includes entries written as part of daily class objectives. Graduate-level digital humanities courses taught at IU since the fall of 2010 include the VWWP, the Swinburne Project, and the Chymistry of Isaac Newton as part of a more general survey of DH projects. The courses taught in the Fall of 2014 and 2015 used Scalar to preserve the classes’ work. Yet, the first class was a bit of an outlier in its focus on editorship and on TEI. By nature, digital humanities survey courses have little Borgo: Toward Sustainable Growth4 room for extended TEI-encoding projects. Since most students enroll in these courses without prior knowledge of XML encoding and TEI guidelines, it is difficult to devote a significant portion of the class to technical training. Learning how to encode seemed to be the biggest logistical challenge for graduate volunteers. When coupled with the fact that most graduate students are also juggling teaching responsibilities and dissertations, devoting time to learning a coding language seems like a daunting task. Until there are institutional changes to dissertation criteria, it’s difficult to convince graduate students to engage with digitization projects as an extension of their research because this kind of work is not needed to graduate. IU has taken steps toward changing this perception by modifying the language requirement of the Ph.D. to include code. Positioning TEI as a language prepares Ph.D. candidates like myself to engage with a broader range of critical work, much in the same way that one would grapple with criticism in German or French. As a language, TEI also shapes the way that an encoder interacts with the texts. In my own work, looking for place names has made me more attuned to the role of space in shaping narrative. Encoding creates an experience of close- reading a text that both prepares the text for digital publication and generates new interpretations of nineteenth-century material. In order to better support work that combined editing with encoding, we had to cater encoding tasks to fit the requirements and time constraints of the classroom. This was a particularly daunting undertaking since many of the books in our current workflow span over 200 pages. With the help of teaching workshops offered through the Women Writers Project (Northeastern University) and the Digital Humanities Summer Institute (University of Victoria), we developed different strategies for sharing the work of encoding. In some cases, encoders complete only a portion of the text; while this is well-suited for short-term projects, it’s challenging to maintain a level of continuity between each part (and among all of the texts in the repository). Since our encoders have found it easier to work with a whole text, we are gravitating toward adding shorter texts into our digitization workflow and toward dividing encoding tasks into phases. Having several encoders make multiple passes through a text increases chances for peer-review and thus reduces the number of errors in the encoding. Borgo: Toward Sustainable Growth 5 As we worked on strategies to market encoding tasks to graduate students, we also considered expanding contributions to the project that did not require encoding. While this move does not help us expand our collection of TEI-encoded texts, it allows us to develop partnerships with undergraduate students and to increase our outreach efforts. Much to our delight, we were able to partner with Chris Hokanson at Judson College in the spring of 2012 in order to add supplemental scholarly material to the archive. As part of an undergraduate course on Victorian women’s writing, Hokanson asked students to write brief scholarly biographies for authors in the collection. These submissions were then edited and encoded by the project’s managers. The greatest challenge that we face during the next phase of the project’s development is not a logistical problem but an ethical one. Since the VWWP is, and will continue to be, an open-access resource, we lack the revenue generated by subscriptions. To further complicate matters, encoding a 300-page Victorian novel or writing a scholarly introduction to an obscure tract on suffrage requires a significant amount of time, energy, and expertise. We are morally obligated to compensate our contributors for their time, especially since their work requires advanced technical skills and knowledge of the subject material, but we are unable to financially reimburse the project’s partners and thus must rely on the good-will of contributors. The citizen science model provides one way to address this issue. By simplifying tasks, projects like Science Gossip and Ancient Lives broaden the range of potential contributors. Because many hands make light work, labor-intensive projects like transcription can be accomplished in a fraction of the time. More importantly, these projects reward volunteer efforts by positioning contributors as shareholders in the final product. Clearly articulating the goals of the project gives citizen-scientists a better sense of how these small-scale tasks contribute to our understanding of history. Citizen-science projects have helped us to re-evaluate our classroom model. As Emily Murphy and Shannon Smith have argued, teacher-apprentice models lend student projects focused structure, but they risk reinforcing traditional hierarchies rather than giving students opportunities to join the DH community (2015). The VWWP encourages its students to become what Murphy and Smith describe as the Borgo: Toward Sustainable Growth6 “scholar-citizen,” a position which allows students to shape the project’s content at both a textual and encoding level. Graduate students in particular have worked with librarians and English department faculty to add new texts to the archive and to make emendations to the project’s encoding guidelines. This collaboration between the VWWP ’s editorial board and contributors has resulted in TEI encoding which more accurately represents the material. From a feminist perspective, the scholar-citizen model adopted by the VWWP not only places women more centrally in the literary cannon but also empowers women to be leaders in digital scholarship. Performing both encoding and editorial tasks has allowed junior scholars to actively participate in conversations about encoding best-practices and archive-building. Though our most dynamic periods of growth have stemmed from close partnerships with faculty, the opportunities to teach TEI encoding through the VWWP ’s texts are too few at IU to sustain the project’s continued growth. In light of limited course offerings, we have explored options that extend beyond the classroom model. Contributors working independently of a class have allowed us to extend our pool of contributors beyond IU. These long-distance partners have revealed the need for more streamlined project guidelines and for continued support in the form of regular meetings to maintain momentum. For our particular project, contributors must find their work professionally and intellectually rewarding. Locating and digitizing texts which intersect with our contributor’s research interests attracts a broader spectrum of students. One of our most recent collaborators, Rachel Philbrick (Brown University), has been encoding Victorian classical scholarship as an extension of her dissertation research on ancient Greek literature. Since most of the graduate student encoders will be entering the job market soon, they are concerned that their contribution won’t “count” as a publication. We have been working to create a more robust editorial review in order to add weight to their work with the project. Furthermore, we are developing surveys to track how the website is being used so that we can build stronger partnerships with those actively using the collection. We’ve also discussed at length how we can preserve the ownership and self-direction integral to the “citizen-scholar” model in non-encoding based tasks, Borgo: Toward Sustainable Growth 7 particularly at the undergraduate level. These strategies stem from undergraduate student-driven research projects. By offering students the option to work with the VWWP as part of professional writing courses, we’ve been working with undergraduates from marketing, business, and events management to create outreach events and internships. Thanks to Rachel Sharp, Evan Garthus, and Katelyn Kass, we will be hosting the 21st birthday party for the VWWP in Spring 2017. Research performed by two other groups have shown that students are looking for social media marketing experience. In response to this need, we will be offering a social media internship where students tweet, develop blog posts, and design marketing campaigns. Increasing our social media presence will help us to reach potential collaborators and identify projects with similar thematic foci. By identifying our contributors’ needs and finding models for sustainable growth, the VWWP has been developing new methods to expand TEI-based projects with limited funding. Catering project tasks to fulfill the professional and pedagogical objectives of our contributors has created partnerships which benefit volunteers and the project. As we move forward, we will continue to explore ways to support collaboration through coursework, through independent efforts, and through our online presence. In the years to come, we hope to attract an even more diverse range of contributors in order to foreground underrepresented voices in Victorian studies and digital scholarship. Competing Interests Mary is the Managing Editor of the Victorian Women Writers Project. There are no other competing interests. References Courtney, Angela, Arianne Hartsell-Grundy, et al. 2015. “Second Time Around; or The Long Life of the Victorian Women Writers Project.” In Digital Humanities in the Library: Challenges and Opportunities for Subject Specialists, 263–75. Chicago: ACLR. Borgo: Toward Sustainable Growth8 Ewing, Juliana Horatia. 1895. “A Bit of Green.” In Melchior’s Dream, 118–33. London: Society for Promoting Christian Knowledge. Murphy, Emily, and Shannon Smith. 2015. “Productive Failure” for Undergraduates: How to Cultivate Undergraduate Belonging and Citizenry in the “Digital Humanities.” Digital Pedagogy Institute – Improving the Student Experience. University of Toronto Scarborough, August 20. How to cite this article: Borgo, Mary Elizabeth. 2017. “Toward Sustainable Growth: Lessons Learned Through the Victorian Women Writers Project.” Digital Studies/Le champ numérique 7(1): 4, pp. 1–8, DOI: https://doi.org/10.16995/dscn.276 Submitted: 20 October 2015 Accepted: 31 October 2016 Published: 13 October 2017 Copyright: © 2017 The Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/. OPEN ACCESS Digital Studies/Le champ numérique is a peer-reviewed open access journal published by Open Library of Humanities. https://doi.org/10.16995/dscn.276 http://creativecommons.org/licenses/by/4.0/ Competing Interests References
work_alynjplhunbqnmppz32wm5fojm ---- OP-LLCJ140026 326..339 UC Berkeley UC Berkeley Previously Published Works Title What Ever Happened to Project Bamboo? Permalink https://escholarship.org/uc/item/6jq660tm Author Dombrowski, Quinn Publication Date 2014-06-16 Peer reviewed eScholarship.org Powered by the California Digital Library University of California https://escholarship.org/uc/item/6jq660tm https://escholarship.org http://www.cdlib.org/ What Ever Happened to Project Bamboo? ............................................................................................................................................................ Quinn Dombrowski Research IT, UC Berkeley, Berkeley, CA 94720, USA ....................................................................................................................................... Abstract This paper charts the origins, trajectory, development, challenges, and conclusion of Project Bamboo, a humanities cyberinfrastructure initiative funded by the Andrew W. Mellon Foundation between 2008 and 2012. Bamboo aimed to en- hance arts and humanities research through the development of infrastructure and support for shared technology services. Its planning phase brought together scholars, librarians, and IT staff from a wide range of institutions, in order to gain insight into the scholarly practices Bamboo would support, and to build a com- munity of future developers and users for Bamboo’s technical deliverables. From its inception, Bamboo struggled to define itself clearly and in a way that resonated with scholars, librarians, and IT staff alike. The early emphasis on a service- oriented architecture approach to supporting humanities research failed to con- nect with scholars, and the scope of Bamboo’s ambitions expanded to include scholarly networking, sharing ideas and solutions, and demonstrating how digital tools and methodologies can be applied to research questions. Funding con- straints for Bamboo’s implementation phase led to the near-elimination of these community-oriented aspects of the project, but the lack of a shared vision that could supersede the individual interests of partner institutions re- sulted in a scope around which it was difficult to articulate a clear narrative. When Project Bamboo ended in 2012, it had failed to realize its most ambitious goals; this article explores the reasons for this, including technical approaches, communication difficulties, and challenges common to projects that bring to- gether teams from different professional communities. ................................................................................................................................................................................. 1 Introduction Project Bamboo was a humanities cyberinfrastruc- ture initiative funded by the Andrew W. Mellon Foundation between 2008 and 2012, in order to en- hance arts and humanities research through the de- velopment of infrastructure and support for shared technology services. In 2008, the Mellon Foundation funded a joint proposal for UC Berkeley and the University of Chicago to conduct a planning process that would gather feedback from scholars, librar- ians, and IT staff from a wide range of institutions, and build a community of future developers and users for Bamboo’s technical deliverables. Where project staff anticipated 200 attendees representing 75 institutions, over 600 ultimately participated, representing more than 115 institutions.1 This article charts the origins, trajectory, devel- opment, challenges, and conclusion of Project Bamboo, from its initial funding through the months immediately following its conclusion. The article is an expansion of the author’s presentation at Digital Humanities 2013, with the goal of provid- ing background and context for further discussion within the digital humanities community about les- sons that can be learned from this project. Correspondence: 2195 Hearst Avenue, Suite 200, Berkeley, CA 94720, USA. E-mail: quinnd@berkeley.edu Literary and Linguistic Computing, Vol. 29, No. 3, 2014. � The Author 2014. Published by Oxford University Press on behalf of EADH. All rights reserved. For Permissions, please email: journals.permissions@oup.com 326 doi:10.1093/llc/fqu026 Advance Access published on 16 June 2014 . . paper paper Material for this article has been drawn from a number of sources, most prominently the public Bamboo wikis,2 supplemented by the author’s own memory, that of colleagues, and email records.3 While this article largely deals with the facts of Project Bamboo, a layer of interpretation is inevit- able, particularly as pertains to the factors contri- buting to the project’s failure to realize its most ambitious goals. The conclusions drawn are the au- thor’s own, and neither a product of consensus among the participants nor an official statement on behalf of Project Bamboo, the University of Chicago, UC Berkeley, or the Mellon Foundation. 2 Origins In the mid-2000s, discussions about cyberinfrastruc- ture emerged in higher education IT circles, includ- ing EDUCAUSE and the Coalition for Networked Infrastructure. Future Bamboo project co-director Chad Kainz, then the senior director of Academic Technologies within the University of Chicago’s central IT unit, saw a role for cyberinfrastructure, and what would come to be known as cloud com- puting, in addressing the following issues he had encountered while supporting digital humanities projects: (1) at least two-thirds of the time spent on typical humanities technology projects was spent on developing the technology rather than focus- ing on the scholarship, (2) many of the projects centered on either ‘yet another database’ or ‘yet another website’, and (3) the technologies that were ultimately created for the projects in question were developed before, but for different contexts, thus ‘re- inventing the wheel’. (Kainz, 2010) At the 2006 EDUCAUSE Seminar on Academic Computing, Kainz discussed support for digital humanities with Chris Mackie, at that time an Associate Program Officer for the Research in Information Technology (RIT) program at the Mellon Foundation. For Mackie, the issues that Kainz identified also led to frustrations for funding agencies: foundation funds were being directed to- ward the development of software that would likely not be reused and the creation and presentation of data that could spread no further than a single Web site or database, rather than substantively furthering humanities scholarship. Mackie encouraged Kainz to partner with David Greenbaum, the UC Berkeley Director of Data Services and future Bamboo co-director, to initiate a Mellon-funded project that would address these issues. Based on feedback from Mackie, Kainz and Greenbaum revised an initial technology development proposal into a community-driven technology planning project. 3 Bamboo planning project proposal The Bamboo Planning Project proposal identified five key communities whose participation was seen as crucial for the project’s success: humanities researchers, computer science researchers, informa- tion scientists, librarians, and campus technolo- gists.4 Anticipating—if understating—the root of many of the challenges that would arise in the work- shops, the proposal noted that ‘[e]ach community has distinctive practices, lingo, assumptions, and concerns; and clearly there is much diversity within each community as well’ (Project Bamboo, 2008, p. 6). The proposal drew extensively on infor- mation and examples shared by 50 representatives of these five communities at UC Berkeley who at- tended an all-day focus group at the Townsend Center for the Humanities in November 2007. Perspectives from University of Chicago faculty and staff also contributed to the view of the then- current landscape of digital humanities depicted in the proposal. While both UC Berkeley and the University of Chicago are leading research institu- tions with strong programs in the humanities and a number of longstanding digital humanities projects (e.g. ARTFL at the University of Chicago, and the Sino-Tibetan Etymological Dictionary and Thesaurus at UC Berkeley), these projects were more the exception than the norm, and faculty members at these institutions were not highly involved in the leadership of large digital humanities What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 327 paper , . paper . ' (CNI) Chicago's , 2/3 `` '' `` '' `` '' s project's . organizations, in 2007. As such, while the depiction of the digital humanities landscape in the proposal may have been accurate for some faculty members at research institutions, it reflected neither the ex- periences and concerns of many noteworthy digital humanists, nor those of scholars at small liberal arts schools, though both groups participated in Bamboo’s workshops. This omission, while difficult to avoid at such an early stage, opened the project up to criticism.5 In the context of the Bamboo Planning Project, the role of the humanities scholar was to share in- formation about methods, practices, and workflows, paying particular attention to ‘pain points’ and areas where current tools and services were inad- equate. Technologists and librarians would then construct a proposal for the development of new services and underlying infrastructure to support scholarship in the humanities. The Bamboo plan- ning proposal did not significantly treat the possi- bility that humanists might focus on needs that could not meaningfully be addressed through the development of technology. The proposal specified the two models that the infrastructure and scholarly services would draw from: large enterprise SOA practices for scalability, management, cost-effectiveness, and long-term sta- bility on one hand, and mash-ups, which emphasize ease, flexibility, and fast innovation on the other (Project Bamboo, 2008, pp. 15–16). The Bamboo planning proposal charted a direct path from the expression of scholarly practices6 within and across disciplines (in the first workshop) to systematizing those practices into defined schol- arly workflows that could be used ‘to derive com- monalities and unique requirements related to practices, functions, barriers, needs, and existing and potential transformations at the disciplinary level’, to developing ‘a community-endorsed tech- nology services roadmap for scholarship’, along with organizational, staffing, and partnership models to support those services. It anticipated that ‘arts and humanities scholars [would] begin to shape technology options by questioning impacts of potential technological choices, clarifying misin- terpreted goals and ultimately co-determining a roadmap of goals to pursue, tools to provide, platforms on which to run, and architecture to use’ (Project Bamboo, 2008, p. 24). SOA would play an increasingly prominent role as the work- shops progressed.7 Between the workshops, participants would pro- pose pilot projects that would be undertaken by Bamboo program staff. These pilot projects would ‘be based on industry-accepted practice and open standards for a services-oriented architecture’ and would ‘present . . . a tangible expression of how ser- vices can function . . . facilitate understanding and critique . . . our process, as well as clarify our seman- tics and goals’ (Project Bamboo, 2008, p. 28). According to the plan, by the end of the Bamboo Planning Project, the initial group of 200 partici- pants from 75 institutions would be narrowed down to 30 participants from the 15 institutions that would move ahead with implementing a robust, scalable web services framework and a set of services that aligned with scholarly practice in the humanities, as defined by participating scholars. In reality, this plan changed dramatically when faced with the interests and priorities of actual humanities scholars. 4 Bamboo planning workshops One of the hallmark traits of the Bamboo planning workshops was their flexibility—on more than one occasion, plans and agendas that had been painstak- ingly prepared over weeks were discarded and com- pletely rewritten after a frustrating morning session. This began with the first iteration of workshop 1 (held in Berkeley, 28–30 April 2008). After high- level presentations on Bamboo, its approach, and its methodology, participants were asked to name abstracted scholarly practices (as verbþdirect object), provide a description, identify applicable domains, cluster those practices, and then repeat the process for emerging scholarly practices, while scribes filled in an Excel spreadsheet template with different tabs for each exercise. Faculty participants were particularly turned off by the technical jargon in the presentations (including ‘services’, as com- monly understood by IT staff), and the program staff’s pushing for immediately abstracting Q. Dombrowski 328 Literary and Linguistic Computing, Vol. 29, No. 3, 2014 . `` '' service-oriented architecture ( ) - , Service-oriented architecture . ... .. … ... 28---30, `` '' verbþdirect object ‘scholarly practices’ instead of facilitating a conversation about what scholars do. The spreadsheet was emblematic of the disconnect between the plan for workshop 1 and what scholars believed was needed, as it was unable to capture the narrative of their discussions. By the second day of the workshop, the exercises took on a less rigidly structured form, and this informed the process used with greater success in the subsequent three iterations of workshop 1.8 At the time, the incident at the first workshop 1 was largely interpreted as a tactical misstep, rather than the beginnings of a challenge to the entire premise and planned approach of Project Bamboo. After the completion of the workshop 1 series (28 April–16 July 2008), work continued as defined in the proposal: program staff aggregated the notes taken during the workshop 1 meetings, and distilled from that material a set of ‘themes of scholarly prac- tice’9 to present at workshop 2 (15–18 October 2008). Program staff also prepared and presented an introduction to SOA in the context of Bamboo, intended to link the themes of scholarly practice to the planning for future technical development that would be the focus of subsequent workshops. This approach to workshop 2 backfired. While developing the themes of scholarly practice, pro- gram staff had created accounts for over 400 work- shop 1 participants on the project wiki, anticipating that they would actively contribute to the process of theme distillation. The minimal uptake (six con- tributors, each making a few edits) was interpreted as a consequence of humanists being unaccustomed to using a wiki for scholarly discussions, com- pounded by the unintuitive interface of the Confluence wiki platform. In person, however, it quickly became clear that what scholars found unin- tuitive was the program staff’s approach of present- ing their livelihood back to them as a set of ‘scholarly practices’. Already frustrated by the seem- ingly purposeless decontextualization and misrepre- sentation of scholarship in the humanities, many workshop 2 attendees were not disposed to attempt to make sense of the technical language and the ‘wedding cake’ diagram used to present the SOA component of the project. In heated Q&A sessions, some participants went so far as to challenge the legitimacy of a cyberinfrastructure initiative for the humanities led by IT staff rather than by humanists themselves. During workshop 2, it became clear that ‘com- munity design’ could not simply mean that the community would deliberate the details of a web services framework. The community had spoken and made it clear that continuing to emphasize SOA would alienate the very members of the com- munity Bamboo was intended to benefit most: the scholars themselves. While a web services frame- work would continue to play an important role in the project, it was represented in only one or two of the six working groups 10 established at workshop 2. The other groups focused on topics drawn from the themes of scholarly practice, with the exception of ‘Stories’ (later renamed ‘Scholarly Narratives’), a last-minute addition to address concerns about the decontextualization inherent in the process of iden- tifying themes of scholarly practice. Participants were allowed to choose the working group in which they would participate, but the program staff strove to balance group membership, so that IT staff were not the only participants in Shared Services, librarians were not the only participants in the Tools & Content Partners, etc. Professional homogeneity within working groups would have made the discussions easier, but mixing up the membership was seen as a productive step toward developing a single community that bridged profes- sional divides, with a shared vision informed by a diverse range of perspectives. After workshop 2, working groups focused on specific needs, opportunities, and challenges for Bamboo in relation to their working group topic. Working group findings were presented and discussed at workshop 311 (12–14 January 2009), along with a straw proposal outline12 and straw con- sortial model.13 The straw proposal outline intro- duced the idea that the Bamboo Implementation Project would be a 7–10 year endeavor that would need to be split into two phases. The straw proposal outline did not attempt to prioritize the foci of the different working groups, treating them all as part of the first phase (2010–2012). The resulting highly am- bitious scope drew criticism from workshop at- tendees, who also noted the lack of specifics about What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 329 `` '' felt . 28 -- 16, `` '' 15- 18 , services-oriented architecture ' staff's `` '' `` '' service-oriented architecture `` '' , `` '' `` '' weren't weren't s 12-14, . - - what exactly Bamboo would do, and the lack of defined criteria for success.14 At workshop 4 (16–18 April 2009), the Bamboo staff presented a more detailed articulation of a ‘Bamboo Program Document’,15 which outlined the 7–10 year vision and defined the activities to be carried out in the first development phase. The major activities for Bamboo were divided into three areas, with the first two major areas slated for im- plementation in the first phase16: (1) The Forum (a) Scholarly Network (b) Scholarly Narratives (c) Recipes (workflows) (d) Tools and Content Guide (e) Other Educational and Curricular Materials (f) Bamboo Community Environment(s) (2) The Cloud (a) Services Atlas (b) Bamboo Exchange (c) Shared Services Lifecycle (d) Tool and Application Alignment Partnerships (e) Content Interoperability Partnerships (3) Bamboo Labs (a) Diversity, Innovation, and Labs (b) Ecosystem of Projects and Initiatives (c) Structure (Explore, Plan, and Build) (d) Liaisons (e) Governance While the workshop discussion draft of the pro- gram document had already benefited from two rounds of asynchronous feedback from participants, concerns remained about the lack of specificity in each of these areas.17 However, this did not hinder participants from expressing their enthusiasm for the areas of work proposed for the first phase of development. Grouped by institution, participants voted on each sub-area of the ‘Forum’ and the ‘Cloud’, to indicate interest (none/low/medium/ high/potential leadership).18 Every topic except Tools and Content Guide had at least one potential leader, and Content Interoperability (CI) Partnerships, Services Atlas, and Scholarly Network all received a significant number of ‘high’ votes. Workshop 5 (17–19 June 2009) featured presen- tations of demonstrator projects19 and discussions of the draft Bamboo Implementation Proposal20 in- tended to be submitted to the Mellon Foundation that fall. The proposal, as discussed at the workshop, had the following major areas of work 21 : (1) Scholarly Networking—comprising the earlier Scholarly Networking and Bamboo Exchange from the program document. (2) Bamboo Atlas—comprising Scholarly Narratives, Recipes (workflow), Tool and Content Guide, Educational and Curricular Materials, and Services Atlas from the pro- gram document.22 (3) Bamboo Services Platform—the major area of technical development for the project, com- prising Tool and Application Alignment Partnerships, CI Partnerships, and Shared Services Lifecycle from the program document. At workshop 5, the participants (comprising 43% arts and humanities faculty, 41% technologists, and 12% ‘content partners’, primarily librarians and archivists) were asked to vote (yes/no/abstain) on these areas of work. Participants overwhelmingly voted yes on all three,23 while a handful of ab- stainers continued to voice strong concerns about scope,24 particularly with regards to the Bamboo Atlas. 5 Bamboo implementation proposal During the summer and fall of 2009, the Bamboo program staff engaged in an iterative feedback pro- cess with Chris Mackie from the Mellon Foundation on the proposal that developed out of workshop 5. The program staff intended to submit the proposal to the Mellon Foundation by the end of 2009, for consideration at the Mellon Board meeting in March 2010, with work beginning shortly thereafter. Instead, an organizational restructuring at the Mellon Foundation in December 2009 brought Bamboo proposal development to a halt. In this restructuring, the Mellon Foundation merged the RIT program that funded Bamboo into the Q. Dombrowski 330 Literary and Linguistic Computing, Vol. 29, No. 3, 2014 . 16-18, `` '' , - . `` '' `` '' . `` '' 17-19, , , , , -- -- . -- Content Interoperability ed of `` '' , , *** Scholarly Communication program, and the pro- gram officers with whom Bamboo had been work- ing closely left the foundation.25 Over the next 6 months, Bamboo program staff worked with Donald Waters and Helen Cullyer, the program officer and associate program officer in the Scholarly Communication program at the Mellon Foundation, on an implementation proposal for Bamboo that aligned with a different set of con- straints and priorities than those provided by the former RIT program. The Mellon Foundation’s ear- lier investment of $1.3 million dollars in Bamboo’s planning phase had left the project team anticipat- ing a larger investment in the project’s development. This proved not to be the case, and the pool of resources available to Bamboo contracted further in the wake of the global economic crisis, as IT and/or library groups at potential partner institu- tions faced steep cuts, leaving fewer staff, less cash, and a stronger mandate for directing what resources remained toward projects with immediate local impact, rather than contributing to a consortium in potentia with long-term potential. Scope reduc- tion, which Bamboo had resisted, became unavoid- able, and the priorities of the Scholarly Communication program shaped the outcome. Rather than reducing the scope of all areas of Bamboo equally, the ‘Bamboo Commons’ area (consisting of the earlier Scholarly Networking, Scholarly Narratives, Recipes/workflow, Tool and Content Guide, Educational and Curricular Materials, and Service Atlas) was eliminated almost entirely, with only a machine-oriented ‘tool and ser- vice information registry’ remaining. The resulting Bamboo implementation proposal more closely resembled the one suggested by the SOA-oriented planning project proposal than the document dis- cussed at workshop 5. Even as the project’s scope contracted through the elimination of almost all of the community-oriented aspects, it expanded in other ways. Two new areas of work that had previ- ously received minimal attention were ‘work spaces’—virtual research environments intended to provide basic content management capabilities and/ or access to the tools on the services platform—and planning and design work for Corpora Space, ‘applications that will allow scholars to work on dispersed digital corpora using a broad range of powerful research tools and services’ (Project Bamboo, 2010, p. 11). Corpora Space was to be built on top of the Bamboo infrastructure during a subsequent technical development phase. In the Bamboo implementation proposal, UC Berkeley alone served as managing partner, with nine other universities contributing to the project: Australian National University, Indiana University, Northwestern, Tufts, University of Chicago, Univer- sity of Illinois—Urbana-Champaign, University of Maryland, Oxford, and University of Wisconsin— Madison. The University of Chicago PI for the Bamboo Planning Proposal, vice president and CIO Greg Jackson, left that institution in August 2009, followed by Chad Kainz, Bamboo Planning Project co-director, a year later. None of the Chi- cago-based staff who were actively involved in the management of the planning process reprised those roles in the implementation phase. In addition, UC Berkeley hired a new project manager, and had to develop new relationships with staff at the Univer- sities of Wisconsin and Maryland who took on areas of the project that Chicago had previously managed. These staffing changes led to a loss of the project’s organizational memory, which had particularly negative consequences for the message and tone of the project’s communication with scholarly communities. 6 Bamboo technology project It remains difficult to articulate succinctly what Project Bamboo was, without either resorting to barely informative generalities (‘humanities cyberin- frastructure, particularly for working with textual corpora’) or a list of the areas of work. The project struggled to identify a coherent vision that neatly encapsulated all the work being done in the name of Bamboo, or to clearly describe what future state the work would collectively realize. The lack of a shared vision was compounded by the staffing model for the different areas. Most institutions focused on one area or subarea, giving them little exposure to the work going on elsewhere in the pro- ject. Unlike the planning project working groups, What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 331 . six s `` '' `` '' `` '' -- -- -- -- a year later . - where membership represented a mix of scholars, technologists, and librarians, the different areas of the Bamboo technology project were each staffed by the ‘usual suspects’—technologists focusing on shared services and work spaces, librarians focusing on interoperability, and scholars focusing on Corpora Space. This arrangement helped lead to a sense of mutual mistrust among the different groups26—not atypical in project development,27 but corrosive nonetheless. Effective communication with scholarly and pro- fessional communities was never one of Project Bamboo’s greatest strengths. Even during the plan- ning project, most activity took place on a public wiki whose complex organization was a barrier to access. The news feed on the project Web site had always been updated sporadically, but the complete lack of updates to the public Web site between August 2010 and April 2011—a period including the first 6 months of the 18 month technology project—fueled confusion and doubt about what, if anything, Bamboo was doing. Once periodic com- munication resumed in April 2011 with the launch of a new rebranded Web site, the lack of a clear shared vision became more apparent, as did the challenges of having such a widely distributed pro- ject team; some areas of the project received much more visibility than others. Outside observers’ com- bined uncertainty and lack of agreement about what Bamboo was doing were detrimental to the project’s reputation, to the point where it became a source of concern for the project staff and Mellon Foundation alike. Nonetheless, a considerable amount of technical development and planning work took place under the auspices of the Bamboo Implementation Project between 2010 and 2012. Major accomplishments included the following: � Development of identity and access management (IAM) services,28 which also made possible ac- count linking (e.g. of a user’s university and Google accounts). � Development of a CI hub29 that normalized texts using the Bamboo Book Model.30 � Development of utility and scholarly services,31 and their deployment along with IAM services on a centrally hosted Bamboo Services Platform.32 � Investigation of HUBzero, Alfresco ECM, and the OpenSocial API as platforms for ‘work spaces’ or research environments for scholars33 that could be integrated with the Bamboo Services Platform. � Partnering with the long-running Digital Research Tools (DiRT) wiki to develop Bamboo DiRT (http://dirt.projectbamboo.org), which would serve as Bamboo’s ‘Shared Tools and Services Information Registry’. � The Corpora Space design process, where huma- nities scholars and tool developers conceptua- lized a set of applications that would allow scholars to work on dispersed digital corpora using a broad range of powerful research tools and services.34 7 The end of Project Bamboo Between December 2011 and December 2012, the UC Berkeley Bamboo program staff drafted two nearly complete proposals for a second development phase. The first, written in partnership with teams at the University of Wisconsin and the University of Maryland, directly followed from the Corpora Space planning process. The proposal was abandoned in June 2012, after it became clear that insufficient re- sources would be available. When the Mellon Foundation’s technical review of Bamboo empha- sized Bamboo’s place as an infrastructure project (rather than an application development project), Berkeley started over on a new proposal in that spirit. The new version, developed with a team from Tufts, focused on extending the infrastructure and demonstrating its utility through a ‘Classical philology reference implementation’. On 13 December 2012, days before the anticipated final submission, the Mellon Foundation declined to move ahead with inviting the Bamboo proposal, citing the project’s track record of failing to define itself or achieve adoption for its code, the fact that it had not retained its partners, as well as dissatisfac- tion with the proposal itself. The Mellon Foundation requested that the team bring the pro- ject to a close, with an eye toward making the pro- ject’s legacy visible to and usable by others. Q. Dombrowski 332 Literary and Linguistic Computing, Vol. 29, No. 3, 2014 `` '' , w six , , - project's , Content Interoperability ( ) , - `` '' DiRT ( ) http://dirt.projectbamboo.org `` '' - `` '' 13, s Between January and March 2013, the remaining Bamboo staff worked with partners to develop and publish a documentation wiki that would serve as a sort of ‘reliquary’ for the project, alongside the code repository, issue tracker, the archived Web site, email lists, and social media accounts. Respecting the Mellon Foundation’s preferences, the Bamboo staff never publicly announced that Bamboo was over. Word simply spread informally and un- evenly35 beyond the notification of project partners, until the day when the Web site was replaced by the reliquary. 8 Bamboo’s afterlife Some of the components of Bamboo are still in use in other contexts. 8.1 Perseids The Perseids project at the Perseus Digital Library (http://www.perseus.tufts.edu/hopper/) integrates a variety of open-source tools and services to provide a platform for collaborative editing and annotation of classical texts and related objects. An instance of the Bamboo Services Platform is deployed as part of Perseids to provide access to the Tufts Morphology and Annotation Services, and the supporting Cache and Notification Services developed at Berkeley. Under new funding from the Mellon Foundation, Perseids developers will be exploring approaches, including those offered by Bamboo IAM compo- nents, for enabling the platform to better support cross-project and cross-institution collaboration. In addition, the Perseus Digital Library is currently exploring the viability of the Bamboo IAM infra- structure to support a centralized user model for the Perseus ecosystem of distributed applications and services. 8.2 CIFER Designs and technologies for account linking (part of Bamboo’s IAM work) have become the acknowl- edged basis of several items on the development roadmap for Community Identity Framework for Education and Research (CIFER, http://www.cifer project.org/), a collaborative effort across a large number of research institutions and consortia to provide an ‘agile, comprehensive, federation- and cloud-ready IAM solution suite’. 8.3 DiRT directory In October 2013, the Mellon Foundation funded a proposal for additional work on Bamboo DiRT, which would be rebranded as the DiRT directory. This new project included the development of an API that will facilitate data sharing with other digital humanities directories and community sites, includ- ing DHCommons (http://dhcommons.org) and the Commons-In-A-Box (http://commonsinabox.org/) platform, which powers sites such as the MLA Commons (http://commons.mla.org/). The DiRT directory continues to thrive as a community- driven project. 9 Conclusion Project Bamboo began with the ambitious dream of advancing arts and humanities research through the development of shared technology services. Conscious of the challenges for humanities cyberin- frastructure identified in the 2006 Our Cultural Commonwealth report (Unsworth et al., 2006) (e.g. ephemerality, copyright, and conservative academic culture), the Bamboo program staff identified those issues as out-of-scope for Bamboo after workshop 1,36 but they continued to impact the project none- theless (e.g. copyright as the fundamental motivat- ing force behind IAM work). Prior work on social science infrastructure devel- opment suggests that Bamboo’s mode of engage- ment—bringing together people from the scholarly, technology, and library communities after Bamboo had a conceptual and technical trajec- tory, while nonetheless expecting ‘participatory design’—would be a source of tension. Indeed, the wide range of responses to the initial technology- oriented proposal put Bamboo in a bind. Technologists and some librarians tended to see it as important and necessary, while many scholars felt that their needs lay elsewhere entirely. Changing scholars’ minds would not be quick; as noted in What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 333 `` '' and . http://www.perseus.tufts.edu/hopper/ CIFER ( http://www.ciferproject.org/ http://www.ciferproject.org/ http://dhcommons.org http://commonsinabox.org/ http://commons.mla.org/ . , `` '' Ribes and Baker (2007), ‘conceptual innovation is an extended process: one cannot simply make claims about the importance of . . . [e.g. cyberinfras- tructure] and expect immediate meaningful com- munity uptake’. Accommodating the interests of all three groups would necessarily mean a broader scope, but additional supporters could bring with them additional resources to make such a scope possible. It also seemed more promising than the alternative of creating a new group of like-minded technologists and librarians who would move forward with an SOA-focused devel- opment effort without focusing on scholarly out- reach and adoption. In retrospect, doing so may have led the project to greater technical success, but it is arguable whether taking such an approach from the start was even a real option, given Bamboo’s public commitment to a ‘community design process’. From the early planning workshops to the Mellon Foundation’s rejection of the project’s final proposal attempt, Bamboo was dogged by its reluc- tance and/or inability to concretely define itself. In the early days, avoiding a concrete definition was motivated by a desire for the project to remain flex- ible and responsive to its community. The tendency toward generality persisted long after it had ceased being adaptive, even after it became a source of criticism. An infrastructure project like Bamboo could be expected to name the tools and corpora it would integrate as a way to be more concrete, but it became apparent that very few of the tools in use by digital humanists at that time were being refac- tored to fit the model Bamboo was architected to support (i.e. scholarly web services running on nonprofessionally managed servers). If ‘true infra- structures only begin to form when locally con- structed, centrally controlled systems are linked into networks and internetworks governed by dis- tributed control and coordination processes’ (Edwards et al., 2007), the shortage of locally con- structed systems with wide scholarly uptake that were technically compatible with Bamboo was prob- lematic.37 The work done in the Bamboo technology project was pitched as laying the infrastructure for top-to-bottom support for working with textual corpora. Bamboo would support a complete scholarly workflow, from accessing and ingesting texts from repositories, to analyzing and curating them using scholarly web services, all within an en- vironment that facilitated collaboration. This vision was complicated by the decision to include integra- tion with three different research environment sys- tems, each with a distinct approach and feature set. This choice was partly pragmatic (allowing partners to focus on whatever platform their institution had already invested in38), partly in keeping with Bamboo’s philosophy (the infrastructure was in- tended to be flexible, not tied to any one user- facing platform). Flexibility and scalability were part of the early value proposition for Bamboo, and they remained influential considerations in the architecture and development of the infrastructure. However, the in- frastructure was architected in such a way that made it difficult to complete and release stand-alone com- ponents that could be tested and used while other parts were incomplete. As a result, it was nearly impossible to create demonstrator projects that scholars or digital humanities developers could try out and that potential funders could evaluate. Demonstrator projects could have effectively and concretely shown that Bamboo was producing something useful, or provided an opportunity for feedback at a stage where it could have been incor- porated productively. The technical team and the scholarly team had very different perspectives on what was needed, which led to frustration and com- munication failures from both sides. Consequently, the technical team relied on hypothetical scholarly use cases. Given the emphasis placed on the import- ance of communication between technical and nontechnical communication in literature on cyber- infrastructure development (e.g. Freeman, 2007), addressing this communication breakdown should have been a higher priority. The extensive develop- ment time required for infrastructure components, without opportunities to confirm that the compo- nents successfully fulfilled real needs, may have proven even more problematic had Bamboo continued. The resources allocated to Bamboo were signifi- cantly smaller than amounts provided to similarly scoped infrastructure projects in the sciences. Q. Dombrowski 334 Literary and Linguistic Computing, Vol. 29, No. 3, 2014 ' `` '' s - . - - Bamboo’s struggle to produce value within these constraints was made more challenging by a failure to differentiate needs essential to the humanities, and those unique to the humanities. It is crucial in the long run for scholars to be able to work with texts in access-restricted repositories, but the pre- requisite IAM infrastructure represents a common need across all universities. Seeing that existing con- sortia dedicated to working on this problem would not have a solution ready in time for Bamboo to adopt, it might have been wiser for Bamboo to re- define its initial scope to only include free-access textual repositories, allowing it to demonstrate suc- cess by sidestepping the encumbrance of copyright as identified by Our Cultural Commonwealth. While Bamboo’s IAM work did make significant technical contributions, it came at the cost of diverting lim- ited resources from other areas of the project, and became a ‘reverse salient’ (Edwards et al., 2007) for the entire Bamboo infrastructure. Deferring decision on Bamboo’s sustainability plan and operational model until the second phase of development was consequential on multiple fronts. From a technical angle, it risked path de- pendency problems: the best technology choices for a centrally run enterprise-level platform may have made it considerably harder for individual uni- versities to run the platform under a different model. From the social perspective, postponing de- cisions about what ‘membership’ would mean, how much it would cost, and what it would provide made it difficult for institutions to assess whether they would be ‘winners’ or ‘losers’ (Edwards et al., 2007) if Bamboo succeeded. While Bamboo pro- gram staff saw Bamboo as freeing up local staff to provide more hands-on consulting about the appli- cation of scholarly tools (rather than spending time configuring and managing locally run tools and en- vironments), some groups were concerned that uni- versity administration might see those staff as redundant in the face of Bamboo, and lay them off rather than transition them to new kinds of fac- ulty support. Particularly for liberal arts colleges that had participated in the planning project, there was no way to engage with Bamboo to increase one’s chances of ending up a ‘winner’, other than joining an occasional invite-only ‘community’ conference call. Given the expansive scope of Bamboo’s other deliverables, it was unrealistic for Bamboo program staff to have additionally taken on the work of establishing a sustainability plan during the first phase of technical development. Still, defer- ring or constraining the scope of some of the tech- nical work (e.g. reducing the number of work space platforms) in order to redirect resources toward determining a viable operational and membership model before the second phase of development might have made more institutions willing to invest in Bamboo. Perhaps, the greatest impediment to Bamboo’s success was the lack of a shared vision among project leaders, development teams, and communi- cations staff. In the beginning, Bamboo had multi- university cross-professional teams whose members faced challenges in communication and culture but helped one another understand Bamboo’s goals in more nuanced ways. During the development phase, teams were formed on the basis of profession and institution, each one working according to their own status quo, with little connection to a bigger picture. The Bamboo planning project asked partici- pants ‘what’s in it for you?’—an important consid- eration often overlooked in consortial efforts. Without a shared vision to counterbalance the pull of self-interest, a complex multi-faceted project like Bamboo becomes little more than a funding um- brella for individual initiatives. As the likelihood of those initiatives intersecting in a coherent way decreases, project messaging becomes muddled, and the resulting decrease in public confidence and comprehension can jeopardize a project’s con- tinued existence. Brett Bobley, director and CIO of the Office of Digital Humanities at the National Endowment for the Humanities, offered his own interpretation of and eulogy for Bamboo at Digital Humanities 2013, which may serve as a fitting conclusion here. He suggested that, if nothing else, Bamboo brought together scholars, librarians, and technologists at a crucial moment for the emergence of digital huma- nities. The conversations that ensued may not have been what the Bamboo program staff expected, but they led to relationships, ideas, and plans that have blossomed in the years that followed (e.g. DiRT and What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 335 - `` '' - `` '' `` '' `` '' - `` '' `` '' s `` '' -- , the TAPAS project), even as Bamboo itself struggled to find a path forward. References Dombrowski, Q. and Denbo, S. (2013). TEI and Project Bamboo. Journal of the Text Encoding Initiative, 5. http://jtei.revues.org/787 (accessed 12 November 2013). Kainz, C. (2010). The engine that started Project Bamboo, Friday Sushi http://fridaysushi.com/2010/01/30/the- engine-that-started-project-bamboo (accessed 12 November 2013). Edwards, P., Jackson, S., Bowker, G., and Knobel, C. (2007). Understanding infrastructure: dynamics, ten- sions, and design Report from ‘‘History & Theory of Infrastructure: Lessons for New Scientific Cyberinfrastructures,’’ Designing Cyberinfrastructure for Collaboration and Innovation. http://cyberinfras tructure.groups.si.umich.edu//UnderstandingInfrastruc ture_FinalReport25jan07.pdf (accessed 30 April 2014). Freeman, P. (2007). Is ‘designing’ cyberinfrastructure – or, even, defining it – possible? Designing Cyberinfrastructure for Collaboration and Innovation http://cyberinfrastructure.groups.si.umich.edu//OECD- Freeman-V2-2.pdf (accessed 30 April 2014). Project Bamboo. (2008), Bamboo Planning project: an arts and humanities community planning project to develop shared technology services for research. Grant proposal to the Andrew W. Mellon Foundation. http://dx.doi.org/10. 7928/H6J10129 (accessed 12 November 2013). Project Bamboo. (2010), Bamboo technology proposal (Public). Grant proposal to the Andrew W. Mellon Foundation. http://dx.doi.org/10.7928/H6D798B1 (ac- cessed 12 November 2013). Ribes, D. and Baker, K. (2007). Modes of Social Science Engagement in Community Infrastructure Design. In Steinfield, C., Pentland, B. T., Ackerman, M., and Contractor, N. (eds), Communities and Technologies 2007. London: Springer, pp. 107–30. Terras, M. (2008). Bamboozle, Melissa Terras’ Blog http://melissaterras.blogspot.com/2008/05/bambooo zle.html (accessed 12 November 2013). Unsworth, J., Courant, P., Fraser, S. et al. (2006). Our Cultural Commonwealth: The Report of the American Council of Learned Societies Commission on Cyberinfrastructure for Humanities and Social Sciences. American Council of Learned Societies. http://www. acls.org/cyberinfrastructure/cyber.htm (accessed 30 April 2014). Notes 1 Despite later impressions to the contrary, early participation in Bamboo was open to any interested college or university (http://web.archive.org/web/ 20080706131357/http://projectbamboo.org/colleges- universities), museum or library (http://web.archive. org/web/20080706131442/http://projectbamboo.org/ museums-libraries), or organization, society, or agency (http://web.archive.org/web/20080706131346/http:// projectbamboo.org/organizations-societies-agencies) that could pay for their own travel and lodging. The university and library-oriented calls for participation mentioned the possibility of ‘limited travel support’ that could be arranged on a case-by-case basis; in prac- tice, Bamboo covered lodging for participating teams during the nights of the workshops. 2 As of November 2013, archived versions of the Bamboo Planning Project wiki (http://dx.doi.org/10.7928/ H6RN35SK) and Bamboo Technology Project wiki (http://dx.doi.org/10.7928/H6MW2F28) are hosted at UC Berkeley. 3 Project Bamboo was one of the first initiatives the author was involved in when employed by the Academic Technologies group of central IT at the University of Chicago, shortly after leaving a Ph.D. pro- gram in the humanities and while concurrently pursu- ing an MLIS degree. The author was a member of Bamboo’s core program staff throughout the planning process; while she was minimally engaged in the early stages of Bamboo’s implementation phase, by 2011 she was involved in both development and planning, and in 2012 she again joined the program staff at UC Berkeley, where she is still employed. 4 Later prose would reduce this number to three by col- lapsing the distinction between information scientists and librarians and eliminating computer science re- searchers. The latter group was barely represented in the attendees of workshop 1, let alone subsequent workshops. 5 One representative example, from a 2008 blog post entitled ‘Bamboozle’ (which also exemplifies the unfor- tunate wordplay on the project’s name that persisted throughout its duration): . . .an interesting proposal to sort out What Needs To Be Done to aid scholars in using computa- tional power and tools in their research. But there is very little evidence that they have done their homework to what efforts have gone into this before, and no mention of the digital huma- nities community/communities (such as Alliance of Digital Humanities Organizations (ADHO); Q. Dombrowski 336 Literary and Linguistic Computing, Vol. 29, No. 3, 2014 , among others http://jtei.revues.org/787 http://fridaysushi.com/2010/01/30/the-engine-that-started-project-bamboo http://fridaysushi.com/2010/01/30/the-engine-that-started-project-bamboo http://cyberinfrastructure.groups.si.umich.edu//UnderstandingInfrastructure_FinalReport25jan07.pdf http://cyberinfrastructure.groups.si.umich.edu//UnderstandingInfrastructure_FinalReport25jan07.pdf http://cyberinfrastructure.groups.si.umich.edu//UnderstandingInfrastructure_FinalReport25jan07.pdf http://cyberinfrastructure.groups.si.umich.edu//OECD-Freeman-V2-2.pdf http://cyberinfrastructure.groups.si.umich.edu//OECD-Freeman-V2-2.pdf http://dx.doi.org/10.7928/H6J10129 http://dx.doi.org/10.7928/H6J10129 http://dx.doi.org/10.7928/H6D798B1 http://melissaterras.blogspot.com/2008/05/bambooozle.html http://melissaterras.blogspot.com/2008/05/bambooozle.html http://www.acls.org/cyberinfrastructure/cyber.htm http://www.acls.org/cyberinfrastructure/cyber.htm http://web.archive.org/web/20080706131357/http://projectbamboo.org/colleges-universities http://web.archive.org/web/20080706131357/http://projectbamboo.org/colleges-universities http://web.archive.org/web/20080706131357/http://projectbamboo.org/colleges-universities http://web.archive.org/web/20080706131442/http://projectbamboo.org/museums-libraries http://web.archive.org/web/20080706131442/http://projectbamboo.org/museums-libraries http://web.archive.org/web/20080706131442/http://projectbamboo.org/museums-libraries http://web.archive.org/web/20080706131346/http://projectbamboo.org/organizations-societies-agencies http://web.archive.org/web/20080706131346/http://projectbamboo.org/organizations-societies-agencies `` '' http://dx.doi.org/10.7928/H6RN35SK http://dx.doi.org/10.7928/H6RN35SK http://dx.doi.org/10.7928/H6MW2F28 `` '' they've Association for Literary and Linguistic Computing (ALLC); Association for Computers and the Humanities (ACH); Society for Digital Humanities/Société pour l’étude des médias inter- actifs (SDH/SEMI); Text Encoding Initiative (TEI)) and the hundreds of scholars already tread- ing this path or trying to deal with the concerns raised in the proposal (Terras, 2008). 6 Scholarly practice as defined by Bamboo: ‘For example, authoring might be considered a scholarly practice that is comprised of many component tasks; these tasks may include a literature review, documenting citations, acquiring peer review, etc.’ (Project Bamboo, 2008, p. 27). 7 The stated goal of workshop 2 was to ratify the findings of a report on scholarly practice written based on feed- back from the first workshop, and ‘aggregate the initial list of component tasks required to complete these practices along with desired automation capabilities’ (Project Bamboo, 2008, p. 29). As a requirement for attending the second workshop, each institution had to send ‘at least one arts and humanities scholar and one enterprise-level technologist with, if possible, either serious interest in or experience with Services-Oriented Architecture (SOA)’ (Project Bamboo, 2008, p. 28). In workshop 3, ‘a professional SOA consultant will train participants to leverage our task lists by converting them to services. We will then attempt to describe scholarly practices as a sequence of identified service capabilities (in comparison, at the end of the previous workshop scholarly practices were described as a set of component tasks)’ (Project Bamboo, 2008, p. 30). In workshop 4, participants would ‘assign some type of initial grouping of scholarly practices, and prioritiza- tion as to the order in which services should be de- veloped’ (Project Bamboo, 2008, p. 31), and begin discussing organizational issues for a Bamboo consor- tium and requirements for being a partner institution in the next phase; these topics would also serve as the focus for the 5 th and final workshop. 8 At workshops 1b (Chicago, 15–17 May), 1c (Paris, 9–10 June), and 1d (Princeton, 14–16 July), there were six exercises: (1) Initial impressions: What do you hope Bamboo will accomplish? What questions do you have re- garding Bamboo? We are gathering together repre- sentatives from a range of backgrounds—scholars, libraries, IT staff, presses, and funding agencies— around the theme of how technology can better serve arts and humanities research. Based on what you have heard at the table and read from the proposal, what one or two questions, observa- tions, and hopes would your table like to share with the group? (2) Exploring scholarly practice: As a researcher, librar- ian, IT professional, computer scientists, etc., during a really good day, term, research cycle, etc. what productive things do you do in relation to humanities research? (3) Common and uncommon: What are common themes that have emerged from your exploration of scholarly practices? Based on your discussion of scholarly practices, what are two themes that piqued the curiosity of those at your table, or are uncommon? What makes these themes common and uncommon? (4) Unpacking a commonality: What discrete practices are involved in this theme? What outstanding issues need to be addressed in regards to this theme? (5) Unpacking the uncommon: For whom/which dis- ciplines or areas of study is this theme helpful? What discrete practices are involved in this theme? What outstanding issues need to be ad- dressed in regards to this theme? (6) Identify future scholarly practices/magic wand: When you look at new-hires or up-and-coming graduate students, what practices do they use that are different from yours? If you had a magic wand, what would make your day, term, research cycle, etc. more productive in relation to research? 9 See http://dx.doi.org/10.7928/H6H41PBV for a list of the themes that were identified. 10 Education (professional development of faculty and staff around digital tools and methodologies for teach- ing and research), Institutional Support (identifying service models and articulating the scope and value proposition of Bamboo), Scholarly Networking (eval- uating existing social networking and Virtual Research Environment platforms for potential adoption by Bamboo), Shared Services (comprising much of the original SOA vision), and Tools & Content Partners (identifying models and standards for tool and con- tent discovery and integration). See http://dx.doi.org/ 10.7928/H6CC0XM4 for more information about working groups, and links to the wiki pages of indi- vidual working groups. 11 The agenda and notes for workshop 3 are available at http://dx.doi.org/10.7928/H67P8W9K. 12 Slides from the implementation proposal presentation and notes on the discussion that followed are available at http://dx.doi.org/10.7928/H63X84K7. What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 337 two three four fifth 15-17 9-10 14-16 -- -- : D http://dx.doi.org/10.7928/H6H41PBV http://dx.doi.org/10.7928/H6CC0XM4 http://dx.doi.org/10.7928/H6CC0XM4 http://dx.doi.org/10.7928/H67P8W9K http://dx.doi.org/10.7928/H63X84K7 13 Slides from the consortial model presentation and notes on the discussion that followed are available at http://dx.doi.org/10.7928/H6057CVT. 14 These criticisms emerged in the discussion of the pro- posal: ‘Focused on value proposition; really needs to start saying what it is. Need to be more specific con- crete things on the table. Lots of things involving text processing. For this to have clearly perceived value— need to start saying what those things are. Also some consensus that just from social perspective begins to be important to go back home after receiving funding to go to these things, ‘‘here’s what we’re going to do’’ ’ (Table 10); ‘Finiteness of resources, and realities of what have to be accomplished. Have to tell stories about people who could put resources in. Need more finite sense of what is involved. A little con- cerned that we haven’t had that focusing-in phase.’ (Table 12); ‘Need to iterate - if Bamboo is ambitious, will fail over and over. Will succeed only if there’s a sustainability model that will allow for tweaking and redesigning’ (Table 13). See http://dx.doi.org/10.7928/ H63X84K7. 15 All released versions of the Bamboo Program Document are available here: http://dx.doi.org/10. 7928/H6VD6WCJ. 16 For full descriptions of each of these areas, see http:// dx.doi.org/10.7928/H6QN64N6. 17 Notes are available on the discussions about the Forum (http://dx.doi.org/10.7928/H6KW5CXG), Cloud (http://dx.doi.org/10.7928/H6G44N6G), and Labs (http://dx.doi.org/10.7928/H6BG2KW2). 18 See http://dx.doi.org/10.7928/H66Q1V5R for full re- sults and discussion notes. 19 Notes on these presentations are available at http://dx. doi.org/10.7928/H62Z13FD. A larger list of demon- strators is available in the Demonstrator Report: http://dx.doi.org/10.7928/H6Z60KZ1. Dombrowski and Denbo (2013) includes a discussion of some of the challenges that the ‘NYX/Barlach bibliography’ project encountered when attempting to demonstrate a service for processing TEI. 20 All versions of the draft implementation proposal are available at http://dx.doi.org/10.7928/H6TD9V75. Version 0.5 was discussed at workshop 5. 21 A more thorough description of the areas of work in version 0.5 of the draft Bamboo Implementation Proposal can be found here: http://dx.doi.org/10. 7928/H6PN93HT. There was originally a fourth area of work, ‘Bamboo Community’—a repackaging of ‘Bamboo community environments’ from the pro- gram document. Participants largely agreed that this should not be treated as an area of work, but a component of the larger section on community and governance. As a result, this section was not put up for a vote. 22 In response to feedback from workshop 5, the Scholarly Networking area of work was merged with the Bamboo Atlas, and this combined entity was renamed the ‘Bamboo Commons’. 23 See http://dx.doi.org/10.7928/H6JW8BS3 for full results. 24 ‘Direction of Bamboo Atlas is fine, but I have big reservations about the scope, both as it was described in original document and fear discussions haven’t nar- rowed scope at all’; ‘[W]hen you’re reading texts or doing markup, when you find a place that doesn’t make sense, it’s a place of interest but also a place where if you slice/dice differently, problem goes away. Atlas is a confusing chunk—what’s in it, what does it do, trying to tease it out, etc. Not clear exactly what the atlas does; pieces of it that one has associated with it are useful. Not trying to eliminate what it’s doing. But might make it cleaner to take pieces of Atlas (esp. ones that have to do with Bamboo users) and move to scholarly networking, and rename the whole thing.’ http://dx.doi.org/10. 7928/H6JW8BS3 25 This was reported publicly in the Chronicle of Higher Education: http://chronicle.com/blogs/wiredcampus/ in-potential-blow-to-open-source-software-mellon- foundation-closes-grant-program/19519. On 7 January, the following message was posted to the ‘News’ section of the Project Bamboo Web site: On 5 January 2010, the Chronicle of Higher Education published on its blog an article regard- ing recent changes at the Mellon Foundation and in particular, the closure of the RIT program. Although the planning project had been sup- ported by RIT, the changes have had a minimal impact on Bamboo. At the end of December, both the University of California, Berkeley, and the University of Chicago were contacted by the Foundation, and Bamboo was smoothly migrated into the Scholarly Communications program. In short, the transition has gone well, and we look forward to working with Scholarly Communications into the future. (http://web. archive.org/web/20101231171544/http://project- bamboo.org/news?page¼2) 26 This frequently manifested itself in the concern that the scholars would be unable to design sufficiently scalable applications, and that the technologists Q. Dombrowski 338 Literary and Linguistic Computing, Vol. 29, No. 3, 2014 http://dx.doi.org/10.7928/H6057CVT `` , here's we're haven't — there's http://dx.doi.org/10.7928/H63X84K7 http://dx.doi.org/10.7928/H63X84K7 http://dx.doi.org/10.7928/H6VD6WCJ http://dx.doi.org/10.7928/H6VD6WCJ http://dx.doi.org/10.7928/H6QN64N6 http://dx.doi.org/10.7928/H6QN64N6 http://dx.doi.org/10.7928/H6KW5CXG http://dx.doi.org/10.7928/H6G44N6G http://dx.doi.org/10.7928/H6BG2KW2 http://dx.doi.org/10.7928/H66Q1V5R http://dx.doi.org/10.7928/H62Z13FD http://dx.doi.org/10.7928/H62Z13FD http://dx.doi.org/10.7928/H6Z60KZ1 `` '' http://dx.doi.org/10.7928/H6TD9V75 http://dx.doi.org/10.7928/H6PN93HT http://dx.doi.org/10.7928/H6PN93HT `` '' `` '' `` '' http://dx.doi.org/10.7928/H6JW8BS3 haven't you're doesn't it's what's it's http://dx.doi.org/10.7928/H6JW8BS3 http://dx.doi.org/10.7928/H6JW8BS3 http://chronicle.com/blogs/wiredcampus/in-potential-blow-to-open-source-software-mellon-foundation-closes-grant-program/19519 http://chronicle.com/blogs/wiredcampus/in-potential-blow-to-open-source-software-mellon-foundation-closes-grant-program/19519 http://chronicle.com/blogs/wiredcampus/in-potential-blow-to-open-source-software-mellon-foundation-closes-grant-program/19519 7th `` '' 5, Research in Information Technology ( ) http://web.archive.org/web/20101231171544/http://projectbamboo.org/news?page=2 http://web.archive.org/web/20101231171544/http://projectbamboo.org/news?page=2 http://web.archive.org/web/20101231171544/http://projectbamboo.org/news?page=2 http://web.archive.org/web/20101231171544/http://projectbamboo.org/news?page=2 would spend inordinate amounts of resources on sys- tems with minimal scholarly utility. These concerns were never raised through official channels, but had a real presence in informal conversations among members of each professional group. 27 This topic often arose over the course of the planning project workshops. Some examples: ‘sees huge gulf between librarians/faculty and technologists; so here is an opportunity to communicate with each other’ (Ex 1, 1b-B); ‘hope bamboo moves beyond the usual conversation between humanities scholars and digital technology, i.e. ‘‘What do you want?’’, ‘‘What can you do?’’ Also troubled by formula of service, that digital technology folk and librarians are there just to ‘‘ser- vice’’ the humanities faculty; should be a partnership of equals, both have research goals they want to pursue’ (Ex 1, 1b-D); ‘Libraries, Publishing and Faculty are not talking. IT in the background. Efficiency and Effectiveness are not entirely a huma- nities priority.’ (Ex 1, 1b-E); ‘Humanities and IT people have different definitions of Effectiveness v Efficiency? Humanities has ‘‘productive inefficiency’’ ’. (Ex 1,1b-E) See http://quinndombrowski.com/pro- jects/project-bamboo/data/building-partnerships-be- tween-it-professionals-and-humanists for more quotes from the planning project workshops that refer to this phenomenon. 28 For further information about Bamboo’s IAM work, see http://dx.doi.org/10.7928/H6F769GD. 29 For more information about the architecture and im- plementation of the CI hub, see http://dx.doi.org/10. 7928/H69G5JRP. 30 See http://dx.doi.org/10.7928/H65Q4T1C for a de- scription of the Bamboo Book Model, including its implementation through a CMIS binding. The Bamboo Book Model is also discussed in Dombrowski and Denbo (2013). 31 See http://dx.doi.org/10.7928/H61Z4291 for a list of service APIs that were developed by Bamboo. 32 By proxying access through the Bamboo Services Platform, remotely running scholarly services could take advantage of IAM and utility services (e.g. result set caching and notification) hosted on the Platform. See http://dx.doi.org/10.7928/H6X63JTN for more about the architecture, development, and invocation of centrally hosted Bamboo services. 33 See http://dx.doi.org/10.7928/H6SF2T3B for details about the type and extent of integration accomplished for each platform. 34 See http://dx.doi.org/10.7928/H6NP22C0 for informa- tion about the design process. 35 During this transition period, the author received an email from a Bamboo planning project partici- pant inquiring after upcoming opportunities for his liberal arts institution to become more involved. Even a few months after the Project Bamboo Web site was replaced, at Digital Humanities 2013, the author fielded multiple questions about the status of Bamboo. 36 An ‘Advocacy’ working group was discussed at work- shop 2 (http://dx.doi.org/10.7928/H6RF5RZJ), but participants were concerned that it failed to make a clear distinction between the self-promotion necessary for Bamboo’s adoption and advocacy with regards to larger issues facing digital humanities, such as those laid out in Our Cultural Commonwealth. Ultimately, a working group was not formed around this topic after workshop 2; the key issues for Bamboo in this area were reframed as ‘principles for leadership’, and expli- citly put on hold (http://dx.doi.org/10.7928/ H6MS3QNJ). 37 The Bamboo program staff members were aware that a good deal of scholarly functionality was only available as desktop software (e.g. Juxta), or systems that required complex installation (e.g. Philologic), in 2008. They anticipated that software development in digital humanities would evolve toward a web services model, following trends in enterprise soft- ware development. Some tools have moved in this direction: Juxta released a web service in 2012 (http://www.juxtasoftware.org/on-the-juxta-beta-relea se-and-taking-collation-online/), and Philologic 4 in- cludes web services (http://dx.doi.org/10.7928/ H6H12ZX4). However, as of 2014, scholarly tools are still not expected to be delivered as web services, and a great deal of work is done using stand-alone web ap- plications such as Voyant Tools (http://voyant-tools. org/), or locally run packages such as MALLET (http://mallet.cs.umass.edu/). 38 The modest duration of these institutional commit- ments came into conflict with the longer development, deployment, and support timelines for a large cyber- infrastructure initiative. While the level of Bamboo infrastructure integration for HubZero came closest to achieving the vision of the ‘work space’, by 2012, the University of Wisconsin, Madison, was moving away from supporting HubZero. Work was underway to port the integration code to Drupal—which had been selected as the ‘work space’ platform for the second phase of technical development—when Bamboo was shut down. What Ever Happened to Project Bamboo? Literary and Linguistic Computing, Vol. 29, No. 3, 2014 339 http://quinndombrowski.com/projects/project-bamboo/data/building-partnerships-between-it-professionals-and-humanists http://quinndombrowski.com/projects/project-bamboo/data/building-partnerships-between-it-professionals-and-humanists http://quinndombrowski.com/projects/project-bamboo/data/building-partnerships-between-it-professionals-and-humanists http://dx.doi.org/10.7928/H6F769GD http://dx.doi.org/10.7928/H69G5JRP http://dx.doi.org/10.7928/H69G5JRP http://dx.doi.org/10.7928/H65Q4T1C http://dx.doi.org/10.7928/H61Z4291 - , , etc. http://dx.doi.org/10.7928/H6X63JTN http://dx.doi.org/10.7928/H6SF2T3B http://dx.doi.org/10.7928/H6NP22C0 http://dx.doi.org/10.7928/H6RF5RZJ `` '' http://dx.doi.org/10.7928/H6MS3QNJ http://dx.doi.org/10.7928/H6MS3QNJ s http://www.juxtasoftware.org/on-the-juxta-beta-release-and-taking-collation-online/ http://www.juxtasoftware.org/on-the-juxta-beta-release-and-taking-collation-online/ http://dx.doi.org/10.7928/H6H12ZX4 http://dx.doi.org/10.7928/H6H12ZX4 http://voyant-tools.org/ http://voyant-tools.org/ - http://mallet.cs.umass.edu/ `` '' -- , `` '' --
work_am4amy5buvcwplbvdzknmmb4oq ---- Killer Applications in Digital Humanities Patrick Juola Duquesne University Pittsburgh, PA 15282 UNITED STATES OF AMERICA juola@mathcs.duq.edu August 31, 2006 Abstract The emerging discipline of “digital humanities” has been plagued by a perceived neglect on the part of the broader humanities community. The community as a whole tends not to be aware of the tools developed by DH practitioners (as documented by the recent surveys by Siemens et al.), and tends not to take seriously many of the results of scholarship obtained by DH methods and tools. This paper argues for a focus on deliverable results in the form of useful solutions to common problems that humanities scholars share, instead of simply new representations. The question to address is what needs the humanities community has that can be dealt with using DH tools and techniques, or equivalently what incentive humanists have to take up and to use new methods. This can be treated in some respects like the computational quest for the “killer application” – a need of the user group that can be filled, and by filling it, create an acceptance of that tool and the supporting methods/results. Some definitions and examples are provided both to illustrate the idea and to support why this is necessary. The apparent alternative is the status quo, where digital research tools are brilliantly developed, only to languish in neglect and disuse. 1 Introduction “The emerging discipline of digital humanities”. . . . Arguably, “digital humani- ties” has been emerging for decades, without ever having fully emerged. One of the flagship journals of the field, Computers in the Humanities, has published nearly forty volumes, without having established the field as a mainstream sub- discipline. The implications of this are profound; tenure-track opportunities for DH specialists are rare, publications are not widely read or valued, and, perhaps most seriously in the long run, the advances made are not used by mainstream scholars. 1 This paper analyzes some of the patterns of neglect, the ways in which mainstream humanities scholarship fails to value and participate in the digital humanities community. It further suggests one way to increase the profile of this research, by focusing on the identification and development of “killer” ap- plications (apps), computer applications that solve significant problems in the humanities in general. 2 Patterns of Neglect 2.1 Patterns of participation A major indicator of the neglect of digital humanities as a humanities discipline is the lack of participation, particularly by influential or high-impact scholars. As an example, the flagship (or at least, longest running) journal in the field of “humanities computing” is Computers and the Humanities, which has been published since the 1960s. Despite this, the impact of this journal has been minimal. The Journal Citation Reports database suggests that for 2005, the impact factor of this journal (defined as “the number of current citations to articles published in the two previous years divided by the total number of articles published in the two previous years”1) is a relatively low 0.196. (This is actually a substantial improvement from 2002’s impact factor of 0.078.) In terms of averages from 2002–4, CHum was the 6494th most cited journal out of a sample of 8011, scoring in only the 20th percentile. By contrast, the most influential journal in the field of “computer applications,” Bioinformatics, scores above 3.00; Computational Linguistics scores at 0.65; the Journal of Forensic Science at 0.75. Neither Literary and Linguistic Computing, Text Technology, nor the Journal of Quantitative Linguistics even made the sample. In other words, scholars tend not to read, or at least cite, work published under the heading of humanities computing. Do they even participate? In six years of publication (1999-2004; volumes 33–38), CHum published 101 articles, with 205 different authorial affiliations (including duplicates) listed. Who are these authors, and do they represent high-profile and influential scholars? The unfortunate answer is that they do not appear to. Of the 205 affiliations, only 5 are from “Ivy League” universities, the single most prestigious and influential group of US universities. Similarly, of the 205 affiliations, only sixteen are from the universities recognized by US News and World Report [USNews, 2006] as one the top 25 departments in in any of the disciplines of English, history, or sociology. Only two affiliations are among the top ten in those disciplines. While it is of course unreasonable to expect any group of American universities to dominate a group of international scholars, the conspicuous and almost total absence of faculty and students from top-notch US schools is still important. Nor is this absence confined to US scholars; only one affiliation from the top 5 Canadian doctoral universities (according to the 2005 MacLean’s ranking) appears. (Geoff Rockwell has pointed out that the MacLean’s rankings are 1http://jcrweb.com/www/help/hjcrgls2.htm, accessed June 15, 2006 2 School Papers (2005) Papers (2006) USNews Top 10 7 4 Harvard Cal-Berkeley 1 1 Yale Princeton 1 Stanford 1 2 Cornell Chicago Columbia 1 Johns Hopkins UCLA Penn Michigan-Ann Arbor 2 Wisconsin-Madison UNC-Chapel Hill 1 1 MacLean’s top 5 2 3 McGill Toronto 1 (3 authors) 1 Western 1 UBC 1 1 Queen’s Ivies not otherwise listed 4 6 Brown 4 (one paper 2 authors) 6 Dartmouth Table 1: Universities included for analysis of 2005 ACH/ALLC and 2006 DH proceedings not necessarily the “best” research universities in Canada, and that a better list of elite research universities would be the so-called “Group of 10” or G– 10 schools. Even with this list, only three papers — two from Alberta, one from McMaster – appear.) Australian elite universities (the Go8) are slightly better represented; three affiliations from Melbourne, one from Sydney. Only in Europe is there broad participation from recognized elite universities such as the LERU. The English-speaking LERU universities (UCL, Cambridge, Oxford, and Edinburgh) are all represented, as are the universities of Amsterdam, Leuven, Paris, and Utrecht despite the language barrier. However, students and faculty from Harvard, Yale, Berkeley, Toronto, McGilli, and Adelaide — in many cases, the current and future leaders of the fields — are conspicuously absent. Perhaps the real heavyweights are simply publishing their DH work else- where, but are still a part of the community? A study of the 118 abstracts accepted to the 2005 ACH/ALLC conference (Victoria) shows that only 7 in- cluded affiliations from universities in the “top 10” of the USNews ranking. Only two came from universities in the “top 5” of the Maclean ranking, and 3 only 6 from Ivies (Four of those six were from the well-established specialist DH program at Brown, a program unique among Ivies.) A similar analysis shows low participation among the 151 abstracts at the 2006 DH conference (Paris). The current and future leaders seem not to participate in the community, either. 2.2 Tools and awareness People who do not participate in a field cannot be expected to be aware of the developments it creates, an expectation sadly supported by recent survey data. In particular, [Siemens et al., 2004, Toms and O’Brien, 2006] reported on a survey of “the current needs of humanists” and announced that, while over 80% of survey respondents use e-text and over half use text analysis tools, they are not even aware of “commonly available tools such as TACT, WordCruncher and Concordancer.” The tools of which they are aware seem to be primarily common Microsoft products such as Word and Access. This lack of awareness is further supported by [Martin, 2005] (emphasis mine): Some scholars see interface as the primary concern; [electronic] resources are not designed to do the kind of search they want. Oth- ers see selection as a problem; the materials that databases choose to select are too narrow to be of use to scholars outside of that field or are too broad and produce too many results. Still others question the legitimacy of the source itself. How can an electronic copy be as good as seeing the original in a library? Other, more electronically oriented scholars, see the great value of accessibility of these resources, but are unaware of the added potential for research and teaching. The most common concern, however, is that schol- ars believe they would use these resources if they knew they existed. Many are unaware that their library subscribes to resources or that universities are sponsoring this kind of research. Similarly, [Warwick, 2004a] describes the issues involved with the Oxford University Humanities Computing Unit (HCU). Despite its status as an “inter- nationally renowned centre of excellence in humanities computing,” [P]ersonal experience shows that it was extremely hard to con- vince traditional scholars in Oxford of the value of humanities com- puting research. This is partly because so few Oxford academics were involved in any of the work the HCU carried out, and had little knowledge of, or respect for, humanities computing research. Had there been a stronger lobby of interested academics who had a vested interest in keeping the centre going because they had projects asso- ciated with it, perhaps the HCU could have become a valued part of the humanities division. That it did not, demonstrates the con- sequences of a lack of respect for digital scholarship amongst the mainstream. 4 3 Killer Apps and Great Problems One possible reason for this apparent neglect is a mismatch of expectations between the expected needs of audience (market) for the tools and the com- munity’s actual needs. A recent paper [Gibson, 2005] on the development of an electronic scholarly edition of Clotel may illustrate this. The edition itself is a technical masterpiece, offering, among other things, the ability to compare passages among the various editions and even to track word-by-word changes. However, it is not clear who among Clotel scholars will be interested in using this capacity or this edition; many scholars are happy with their print copies and the capacities print grants (such as scribbling in the margins or reading on a park bench). Furthermore, the nature of the Clotel edition does not lend itself well either to application to other areas or to further extension. The knowledge gained in the process of annotating Clotel does not appear to generalize to the annotation of other works (certainly, no general consensus has emerged about “best practices” in the development of a digital edition, and the various pro- posals appear to be largely incompatible and even incomparable). The Clotel edition is essentially a service offered to the broader research community in the hope that it will be used, and runs a great risk of becoming simply yet another tool developed by the DH specialists to be ignored. Quoting further from [Martin, 2005]: [Some scholars] feel there is no incentive within the university system for scholars to use these kinds of new resources. — let alone to create them. This paper argues that for a certain class of resources, there should be no need for an incentive to get scholars to use them. Digital humanities specialists should be in a unique position both to identify the needs of mainstream hu- manities scholars and to suggest computational solutions that the mainstream scholars will be glad to accept. 3.1 Definition The wider question to address, then, is what needs the humanities community has that can be dealt with using DH tools and techniques, or equivalently what incentive humanists have to take up and to use new methods. This can be treated in some respects like the computational quest for the “killer applica- tion” – a need of the user group that can be filled, and by filling it, create an acceptance of that tool and the supporting methods/results. Digital Humanities needs a “killer application.” “Killer application” is a term borrowed from the discipline of computer sci- ence. In its strictest form, it refers to an application program so useful that users are willing to buy the hardware it runs on, just to have that program. One of the earliest examples of such an application was the spreadsheet, as typified by VisiCalc and Lotus 1-2-3. Having a spreadsheet made business deci- sionmaking so much easier (and more accurate and profitable) that businesses 5 were willing to buy the computers (Apple IIs or IBM PCs, respectively) just to run spreadsheets. Gamers by the thousands have bought Xbox gaming consoles just to run Halo. A killer application is one that will make you buy, not just the product itself, but also invest in the necessary infrastructure to make the product useful. For digital humanities, this term should be interpreted in a somewhat broader sense. Any intellectual product — a computer program, an abstract tool a the- ory, an analytic framework — can and should be evaluated in terms of the “affor- dances” [Gibson, 2005, Ruecker and Devereux, 2004] it creates. In this frame- work, an “affordance” is simply “an opportunity for action” [Ruecker and Devereux, 2004]; spreadsheets, for instance, create opportunities to make business decisions quickly on the basis of incomplete or hypothesized data, while Halo creates the opportu- nity for playing a particular game. Ruecker provides a framework for comparing different tools in terms of their “affordance strength,” essentially the value of- fered by the affordances of a specific tool. In this broader context, a “killer app” is any intellectual construct that creates sufficient affordance strength to justify the effort and cost of accepting, not just the construct itself, but the supporting intellectual infrastructure. It is a solution sufficiently interesting to, by itself, retrospectively justify looking the problem it solves — a Great Problem that can both empower and inspire. Three properties appear to characterize such ”killer apps”. First, the prob- lem itself must be real, in the sense that other humanists (or the public at large) should be interested in the fruits of its solution. For example, the organizers of a recent NSF summit on “Digital Tools for the Humanities” identified several examples of the kinds of major shifts introduced by information technology in various areas. In their words, When information technology was first applied [to inventory- based businesses], it was used to track merchandise automatically, rather than manually. At that time, the merchandise was stored in the same warehouses, shipped in the same way, depending upon the same relations among produces and retailers as before[. . . ]. To- day, a revolution has taken place. There is a whole new concept of just-in-time inventory delivery. Some companies have eliminated warehouses altogether, and the inventory can be found at any instant in the trucks, planes, trains, and ships delivering sufficient inventory to re-supply the consumer or vendor — just in time. The result of this is a new, tightly interdependent relationship between sup- pliers and consumers, greatly reduced capital investment in “idle” merchandise, and dramatically more responsive service to the final consumer. A killer application in scholarship should be capable of effecting similar change in the way that practicing scholars do their work. Only if the prob- lem is real can an application solving it be a killer. The Clotel edition described above appears to fail under this property precisely because only specialists in 6 Clotel (or in 19th-century or African-American literature) are likely to be inter- ested in the results; a specialist in the Canterbury Tales will not find her work materially affected. Second, the problem must get buy-in from the humanities computing com- munity itself, in that humanities computing specialists will be motivated to do the actual work. The easiest and probably cheapest way to do this is for the process of solution itself to be interesting to the participating scholars. For example, the compiling of a detailed and subcategorized bibliography of all ref- erences to a given body of work would be of immense interest to most scholars; rather than having to pore through dozens of issues of thousands of journals, they could simply look up their field of interest. (This is, in fact, very close to the service that Thompson Scientific provides with the Social Science Citation Index, or that Penn State provides with CiteSeer.) The problem is that though the product is valuable, the process of compiling it is dull, dreary, and unre- warding. There is little room for creativity, insight, and personal expression in such a bibliography. Most scholars would not be willing to devote substan- tial effort — perhaps several years of full-time work — to a project with such minimal reward. (By contrast, the development of a process to automatically create such a bibliography could be interesting and creative work.) The process of solving interesting problems will almost automatically generate papers and publications, draw others into the process of solving it, and create opportuni- ties for discussion and debate. We can again compare this to the publishing opportunities for a bibliography — is “my bibliography is now 50% complete” a publishable result? Third, the problem itself must be such that even a partial solution or an incremental improvement will be useful and/or interesting. Any problem that meets the two criteria above is unlikely to submit to immediate solution (oth- erwise someone would probably already have solved it). Similarly, any such problem is likely to be sufficiently difficult that solving it fully would be a ma- jor undertaking, beyond the resources that any single individual or group could likely muster. On the other hand, being able to develop, deploy, and use a par- tial solution will help advance the field in many ways. The partial solution, by assumption, is itself useful. Beyond that, researchers and users have an incen- tive to develop and deploy improvements. Finally, the possibility of supporting and funding incremental improvements makes it more likely to get funding, and enhances the status of the field as a whole. 3.2 Some historical examples To more fully understand this idea of a killer app, we should first consider the history of scholarly work, and imagine the life of a scholar c. 1950. He (probably) spends much of his life in the library, reading paper copies of journal articles and primary sources to which he (or his library) has access, taking detailed notes by hand on index cards, and laboriously writing drafts in longhand which he will revise before finally typing (or giving to a secretary to type). His new ideas are sent to conferences and journals, eventually to find their way into the libraries 7 of other scholars worldwide over a period of months or years. Collaboration outside of his university is nearly unheard-of, in part because the process of exchanging documents is so difficult. Compare that with the modern scholar, who can use a photocopier or scan- ner to copy documents of interest and write annotations directly on those copies. She can use a word processor (possibly on a portable computer) both to take research notes and to extend those notes into articles; she has no need to write complete drafts, can easily rearrange or incorporate large blocks of text, and can take advantage of the computer to handle “routine” tasks such as spelling correction, footnote numbering, bibliography formatting, and even pagination. She can directly incorporate the journal’s formatting requirements into her work (so that the publisher can legitimately ask for “camera-ready” manuscripts as a final draft), eliminating or reducing the need both for typists and typesetters. She can access documents from the comfort of her own office or study via an electronic network, and use advanced search technology to find and study docu- ments that her library does not itself hold. She can similarly distribute her own documents through that same network and make them available to be found by other researchers. Her entire work-cycle has been significantly changed (for the better, one hopes) by the availability of these computation resources. We thus have several historical candidates for what we are calling “killer apps”: xerographic reproduction and scanning, portable computing (both ar- guably hardware instead of software), word processing and desktop publishing (including subsystems such as bibliographic packages and spelling checkers), net- worked communication such as Email and the Web, and search technology such as Google. These have all clearly solved significant issues in the way humanities research is generally performed (i.e. met the first criterion). In Ruecker’s terms, they have all created ‘affordances” of the sort that no modern scholar would choose to forego. The amount of research work — journals, papers, patents, presentations, and books — devoted to these topics suggests that researchers themselves are interested in solving the problems and improving the technolo- gies, in many cases incrementally (e.g., “how can a search engine be tuned to find documents written in Thai?”). Of course, for many of these applications, the window of opportunity has closed, or at least narrowed. A group of academics are unlikely to be able to have the resources to build/deploy a competing product to Microsoft and/or Google. On the other hand, the very fact that humanities scholars are something of a niche market may open the door to incremental killer apps based upon (or built as extensions to) mainstream software, applications focused specifically on the needs of practicing scholars. The next section presents a partial list of some candidates that may yield killer applications in the foreseeable future. Some of these candidates are taken from my own work, some from the writings of others. 8 3.3 Potential current killer apps 3.3.1 Back of the Book Index Generation Almost every nonfiction book author has been faced with the problem of index- ing. For many, this will be among the most tedious, most difficult, and least rewarding parts of writing the book. The alternative is to hire a professional indexer (perhaps a member of an organization such as the American Society of Indexers, www.asindexing.org) and pay a substantial fee, which simply shifts the uncomfortable burden to someone else, but does not substantially reduce it. A good index provides much more than the mere ability to find information in a text. The Clive Pyne book indexing company2 lists some aspects of what a good index provides. According to them, “a good index: • provides immediate access to the important terms, concepts and names scattered throughout the book, quickly and efficiently; • discriminates between useful information on a subject, and a passing men- tion; • has headings which are concise, accurate and unambiguous reflecting the contents and terminology used in the text; • has sufficient cross-references to connect related terms; • anticipates how readers will search for information; • reveals the inter-relationships of topics, concepts and names so that the reader need not read the whole index to find what they are looking for; • provides terminology which might not be used in the text, but is the reference point that the reader will use for searching through the index; • can make the difference between a book and a very good book” A traditional back-of-the-book (BotB) index is a substantial intellectual ac- complishment in its own right. In many ways, it is an encapsulated and stylized summary of the intellectual structure of the book itself. “A good index is an objective guide to the text, a link between the author’s ideas and the reader. It should be a road map that leads readers to every relevant idea without frus- trating detours and dead ends.”3 And it is specifically not just a concordance or a list of terms appearing in the document. It is thus surprising that a tedious task of such importance has not yet been computerized. This is especially surprising given the effectiveness of search en- gines such as Google at “indexing” the unimaginably large volume of information on the Web. However, the tasks are subtly different; a Google search is not ex- pected to show knowledge of the structure of the documents or the relationships 2http://www.cpynebookindexing.com/what makes a good index.htm, accessed 5/31/2006 3Kim Smith, http://www.smithindexing.com/whyprof.html, accessed 5/31/2006. 9 among the search terms. As a simple example, a phrasal search on Google (May 31, 2006) for “a good index,” found, as expected, several articles on back of the book indexing. It also found several articles on financial indexing and index funds, and a scholarly paper on glycemic control as measured (“indexed”) by plasma glucose concentrations. A good text index would be expected to identify these three subcategories, to group references appropriately, and to offer them to the reader proactively as three separate subheadings. A good text index is not simply a search engine on paper, but an intellectual precis of the structure of the text. This is therefore an obvious candidate for a killer application. Every hu- manities scholar needs such a tool. Indeed, since chemistry texts need indexing as badly as history texts do, scholars outside of the humanities also need it. Unfortunately, not only does it not (yet) exist, but it isn’t even clear at this writing what properties such a tool would have. Thus there is room for fun- damental research into the attributes of indices as a genre of text, as well as into the fundamental processes of compiling and evaluating indices and their expression in terms of algorithms and computation. I have presented elsewhere [Juola, 2005, Lukon and Juola, 2006] a possible framework to build a tool for the automatic generation of such indices. With- out going into technical detail,the framework identifies several important (and interesting) cognitive/intellectual tasks that can be independently solved in an incremental fashion. Furthermore, this entire problem clearly admits of an in- cremental solution, because a less-than-perfect index, while clearly improvable, is still better than no index at all, and any time saved by automating the more tedious parts of indexing will still be a net gain to the indexer. Thus all three components of the definition of killer app given above are present, suggesting that the development of such an indexing tool would be beneficial both inside and outside the digital humanities community. 3.3.2 Annotation tools As discussed above, one barrier to the use of E-texts and digital editions is the current practices of scholars with regard to annotation. Even when documents are available electronically, many researchers (myself include) will often choose to print them and study them on paper. Paper permits one not only to mark text up and to make changes, but also to make free-form annotations in the margins, to attach PostIt notes in a rainbow of colors, and to share commentary with a group of colleagues. Annotation is a crucial step in recording a reader’s encounter with a text, in developing an interpretation, and in sharing that interpretation with others. The recent IATH Summit on Digital Tools for the Humanities [IATH Summit, 2006] identified this process of annotation and interpretation as a key process underly- ing humanistic scholarship, and specifically discussed the possible development of a tool for digital annotation, a “highlighter’s tool,” that would provide the same capacities of annotation of digital documents, including multimedia doc- uments, that print provides. The flexibility of digital media means, in fact,that 10 one should be able to go beyond the capacities of print — for example, instead of doodling a simple drawing in the margin of a paper, one might be able to “doodle” a Flash animation or a .wav sound file. Discussants identified at least nine separate research projects and communi- ties that would benefit from such a tool. Examples include “a scholar currently writing a book on Anglo-American relations, who is studying propaganda films produced by the US and UK governments and needs to compare these with text documents from on-line archives, coordinate different film clips, etc.”; “an add-on tool for readers (or reviewers) of journal articles,” especially of electronic journal systems (The current system of identifying comments by page and line number, for example, is cumbersome for both reviewers and authors.); and “an endangered language documentation project that deals with language variation and language contact,” where multilingual, multialphabet, and multimedia re- sources must be coordinated among a broad base of scholars. Such a tool has the potential to change the annotation process as much as the word processor has changed the writing and publication process. Can community buy-in be achieved? There is certainly room for research and for incremental improvements, both in defining the standards and capacities of the annotations and in expanding those capacities to meet new requirements as they evolve. For example, early versions of such a project would probably not be capable handling all forms of multimedia data; a research-quality prototype might simply handle PDF files and sound, but not video. It’s not clear that the community support is available for building early, simple versions – although “a straw poll showed that half of [the discussants] wanted to build this kind of tool, and all wanted to use it.” [IATH Summit, 2006], responding to a straw poll is one thing and devoting time and resources is another altogether; it is not clear that any software development on this project has yet happened. However, given the long-term potential uses and research outcomes from this kind of project, it clearly has the potential to be a killer application. 3.3.3 Resource exploration Another issue raised at the summit is that of resource discovery and explo- ration. The huge amount of information on the Web is, of course, a tremendous resource for all of scholarship, and companies such as Google (especially with new projects such as Google Images and Google Scholar) are excellent at finding and providing access. On the other hand, “such commercial tools are shaped and defined by the dictates of the commercial market, rather than the more complex needs of scholars.” [IATH Summit, 2006] This raises issues about ac- cess to more complex data, such as textual markup, metadata, and data hidden behind gateways and search interfaces. Even where such data is available, it is rarely compatible from one database to another, and it’s hard to pose questions to take advantage of the markup. In the words of the summit report, What kinds of tools would foster the discovery and exploration 11 of digital resources in the humanities? More specifically, how can we easily locate documents (in multiple formats and multiple media), find specific information and patterns in across [sic] large numbers of scholarly disciplines and social networks? These tasks are made more difficult by the current state of resources and tools in the hu- manities. For example, many materials are not freely available to be crawled through or discovered because they are in databases that are not indexed by conventional search engines or because they are behind subscription-based gates. In addition, the most commonly used interfaces for search and discovery are difficult to build upon. And, the current pattern of saving search results (e.g., bookmarks) and annotations (e.g., local databases such as EndNote) on local hard drives inhibits a shared scholarly infrastructure of exploration, discovery, and collaboration. Again, this has the potential to effect significant change in the day-to-day working life of a scholar, by making collaborative exploration and discovery much more practical and rewarding, possibly changing the culture by creating a new “scholarly gift economy in which no one is a spectator and everyone can readily share the fruits of their discovery efforts.” “Research in the sciences has long recognized team efforts. . . . A similar emphasis on collaborative research and writing has not yet made its way into the thinking of humanists.” But, of course, what kind of discovery tools would be needed? What kind of search questions should be supported? How can existing resources such as lexi- cons and ontologies be incorporated into the framework? How can it take advan- tage of (instead of competing with) existing commercial search utilities? These questions illustrate many of the possible research avenues that could be explored in the development of such an application. Jockers’ idea of “macro lit-o-nomics (macro-economics for literature)” [Jockers, 2005] is one approach that has been suggested to developing useful analysis from large datasets; Ruecker and De- veraux [Ruecker and Devereux, 2004] and their “Just-in-Time” text analysis is another. In both projects, the researchers showed that interesting conclusions could be drawn by analyzing the large-scale results of automatically-discovered resources and looking at macro-scale patterns of language and thought. 3.3.4 Automatic essay grading The image of a bleary-eyed teacher, bent over a collection of essays at far past her bedtime is a traditional one. Writing is a traditional and important part of the educational one, but most instructors find the grading of essays to be time-consuming, tedious, and unrewarding. This applies regardless of the sub- ject; essays on Shakespeare are not significantly more fun to grade than essays on the history of colonialism. The essay grading problem is one reason that multiple choice tests are so popular in large classes. We thus have another po- tential “killer app,” an application to handle the chore of grading essays without interfering with the educational process. 12 Several approaches to automatic essay grading have been tried, with rea- sonable but not overwhelming success. At a low enough level, essay grading can be done successfully just by looking at aspects of spelling, grammar, and punctuation, or at stylistic continuity [Page, 1994]. Foltz [Foltz et al., 1999] has also shown good results by comparing semantic coherence (as measured, via La- tent Semantic Analysis, from word cooccurances) with that of essays of known quality: LSA’s performance produced reliabilities within the range of their comparable inter-rater reliabilities and within the generally accepted guidelines for minimum reliability coefficients. For example, in a set of 188 essays written on the functioning of the human heart, the av- erage correlation between two graders was 0.83, while the correlation of LSA’s scores with the graders was 0.80. . . . In a more recent study, the holistic method was used to grade two additional questions from the GMAT standardized test. The performance was compared against two trained ETS graders. For one question, a set of 695 opinion essays, the correlation between the two graders was 0.86, while LSA’s correlation with the ETS grades was also 0.86. For the second question, a set of 668 analysis of argument essays, the correlation between the two graders was 0.87, while LSA’s correlation to the ETS grades was 0.86. Thus, LSA was able to perform near the same reliability levels as the trained ETS graders. Beyond simply reducing the workload of the teacher, this tool has many other uses. It can be used, for example, as a method of evaluating a teacher for consistency in grading, or for ensuring that several different graders for the same class use the same standards. More usefully, perhaps, it can be used as a teach- ing adjunct, by allowing students to submit rough drafts of their essays to the computer and re-write until they (and the computer) are satisfied. This will also encourage the introduction of writing into the curriculum in areas outside of tra- ditional literature classes, and especially into areas where the faculty themselves may not be comfortable with the mechanics of teaching composition. Research into automatic essay grading is a active area among text categorization scholars and computer scientists for the reasons cited above. [Valenti et al., 2003] From a philosophical point of view, though, it’s not clear that this approach to essay grading should be acceptable. A general-purpose essay grader can do a good job of evaluating syntax and spelling, and even (presumably) grade “se- mantic coherence” by counting if an acceptable percentage of the words are close enough together in the abstract space of ideas. What such a grader cannot do is evaluate factual accuracy or provide discipline-specific information. Further- more, the assumption that there is a single grade that can be assigned to an essay, irrespective of context and course focus, is questionable. Here is an area where a problem has already been identified, applications have been and con- tinue to be developed, uptake by a larger community is more or less guaranteed, 13 but the input of humanities specialists is crucially needed to improve the service quality provided. 4 Discussion The list of problems in the preceeding section is not meant to be either exclusive or exhaustive, but merely to illustrate the sort of problems for which killer apps can be designed and deployed. Similarly, the role for humanities specialists to play will vary from project to project – in some cases, humanists will need to play an advisory role to keep a juggernaut from going out of control (as might be needed with the automatic grading), while in others, they will need to create and nurture a software project from scratch. The list, however, shares enough to illustrate both the underlying concept and its significance. In other words, we have an answer to the question “what?” — what do I mean by a “killer application,” what does it mean for the field of digital humanities, and, as I hope I have argued, what can we do to address the perennial problem of neglect by the mainstream. An equally important question, of course, is “how?” Fortunately, there appears to be a window opening, a window of increased attention and avail- able research opportunities in the digital humanities. The IATH summit cited above [IATH Summit, 2006] is one example, but there are many others. Re- cent conferences such as the first Text Analysis Developers Alliance (TADA), in Hamilton (2005), the Digital Tools Summit for Linguistics in East Lansing (2006), the E-MELD Workshops (various locations, 2000–6), the Cyberinfras- tructure for Humanities, Arts, and Social Sciences workshop at UCSD (2006), and the recent establishment of the Working Group on Community Resources for Authorship Attribution (New Brunswick, NJ; 2006) illustrate that digital scholarship is being taken more seriously. The establishment of Ray Siemens in 2004 as the Canada Research Chair in Humanities Computing is another impor- tant milestone, marking perhaps the first recognition by a national government of the significance of Humanities Computing as an acknowledged discipline. Perhaps most important in the long run is the availability of funding to support DH initiatives. Many of the workshops and conferences described above were partially funded by competitively awarded research grants from national agencies such as the National Science Foundation. The Canadian Foundation for Innovation has been another major source of funding for DH initiatives. But perhaps the most significant development is the new (2006) Digital Humanities Initiative at the (United States) National Endowment for the Humanities. From the website4: NEH has launched a new digital humanities initiative aimed at supporting projects that utilize or study the impact of digital technology. Digital technologies offer humanists new methods of conducting research, conceptualizing relationships, and presenting 4http://www.neh.gov/grants/digitalhumanities.html, accessed 6/18/2006 14 scholarship. NEH is interested in fostering the growth of digital hu- manities and lending support to a wide variety of projects, including those that deploy digital technologies and methods to enhance our understanding of a topic or issue; those that study the impact of digital technology on the humanities–exploring the ways in which it changes how we read, write, think, and learn; and those that digitize important materials thereby increasing the public’s ability to search and access humanities information. The list of potentially supported projects is large: • apply for a digital humanities fellowship (coming soon!) • create digital humanities tools for analyzing and manipulating humanities data (Reference Materials Grants, Research and Development Grants) • develop standards and best practices for digital humanities (Research and Development Grants) • create, search, and maintain digital archives (Reference Materials Grants) • create a digital or online version of a scholarly edition (Scholarly Editions Grants) • work with a colleague on a digital humanities project (Collaborative Re- search Grants) • enhance my institution’s ability to use new technologies in research, educa- tion, preservation, and public programming in the humanities (Challenge Grant) • study the history and impact of digital technology (Fellowships, Faculty Research Awards, Summer Stipends) • develop digitized resources for teaching the humanities (Grants for Teach- ing and Learning Resources) Most importantly, this represents an agency-wide initiative, and thus illus- trates the changing relationship between the traditional humanities and digital scholarship at the very highest levels. Of course, just as windows can open, they can close. To ensure continued access to this kind of support, the supported research needs to be successful. This paper has deliberately set the bar high for “success,” arguing that digi- tal products can and should result in substantial uptake and effect significant changes in the way that, as NEH put it, “how we read, write, think, and learn.” The possible problems discussed earlier are an attempt to show that we can effect such changes. But the most important question, of course, is “should we?” 15 “Why?” Why should scholars in the digital humanities try to develop this software and make these changes? The first obvious answer is simply one of self- interest as a discipline. Solving high-profile problems is one way of attracting the attention of mainstream scholars and thereby getting professional advance- ment. Warwick [Warwick, 2004b] illustrates this in her analysis of the citations of computational methods, and the impact of a single high-profile example. Of all articles studied, the only ones that cited computation methods did so in the context of Don Foster’s controversial analysis of “A Funeral Elegy” to Shake- speare. The Funeral Elegy controversy provides a case study of circum- stances in which the use of computational techniques was noticed and adopted by mainstream scholars. The paper argues that a com- plex mixture of a canonical author (Shakespeare) and a star scholar (Foster) brought the issue to prominence. . . . The Funeral Elegy debate shows that if the right tools for tex- tual analysis are available, and the need for, and use of, them is explained, some mainstream scholars may adopt them. Despite the current emphasis on historical and cultural criticism, scholars will surely return in time to detailed analysis of the literary text. There- fore researchers who use computational methods must publish their results in literary journals as well as those for humanities computing specialists. We must also realize that the culture of academic disci- plines is relatively slow to change, and must engage with those who use traditional methods. Only when all these factors are understood and are working in concert, may computational analysis techniques truly be more widely adopted. Implicit in this, of course, is the need for scholars to find results that are publishable in mainstream literary journals as well as to do the work resulting in publication, the two main criteria of killer apps. On a less selfish note, the development of killer applications will improve the overall state of scholarship as a whole, without regard to disciplinary boundaries. While change for its own sake may not necessarily be good, solutions to genuine problems usually are. Creating the index to a large document is not fun — it requires days or weeks of painstaking, detailed labor that few enjoy. The inability to find or access needed resources is not a good thing. By eliminating artificial or unnecessary restrictions on scholarly activity, scholars are freed to do what they really want to do — to read, to write, to analyze, to produce knowledge, and to distribute it. Furthermore, the development of such tools will in and of itself generate knowledge, knowledge that can be used not only to generate and enhance new tools but to help understand and interpret the humanities more generally. Soft- ware developers must be long-term partners with the scholars they serve, but digital scholars must also be long-term partners, not only with the software de- velopers, but with the rest of the discipline and its emerging needs. In many 16 case, the digital scholars are uniquely placed to identify and to describe the emerging needs of the discipline as a whole. With a foot in two camps, the digital scholars will be able to speak to the developers about what is needed, and to the traditional scholars about what is available as well as what is under development. 5 Conclusion Predicting the future is always difficult, and predicting the effects of a newly- opened window is even more so. But recent developments suggest that digital humanities, as a field, may be at the threshold of new series of significant de- velopments that can change the face of humanities scholarship and allow the “emerging discipline of humanities computing” finally to emerge. For the past forty years, humanities computing has more or less languished in the background of traditional scholarship. Scholars lack incentive to partici- pate (or even to learn about) the results of humanities computing. This paper argues that DH specialists are placed to create their own incentives by develop- ing applications with sufficient scope to materially change the way humanities scholarship is done. I have suggested four possible examples of such applica- tions, knowing well that many more are out there. I believe that by actively seeking out and solving such Great Problems – by developing such killer apps, scholarship in general and digital humanities in particular, will be well-served. References [Foltz et al., 1999] Foltz, P. W., Laham, D., and Landauer, T. K. (1999). Auto- mated essay scoring: Applications to educational technology. In Proceedings of EdMedia ’99. [Gibson, 2005] Gibson, M. (2005). Clotel: An electronic scholarly edition. In Proceedings of ACH/ALLC 2005, Victoria, BC CA. University of Victoria. [IATH Summit, 2006] IATH Summit (2006). Summit on digital tools for the humanities : Report on summit accomplishments. [Jockers, 2005] Jockers, M. (2005). Xml aware tools — catools. In Presentation at Text Analysis Developers Alliance, McMaster University, Hamilton, ON. [Juola, 2005] Juola, P. (2005). Towards an automatic index generation tool. In Proceedings of ACH/ALLC 2005, Victoria, BC CA. University of Victoria. [Lukon and Juola, 2006] Lukon, S. and Juola, P. (2006). A context-sensitive computer-aided index generator. In Proceedings of DH 2006, Paris. Sorbonne. [Martin, 2005] Martin, S. (2005). Reaching out: What do scholars want from electronic resources? In Proceedings of ACH/ALLC 2005, Victoria, BC CA. University of Victoria. 17 [Page, 1994] Page, E. B. (1994). Computer grading of student prose using mod- ern concepts and software. Journal of Experimental Education, 62:127–142. [Ruecker and Devereux, 2004] Ruecker, S. and Devereux, Z. (2004). Scraping Google and Blogstreet for Just-in-Time text analysis. In Presented at CaSTA- 04, The Face of Text, McMaster University, Hamilton, ON. [Siemens et al., 2004] Siemens, R., Toms, E., Sinclair, S., Rockwell, G., and Siemens, L. (2004). The humanities scholar in the twenty-first century: How research is done and what support is needed. In Proceedings of ALLC/ACH 2004, Gothenberg. U. Gothenberg. [Toms and O’Brien, 2006] Toms, E. G. and O’Brien, H. L. (2006). Understand- ing the information and communication technology needs of the e-humanist. Journal of Documentation, (accepted/forthcoming). [USNews, 2006] USNews (2006). U.S. News and World Report : America’s best graduate schools (social sciences and humanities). [Valenti et al., 2003] Valenti, S., Neri, F., and Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of In- formation Technology Education, 2:319–330. [Warwick, 2004a] Warwick, C. (2004a). No such thing as humanities comput- ing? an analytical history of digital resource creation and computing in the humanities. In Proceedings of ALLC/ACH 2004, Gothenberg. U. Gothenberg. [Warwick, 2004b] Warwick, C. (2004b). Whose funeral? a case study of com- putational methods and reasons for their use or neglect in English studies. In Presented at CaSTA-04, The Face of Text, McMaster University, Hamilton, ON. 18
work_ankxmqrvibhnznbna2irncou5q ---- Microsoft Word - RIs special issue v5 - revised - changes accepted King’s Research Portal DOI: 10.3366/ijhac.2013.0086 Document Version Early version, also known as pre-print Link to publication record in King's Research Portal Citation for published version (APA): Dunn, S., & Hedges, M. (2013). Crowd-sourcing as a Component of Humanities Research Infrastructures. International Journal of Humanities and Arts Computing, 7(1), 147-169. [N/A]. https://doi.org/10.3366/ijhac.2013.0086 Citing this paper Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections. General rights Copyright and moral rights for the publications made accessible in the Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognize and abide by the legal requirements associated with these rights. •Users may download and print one copy of any publication from the Research Portal for the purpose of private study or research. •You may not further distribute the material or use it for any profit-making activity or commercial gain •You may freely distribute the URL identifying the publication in the Research Portal Take down policy If you believe that this document breaches copyright please contact librarypure@kcl.ac.uk providing details, and we will remove access to the work immediately and investigate your claim. Download date: 06. Apr. 2021 https://doi.org/10.3366/ijhac.2013.0086 https://kclpure.kcl.ac.uk/portal/en/publications/crowdsourcing-as-a-component-of-humanities-research-infrastructures(294ddb42-e10c-4bce-b6cf-8b9500c9aeb7).html https://kclpure.kcl.ac.uk/portal/en/persons/stuart-dunn(9a7fa6a7-47a3-49b3-a358-140b7ba41334).html /portal/mark.hedges.html https://kclpure.kcl.ac.uk/portal/en/publications/crowdsourcing-as-a-component-of-humanities-research-infrastructures(294ddb42-e10c-4bce-b6cf-8b9500c9aeb7).html https://kclpure.kcl.ac.uk/portal/en/journals/international-journal-of-humanities-and-arts-computing(ede900f4-f773-46e9-a878-a2110f8c1d8a).html https://doi.org/10.3366/ijhac.2013.0086 Open Access document downloaded from King’s Research Portal https://kclpure.kcl.ac.uk/portal The copyright in the published version resides with the publisher. When referring to this paper, please check the page numbers in the published version and cite these. General rights Copyright and moral rights for the publications made accessible in King’s Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications in King's Research Portal that users recognise and abide by the legal requirements associated with these rights.' • Users may download and print one copy of any publication from King’s Research Portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the King’s Research Portal Take down policy If you believe that this document breaches copyright please contact librarypure@kcl.ac.uk providing details, and we will remove access to the work immediately and investigate your claim. Citation to published version: Hedges, M., & Dunn, S. (2013). Crowd-sourcing as a Component of Humanities Research Infrastructures. International Journal of Humanities and Arts Computing This version: Pre-print https://kclpure.kcl.ac.uk/portal/en/publications/crowdsourcing-as-a- component-of-humanities-research-infrastructures%28294ddb42-e10c- 4bce-b6cf-8b9500c9aeb7%29.html This Pre-print version has been submitted for publication https://kclpure.kcl.ac.uk/portal/ mailto:librarypure@kcl.ac.uk https://kclpure.kcl.ac.uk/portal/en/publications/crowdsourcing-as-a-component-of-humanities-research-infrastructures%28294ddb42-e10c-4bce-b6cf-8b9500c9aeb7%29.html https://kclpure.kcl.ac.uk/portal/en/publications/crowdsourcing-as-a-component-of-humanities-research-infrastructures%28294ddb42-e10c-4bce-b6cf-8b9500c9aeb7%29.html https://kclpure.kcl.ac.uk/portal/en/publications/crowdsourcing-as-a-component-of-humanities-research-infrastructures%28294ddb42-e10c-4bce-b6cf-8b9500c9aeb7%29.html Crowd-sourcing as a Component of Humanities Research Infrastructures Stuart Dunn, Mark Hedges Centre for e-Research, Department of Digital Humanities, King’s College London, 26-29 Drury Lane, London, UK mark.hedges@kcl.ac.uk, stuart.dunn@kcl.ac.uk Abstract: Crowd-sourcing, the process of leveraging public participation in or contribution to a project or activity, is relatively new to academic research, but is becoming increasingly important as the Web transforms collaboration and communication and blurs the boundaries between the academic and non- academic worlds. At the same time, digital research methods are entering the mainstream of humanities research, and there are a number of initiatives addressing the conceptualisation and construction of research infrastructures for the humanities. This paper examines the place of crowd-sourcing activities within such initiatives, presenting a framework for describing and analysing academic humanities crowd- sourcing, and using this framework of ‘primitives’ as a basis for exploring potential relationships between crowd-sourcing and humanities research infrastructures. Keywords: crowd-sourcing, research infrastructures, citizen science, scholarly primitives, typology. Introduction Crowd-sourcing, 1 the process of leveraging public participation in or contribution to a project or activity, is relatively new to academic research, and even more so to the humanities. However, at a time when the Web is transforming the way in which people collaborate and communicate, and is blurring boundaries between the spaces inhabited by the academic and non-academic worlds, it has never been more important to examine the role that public communities are beginning to play in academic humanities research. At the same time, digital research methods are starting to enter the mainstream of humanities research, and there are a number of initiatives addressing the conceptualisation and construction of research infrastructures that would support a shift from ad hoc projects and centres to an environment that is more integrated and sustainable. Such an environment will inevitably be distributed, integrating knowledge, services and people in a loosely-coupled, collaborative ‘digital social marketplace’. 2 The question naturally arises as to where crowd-sourcing activities fit within this framework. More specifically, what contributions can public participants, and the communities to which they belong, make to a humanities research infrastructure, and conversely how can these participants and communities, and the academic researchers who make use of the knowledge and effort that they contribute, benefit from such participation? To begin to address these questions is one of the aims of this paper. The paper is organised as follows: we begin by describing the context in which the work was carried out, and the methodology used. We then review a number of existing terminologies and typologies for crowd- sourcing and related concepts, and follow this with an analysis of the main motivations for engaging with crowd-sourcing, from both the volunteer’s and the academic’s points of view. Finally, we build upon this by presenting the outline of a framework for describing and analysing academic humanities crowd-sourcing projects, and use this framework of ‘primitives’ as a basis for exploring the potential relationships between various forms of crowd-sourcing activity and humanities research infrastructures. Background and Methodology The research described in this paper was mostly carried out as part of the Crowd-sourcing Scoping Study project (Ref. AH/J01155X/1), which ran for nine months from February-November 2012, and was funded by the Arts and Humanities Research Council as part of its Connected Communities programme. The study’s methodology had four main components: • a literature review covering academic humanities research that has incorporated crowd-sourcing, research into crowd-sourcing as a method, and less formal outputs such as blogs and project websites. • two workshops facilitating discussion between, respectively, humanities academics who have used crowd-sourcing, and contributors to crowd-sourcing projects; • an online survey of contributors to crowd-sourcing projects, exploring their backgrounds, histories of participating in such projects, and motivations for doing so; • interviews with academics and contributors. The study does not claim to be comprehensive: there are bound to be important projects, publications, individuals and activities that have been omitted, and there is a strong UK and Anglophone focus on the activities studied. In particular, while the survey was widely publicised, it was self-selecting and makes no claim to being statistically representative; it functioned rather as a means of gathering qualitative information about contributors’ backgrounds and motivations. Crowd-sourcing and related concepts The term crowd-sourcing was coined in a Wired article by Jeff Howe, 3 in which he draws a parallel between reducing labour costs by outsourcing to cheaper countries, and utilising ‘the productive potential of millions of plugged-in enthusiasts’. In an academic context, the term has developed from an economic focus to an information focus, in which this productive potential is used to achieve research aims. However, the term is problematic and requires further analysis. It is first necessary to distinguish crowd-sourcing from some related concepts. It is broader and less easy to define than ‘citizen science’, which is commonly understood to refer to activities whereby members of the public undertake well-defined and (individually) small-scale tasks as part of larger-scale scientific projects. 4 Another related concept is the ‘Wisdom of Crowds’, 5 which holds that large-scale collective decision-making can be superior to that of individuals, even experts. Although academic crowd-sourcing can be about decision, the decisions involved are rarely as neatly packageable as those implied in the world of business, where the ‘good’ or ‘bad’ nature of a decision can be evaluated on the basis of profitability. 6 Such collective decision-making also lacks the elements of collaboration around activities conceived and directed for a common purpose that characterise crowd-sourcing as commonly understood. Another important distinction is that between crowd-sourcing and ‘social engagement’. 7 According to Holley, social engagement involves ’giving the public the ability to communicate with us and each other‘, and is ’usually undertaken by individuals for themselves and their own purposes‘, whereas crowd-sourcing ’uses social engagement techniques to help a group of people achieve a shared, usually significant, and large goal by working collaboratively together as a group‘. Holley also notes that crowd-sourcing is likely to involve more effort, and implies a level of commitment and participation that goes beyond casual interest, whereas social engagement is an extension of the kinds of online activities – Tweeting, commenting – that millions do on a daily basis anyway. In one way, this aligns crowd-sourcing with ‘citizen science’. Indeed, Wiggins and Crowston develop this theme by highlighting a distinction between citizen science and community science, and stating as a key ingredient of the former that it is not self-organising and ’does not represent peer production ... because the power structure of these projects is usually hierarchical‘. 8 A fundamental aspect of citizen science is thus that the goal is defined by a particular person or group (almost always as part of a professional academic undertaking), and the participants (recruited through an open call) provide some significant effort towards achieving that goal. However, the different intellectual traditions of the sciences and the humanities embrace, and are embraced by, different kinds of non-academic community. Indeed, as Trevor Owens has noted, most successful crowd-sourcing activities in the humanities and cultural sectors are not really about crowds at all, in the sense of ’large anonymous masses of people’, but are about ’participation from interested and engaged members of the public’. 9 While a crowd-sourcing project may have the capacity for involving large numbers of people, in many cases only a few contributors end up being actively engaged, and these contribute a large percentage of the work. While there may be a centralised recruitment process, at this level the body of contributors is self-organising and self-selecting. A number of attempts have been made to identify the key characteristics, or to formulate a typology, of crowd-sourcing and related activities. Estellés-Arolas and González-Ladrón-de-Guevara identify eight characteristics, distilled from 32 distinct definitions identified in the literature: the crowd; the task at hand; the recompense obtained; the crowdsourcer or initiator of the crowdsourcing activity; what is obtained by crowdsourcing process; the type of process; the call to participate; and the medium. 10 This extremely processual definition is comprehensive in identifying stages that map easily to business processes. For the humanities, the ‘type of process’ is both more significant and more problematic, given the great diversity of processes in the creation of humanities research material. A more task-oriented approach is taken by Wiggins and Crowston, 11 who construct a typology for ‘citizen science’ activities, identifying five areas of application: Action, Conservation, Investigation, Virtual, and Education. The factors that lead to an activity being assigned to a category are multivariate, and the identification of the categories was based on whether there is an occurrence in a category or not, rather than frequency of those occurrences. The coverage is therefore extremely broad; ’Action’, for example, covers self-organising citizen groups that use web technologies to achieve a common purpose, often to do with campaigns on local issues. Moreover, the use of the word ‘science’ (at least in the usual Anglophone sense) confines the activities reviewed (in terms of both the methods and the content) to a particular epistemic bracket, which inevitably excludes some aspects of humanities research. One widely-quoted set of definitions for citizen science projects was presented by Bonney et al.. 12 This divided the field into three broad categories: contributory projects, in which members of the public, via an open call, contribute along lines that are tightly defined and directed by scientists; collaborative projects, which have a central design but to which members of the public contribute data, and may also help to refine project design, analyze data, or disseminate findings; and co-created projects, which are designed by scientists and members of the public working together and for which at least some of the public participants are actively involved in the scientific process. This approach shares important characteristics with the ‘task type’ described below, in that it is rooted in the complexity of the task, and the amount of initiative and independent analysis required to make a contribution. The Galleries, Libraries, Archives and Museums (hereafter GLAM) sectors have in particular seen efforts to develop crowd-sourcing typologies. One such typology has been proposed by Mia Ridge in a blog post, 13 and includes the following categories: Tagging, Debunking (i.e. correcting/reviewing content), Recording a personal story, Linking, Stating preferences, Categorizing, and Creative responses. Again, these categories imply a processual approach, concerning the type of task being carried out, and are potentially extensible across different types of online and physical content and collections. Another typology from the GLAM domain was developed by Oomen and Aroyo. 14 Their categories include Correction and Transcription, defined as inviting users to correct and/or transcribe outputs of digitisation processes (a category that Ridge’s ‘Debunking’ partially, but not entirely, covers); Contextualisation, or adding contextual knowledge to objects, by constructing narratives or creating User Generated Content (UGC) with contextual data; Complementing Collections, which is the active pursuit of additional objects to be included in a collection; Classification, defined as the gathering of descriptive metadata related to objects in a collection (Ridge’s ‘Tagging’ is a subset of this); Co-curation, which is using inspiration/expertise of non-professional curators to create (Web) exhibits (somewhat analogous to the co- created projects of Bonney et al., but more task-oriented); and Crowdfunding, or the collective cooperation of people who pool their money and other resources together to support efforts initiated by others. 15 Ridge explicitly rejects crowdfunding as a component of crowd-sourcing. 16 These typologies from the GLAM world perhaps represent best the different crowd-sourcing activities examined by the study, although such lists of categories do not reflect fully the complexity of the situations encountered. Instead, we propose a typology that is orientated along four distinct, although inter- dependent, facets, as described in Crowd-sourcing and research infrastructures below. Motivations Motivations of participants Overview Most studies have concluded that crowd-sourcing contributors typically do not have a single motivation; our own survey indicated overwhelmingly (79%) that the contributors who responded have both personal and altruistic motivations. However in many cases it is possible to identify a dominant motivating factor, which is almost always concerned directly with the activity’s subject area. In an analysis of 207 forum posts and interview responses for example, the Galaxy Zoo project found that the top motivations were an interest in astronomy (39%), a desire to contribute (13%) and a concern with the vastness of the universe (11%). 17 A study of volunteers for the Florida Fish and Wildlife Conservation Commission’s Nesting Beach Survey found that concern for turtle conservation was the overwhelming motivating factor. 18 Moreover, studies of the motivations of the contributors to academic crowd-sourcing projects have emphasised personal interest in the subject area concerned, and the opportunities provided to exercise that interest and to engage with people who share it, without material benefit. Such interest is usually concerned with the outcome, but it can also be in the process, or some combination of both. For example, in her 2009 assessment of volunteers to the TROVE project, Holley notes that ‘a large proportion was family history researchers’, who were highly motivated and had ‘a sense of responsibility towards other genealogists to help not only themselves but other people where possible’. 19 In general, it may be said that research into crowd-sourcing motivations suggests a clear primary, although not exclusive, focus on the subject or activity area, and that motivations can be personal or altruistic, and extrinsic or intrinsic. Rewards For the most part, crowd-sourcing projects do not reward their contributors directly in material or professional terms, and conversely contributors to crowd-sourcing projects are not subject to discipline (in either sense) or sanction in the way that members of conventionally-configured research projects are. Indeed, it is clear that the motivations of participants in academic crowd-sourcing tend to be intrinsic to the activity. However, we may regard more indirect benefits as constituting a form of reward: the fulfilment of an interest in the subject; personal gains such as skills, experience or knowledge; some form of status; or a feeling of gratification. In our survey, contributors mentioned a number of skills gained, including general IT competencies, such as editing wikis and using Skype for distributed collaboration, as well as specialised skills such as TEI encoding. Many contributors gained domain knowledge, for example through the opportunity to edit historical documents (ships’ histories) resulting from participation in the Old Weather project. This project showed that the domain interests of the participants can differ from those of the project team, which in this case is solely interested in those parts of the documents being transcribed that relate to climate history, 20 whereas several contributors became interested in the histories of individual ships, and in addressing niches of history that had been hitherto unexplored. Participants can also pick up a basic grounding in research methods of collation, synthesis and analysis in the area of interest to them. Less concrete benefits also function as rewards. It was frequently noted that some form of ‘feedback loop’, through which a participant is informed that their contributions were correct and valuable, is a very important motivating factor for engaging with crowd-sourcing projects, and conversely that a lack of feedback can be very frustrating and discouraging to the participant. Feedback also plays a key role in building a sense of community, and making participants feel that they have a stake in the project. For complex tasks, feedback may also be a necessary part of improving volunteers’ work practices, as in Transcribe Bentham. 21 This feedback can be immediate and specific to an individual contribution – for example,. participants in the British Library’s Georeferencer project (BLG), 22 who could see the results of their work immediately – or it can be deferred and cumulative, for example by means of rankings. Contributors may receive various ’social’ rewards, for example through rankings, increased standing in the crowd-sourcing community, or (in the case of Galaxy Zoo) being credited and named in publications. Similarly, contributors may be subjected to social sanctions, such as banning (e.g. removal of pages or blocking of accounts on Wikipedia), which can adversely affect their reputation and enjoyment, and may even in rare cases reflect on their professional standing. As well as simple feedback interactions between the project and an individual user, the ability to interact with other participants, for example via a project forum, is an extremely important motivation. Such project-based social networks are used both for ‘exchanging chit-chat’ and for discussing and sharing information on the practical and technical issues raised, and can foster a sense of community among the participants that can extend beyond the immediate activities of the project itself. A good example of this is the Old Weather forum, 23 which contains exchanges among participants that are indicative of a high degree of collaborative, communal working in addressing problems that arise during the process. The importance of forums was also noted by participants in Transcribing Bentham and British Library Georeferencer. Gamification Some approaches have emphasised the importance of tasks being enjoyable, and have focused on the development of games for crowd-sourcing of different kinds. Prestnopnik and Crowston discuss the role of games, and in particular possible approaches to creating an application for crowd-sourced natural history taxonomy classification using design science. 24 The Bodiam Castle project provides an example of the potential for games in the context of archaeological analysis of buildings, although this had a greater emphasis on visualisation than on competition. 25 However, Prestnopnik and Crowston also note that ‘gamification’ can act as a disincentive to contributors who have expert knowledge or deep interest in the subject. 26 Gamification can also be a barrier for users who simply want to engage with the assets or processes in question, and can trivialise the process of acquiring or processing data. 27 In their analysis of The Bird Network project, in which gathered data about the use of bird-boxes by birds, which was then shared with the scientific team, Brossard et al. note that participants’ interest in ornithology was likely to overshadow awareness of scientific process, 28 and thus stymie efforts by the Lab to contribute to scientific awareness and education. 29 Competition Although very few participants in our survey admitted to being motivated by competition with each other, among those who attended our workshop competition featured strongly as a factor, although this should be qualified by the fact that those present tended to be ‘super contributors’, who are likely to feel more competitive than those in the ‘long tail’ of the crowd. For many projects it is possible to track individual participant’s contributions and to acquire statistics on contributions, and in such cases projects can establish ‘leader boards’ indicating which participants have made the biggest contributions (in whatever terms the project is using). For example, the British Library’s Georeferencer project displayed the handles of the users who processed the most maps, and the ‘winner’ was invited to meet the Library’s head of cartography. The Old Weather project also encouraged competition by assigning roles to contributors based on the number of pages transcribed. However, in order for competition to be a significant motivating factor, the tasks and their outcomes must be sufficiently quantifiable to allow mutual comparison; matters can become complex when tasks are not comparable directly. For example, in BLG some maps were more complex than others, and the team felt that this affected the meaningfulness of comparing the effort needed to georeference them. Where more creative or interpretive outputs are being created, this lack of commensurability is a still greater issue, and there may even be conflicts between outputs; simple rankings seem inappropriate to such scenarios. In any case, the encouragement of competition should not be at the cost of alienating potential participants who are not by nature competitive, nor of favouring speed and volume at the expense of quality and care. Indeed, competition can be defined not just in this quantitative sense; volunteers may compete to produce more high-quality work, although in the absence of metrics this can amount to competing only against oneself. Note also that competition is not incompatible with a sense of common purpose; for example, Old Weather participants often ‘feel like part of the ship’ on which they are working. Motivations of academics At least part of the success of Galaxy Zoo and other Zooniverse projects is that they catered to clear and present academic needs. In the case of Galaxy Zoo itself, the assets – photographs of galaxies – were far too numerous to be examined individually by any research team, and the task – the classification of those galaxies – was not one that could be performed by computer software, although for the most part could be carried out by a person without specialist expertise. 30 Quite simply, this is work that could not have been carried without large-scale public engagement and participation. Most cases where humanities academics have engaged with crowd-sourcing have been driven by specific research questions or the need for a particular resource. For example, the Transcribe Bentham project was motivated by the fact that 40,000 folios of Bentham’s work were untranscribed, and thus these valuable primary sources were inaccessible to people researching eighteenth or nineteenth century thought. 31 BLG was motivated by the desire to make its map collections more searchable and thus more exploitable. In Old Weather, researchers were motivated by the desire to be able to use information contained within the assets to explore historic weather patterns, although these motivations may not necessarily be shared by the participants. 32 Although the research motivations are various, the key characteristic leading the project to use crowd-sourcing is that each involves tasks that a computer could not carry out, and that a research team could only do only with prohibitively large resources. Note however, during the initial six-month testing period of the project, the rate of volunteer transcription compared unfavourably with that of professional researchers, 33 possibly due to the complexity of the material and the difficulty of Bentham’s handwriting. There was also an extremely high moderation overhead, with significant staff time needed to validate the outputs and provide feedback to the contributors. Since then, the volunteer transcription rate has improved significantly, so there is potential for avoiding significant costs in the future. 34 However, this example can serve as a warning against assumptions that crowd-sourcing provides free labour. Other researchers, particularly those in the GLAM sector, see crowd-sourcing as a means of filling gaps in the coverage of their collections, 35 as it can be an effective way of obtaining information about assets (or the assets themselves) to which only certain members of the public have access, for example through personal or family connections. However, in order to be usable for academic purposes, a degree of curation is required, and this may involve expert input. It is clear that public engagement and community building is frequently an unintentional by-product of crowd-sourcing projects. In some cases it is seen as an explicit motivation, with the aim of encouraging public engagement with scholarly archives and research, and thus increasing the broader impact of academic research activities. 36 Crowd-sourcing and research infrastructures A conceptual framework for crowd-sourcing One of the outcomes of our study is a typology for crowd-sourcing in the humanities, which brings together the earlier work cited in Section 2 with the experiences and processes uncovered during the study. It does not seek to provide an alternative set of categories specifically for the humanities, in competition with those considered above. Rather, we propose a model for describing and understanding crowd-sourcing projects in the humanities by analysing them in terms of four key facets – asset type, process type, task type, and output type – and of the relationships between them, and in particular by observing how the applicable categories in one facet are dependent on those in other facets. Error! Reference source not found. illustrates the four facets and their interactions. • A process is composed of tasks through which an output is produced by operating on an asset. It is conditioned by the kind of asset involved, and by the questions that are of interest to project stakeholders (both organisers and volunteers) and can be answered, or at least addressed, using information contained in the asset. • An Asset refers to the content that is, in some way, transformed as a result of processing by a crowd-sourcing activity. • A task is an activity that a project participant undertakes in order to create, process or modify an asset (usually a digital asset). Tasks can differ significantly as regards the extent to which they require initiative and/or independent analysis on the part of the participant, and the difficulty with which they can be quantified or documented. The task types were identified the aim of categorising this complexity, and are listed below in approximately increasing order. • The output is what is produced as the result of applying a process to an asset. Outputs can be tangible and/or measurable, but we make allowance also for intangible outcomes, such as awareness or knowledge etc. Error! Reference source not found.–Error! Reference source not found. list the categories that the study identified under each facet; these are based for the most part on an examination of existing crowd- sourcing practice, so it is to be expected that the lists will be extended and/or challenged by future work. Detailed descriptions of each category may be found in the report by Dunn and Hedges; 37 in the rest of this paper, we examine the framework specifically in relation to humanities research infrastructures. From crowd-sourcing primitives to research infrastructures Rather than attempting to map the elements of this crowd-sourcing framework to specific infrastructures or infrastructural components, we note instead that it may be thought of as a framework of ‘primitives’, in a sense analogous to that of ‘scholarly primitives’. Scholarly primitives may be defined as ’basic functions common to scholarly activity across disciplines’, 38 and they provide a conceptual framework for classifying scholarly activities. Given the diversity of humanities research, it is not surprising that there are various sets of candidates – in addition to Palmer et al. there are, for example, Unsworth, 39 Benardou et al. 40 and Anderson et al. 41 – and such a structure has in particular been used as a framework for conceptualising and developing infrastructure for supporting humanities research. 42 The process facet in particular may be regarded as providing a set of primitives in this sense, and the output type composite digital collection with multiple meanings may in particular be regarded as a form of humanities ‘research object’, in the sense used by Bechhofer et al. 43 and Blanke and Hedges. 44 Of course, the categorisation into primitives described above is quite different to those in the works cited; this is only to be expected, as it represents the activities of quite different stakeholders, namely interested members of the public rather than professional scholars (although of course one person can play different roles in different circumstances). In particular, there is a greater emphasis on creating or enhancing digital assets in some way, rather than using these assets in research (although again these activities can overlap. For the remainder of this paper, we will look in more detail at each of the process types in turn, using specific examples examined by the study with a view to seeing how crowd-sourcing can contribute effectively to humanities research infrastructures. COLLABORATIVE TAGGING Collaborative tagging may be regarded as crowd-sourcing the organisation of information assets by allowing users to attach tags to those assets. Tags can be based on existing controlled vocabularies, but are more usually derived from free text supplied by the users themselves. Such ‘folksonomies’ are distinguished from deliberately designed knowledge organisation systems by the fact that they are self- organising, evolving and growing as contributors add new terms. It is possible to extract more formal vocabularies from folksonomies. 45 Collaborative tagging can result in two concrete outcomes: it can make a corpus of information assets searchable using keywords applied by the user pool, and it can highlight assets that have particular significance, as evidenced by the number of repeat tags they are accorded by the pool. Research in this area has examined the patterns and information that can be extracted from folksonomies. Golder found that patterns generated by collaborative tagging are, on the whole, extremely stable, meaning that minority opinions can be preserved alongside more highly replicated, and therefore mainstream, concentrations of tags. 46 Other research has shown that user-assigned tags in museums may be quite different from vocabulary terms assigned by curators, and that relating tags to controlled vocabularies can be very problematic, 47 although it could be argued that this allows works to be addressed from a different perspective than that of the museum’s formal documentation. In any case, such approaches to knowledge organisation are likely to play a significant part in the organisation of humanities data in the future. An example is the BBC’s YourPaintings project, 48 developed in collaboration with the Public Catalogue Foundation, which has amassed a collection of photographs of all paintings in public ownership in the UK. The public is invited to apply tags to these, which both improves discovery and enables the creation of an aggregation of specialised knowledge. A more complex example is provided by the Prism project. 49 Collaborative tagging typically assumes that the assets being tagged are themselves stable and clearly identifiable as distinct objects. Prism allowed readers to highlight significant areas of a text and apply tags to them, and thus build up a collective interpretation of the text. Unlike many humanities crowd-sourcing activities, such as transcribing texts according to well-defined procedures, which have identifiable completions, interpretation can go on indefinitely, and there are no right or wrong answers. LINKING Linking covers the identification and documentation of relationships (usually typed) between individual assets. Most commonly, this takes the form of linking via semantic tags, where the tags describe binary relationships, in which case it is analogous to collaborative tagging. In principle, this could also include the identification of n-ary relationships. TRANSCRIBING Transcribing is currently one of the most prominent areas of humanities crowd-sourcing, as it can be used to address a fundamental problem with digitisation, namely the difficulty of rendering handwriting into machine-readable form using current technology. Typically, such transcription requires the human eye and, in many cases, human interpretation. In terms of our typology, the output of a transcribing process will typically be transcribed text. Two projects have contributed significantly to this prominence: Old Weather (OW) and Transcribe Bentham (TB). OW involved the transcription of ships’ log-books held by The National Archives, in order to obtain access to the weather observations they contain, information that is of major significance for climate research. 50 TB encouraged volunteers to transcribe and engage with unpublished manuscripts by the philosopher and reformer Jeremy Bentham, by rendering them into text marked up using TEI XML. 51 The collaborative model needed for successful crowd-sourced transcription depends on the complexity of the source material. Complex material, such as these two cases, requires a high level of support, whether from the project team or a participant’s peers. Simpler material is likely to require less support; for example, when transcribing the more structured data found in family records, 52 the information (text or integers) to be transcribed is presented to the user in small segments – e.g. names, dates, addresses – and transcription requires different cognitive processes that are less dependent on interaction with peers and experts. Note that this category includes marked-up transcriptions, e.g. using TEI XML, as well as simple transcription of characters. There will be a point however at which the addition of semantic mark-up will go beyond mere transcription, and will count as a form of collaborative tagging or linking, and the output will typically be enhanced text. CORRECTING/MODIFYING CONTENT While content is increasingly ‘born digital’, projects for digitising analogue material abound. Many mass- digitisation technologies, such as Optical Character Recognition (OCR) and speech recognition, can be error-prone, and any such enterprise needs to factor in quality control and error correction, which can make use of crowd-sourcing. The TROVE project, which produced OCR-ed scans of newspapers from the Australian National Archives, is an excellent example of this. 53 The volume of digitised material precluded the corrections being undertaken by the Archive’s its own staff, and using uncorrected text would have significantly reduced the benefits of digitisation, as search capability would have been very restricted. Another potential application in this category is for correcting automated transcriptions of recorded speech, as such transcription is currently highly error-prone, with error rates of 30% or more. 54 RECORDING AND CREATING CONTENT Processes in this category frequently deal with ephemera and intangible cultural heritage. The latter covers any cultural manifestation that does not exist in tangible form; typically, crowd-sourcing is used to document such heritage through a set of processes and tasks, resulting in some form of tangible output. The importance of preserving intangible cultural heritage has been recognised by the UN, 55 and the ways in which this can be documented and curated by distributed communities is an important area for future research. Frequently this takes the form of a cultural institution soliciting memories from the communities it serves, for example the Tenbury Wells Regal Cinema’s Memory Reel project. 56 Such processes can incorporate a form of editorial control or post hoc digital curation, and their outputs can be edited into more formal publications. Another example is the Scottish Words and Place-names (SWAP) project, 57 which gathered words in Scots, determining which words were in current use and where/how they were used, with the ultimate aim of offering selected words for inclusion in the Scottish Language Dictionaries resource. 58 Candidate words were gathered via the project website as well as via social media – Facebook in particular was an important venue for developing conversations around the material – and words that the project felt were suitable were passed to lexicographers for further scrutiny. By ephemera, we understand cultural objects that are tangible, but are at risk of loss because of their transitory nature, for example home videos or personal photographs. 59 There are a number of project addressing such assets, for example the Europeana 1914-1918 project, 60 which is collecting digitised personal artefacts relating to the First World War. The ubiquity of the Web, and access to content creation and digitisation technologies, has led to the creation of non-professionally curated online archives. These have a clear role to play in enriching, augmenting and complementing collections held by memory institutions, and in developing curatorial narratives independent from those of library and archive professionals. 61 Processes in this category are also likely to have elements of the ‘social engagement’ model, in terms of Holley’s distinction. 62 COMMENTING, CRITICAL RESPONSES AND STATING PREFERENCES Processes of this type are likely to count as crowd-sourcing only if there is some specific purpose around which people come together. One example of this is the Shakespeare’s Global Communities project, 63 which captured audience responses to the 2012 World Shakespeare Festival, with the aim of investigating how ‘social networking technologies reshape the ways in which diverse global communities connect with one another around a figure such as Shakespeare’ 64 . The question provides a focus for the activity, which, although not itself producing an academic output, provides a dataset for addressing research questions on the modern reception of Shakespeare. Appropriately managed blogs can provide a platform for focused scholarly interactions of this type. For example, a review by Sonia Massai of King Lear on the Year of Shakespeare site attracted controversial responses, leading to an exchange about critical methods as well as content. 65 What differentiates such exchanges from amateur blogging is the scholarly focus and context provided by the project, and its proactive directing of content creation. The project thus provides a tangible link between the crowd and the subject. CATEGORISING Categorising involves assigning assets to predefined categories; it differs from collaborative tagging in that the latter is unconstrained. CATALOGUING Cataloguing – or the creation of structured, descriptive metadata – is a more open-ended process than categorising, but is nevertheless constrained to following accepted metadata standards and approaches. It frequently includes categorising as a sub-activity, e.g. by LoC subject headings. Cataloguing is a time- and resource-consuming process for many GLAM institutions, and crowd-sourcing has been explored as a means of addressing this. For example, the What’s the Score project at the Bodleian investigated a cost-effective approach to increasing access to music scores from their collections through a combination of rapid digitisation and crowd-sourcing descriptive metadata. 66 Cataloguing is related to contextualising, as ordering, arraying and describing assets will also make explicit some of their context. CONTEXTUALISING Contextualising is typically a more broadly-conceived activity than the related process types of cataloguing or linking, and it involves enriching an asset by adding to it or associating with it other relevant information or content. GEOREFERENCING Georeferencing is the process of establishing the location of un-referenced geographical information in terms of a modern coordinate system such as latitude and longitude. Georeferencing can be used to enrich geospatial assets – datasets or texts, including maps, gazetteers or travelogues, that refer to locations on the earth’s surface – that do not include such explicit information. A major example of crowd-sourcing activity in this area is the British Library Georeferencer project, which aimed to ’geo-enable‘ historical maps in its collections by asking participants to assign spatial coordinates to digitised map images, a task that would have been too labour-intensive for Library staff to undertake themselves. Once georeferenced, the digitised maps are searchable geographically due to the inclusion of latitude and longitude coordinates in the metadata. 67 MAPPING Mapping (in the sense of this typology) refers to the process of creating a spatial representation of some information asset(s). This could involve the creation of map data from scratch, but could also be applied to the spatial mapping of concepts, as in a ‘mind map’. The precise sense will depend on the asset type to which mapping is being applied. There is an important distinction between maps and related geospatial assets created by expert organisations, such as the Ordnance Survey, and those created by community-based initiatives. The former may have the authority of a governmental imprimatur, and the distinction of official endorsement. However, the recent emergence of crowd-sourced geospatial assets – a product of the recent global growth in the ownership of hand-held devices with the ability to record location using GPS 68 – has led to the emergence of resources such as Open Street Map, 69 which has in turn led to a discussion about the reliability of such resources. In general, it has been found that Open Street Map in particular is extremely reliable, 70 but that the specifications for such resources must be carefully defined. 71 The impact of Open Street Map on the cartographic community generally has been noted. 72 The importance of mapping as a means of convening spatial significance means that this kind of asset is particularly open to different discourses, and possibly conflicting narratives. The digital realm, with its potential for accommodating multiple, diverse, contributions and interpretations, holds great potential for such material. 73 TRANSLATING This covers the translation of content from one language to another. In many cases, a crowd-sourced translation will require a strongly collaborative element if it is to be successful, given the semantic interdependencies that can occur between different parts of a text. However, in cases where a large text can be broken up naturally into smaller pieces, a more independent mode of work may be possible; for example, Suda On-Line, 74 which is translating the entries in a 10 th Century Byzantine lexicon/encyclopaedia. A more modern, although non-academic, example is the phenomenon of ‘fansubbing’, where enthusiasts provide subtitles for television shows and other audiovisual material. 75 Conclusions One of the main conclusions of our study is that research involving humanities crowd-sourcing can best be framed and understood through an analysis in terms of four fundamental facets – asset type, process type, task type, and output type – and of the relationships between them. Depending on the activity in question, and what it aims to do, some categories, or indeed some facets, will have primacy. Outputs might be original knowledge, or they might be more ephemeral and difficult to identify: however, considering the processes of both knowledge and resource creation as comprising of these four facets gives a meaningful context to every piece of research, publication and activity we have uncovered in the course of this review. We hope the lessons and good practice we have identified here will, along with this typology, contribute to the development of new kinds of humanities crowd-sourcing in the future. Significantly, we have determined that most humanities scholars that have used crowd-sourcing as part of some research activity agree that it is not simply a form of ‘cheap labour’ for mass digitisation or resource enhancement; indeed, in a narrowly cost-benefit sense it does not always compare well with more conventional mechanisms of digitisation. In this sense, it has truly left its economic roots, as defined by Howe (2006), behind. The creativity, enthusiasm and alternative foci that communities outside that academy can bring to academic research is a resource that is now ripe for tapping into, and the examples above illustrate the rich variety of forms that this tapping can take. We have noted the similarity between some aspects of our typology and the concept of the ‘scholarly primitive’, which has proved valuable in humanities e-research for providing a conceptual framework of fundamental building blocks for describing scholarly activities and modelling putative research infrastructures for the humanities. We have used this relationship to investigate how crowd-sourcing activities falling under various process types can contribute effectively to such research infrastructures. Acknowledgements and additional information A list of the projects investigated by the study, and a description of the survey (including the questions and a summary of the results) may be found in Appendices B and A respectively of (Dunn and Hedges 2012). The project website is at http://humanitiescrowds.org, and additional information (in ‘raw’ form) from the workshops organised as part of the study may be found at http://humanitiescrowds.org/wp- uploads/2012/09/workshop_report1.pdf. We are very grateful to all those who have shared their knowledge and experience with us during the study, and in particular those who agreed to be interviewed, or participated in the workshops, or provided feedback on the project report. 1 We follow the convention of hyphenating ‘crowd-sourcing’; other authors use ‘crowdsourcing’ or ‘crowd sourcing’. In quotations, we preserve the original form. 2 T. Blanke, M. Bryant, M. Hedges, A. Aschenbrenner and M. Priddy, ‘Preparing DARIAH’, 7th IEEE International Conference on e-Science, Stockholm, Sweden (2011), 158-165, http://dx.doi.org/10.1109/eScience.2011.30. 3 J. Howe, ‘The rise of crowdsourcing’, Wired, 14.06 (2006), http://www.wired.com/wired/archive/14.06/crowds.html. 4 J. Silvertown, ‘A new dawn for citizen science’, Trends in ecology & evolution, 24, No. 9 (2009), 467-71. D. P. Anderson, J. Cobb, E. Korpela, M. Lebofsky and D. Werthimer, ‘SETI@home: an experiment in public-resource computing’, Communications of the ACM, 45, Issue 11 (2002), 56-61. 5 J. Surowiecki, The wisdom of crowds: why the many are smarter than the few, 2004. 6 D. Brabham, ‘Crowdsourcing as a model for problem solving: an introduction and cases’, Convergence: The International Journal of Research into New Media Technologies, 14, Issue 1 (2008), 75-90. 7 R. Holley, ‘Crowdsourcing: how and why should libraries do it?’, D-Lib Magazine, 16, No. 3/4 (2010), http://www.dlib.org/dlib/march10/holley/03holley.html. 8 A. Wiggins and K. Crowston, ‘From conservation to crowdsourcing: a typology of citizen science’, System Sciences (HICSS), 2011 44 th Hawaii International Conference, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5718708. 9 http://www.trevorowens.org/2012/05/the-crowd-andthe-library 10 E. Estellés-Arolas and F. González-Ladrón-de-Guevara, ‘Towards an integrated crowdsourcing definition’, Journal of Information Science, 38, No. 2 (2012), 189-200. 11 A. Wiggins and K. Crowston, ‘From conservation to crowdsourcing: a typology of citizen science’. 12 R. Bonney, H. Ballard, R. Jordan, E. McCallie. T. Phillips, J. Shirk and C. Wilderman, Public Participation in Scientific Research: Defining the Field and Assessing Its Potential for Informal Science Education, Center for Advancement of Informal Science Education, Washington D. C. (2009), http://caise.insci.org/uploads/docs/PPSR%20report%20FINAL.pdf. 13 http://openobjects.blogspot.co.uk/2012/06/frequently-asked-questions-about.htm 14 J. Oomen and L. Aroyo, ‘Crowdsourcing in the cultural heritage domain: opportunities and challenges’, Proceedings of the 5 th International Conference on Communities and Technologies (2011), 138-149, http://www.cs.vu.nl/~marieke/OomenAroyoCT2011.pdf. 15 A. Agrawal, C. Catalini and A. Goldfarb, ‘The geography of crowdfunding’, NET Institute Working Paper Series, 10-8 (2011), 1-57, http://ssrn.com/abstract=1692661. 16 http://openobjects.blogspot.co.uk/2012/06/frequently-asked-questions-about.htm 17 M. J. Raddick, G, Bracey, P. L. Gay, C. J. Lintott, P. Murray, K. Schawinski, A. S. Szalay and J. Vandenberg, ‘Galaxy Zoo: exploring the motivations of citizen science volunteers’, Astronomy Education Review, 9 (2010), http://aer.aas.org/resource/1/aerscz/v9/i1/p010103_s1. 18 B. M. Bradford and G. D. Israel, ‘Evaluating volunteer motivation for sea turtle conservation in Florida’, Agricultural Education (2004), 1-9. 19 R. Holley, Many hands make light work: public collaborative OCR text correction in Australian historic newspapers, National Library of Australia (2009), http://www.nla.gov.au/ndp/project_details/documents/ANDP_ManyHands.pdf. 20 http://crowds.cerch.kcl.ac.uk/wp-uploads/2012/04/Brohan.pdf 21 T. Causer, J. Tonra and V. Wallace, ‘Transcription maximized; expense minimized? crowdsourcing and editing The Collected Works of Jeremy Bentham’, Literary and Linguistic Computing, 27, Issue 2 (2012), 1-19. Similar conclusions were drawn by the authors of the current article, based on their interviews with staff and volunteers from the Old Weather project and the British Library’s Georeferencer project. 22 http://www.bl.uk/maps/ 23 http://forum.oldweather.org 24 N. R. Prestopnik and K. Crowston, ‘Gaming for (citizen) science: exploring motivation and data quality in the context of crowdsourced science through the design and evaluation of a social-computational system’, Proceedings of “Computing for Citizen Science” workshop at the 7 th IEEE eScience Conference (2011), http://crowston.syr.edu/sites/crowston.syr.edu/files/gamingforcitizenscience_ver6.pdf. 25 http://crowds.cerch.kcl.ac.uk/wp-uploads/2012/04/Masinton.pdf 26 N. R. Prestopnik and K. Crowston, ‘Gaming for (citizen) science: exploring motivation and data quality in the context of crowdsourced science through the design and evaluation of a social-computational system’ (2011). 27 See http://blog.tommorris.org/post/3216687621/im-not-an-experience-seeking-user-im-a for a combative assertion of this position. 28 D. Brossard, B. Lewenstein and R. Bonney, ‘Scientific knowledge and attitude change: the impact of a citizen science project’, International Journal of Science Education, 27, Issue 9 (2005), 1029-1121. 29 D. J. Trumbull, R. Bonney, D. Bascom and A. Cabral, ‘Thinking scientifically during participation in a citizen-science project’ Science Education, 84, Issue 2 (1999), 265-275. 30 C. J. Lintott, K. Schawinski, A. Slosar, K. Land, S. Bamford, D. Thomas, M. J. Raddick, R. Nichol, A. Szalay, D. Andreescu, P. Murray and J. Vandenberg, ‘Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey’, Monthly Notices of the Royal Astronomical Society, 389, Issue 3 (2008), 1179-1189. 31 http://humanitiescrowds.org/wp-uploads/2012/04/Causer.pdf 32 http://humanitiescrowds.org/wp-uploads/2012/04/Brohan.pdf 33 T. Causer, J. Tonra and V. Wallace, ‘Transcription maximized; expense minimized? crowdsourcing and editing The Collected Works of Jeremy Bentham’ (2012). 34 T. Causer and V. Wallace, ‘Building a volunteer community: results and findings from Transcribe Bentham’ Digital Humanities Quarterly, 6. No. 2 (2012), http://www.digitalhumanities.org/dhq/vol/6/2/000125/000125.html. 35 M. Terras, ‘Digital curiosities: resource creation via amateur digitisation’, Literary and Linguistic Computing, 25, No. 4 (2010), 425-438, doi:10.1093/llc/fqq019. 36 M. Moyle, J. Tonra and V. Wallace, ‘Manuscript transcription by crowdsourcing: Transcribe Bentham’. Liber Quarterly - The Journal of European Research Libraries. 20, Issue 3/4 (2011). 37 S. Dunn and M. Hedges, ‘Crowd-sourcing scoping study: engaging the crowd with humanities research’, Arts and Humanities Research Council report (2012), http://humanitiescrowds.org/wp- uploads/2012/12/Crowdsourcing-connected-communities.pdf. 38 C. L. Palmer, L. C. Teffeau and C. M. Pirmann, ‘Scholarly information practices in the online environment: themes from the literature and implications for library service development’ (2009). 39 J. Unsworth, ‘Scholarly primitives: what methods do humanities researchers have in common, and how might our tools reflect this’, ‘Humanities Computing, Formal Methods, Experimental Practice’ Symposium, King’s College London (2000), http://people.lis.illinois.edu/~unsworth/Kings.5-00/primitives.html. 40 A. Benardou, P. Constantopoulos, C. Dallas and D. Gavrilis, ‘Understanding the information requirements of arts and humanities scholarship’, International Journal of Digital Curation, 5, No. 1 (2010), 18-33. 41 S. Anderson, T. Blanke and S. Dunn, ‘Methodological commons: arts and humanities e-science fundamentals’, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 368, No. 1925 (2010), 3779-3796. 42 T. Blanke and M. Hedges, ‘Scholarly primitives: building institutional infrastructure for humanities e- science’, Future Generation Computer Systems, 29, Issue 2 (2013), 654-661, http://dx.doi.org/10.1016/j.bbr.2011.03.031. 43 S. Bechhofer, I. Buchan, D. De Roure, P. Missier, J. Ainsworth, J. Bhagat, P. Couch, D. Cruickshank, M. Delderfield, I. Dunlop, M. Gamble, D. Michaelides, S. Owen, D. Newman, S. Sufi and C. Goble, Future Generation Computer Systems, 29, Issue 2 (2013), 599–611, http://dx.doi.org/10.1016/j.future.2011.08.004. 44 T. Blanke and M. Hedges, ‘Scholarly primitives: building institutional infrastructure for humanities e- science’ (2013). 45 H. Lin and J. Davis, ‘Computational and crowdsourcing methods for extracting ontological structure from folksonomy, The Semantic Web: Research and Applications, Lecture Notes in Computer Science, 6089 (2010), 472-477, DOI:10.1007/978-3-642-13489-0_46. 46 S. Golder, ‘Usage patterns of collaborative tagging systems’, Journal of Information Science, 32, Issue 2 (2006), 198-208. 47 J. Trant, Tagging, Folksonomy, and Art Museums: Resultsof steve.museum’s research (2009), http://conference.archimuse.com/blog/jtrant/stevemuseum_research_report_available_tagging_fo; J. Trant, D. Bearman and S. Chun, ‘The eye of the beholder: steve.museum and social tagging of museum collections’, Proceedings of the International Cultural Heritage Informatics Meeting (ICHIM07), Toronto, Canada (2007). 48 http://www.bbc.co.uk/arts/yourpaintings/ 49 http://www.scholarslab.org/category/praxis-program/ 50 P. Brohan, R. Allan, J. E. Freeman, A. M. Waple, D. Wheeler, C. Wilkinson and S. Woodruff, ‘Marine observations of old weather’ Bulletin of the American Meteorological Society, 90, Issue 2 (2009), 219-230. 51 T. Causer, J. Tonra and V. Wallace, ‘Transcription maximized; expense minimized? crowdsourcing and editing The Collected Works of Jeremy Bentham’ (2012). 52 For example, http://www.familysearch.org 53 R. Holley, Many hands make light work: public collaborative OCR text correction in Australian historic newspapers (2009). 54 M. Wald, ‘Crowdsourcing correction of speech recognition captioning errors’ Proceedings of the International Cross-Disciplinary Conference on Web Accessibility - W4A '11 (2011), http://eprints.soton.ac.uk/272430/1/crowdsourcecaptioningw4allCRv2.pdf. 55 R. Kurin, ‘Safeguarding intangible cultural heritage in the 2003 UNESCO convention: a critical appraisal’, Museum International, 56, Issue 1-2 (2004), 66–77. 56 http://www.regaltenbury.org.uk/memory-reel/ 57 C. Hough, E. Bramwell and D. Grieve, Scots Words and Place-Names Final Report, JISC (2011), http://www.jisc.ac.uk/media/documents/programmes/digitisation/swapfinalreport.pdf. See also http://swap.nesc.gla.ac.uk/. 58 http://www.scotsdictionaries.org.uk/ 59 This usage differs from the standard usage of the term by museums. 60 http://www.europeana1914-1918.eu/en/contributor 61 M. Terras, ‘Digital curiosities: resource creation via amateur digitisation’ (2010). 62 R. Holley, ‘Crowdsourcing: how and why should libraries do it?’ (2010). 63 www.yearofshakespeare.com 64 http://humanitiescrowds.org/wp-uploads/2012/09/workshop_report1.pdf 65 http://bloggingshakespeare.com/year-of-shakespeare-king-lear-at-the-almeida 66 http://www.whats-the-score.org; http://scores.bodleian.ox.ac.uk 67 C. Fleet, K. C. Kowal and P. Pridal, ‘Georeferencer: crowdsourced georeferencing for map library collections, D-Lib Magazine, 18, No. 11/12 (2012), http://www.dlib.org/dlib/november12/fleet/11fleet.html. 68 M. Goodchild, ‘Editorial: citizens as voluntary sensors: spatial data infrastructure in the world of Web 2.0’, International Journal of Spatial Data Infrastructures Research, 2 (2007), 24-32. 69 http://www.openstreetmap.org/ 70 M. Haklay and P. Weber, ‘OpenStreetMap: user-generated street maps’, Pervasive Computing, IEEE, 7, Issue 7 (2008), 12-18. M. Haklay, ‘How good is volunteered geographical information? A comparative study of OpenStreetMap and Ordnance Survey datasets’, Environment and Planning B: Planning and Design,37, Issue 4 (2010), 682-703. 71 C. Brando and B. Bucher, ‘Quality in user generated spatial content: a matter of specifications’, Proceedings of the 13 th AGILE International Conference on Geographic Information Science, Guimarães, Portugal (2010), 1-8. 72 S. Chilton, ‘Crowdsourcing is radically changing the geodata landscape: case study of OpenStreetMap’, Proceedings of the 24th International Cartographic Conference, Santiago, Chile (2009), http://w.icaci.org/files/documents/ICC_proceedings/ICC2009/html/nonref/22_6.pdf. 73 C. Fink, ‘Mapping together: on collaborative implicit cartographies, their discourses and space construction’, Journal for Theoretical Cartography, 4 (2011), 1-14. M. Graham, ‘Neogeography and the palimpsests of place: Web 2.0 and the construction of a virtual earth’, Tijdschrift voor Economische en Sociale Geografie, 101, Issue 4 (2010), 422-436. 74 http://www.stoa.org/sol/ 75 J. D. Cintas and P. M. Sanchez, ‘Fansubs: audiovisual translation in an amateur environment’, Journal of Specialised Translation, 6 (2006), 37-52. Figure 1: Typology framework Process Type Collaborative tagging Linking Correcting/modifying content Transcribing Recording and creating content Commenting, critical responses and stating preferences Categorising Cataloguing Contextualisation Mapping Georeferencing Translating Table 1: Process Types Asset Type Geospatial Text Numerical or statistical information Sound Image Video Ephemera and intangible cultural heritage Table 2: Asset Types TASK Mechanical Configurational Editorial Synthetic Investigative Creative Table 3: Task Types Asset Type Original text Transcribed text Corrected text Enhanced text Transcribed music Metadata Structured data Knowledge/awareness Funding Synthesis Composite digital collection with multiple meanings Table 4: Output Types ADP106.tmp Open Access document downloaded from King’s Research Portal General rights Citation to published version:
work_asoxottesnejraao7uisnuotsy ---- humanities Article Digital Humanities’ Shakespeare Problem Laura Estill Department of English, St. Francis Xavier University; P.O. Box 5000, Antigonish, NS B2G 2W5, Canada; lestill@stfx.ca Received: 28 January 2019; Accepted: 23 February 2019; Published: 4 March 2019 ���������� ������� Abstract: Digital humanities has a Shakespeare problem; or, to frame it more broadly, a canon problem. This essay begins by demonstrating why we need to consider Shakespeare’s position in the digital landscape, recognizing that Shakespeare’s prominence in digital sources stems from his cultural prominence. I describe the Shakespeare/not Shakespeare divide in digital humanities projects and then turn to digital editions to demonstrate how Shakespeare’s texts are treated differently from his contemporaries—and often isolated by virtue of being placed alone on their pedestal. In the final section, I explore the implications of Shakespeare’s popularity to digital humanities projects, some of which exist solely because of Shakespeare’s status. Shakespeare’s centrality to the canon of digital humanities reflects his reputation in wider spheres such as education and the arts. No digital project will offer a complete, unmediated view of the past, or, indeed, the present. Ultimately, each project implies an argument about the status of Shakespeare, and we—as Shakespeareans, early modernists, digital humanists, humanists, and scholars—must determine what arguments we find persuasive and what arguments we want to make with the new projects we design and implement. Keywords: digital humanities; Shakespeare; early modern drama; literary canon; English literature; Renaissance 1. Introduction Digital humanities has a Shakespeare problem; or, to frame it more broadly, a canon problem. Too many digital projects and sites focus on Shakespeare alone. Some sites highlight Shakespeare to the exclusion of other writers; other projects set their bounds at Shakespeare and “not Shakespeare”. Digital humanities’ Shakespeare problem both stems from and reifies Shakespeare’s centrality to the canon of English literature. While this problem is, indeed, a digital humanities problem, it is also a problem in the arts and humanities more generally. Shakespeare is one of the few writers regularly featured in single-author undergraduate courses (alongside, perhaps, Chaucer, Milton, and Austen, albeit to a lesser extent). Shakespeare’s works are so often produced on the twenty-first century stage that American Theatre excludes Shakespeare from their annual list of top-produced American plays in order to “make more room on our list for everyone and everything else” (Tran 2018). Digital humanities has often been heralded as the solution to the canonicity problem, but that is a great burden that it cannot bear alone. This essay begins by demonstrating why we need to consider Shakespeare’s position in the digital landscape, recognizing that Shakespeare’s prominence in digital sources stems from his cultural prominence. I describe the Shakespeare/not Shakespeare divide in digital humanities projects and then turn to digital editions to demonstrate how Shakespeare’s texts are treated differently from his contemporaries—and often isolated by virtue of being placed alone on their pedestal. In the final section, I explore the implications of Shakespeare’s popularity to digital humanities projects, some of which exist solely because of Shakespeare’s status. Shakespeare’s centrality to the canon of digital humanities reflects his reputation in wider spheres such as education and the arts. No digital project will offer a complete, unmediated view of the past, or, indeed, the present. Ultimately, each digital Humanities 2019, 8, 45; doi:10.3390/h8010045 www.mdpi.com/journal/humanities http://www.mdpi.com/journal/humanities http://www.mdpi.com https://orcid.org/0000-0003-0904-3325 http://dx.doi.org/10.3390/h8010045 http://www.mdpi.com/journal/humanities https://www.mdpi.com/2076-0787/8/1/45?type=check_update&version=2 Humanities 2019, 8, 45 2 of 16 humanities project presents an argument about the status of Shakespeare, and we—as Shakespeareans, early modernists, digital humanists, humanists, and scholars—must determine what arguments we find persuasive and what arguments we want to make with the new projects we design and implement. Although the definition of digital humanities (and perhaps even the definition of Shakespeare) is subject to disagreement, for this essay, I limit my scope to digital humanities resources for pedagogy and research. This excludes games such as Richard III Attacks! (P. 2015), online performances such as Such Tweet Sorrow (Silbert 2010), and social media hashtags like #ShakespeareSunday. Cultural studies often informs New Media Shakespeare scholarship to show Shakespeare’s continued prominence online (see O’Neill 2015 for an overview): consider recent issues of Shakespeare Quarterly (Rowe 2010) and Borrowers and Lenders (Calbi and O’Neill 2016) on this topic. Stephen O’Neill (2018), drawing on Douglas Lanier’s notion of “Shakespearean rhizomatics” (Lanier 2014), equates “Our contemporary Shakespeares” to “digital Shakespeares”, describing both as “fully rhizomatic in their extraordinary and seemingly endless flow of relations.” Christy Desmet suggests that we need to encounter all digital Shakespeares (both digital humanities and new media) through the lens of Ian Bogost’s “alien phenomenology” (Bogost 2012), considering “material objects and networks as models for posthuman relations” (Desmet 2017, p. 5). Although Digital Humanities and New Media are often paired, for the purpose of this essay it is useful to differentiate the two: new media endeavors that participate in or create digital culture versus digital humanities projects that announce themselves as contributing to our general and scholarly knowledge. This article focuses on digital humanities projects for two reasons: first, as one way of limiting the scope of the “seemingly endless flow of relations” in Digital Shakespeares, and second, because the majority of digital humanities projects exist primarily to educate rather than to entertain. Digital humanities projects provide the resources we use to study and teach the early modern period: digital editions, bibliographies, digitizations, catalogs, and more. Often, digital humanities projects are expanded from earlier print resources: consider, for instance, the online English Short Title Catalogue (British Library 2006) and its print antecedents, the short-title catalogs by Pollard and Redgrave (1926) and Donald Goddard Wing (1945). Nondigital scholarly resources frequently skew towards Shakespeare; even the library catalogs we use to access archival resources are not neutral and emphasize Shakespeare above his contemporaries (Estill 2019a). Many digital humanities resources replicate this Shakespeare-centric focus, and, as such, misrepresent the materials they provide or offer a skewed perspective on early modern literature, theatre, and culture. Biased sources can only lead to biased scholarship; and while some professors will be able to see the biases of the sites they visit, many students will not. This is particularly problematic because, as Christie Carson and Peter Kirwan explain, “Students are some of the key ‘users’ of digital Shakespeare” (Carson and Kirwan 2014a, p. 244). It has been well-documented that major digital literary studies projects often focus on canonical authors. There is excellent work on the biases of digital humanities projects, particularly in relation to the status of women writers (see, for instance, Wernimont and Flanders 2010; Mandell 2015; Bergenmar and Leppänen 2017) and the canon of American literature (Earhart 2012; Price 2009), yet comparatively few scholars have critiqued how digital humanities overrepresents perhaps the most canonical figure in all of English literature: Shakespeare. “Shakespeare and Digital Humanities” has been and continues to be a fruitful area of research, with special issues of Shakespeare (Galey and Siemens 2008), the Shakespearean International Yearbook (Hirsch and Craig 2014), RiDE: Research in Drama Education (Bell et al., forthcoming), and this issue of Humanities. The prevalence of digital humanities tools in Shakespeare teaching and research leads Carson and Kirwan to wonder, “are all Shakespeares digital now?” (Carson and Kirwan 2014a, p. 240). The questions less often asked are: when we focus on Shakespeare(s) in our digital projects, what is excluded by our Shakespeare-centrism? And how does that shape how we access and understand early modern drama? Digital Shakespeare studies often focuses on Shakespeare’s place in the digital world, without questioning why he is given such primacy and the ramifications of his continued canonization. Humanities 2019, 8, 45 3 of 16 A decade ago, Matthew Steggle (2008) showcased how digital projects were “developing a canon” of early modern literature. Building on the “interrelated cycles” that Gary Taylor identified as supporting Shakespeare’s centrality to textual studies, Brett Greatley-Hirsch describes “the long shadows cast by the cultural, scholarly, and economic investments in Shakespeare” (Hirsch 2011, p. 569), specifically as it pertains to digital editions of early modern plays. This essay furthers the work by Steggle, Greatley-Hirsch, and others by arguing that we must continually assess the landscape of digital projects available for teaching and researching the early modern period in order to understand and shape the future of the field. As the argument goes, traditional anthologies and resources are constricted by page counts and other limited resources, unlike digital projects, which can be democratizing due to their lack of—or, more realistically, different—limitations. In that vein, Neil Fraistat, Steven E. Jones, and Carl Stahmer (Fraistat et al. 1998, p. 2) suggest that “one of the strengths of Web publishing is that it facilitates—even favors—the production of editions of texts and resources of so-called non-canonical authors and works.” Earhart (2015, esp. chp. 3), however, traces the familiar pattern of discovery then loss for noncanonical writers: their work is digitized, declared as recuperated, and then the site disappears. Another way digital humanities has been announced to recover noncanonical writers is by projects that digitize on a large scale. Julia Flanders (2009) explains: It is now easier, in some contexts, to digitize an entire library collection than to pick through and choose what should be included and what should not: in other words, storage is cheaper than decision-making. The result is that the rare, the lesser known, the overlooked, the neglected, and the downright excluded are now likely to make their way into digital library collections, even if only by accident. Indeed, it is the decision-making where Shakespeare too often gets pulled artificially to the fore: sometimes even in the foundational decisions about project scope. The next section of the essay explores how single authors are represented in small-scale digital resources versus large-scale digital resources, thinking about them in terms of labor, funding, and project scope. 2. The Shakespeare/Not Shakespeare Divide in Digital Humanities Resources There is a lopsidedness to early modern online resources: some, such as the English Short Title Catalog (ESTC; British Library 2006) and the Database of Early English Playbooks (DEEP; Lesser and Farmer 2007) deliver breadth of coverage that is, due to their large scope, necessarily shallow; others, such as The Shakespeare Quartos Archive (Bodleian Library 2009) or MIT’s Global Shakespeares (Donaldson 2009), provide deep coverage of a much narrower topic. Both approaches are needed to support different avenues of early modern scholarship, but, the latter, I contend, too often begins and ends with Shakespeare. The logistical reasons for these very different kinds of projects (broad coverage versus deep coverage) are readily apparent. The notion of “Shakespeare” offers a convenient scope and bounds for a given project. Many projects that include detailed metadata, extensive editorial annotation or encoding, expensive-to-create facsimiles, or streaming media center on the work of a single author. The Pulter Project (Knight and Wall 2018), for instance, is an example of a new project that focuses on a single author, and, indeed, a single manuscript, in order to offer a hypertext edition with multiple layers of editorial intervention, linked related texts, and comparative viewing options. The Digital Cavendish Project (Moore and Tootalian 2013) offers a range of ways to interact with Margaret Cavendish’s life and texts: site visitors can explore Margaret Cavendish’s social network, search the bibliography-in-progress of Cavendish scholarship, and make use of reference works such as a list of Cavendish’s printers and booksellers and a spreadsheet locating all known copies of Cavendish’s early publications. We can imagine extending these projects by adding another analysis section, another manuscript, or even another individual author. However, to extend these projects by any order of magnitude, by say, covering all seventeenth-century women writers or all previously unpublished Humanities 2019, 8, 45 4 of 16 manuscript poetry would be to undertake significant amounts of labor and would require both time and money. These single-author projects are the fruits of detailed scholarly attention: they are “boutique” digital projects. In their discussion of archival practices, Mark A. Greene and Dennis Meissner position “boutique digitization” at the far end of the continuum from “‘Googlization’ (ultra-mass digitization)” (Greene and Meissner 2005, p. 196). The former, boutique projects, require “extraordinary attention to the unique properties of each artifact” (Conway 2010, p. 76). While Greene, Meissner, and Conway focus on archival digitization projects, the continuum also applies to digital humanities projects, many of which include digitized elements alongside other interventions: transcriptions, editorial apparatus, bibliographic resources, and so forth. The Shakespeare Quartos Archive (Bodleian Library 2009) is an example of extraordinary attention to primary sources: the site’s goal is to “reproduce at least one copy of every edition of William Shakespeare’s plays printed in quarto before the theatres closed in 1642.” Where possible, however, they include digitizations of as many copies of each Shakespeare quartos as possible. Their prototype offers thirty-two quartos of Hamlet (from Q1–Q5), carefully digitized and painstakingly encoded.1 With their attention to primary sources, the Shakespeare Quartos Archive project argues that scholars must pay attention to copy-specific details. The Shakespeare Quartos Archive text encoding highlights different marginalia in each copy, the binding, and even the library ownership stamps.2 While the Shakespeare Quartos Archive can be used as an exemplar of a “boutique” project, it is not the labor of a single scholar. This project emerged from the collaboration of multiple major institutions, including, most notably, the Bodleian Library of the University of Oxford, the British Library, the University of Edinburgh Library, the Folger Shakespeare Library, the Huntington Library, and the National Library of Scotland. The project was made possible by major grant funding from the United States’s National Endowment for the Humanities (NEH) and the United Kingdom’s Joint Information Systems Committee (JISC). The well-supported Shakespeare Quartos Archive raises another reason for author-centric approaches, namely, existing funding models. As Jamie “Skye” Bianco (2012) explains, “digital humanities is directly linked to the institutional funding that privileges canonical literary and historiographic objects and narratives” (see also Price 2009). In her review, Desmet unpacks the project’s “rationale for a focus on Shakespeare’s quartos” (Desmet 2014, p. 143): the rarity and fragility of the material objects; their locations in libraries around the world; and the lack of Shakespearean manuscript texts. This rationale, while a compelling argument for why we need to digitize and encode all early modern play quartos, hardly touches on why Shakespeare is the focus of the project. We lack authorial manuscripts of many plays by many playwrights. The Shakespearean focus of the Shakespeare Quartos Archive is taken for granted. It is hard to imagine the Ford Quartos Archive receiving much enthusiasm from funders, despite the fact that John Ford’s plays are still edited, anthologized, taught, and performed today. There are many ongoing editorial projects focused on individual early modern playwrights, such as Oxford University Press’s The Complete Works of John Marston (Butler and Steggle, forthcoming); yet to imagine digitizing and encoding all known early printings of Marston’s work for a Marston Quartos Archives seems far-fetched, and the notion of turning to even less canonical playwright—say, the Glapthorne Quartos Archive—hardly bears thinking about. Shakespeare sells. Shakespeare’s name is itself a valuable commodity (Hodgdon 1998; McLuskie and Rumbold 2014; Olive 2015). Digital project 1 Just as Digital Humanities has a Shakespeare problem, Shakespeare studies has a Hamlet problem, although the prominence of Hamlet in Shakespeare studies, both digital and otherwise, is a topic for another essay. For evidence of Hamlet’s prominence, see Bernice W. Kliman et al.’s HamletWorks (Kliman et al. 2004) and Estill, Klyve, and Bridal (Estill et al. 2015). 2 The Shakespeare Quartos Archive uses the Text Encoding Initiative (TEI) for their XML (eXtensible Markup Language), which includes elements such as Belinda Barnet is Lecturer in Media and Communications at Swinburne University, Melbourne. Prior to her appointment at Swinburne she worked at Ericsson Australia, where she managed the development of 3G mobile content services and developed an obsession with technical evolution. Belinda did her PhD on the history of hypertext at the University of New South Wales, and has research interests in digital media, digital art, convergent journalism and the mobile internet. She has published widely on new media theory and culture. Authored for DHQ; migrated from original DHQauthor format This article describes the evolution of the design of Vannevar Bush's Memex, tracing its roots in Bush's earlier work with analog computing machines, and his understanding of the technique of associative memory. It argues that Memex was the product of a particular engineering culture, and that the machines that preceded Memex — the Differential Analyzer and the Selector in particular — helped engender this culture, and the discourse of analogue computing itself. Can we say that technical machines have their own genealogies, their own evolutionary dynamic? Since the early days of Darwinism, analogies have been drawn between biological evolution and the evolution of technical objects and systems. It is obvious that technologies change over time; we can see this in the fact that technologies come in generations; they adapt and adopt characteristics over time, Inventors learn by experience and experiment, and they learn by watching other machines work in the form of technical prototypes. They also copy and Can we say that technical machines have their own genealogies, their own evolutionary dynamic? It is my contention that we can, and I have argued elsewhere that in order to tell the story of a machine, one must trace the path of these transferrals, paying particular attention to technical In this article I will be telling the story of particular technical machine – Vannevar Bush’s Memex. Memex was an electro-mechanical device designed in the 1930’s to provide easy access to information stored associatively on microfilm. It is often hailed as the precursor to hypertext and the web. Linda C. Smith undertook a comprehensive citation context analysis of literary and scientific articles produced after the 1945 publication of Bush's article on the device, The social and cultural influence of Bush’s inventions are well known, and his political role in the development of the atomic bomb are also well known. What is not so well known is the way the Memex came about as a result of both Bush’s earlier work with analog computing machines, and his understanding of the Bush transferred technologies directly from the Analyzer and also the Selector into the design of Memex. I will trace this transfer in the first section. He also transferred an electro-mechanical model of human associative memory from the nascent science of cybernetics, which he was exposed to at MIT, into Memex. We will explore this in the second section. In both cases, we will be paying particular attention to the structure and architecture of the technologies concerned. The idea that technical artefacts evolve in this way, by the transfer of both technical innovations (for example, microfilm) and techniques (for example, association as a storage technique), was popularised by French technology historian Bertrand Gille. I will be mobilising Gille’s theories here as I trace the evolution of the Memex design. We will begin with Bush’s first analogue computer, the Differential Analyzer. The Differential Analyzer was a giant, electromechanical gear and shaft machine which was put to work during the war calculating artillery ranging tables and the profiles of radar antennas. In the late 1930s and early 1940s, it was However, by the spring of 1950, the Analyzer was gathering dust in a storeroom — the project had died. Why did it fail? Why did the world’s most important analogue computer end up in a back room within five years? This story will itself be related to why Memex was never built; research into analogue computing technology in the interwar years, the Analyzer in particular, contributed to the rise of digital computing. It demonstrated that The decade between the Great War and the Depression was a bull market for engineering Of particular interest to the engineers was the Carson equation for transmission lines. This was a simple equation, but it required intensive mathematical integration to solve. So the equation was transferred to an electro-mechanical device: the Product Intergraph. Many of the early analogue computers that followed Bush’s machines were designed to automate existing mathematical equations. This particular machine physically mirrored the equation itself. It incorporated the use of a mechanical A second version of this machine incorporated two wheel-and-disc integrators, and it was a great success. Bush observed the success of the machine, and particularly the later incorporation of the two wheel-and-disc integrators, and decided to make a larger one, with more integrators and a more general application than the Carson equation. By the fall of 1928, Bush had secured funds from MIT to build a new machine. He called it the Differential Analyzer, after an earlier device proposed by Lord Kelvin which might externalise the calculus and As Bertrand Gille observes, a large part of technical invention occurs by transfer, whereby the functioning of a structure is analogically transposed onto another structure, or the same structure is generalised outwards In engineering science, there is an emphasis on working prototypes or Watching the Analyzer work did more than just teach people about the calculus. It also taught people about what might be possible for mechanical calculation — for Its technical processes remained the same. It was an analogue device, and it literally turned around a central analogy: the rotation of the wheel shall be the area under the graph (and thus the integrals). The Analyzer directly mirrored the task at hand; there was a mathematical transparency to it which at once held observers captive and promoted, in its very workings, the Many of the Analyzers built in the 1930s were built using military funds. The creation of the first Analyzer, and Bush’s This paper has been arguing that the Analyzer In 1935, the Navy came to Bush for advice on machines to crack coding devices like the new Japanese cipher machines There were three new technologies emerging at the time which handled information: photoelectricity, microfilm and digital electronics. Bush transferred these three technologies to the new design. This decision was not pure genius on his part; they were perfect analogues for a popular conception of how the brain worked at the time. The scientific community at MIT were developing a pronounced interest in man-machine analogues, and although Claude Shannon had not yet published his information theory it was already being formulated, and there was much discussion around MIT about how the brain might process information in the manner of an analogue machine. Bush thought and designed in terms of analogies between brain and machine, electricity and information. This was also the central research agenda of Norbert Weiner and Warren McCulloch, both at MIT, who were at the time Bush called this machine the Comparator — it was to do the hard work of comparing text and letters for the humble human mind. Like the analytic machines before it and all other technical machines being built at the time, this was an analogue device; it directly mirrored the task at hand on a mechanical level. In this case, it directly mirrored the operations of But immediately, there were problems in its development. Technical objects often depart from their fabricating intention; sometimes because they are By this time, Bush had also started work on the Memex design. He transferred much of the architecture from the Comparator, including photoelectrical components, an optical reader and microfilm. In tune with the times, Bush had developed a fascination for microfilm in particular as an information storage technology, and although it had failed to work properly in the Comparator, he wanted to try it again. It would appear as the central technology in the Rapid Selector and also in the Memex design. In the 1930s, many believed that microfilm would make information universally accessible and thus spark an intellectual revolution ( Microfilm promised Corporate funding was secured for the Selector by pitching it as a microfilm machine to modernise the library Bush considered the Selector as a step towards the mechanised control of scientific information, which was of immediate concern to him as a scientist. According to him, the fate of the nation depended on the effective management of these ideas lest they be lost in a brewing data storm. Progress in information management was not only inevitable, it was But as Burke writes, the technology of microfilm and the tape-scanners began to impose their technical limitations; The Selector’s scanning station was similar to that used in the Comparator. But in the Selector, the card containing the code of interest to the researcher would be stationary. Bush and others associated with the project Solutions were suggested (among them slowing down the machine, and checking abstracts before they were used) In the evolution of any machine, there will be internal limits generated by the behaviour of the technology itself; Gille calls these The Analyzer, meanwhile, was being used extensively during WWII for ballistic analysis and calculation. Wartime security prevented its public announcement until 1945, when it was hailed by the press as a great What happened? The reasons The Analyzer fell into disuse were quite different to the Selector; its limits were I do not have the space here to trace the evolution of digital computing at this time in the US and the UK — excellent accounts have already been written by It is important to understand, however, that Bush was not a part of this revolution. He had not been trained in digital computation or information theory, and knew little about the emerging field of digital computing. He was immersed in a different technical system: analogue machines interpreted mathematics in terms of mechanical rotations, storage and memory as a physical The passing away of analogue computing was the passing away of an ethos: machines as mirrors of mathematical tasks. But Bush and Memex remained in the analogue era; in all versions of the Memex essay, his goal remained the same: Technological evolution moves faster than our ability to adjust to its changes. More precisely, it moves faster than the We now turn to Bush’s fascination with, and exposure to, new models of human associative memory gaining current in his time. Bush thought and designed his machines in terms of biological-mechanical analogues; he sought a symbiosis between As Nyce and Kahn observe, in all versions of the Memex essay (1939, 1945, 1967), Bush begins his thesis by explaining the dire problem we face in confronting the great mass of the human record, criticising the way information was then organised Our ineptitude at getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can only be found in one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path. The human mind does not work that way. It operates by association. With one item in grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. These paragraphs were important enough that they appeared verbatim in all versions of the Memex essay — 1939, 1945 and 1967 Which is interesting, because Bush’s model of mental association was itself technological; the mind Bush’s model of human associative memory was an electro-mechanical one — a model that was being keenly developed by Claude Shannon, Warren McCulloch and Walter Pitts at MIT, and would result in the McCulloch-Pitts neuron In the 1930s and 1940s, the popular scientific conception of mind and memory was a mechanical one. An object or experience was perceived, transferred to the memory-library's receiving station, and then According to Manuel De Landa, there was also a widespread faith in biological-mechanical analogues at the time as models to boost human functions. The military had been attempting to develop technologies which mimicked and subsequently replaced human faculties for many years Bush explicitly worked with such methodologies — in fact, So Memex was first and foremost an extension of human memory and the associative movements that the mind makes through information: a mechanical analogue to an already mechanical model of memory. Bush transferred this idea into information management; Memex was distinct from traditional forms of indexing not so much in its mechanism or content, but in the way it organised information based on association. The design did not spring from the ether, however; the first Memex design incorporates the technical architecture of the Rapid Selector and the methodology of the Analyzer — the machines Bush was assembling at the time. Bush’s autobiography, Nyce and Kahn maintain that Bush took this methodology from the Rapid Selector The idea of creating a machine to aid the mind did not belong to Bush, nor did the technique of integral calculus (or association for that matter); he was, however, arguably the first person to externalise this technology on a grand scale. Observing the success of the Analyzer The difference, of course, was that Bush’s proposed Memex would access information stored on microfilm by As Professor of Engineering at MIT (and after 1939, President of the Carnegie Institute in Washington), Bush was in a unique position — he had access to a pool of ideas, techniques and technologies which the general public, and engineers at other smaller schools, did not have access to. Bush had a more Memex was a future technology. It was originally proposed as a desk at which the user could sit, equipped with two The 1945 Memex design also introduced the concept of The Memex described in In modern terminology, such a machine is called an intelligent In This was in line with Bush’s conception of technical machines as mechanical teachers in their own right. It was a In our interview, Engelbart claimed it was Bush’s concept of a Paradoxically, Bush also retreats on this close alignment of memory and machine. In the later essays, he felt the need to demarcate a purely Machines can remember better than human beings can — their trails do not fade, their logic is never flawed. Both of the Bush had always been obsessed with memory and technics, as we have explored. But near the end of his career, when In all versions of the Memex essay, the machine was to serve as a personal memory support. It was not a Current hypertext technologies are not quite so private and tend to emphasise In all versions of the essay, Memex remained profoundly uninfluenced by the paradigm of digital computing. As we have explored, Bush transferred the concept of machine learning from Shannon — but not information theory. He transferred neural and memory models from the cybernetic community — but not digital computation. The analogue computing discourse Bush and Memex created never Consequently, the Memex redesigns responded to the advances of the day quite differently to how others were responding at the time. By 1967, for example, great advances had been made in digital memory techniques. As far back as 1951, the Eckert-Mauchly division of Remington Rand had turned over the first Bush, however, remained enamoured of physical recording and inscription. His 1959 essay proposes using organic crystals to record data by means of phase changes in molecular alignment. Memex became an image of potentiality for Bush near the end of his life. In the later essays, he writes in a different tone entirely: Memex was an image he would bequeath to the future, a gift to the human race. For most of his professional life, he had been concerned with augmenting human memory, and preserving information that might be lost to human beings. He had occasionally written about this project as a larger idea which would boost Near the end of his life, Bush thought of Memex as more than just an Bush died on June 30, 1974. The image of Memex has been passed on beyond his death, and it continues to inspire a host of new machines and element, a The key difference [between material cultural evolution and biological evolution] is that biological systems predominantly have
Paleontologist Dr. Niles Eldredge, interview with the author vertical
transmission of genetically ensconced information, meaning parents to offspring… Not so in material cultural systems, where horizontal transfer is rife — and arguably the more important dynamic .one suppressing the other as it becomes obsolete
lineage
or a line. From the middle of the nineteenth century on, writers have been remarking on this basic analogy – and on the alarming rate at which technological change is accelerating. But as Eldredge points out, the analogy can only go so far; technological systems are not like biological systems in a number of important ways, most obviously the fact that they are the products of conscious design. Unlike biological organisms, technical objects are transfer
ideas and techniques between machines, co-opting innovations at a whim. Technological innovation thus has Lamarckian features, which are forbidden in biology extinction
in technological evolution: ideas, designs and innovations can be co-opted and transferred both retroactively and laterally. This retroactive and lateral transfer
of innovations is what distinguishes technical evolution from biological evolution, which is characterised by vertical transfer (parents to offspring). As the American paleontologist Niles Eldredge observed in an interview with the author,Makers copy each other, patents affording only fleeting protection. Thus, instead of the neatly bifurcating trees [you see in biological evolution], you find what is best described as "networks"-consisting of an historical signal of what came before what, obscured often to the point of undetectability by this lateral transfer of subsequent ideas .
Niles Eldredge, interview with the author technology [has] finally caught up with this vision
mechanism
or technique of associative memory. I would like to show that Memex was the product of a particular engineering culture, and that the machines that preceded Memex — the Differential Analyzer and the Selector in particular — helped engender this culture, and the discourse of analogue computing, in the first place. The artefacts of engineering, particularly in the context of a school such as MIT, are themselves productive of new techniques and new engineering paradigms. Prototype technologies create cultures of use around themselves; they create new techniques and new methods that were unthinkable prior to the technology. This was especially so for the Analyzer. In the context of the early 20th-century engineering school, the analyzers were not only tools but paradigms, and they taught mathematics and method and modeled the character of engineering.
the most important computer in existence in the US
computer
had meant a large group of mostly female humans performing equations by hand or on limited mechanical calculators. The Analyzer evaluated and solved these equations by mechanical integration. It created a small revolution at MIT. Many of the people who worked on the machine (e.g. Harold Hazen, Gordon Brown, Claude Shannon) later made contributions to feedback control, information theory, and computing early
period of the Control Revolution Early in 1925 Bush suggested to his Graduate Student Herbert Stewart that he devise a machine to facilitate the recording of the areas needed for the Carson equation … [and a colleague] suggested that Stewart interpret the equation electrically rather than mechanically.
integrator
to record the areas under the curves (and thus the integrals), which was… in essence a variable-speed gear, and took the form of a rotating horizontal disk on which a small knife-edged wheel rested. The wheel was driven by friction, and the gear ratio was altered by varying the distance of the wheel from the axis of rotation of the disk.
mechanically integrate
its solution essentially an elegant, dynamical, mechanical model of the differential equation
deliverables
. As Professor of Computer Science Andries van Dam put it in an interview with the author, when engineers talk about work, they mean work in the sense of machines, software, algorithms, things that are
mechanical calculus
, an internalised knowledge of the machine. When the army wanted to build their own machine at the Aberdeen Proving Ground, he sent them a mechanic who had helped construct the Analyzer. The army wanted to pay the man machinist’s wages; Bush insisted he be hired as a consultant I never consciously taught this man any part of the subject of differential equations; but in building that machine, managing it, he learned what differential equations were himself … [it] was interesting to discuss the subject with him because he had learned the calculus in mechanical terms — a strange approach, and yet he understood it. That is, he did not understand it in any formal sense, he understood the fundamentals; he had it under his skin.
Bush 1970, 262 cited in Owens 1991, 24 But what is interesting about the Rockefeller Differential Analyzer is what remained the same. Electrically or not, automatically or not, the newest edition of Bush’s analyzer still interpreted mathematics in terms of mechanical rotations, still depended on expertly machined wheel-and-disc integrators, and still drew its answers as curves.
language of early 20th-century engineering
mark the start of a new era in mechanized calculus
Hazen 1940, 101 cited in Owens 1991, 4 bridge
between the engineers and the military, he connected scientists to the blueprints of generals and admirals
communities often suspicious of one another: the inventors and the scientists on the one side and the warriors on the other
aura
generated by the Analyzer as prototype was not lost on the military.All three were just emerging, but, unlike the fragile magnetic recording his students were exploring, they appeared to be ready to use in calculation machines. Microfilm would provide ultra-fast input and inexpensive mass-memory, photoelectricity would allow high-speed sensing and reproduction, and digital electronics would allow astonishingly fast and inexpensive control and calculation.
working on parallels they saw between neural structure and process and computation
(searching
and associating
on a mechanical level, and, Bush believed, it mirrored the operations of the human mind and memory. Bush began the project in mid-1937, while he was working on the Rockefeller Analyzer, and agreed to deliver a code-cracking device based on these technologies by the next summer bugs
. It was decided to use paper tape with minute holes, although paper was only one-twentieth as effective as microfilm could be reduced to the volume of a matchbox. A library of a million volumes could be compressed into one end of a desk
he wrote Permanent World Encyclopaedia
or Planetary Memory that would carry all the world’s knowledge. It was based on microfilm. By means of microfilm, the rarest and most intricate documents and articles can be studied now at first hand, simultaneously in a score of projection rooms. There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas, achievements, to the creation, that is, of a complete planetary memory for all mankind.
Bagg and Stevens 1961, cited in Nyce 1991, 41
essential if the nation is to be strong
library of the future
was very hard [a]lmost as soon as it was begun, the Selector project drifted away from its original purpose and began to show some telling weaknesses … Bush planned to spin long rolls of 35mm film containing the codes and abstracts past a photoelectric sensing station so fast, at speeds of six feet per second, that 60,000 items could be tested in one minute. This was at least one hundred-fifty times faster than the mechanical tabulator.
were so entranced with the speed of microfilm tape that little attention was paid to coding schemes
endogenous
limits By the 1960’s the project and machine failures associated with the Selector, it seems, made it difficult for Bush to think about Memex in concrete terms.
electromechanical brain
ready to advance science by freeing it from the pick-and-shovel work of mathematics ( ushered in a variety of new computation tasks, in the field of large-volume data analysis and real-time operation, which were beyond the capacity of the Rockefeller instrument
computer science
would so quickly overtake Bush’s project (Weaver and Caldwell, cited in holding
of information, and drew their answers as curves. They directly mirrored the operations of the calculus. Warren Weaver expressed his regret over the passing of analogue machines and the Analyzer in a letter to the director of MIT's Center of Analysis: It seems rather a pity not to have around such a place as MIT a really impressive Analogue computer; for there is a vividness and directness of meaning of the electrical and mechanical processes involved ... which can hardly fail, I would think, to have a very considerable educational value.
Weaver, cited in Owens 1991, 5 he sought to develop a machine that mirrored and recorded the patterns of the human brain
The trend had turned in the direction of digital machines, a whole new generation had taken hold. If I mixed with it, I could not possibly catch up with new techniques, and I did not intend to look foolish.
There is another revolution under way, and it is far more important and significant than [the industrial revolution]. It might be called the mental revolution.
natural
human thought and his thinking machines.
Bush 1939, 1945, 1967 intelligent
, the other machines (the Cyclops Camera, the Vocoder) disappeared. These paragraphs, however, remain a constant. Given this fact, Nelson’s assertion that the major concern of the essay was to point out the artificiality of systems of indexing, and to propose the associative mechanism as a solution for this ignored
by commentators natural
than other forms of indexing — more human. This is why it was revolutionary.snapped
between allied items, an unconscious movement directed by the trails themselves, trails of brain or of machine
my brain runs rapidly — so rapidly I do not fully recognize that the process is going on
speed of action
in the retrieval process from neuron to neuron mechanical switching
(this term was omitted from the memories, sheets of data
a great deal of our brain cell activity is closely parallel to the operation of relay circuits
, and that one can explore this parallelism…almost indefinitely
November 6, 1944; cited in Nyce and Kahn 1991, 62 installed in the memory-library for all future reference
stored
information across the neural substrate, in some instances creating further connections, via minute electrical vibrations
. According to Bush, memories that were not accessed regularly suffered from this neglect by the conscious mind and were prone to fade. The pathways of the brain, its indexing system, needed constant electrical stimulation to remain strong. This was the problem with the neural network: items are not fully permanent, memory is transitory
the image of the machine as the basis for the understanding of man
and vice versa, writes Harold Hatt in his book on Cybernetics looked for and worked from parallels they saw between neural structure and process and computation
he not only thought with and in these terms, he built technological projects with them
process
or nature of thought itself; the second step was transferring this process to a machine. So there is a double movement within Bush’s work, the location of a natural
human process within thought, a process which is already machine-like, and the subsequent refinement and modelling of a particular technology on that process. Technology should depart from nature, it should depart from an extant human process: this saves us so much work. If this is done properly, [it] should be possible to beat the mind decisively in the permanence and clarity of the items resurrected from storage
pick-and-shovel
work of the human mind The future means of implementing thought are … fully as worthy of attention by one who wonders what comes next as are new ways of extracting natural resources, or of killing men.
drudge
work of cryptography, and Bush rightly saw it as the first electronic data-processing machine
The Memex-like machine proposed in Bush’s 1937 memo to Weaver shows just how much [the Selector] and the Memex have in common. In the rapid selector, low-level mechanisms for transporting 35mm film, photo-sensors to detect dot patterns, and precise timing mechanisms combined to support the high-order task of information selection. In Memex, photo-optic selection devices, keyboard controls, and dry photography would be combined … to support the process of the human mind.
selected from the existing technologies of the time and made a case for how they should develop in the future
global
view of the combinatory possibilities and the technological lineage. Bush himself admitted this; in fact, he believed that engineers and scientists were the only people who could or so little true discrimination
wont to visualize scientific triumphs as
before they are even ready, even as they are being hatched in the laboratory
distinguish the
slanting translucent screens
upon which material would be projected for convenient reading
set of buttons and levers
which the user could depress to search the information using an electrically-powered optical recognition system. If the user wished to consult a certain piece of information, he [tapped] its code on the keyboard, and the title page of the book promptly appear[ed]
and the matter of bulk [was] well taken care of
by this technology — only a small part of the interior is devoted to storage, the rest to mechanism
ordinary
desk, except it had screens and a keyboard attached to it. To add new information to the microfilm file, a photographic copying plate was also provided on the desk, but most of the Memex contents would be purchased on microfilm ready for insertion
trails
, a concept derived from work in neuronal storage-retrieval networks at the time, which was a method of connecting information by linking units together in a networked manner, similar to hypertext paths. The process of making trails was called trailblazing
, and was based on a mechanical provision whereby any item may be caused at will to select immediately and automatically another
gathered together from widely separated sources and bound together to form a new book
This is the essential feature of the Memex. The process of tying two items together is the important thing
even when the owner was not there
mechanical mouse
; A striking form of self adaptable machine is Shannon’s mechanical mouse. Placed in a maze it runs along, butts its head into a wall, turns and tries again, and eventually muddles its way through. But, placed again at the entrance, it proceeds through without error making all the right turns.
agent
, a concept we shall discuss later in this work. Technology has not yet reached Bush's vision for adaptive associative indexing remolds
the human mind, it remolds the trails of the user’s brain, as one lives and works in close interconnection with a machine
For the trails of the machine become duplicated in the brain of the user, vaguely as all human memory is vague, but with a concomitant emphasis by repetition, creation and discard … as the cells of the brain become realigned and reconnected, better to utilize the massive explicit memory which is its servant.
proposal of an active symbiosis between machine and human memory
co-evolution
between humans and machines, and also his conception of our human augmentation system
, which inspired him adapt to mechanization
can touch those subtle processes of mind, its logical and rational processes
and alter them logical and rational processes
which the machine connected with were human
realm of thought from technics, a realm uncontaminated by technics. One of the major themes in Two mental processes the machine can do well: first, memory storage and recollection, and this is the primary function of the Memex; and second, logical reasoning, which is the function of the computing and analytical machines.
mental processes
Bush locates above take place within human thought, they are forms of internal repetitive
thought creativity
as the realm of thought that exists beyond technology. How far can the machine accompany and aid its master along this path? Certainly to the point at which the master becomes an artist, reaching into the unknown with beauty and versatility, erecting on the mundane thought processes a thing of beauty … this region will always be barred to the machine.
boundary
between them, between what is personal and belongs to the human alone, and what can be or dropping
them into their archive via an electro-optical scanning device. In the later adaptive Memex, these trails fade out if not used, and if much in use, the trails become emphasized
systems which are public rather than personal in nature and that emphasize the static record over adaptivity
personal
machine to amplify the mind also flew in the face of the emerging paradigm of human–computer interaction that reached its peak in the late 1950s and early 1960s, which held computers to be rarefied calculating machines used only by qualified technicians in white lab coats in air-conditioned rooms at many degrees of separation from the user
. After the summer of 1946
, writes Ceruzzi, computing's path, in theory at least, was clear
it was so entrenched that the very idea of a free interaction between users and machines as envisioned by Bush was viewed with hostility by the academic community
mixed
with digital computing While the pioneers of digital computing understood that machines would soon accelerate human capabilities by doing massive calculations, Bush continued to be occupied with extending, through replication, human mental experience.
digital
computer with a stored-program architecture, the UNIVAC, to the US Census Bureau Delay Lines
stored 1,000 words as acoustic pulses in tubes of mercury, and reels of magnetic tapes which stored invisible bits were used for bulk memory. This was electronic digital technology, and did not mirror or seek to mirror natural
processes in any way. It steadily replaced the most popular form of electro-mechanical memory from the late 1940s and early 1950s: drum memory. This was a large metal cylinder which rotated rapidly beneath a mechanical head, where information was written across the surface magnetically [I]n Memex II, when a code on one item points to a second, the first part of the code will pick out a crystal, the next part the level in this, and the remainder the individual item
The brain does not operate by reducing everything to indices and computation
, Bush wrote a long look ahead
the time has come to try it again
No memex could have been built when that article appeared. In the quarter-century since then, the idea has been with me almost constantly, and I have watched new developments in electronics, physics, chemistry and logic to see how they might help bring it to reality
the entire process by which man profits by his inheritance of acquired knowledge
day has come far closer
in the interval since that paper [
Can a son inherit the memex of his father, refined and polished over the years, and go on from there? In this way can we avoid some of the loss which comes when oxygen is no longer furnished to the brain of the great thinker, when all the patterns of neurons so painstakingly refined become merely a mass of protein and nucleic acid? Can the race thus develop leaders, of such power and intellect, and such forces of conviction, that the world can be saved from its follies? This is an objective of far greater importance than the conquest of disease, even than the conquest of mental aberrations
ultimate [machine] is far more subtle than this
Immortality in a machine
longevity
over the individual human mind technical instrumentalities
. But Memex itself has never been built; it exists only on paper, in technical interpretation and in memory. All we have of Memex are the words that Bush assembled around it in his lifetime, the drawings created by the artists from use function
of the machine would itself have changed as it demonstrated its own potentials. If Memex had been built, the object would have invented itself independently of the outlines Bush cast on paper. This never happened — it has entered into the intellectual capital of new media as an image of potentiality. Thirties
and element and the element, while the phrase in witness 2 is contained by two elements. However structural variation does not only occur across documents: when an author indicates the start of a new chapter or paragraph by inserting a metamark of some sorts, this is arguably a form of structural intradocumentary variation. To summarise, we can distinguish different forms of textual variance. Variation can occur on the level of the text characters (linguistic or semantic variation) and on the structure of the text (sentences, paragraphs, etc.). Furthermore, we distinguish between intradocumentary variation (within one witness) and interdocumentary variation (across witnesses). Arguably all forms are relevant for textual scholarship, but taking them into account when processing and comparing texts has both technical and conceptual consequences. These consequences have been discussed extensively elsewhere (Bleeker et al. 2018) and will be briefly repeated in section 5 below. The main goal of the present article is to focus on the question of visualisation. Assuming we have a software program that compares texts in great detail, including structural information and in-witness revisions, how can we best visualise its ouput? first and foremost, The additional information (structural and linguistic, intradocumentary and interdocumentary) needs to be visualised in an understandable way. The visualisations can be useful for a wide range of research objectives, such as (1) finding a change in markup indicating structural revision like sentence division, (2) presenting the different paths through one witness and the possible matches between tokens from any path, (3) complex revisions, like a deletion within a deletion within an addition, (4) studying patterns of revision, and so on. This begs the question: is it even possible or desirable to decide on one visualisation? Is there one ultimate visualisation that reflects the dynam- ic, temporal nature of the textual object(s) by demonstrating both structural and linguistic variation on an intradocumentary and interdocumentary level? the existing field of Information Visualisation can certainly offer inspiration, but simply adopting its methods and techniques will not suffice, since it deals primarily with objects which are From graveyard to graph 145 ‘self-identical, self-evident, ahistorical, and autonomous’ (Drucker 2012), adjectives which could hardly be applied to literary texts. 4 Existing Visualisations of collation results Let us consider the various existing visualisations of collation output and explore to what extent they address the conditions outlined above. We can distinguish roughly five types of visualisation: alignment tables, parallel segmentation, synoptic viewers, variant graphs, and phylogenetic trees or ‘stemmata’. A smaller example of a collation of two fragments from Woolf’s A Sketch of the Past (holograph MS-VW-SoP and typescript TS1-VW-SoP) serves as illustration of the effect of the visualisations: Witness 1 (MS-VW-SoP): with the boat train arriving, people talking loudly, chains being dropped, and the screws the beginning, and the steamer suddenly hooting Witness 2 (TS1-VW-SoP): with the boat train arriving; with people quarrelling outside the door; chains clanking; and the steamer giving those sudden stertorous snorts These two small fragments are transcribed in plain text format and subsequently collated with the software program CollateX. Unless indicated otherwise, the result from this collation forms the basis for the visualisation examples below. 4.1 Alignment table An alignment table presents the text of the witnesses in linear sequence (either horizontally or vertically), making it well-suited to a study of the relationships between witnesses on a detailed level, but less so to acquire an overview of patterns in revision. Note that ‘aligned tokens’ are not necessarily the same as ‘matching tokens’: two tokens may be placed above each other because they are at the same relative position between two matches, even though they do not constitute a match. For this reason, alignment tables often have additional markup (e.g. colours) to differentiate between matches and aligned tokens. The arrangement of the tokens is also one of the advan- tages of an alignment table: it shows at first glance the variation between tokens at the same relative position. In other words, this representation indicates tokens which match on a semantic level, such as synonyms or fragments with similar meanings, such as ‘talking loudly’ and ‘quarrelling outside the door’ (Fig. 2). Ongoing research into the potential of an alignment table visualisation to explore intradocumentary variation (see Bleeker et al. 2017, visualisations created by Vincent Neyt) focuses on increasing the amount of information in an alignment table by incorporating intradocumentary variation in the cells. The alignment table in Fig. 3 shows that witness 1 (Wit1) contains several paths; matching tokens are displayed in red. 146 E. Bleeker et al 4.2 Synoptic viewers A synoptic edition contains a visual representation of the collation results from the perspective of one witness, where the variants are indicated by means of a system of signs or diacritical marks. In contrast to an alignment table, a synoptic overview is more suitable as an overview examination of the patterns of variation. The following paragraphs discuss two ways of presenting textual variation synoptically: parallel segmentation and an inline apparatus. It may be clear that both are skeuomorphic in character, in the sense that they mimic the analogue examination and presentation of textual variants. This characteristic should not necessarily be considered negative, however, precisely because it is a tried and tested instrument for textual research. 4.2.1 Parallel segmentation The term ‘parallel segmentation’ may be confusing, as it is also the name of the (TEI) encoding for a critical apparatus. In this context, parallel segmentation is used to describe the visualisation of textual variation in a side-by-side manner, often with the corresponding segments linked to one another. The quantity of online, open source tools for a parallel segmentation visualisation suggests that it is a popular way of studying textual variation (e.g. the Versioning Machine,4 the Edition Visualisation Technology – EVT – project,5 and the visualisation of Juxta Commons).6 As Fig. 4 shows, parallel segmentation entails presentation of the witnesses as reading texts in separate panels which can be read vertically (per witness) or horizontally (interdocumentary variation across witnesses). Colours indicate the matching and non-matching segments. To be clear: this parallel segmentation visualisation concerns the presentation of variance; it is not a collation method in and of itself. The segments are encoded by the editor, for instance using the TEI and and element, a