Published by: #EWAVirtual Conference Organisers, Maynooth University Arts and Humanities Institute Edited by Sharon Healy, Michael Kurzmeier, Helena La Pina and Patricia Duffe DOI: http://doi.org/10.5281/zenodo.4058013 This work is licensed under a Creative Commons Attribution 4.0 International License https://creativecommons.org/licenses/by/4.0/ Program Committee: Co-Chairs Sharon Healy, PhD Candidate and IRC Scholar in Digital Humanities, Maynooth University Michael Kurzmeier, PhD Candidate and IRC Scholar in Digital Humanities/Media Studies, Maynooth University #EWAVirtual Coordinators Rebecca O’Neill, MA Historical Archives, Maynooth University Helena La Pina, MA Historical Archives, Maynooth University Programme Coordinator Maria Ryan, Web archivist at the National Library of Ireland (NLI Web Archive) Treasurer Dr Joseph Timoney, Head of Department of Computer Science, Maynooth University PR/Outreach Julian Carr, MA Geography (Urban Studies), Maynooth University Committee Dr Martin Maguire, History/Digital Humanities, Dundalk Institute of Technology. Dr Thomas Lysaght, Deputy Head of Department of Computer Science, Maynooth University Gavin MacAllister, Historian in Residence, Irish Military War Museum. Bernadette McKevitt, MA International Peace Studies, Trinity College Dublin. http://doi.org/10.5281/zenodo.4058013 Table of Contents Introduction 1 Welcome from Sharon Healy and Michael Kurzmeier, Conference Co-Chairs 4 #EWAVirtual Keynotes 6 #EWAVirtual Programme 9 #EWAVirtual Abstracts 15 Session 1: Archiving Initiatives 15 Session 2: Collaborations 20 Session 3: Archiving Initiatives (Lightning Round) 23 Session 4: Research Engagement & Access 27 Session 5: Archiving Initiatives 32 Session 6: Social Science & Politics 35 Session 7: Collaborations & Teaching 40 Session 8: Research of Web Archives 44 Session 9: Research Approaches 46 Session 10: Culture & Sports 50 Session 11: Research (Lightning Round) 54 Session 12: Youth & Family 58 Session 13: Source Code and App Histories 63 Session 14: AI and Infrastructures 67 Session 15: WARC and OAIS 73 Session 16: Web Archives as Scholarly Dataset 75 Session 17: An Irish Tale / Scéal Éireannach 77 1 Introduction Engaging with Web Archives ‘Opportunities, Challenges and Potentialities’, (#EWAVirtual), 21-22 September 2020, Maynooth University Arts and Humanities Institute, Co. Kildare, Ireland. Maynooth University Arts and Humanities Institute are delighted to be hosting the first international EWA conference which aims to: ● Raise awareness for the use of web archives and the archived web for research and education across a broad range of disciplines and professions in the Arts, Humanities, Social Sciences, Political Science, Media Studies, Information Science, Computer Science and more; ● Foster collaborations between web archiving initiatives, researchers, educators and IT professionals. ● Highlight how the development of the internet and the web is intricately linked to the history of the 1990s. What is Web Archiving? Pioneered by the efforts of the Internet Archive in 1996, national libraries and cultural heritage organisations quickly realised the need to preserve information and content that was born on the web. It was this awareness that gave rise to technologies, specifically web crawler programmes, used for web archiving. According to the International Internet Preservation Consortium, ‘Web archiving is the process of collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.’ Due to serious concerns about the loss of web-born heritage, there has been a continuous growth of web archiving initiatives across the globe. Why should we care? For example, in Ireland — The first connection to the Internet as we know it (via TCP/IP), went live in Trinity College Dublin in June 1991. The first web server and website in Ireland can be traced back to 1991/92 in University College Cork (CURIA project); and other websites followed in 1993 from IONA Technologies, TCD Maths, IEunet, and University of Limerick. The growth of Irish websites was slow at first, but this changed by the end of 1995 due to international developments in browser technology, and the growth of internet service providers in Ireland (see TechArchives, How the internet came to Ireland; David Malone, Early Irish Web Stuff). https://help.archive.org/hc/en-us/categories/360000553851-The-Wayback-Machine http://netpreserve.org/web-archiving/ https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives https://en.wikipedia.org/wiki/List_of_Web_archiving_initiatives https://curia.ucc.ie/oldmenu.html https://techarchives.irish/how-the-internet-came-to-ireland-1987-97/ https://www.maths.tcd.ie/~dwmalone/early-web.html 2 THERE ARE SIMILAR SCENARIOS AROUND THE WORLD As researchers begin to negotiate and write the history of their countries for the 1990s, whether it is social, cultural, political or even economic, it seems inevitable that they will also need to consider their histories of IT – in terms of how the introduction of the internet and the WWW began to infiltrate the fabric of life, work and play. The archived web is now an object of study in many countries, and there has been a lot of work done already to build research infrastructures and networks. But more needs to be done to promote awareness of the availability of web archives, and how they can be utilised as resources for research going into the future. And certainly, much more needs to be done in the realms of how web archives can be incorporated as resources in education, and how the use of web archives can be taught. International literature using web archives for research and historical inquiry is growing; yet the question of how to effectively use the archived web for qualitative and quantitative research still remains open; and how to integrate the use of web archives into teaching is a path yet to be explored. Furthermore, existing web archiving efforts find it hard to exchange knowledge and take on larger projects, partially due to the lack of opportunities for exchange between the disciplines and educators. The EWA organisers would also like to extend their sincerest thanks and appreciation to the following organisations and institutions for their kind support and efforts to make this conference event possible: ● Maynooth University Arts and Humanities Institute ● Maynooth University, Department of Sociology ● Maynooth University, Department of Media Studies ● Maynooth University, Department of Computer Science ● Maynooth University, Department of History ● National Library of Ireland, Web Archive ● TechArchives, Ireland ● University College Cork, Digital Arts & Humanities ● University College Dublin, School of History ● AGREXIS AG https://www.maynoothuniversity.ie/arts-and-humanities-institute https://www.maynoothuniversity.ie/sociology https://www.maynoothuniversity.ie/media-studies https://www.maynoothuniversity.ie/computer-science https://www.maynoothuniversity.ie/history https://www.nli.ie/en/web_archive.aspx https://techarchives.irish/ https://www.ucc.ie/en/dah/ https://www.ucd.ie/history/ https://www.agrexis.com/ 3 If you require more information or have any questions please feel free to email us: ewaconference@gmail.com Follow us on Twitter: ● @EWAConf ● @MU_AHI ● #EWAVirtual mailto:ewaconference@gmail.com https://twitter.com/EWAConf https://twitter.com/MU_AHI 4 Welcome from Sharon Healy and Michael Kurzmeier #EWAVirtual 2020 Conference Co-Chairs On behalf of the organising committee of the first international Engaging with Web Archives conference, we would like to welcome all delegates to Maynooth University Arts and Humanities Institute for what we hope will be a stimulating event within the realms of engaging with web archives and web archiving activities. We are proud to announce that this is the first web archive conference of its nature ever to be held in Ireland; and, the first virtual conference to be held in Maynooth University for 2020. The programme contains, 35 paper presentations, and 2 distinguished Keynote speakers. We are delighted to extend a warm welcome to the two keynotes speakers: Prof. Niels Brügger of Aarhus University, Denmark; and Prof. Jane Winters of School of Advanced Study, University of London. UK. #EWAVirtual brings together speakers who are historians, digital humanists, media scholars, social scientists, information and IT professionals, computer scientists, data consultants, librarians and archivists from Ireland, the United Kingdom, Europe, Canada, and the United States. To all the speakers, we appreciate your kindness, support and patience when the initial conference, scheduled in the Spring of 2020 was postponed, and your continued enthusiasm, cooperation and collaboration when we announced it would become a virtual event. We are also indebted to the Chairs of each session. Each one volunteered their services enthusiastically to assure the smooth running of the conference. Our gratitude is extended to the tireless efforts of the organising committee. Its dedication, from the reviewing of papers, to the logistical components of organising the first physical conference. Then to find the motivation, and spirit to reorganise the event as a virtual conference, is greatly appreciated. To all at Maynooth University and the band of volunteers, we appreciate your time, talent, and storyboard of ideas. Without your support and dedication, this conference would not be possible. A special shoutout goes to Professor Thomas O’Connor and Ann Donoghue from Maynooth University Arts and Humanities Institute. Their unfailing support, advice and kind assistance was invaluable throughout the entire processes of planning both EWA conferences (from the physical to the virtual). 5 Also, to all our sponsors and supporters, we appreciate all your encouragement, sound advice and uplifting messages. Particularly, we are grateful to the year-long encouragement and support by the committed staff at the National Library of Ireland. To all the speakers, guests, volunteers, chairs and attendees, we thank you. Together we have all played a part in the transformation of #EWA20 to #EWAVirtual. All the Best Sharon & Michael 6 #EWAVirtual KEYNOTES Professor Niels Brügger The variety of European web archives - potential effects for future humanities research The aim of this keynote is to open up a discussion of how the great variety of European web archives may affect future humanities research based on the archived web as a source. The keynote is divided in two main sections. First, the different web archiving forms in Europe are briefly mapped with a focus on which countries do have a web archive, archiving strategies, and access conditions. Second, it is discussed how this state of affairs may affect transnational research projects, spanning more web archives. The case of the national Danish web domain is used as a stepping stone to evaluate to what an extent such a study can be replicated in other European countries, thus enabling transnational comparisons. ------------------------------------------------------------------------------------------- Niels Brügger is a Professor in Media Studies, Head of NetLab, part of the Danish Digital Humanities Lab, and head of the Centre for Internet Studies at Aarhus University in Denmark. He is a Coordinator of the European network RESAW, a Research Infrastructure for the Study of Archived Web Materials, and the managing editor of the international journal Internet histories: Digital technology, culture and society. Professor Brügger has initiated the research projects “Probing a Nation’s Web Domain — the Historical Development of the Danish Web” (2014-) and “the history of dr.dk, 1996-2006” (2007-), and co-initiated the research infrastructure project NetLab (2012-17) within the Digital Humanities Lab. His research interests are the history of the Internet as a means of communication, and Digital Humanities, including archiving the Internet as well as the use of digital research tools. Other interests include media theory, the Internet, and the relation between the two with a view to (re)evaluating the status and relevance of existing media theories and methods. Recent publications include: ● The Historical Web and Digital Humanities, eds. N. Brügger, D. Laursen (Routledge, 2019) ● The SAGE Handbook of Web History eds. N. Brügger, I. Milligan (SAGE, 2019), ● The Archived Web: Doing History in the Digital Age (MIT Press, 2018). ● Web 25: Histories from the first 25 years of the World Wide Web ed. Niels Brügger (New York: Peter Lang, 2017) https://www.tandfonline.com/loi/rint20 https://www.routledge.com/The-Historical-Web-and-Digital-Humanities/Brugger-Laursen/p/book/9781138294318 https://uk.sagepub.com/en-gb/eur/the-sage-handbook-of-web-history/book252251 https://mitpress.mit.edu/books/archived-web https://www.amazon.com/Web-25-Histories-Digital-Formations/dp/1433132699 7 Professor Jane Winters Web archives as sites of collaboration Openness to collaboration has been one of the defining characteristics of web archiving and web archive studies from the outset. The challenges posed by the archiving and preservation of born-digital data, including web archives, are simply too great to be solved by individuals or single organisations. This keynote will present some of the partnerships which have moved the field forward in the past decade, suggest some new avenues for collaboration in the future, and consider how the required knowledge and skills can be developed within universities and the cultural heritage sector to ensure that current web archiving initiatives are sustainable. ------------------------------------------------------------------------------------------- Jane Winters is a Professor of Digital Humanities and Pro-Dean for Libraries in the School of Advanced Study at the University of London. She is responsible for developing digital humanities and has led or co-directed a range of digital projects, including most recently Big UK Domain Data for the Arts and Humanities; Digging into Linked Parliamentary Metadata; Traces through Time: Prosopography in Practice across Big Data; the Thesaurus of British and Irish History as SKOS; and Born Digital Big Data and Approaches for History and the Humanities. Professor Winters is a Fellow and Councillor of the Royal Historical Society, and a member of RESAW (Research Infrastructure for the Study of the Archived Web), the Academic Steering & Advocacy Committee of the Open Library of Humanities, the Advisory Board of the European Holocaust Research Infrastructure, the Advisory Board of Cambridge Digital Humanities, and the UK UNESCO Memory of the World Committee. Jane’s research interests include digital history, born-digital archives (particularly the archived web), big data for humanities research, peer review in the digital environment, text editing and open access publishing. Recent publications include: ● ‘Giving with one hand, taking with the other: e-legal deposit, web archives and researcher access’, in Electronic Legal Deposit: Shaping the Library Collections of the Future, ed. Paul Gooding and Melissa Terras (London: Facet Publishing, 2019); ● ‘Negotiating the born digital: a problem of search‘, Archives and Manuscripts, 47:4 2019; ● ‘Negotiating the archives of UK web space‘, in The Historical Web and Digital Humanities: the Case of National Web Domains, ed. Niels Brügger and Ditte Laursen (London: Routledge, 2019); http://www.facetpublishing.co.uk/title.php?id=303779&category_code=10#.XY-rY0ZKjIV http://www.facetpublishing.co.uk/title.php?id=303779&category_code=10#.XY-rY0ZKjIV https://www.tandfonline.com/doi/abs/10.1080/01576895.2019.1640753?journalCode=raam20 https://www.routledge.com/The-Historical-Web-and-Digital-Humanities/Brugger-Laursen/p/book/9781138294318 8 ● ‘Web archives and (digital) history: a troubled past and a promising future?’ in The SAGE Handbook of Web History, ed. Niels Brügger and Ian Milligan (SAGE Publications Ltd., 2019) https://us.sagepub.com/en-us/nam/the-sage-handbook-of-web-history/book252251 9 #EWAVirtual Programme DAY ONE: 21 September 2020 9.45 (IRE) / 10.45 (CEST) WELCOME Professor Tom O’Connor, Director of Maynooth University Arts and Humanities Institute Michael Kurzmeier, #EWAVirtual Co-Chair, (Maynooth University) 10.00 (IRE) / 11.00 (CEST) KEYNOTE Chair: Joanna Finegan (National Library of Ireland) Professor Niels Brügger, Aarhus University: The variety of European web archives — potential effects for future humanities research 11.00 (IRE) / 12.00 (CEST) Session 1: Archiving Initiatives Chair: Jason Webber (UK Web Archive, British Library) ● Maria Ryan (National Library of Ireland): The National Library of Ireland’s Web Archive: preserving Ireland’s online life for tomorrow ● Sara Day Thomson (University of Edinburgh) Developing a Web Archiving Strategy for the Covid-19 Collecting Initiative at the University of Edinburgh ● Dr. Kees Teszelszky (KB – National Library of the Netherlands): Internet for everyone: the selection and harvest of the homepages of the oldest Dutch provider XS4ALL (1993- 2001) 12.00 (IRE) / 13.00 (CEST) Session 2: Collaborations Chair: Patricia Duffe (Maynooth University) ● Dr. Brendan Power (The Library of Trinity College Dublin): Leveraging the UK Web Archive in an Irish context: Challenges and Opportunities 10 ● Sarah Haylett & Patricia Falcao (Tate): Creating a web archive at Tate: an opportunity for ongoing collaboration 12.40 (IRE) / 13.40 (CEST) Session 3: Archiving Initiatives (lightning round) Chair: Rebecca O’Neill (Maynooth University) ● Rosita Murchan (Public Record Office of Northern Ireland): PRONI Web Archive: A Collaborative Approach ● Inge Rudomino & Marta Matijević (Croatian Web Archive, National and University Library in Zagreb – NSK): An overview of 15 years of experience in archiving the Croatian web ● Robert McNicol (Kenneth Ritchie Wimbledon Library): The UK Web Archive and Wimbledon: A Winning Combination 14.00 (IRE) / 15.00 (CEST) Session 4: Research Engagement & Access Chair: Chris Beausang (Maynooth University) ● Dr. Peter Mechant; Sally Chambers; Eveline Vlassenroot (Ghent University); Friedel Geeraert (KBR – Royal Library and the State Archives of Belgium): Piloting access to the Belgian web-archive for scientific research: a methodological exploration ● Sharon Healy (Maynooth University): Awareness and Engagement with Web Archives in Irish Academic Institutions 14.40 (IRE) / 15.40 (CEST) / 09:40 (EDT) Session 5: Archiving Initiatives Chair: Sara Day Thomson (University of Edinburgh) ● Anisa Hawes (Independent Curatorial Researcher): Archiving 1418-Now using Rhizome’s Webrecorder: observations and reflections ● Nicole Greenhouse (New York University Libraries): Managing the Lifecycle of Web Archiving at a Large Private University 11 15.30 (IRE) / 16.30 (CEST) Session 6: Social Science & Politics Chair: Dr. Claire McGinn (Institute of Art, Design and Technology, Dún Laoghaire) ● Benedikt Adelmann MSc & Dr. Lina Franken (University of Hamburg): Thematic web crawling and scraping as a way to form focussed web archives ● Andrea Prokopová (Webarchiv, National Library of the Czech Republic): Metadata for social science research ● Dr. Derek Greene (University College Dublin): Exploring Web Archive Networks: The Case of the 2018 Irish Presidential Election 16.10 (IRE) / 17.10 (CEST) / 11:10 (EDT) Session 7: Collaborations & Teaching Chair: Dr. Joseph Timoney (Maynooth University) ● Olga Holownia (International Internet Preservation Consortium): IIPC: training, research, and outreach activities ● Dr. Juan-José Boté (Universitat de Barcelona): Using web archives to teach and opportunities on the information science field 16.50 (IRE) / 17.50 (CEST) / 10:50 (CST) Session 8: Research of Web Archives Chair: Sally Chambers (Ghent Centre for Digital Humanities, Ghent University) ● Bartłomiej Konopa (State Archives in Bydgoszcz; Nicolaus Copernicus University): Web archiving – professionals and amateurs ● Prof. Lynne M. Rudasill & Dr. Steven W. Witt (University of Illinois at Urbana- Champaign): Opportunities for Use, Challenges for Collections: Exploring Archive-It for Sites and Synergies 12 DAY TWO: 22 September 2020 9.45 (IRE) / 10.45 (CEST) WELCOME Michael Kurzmeier, EWA Co-Chair (Maynooth University) 10.00 (IRE) / 11.00 (CEST) KEYNOTE Chair: Maria Ryan (National Library of Ireland) Professor Jane Winters, School of Advanced Study, University of London: Web archives as sites of collaboration 11.00 (IRE) / 12.00 (CEST) Session 9: Research Approaches Chair: Jason Webber (UK Web Archive, British Library) ● Dr. Peter Webster (Independent Scholar, Historian and Consultant): Digital archaeology in the web of links: reconstructing a late-90s web sphere ● Michael Kurzmeier (Maynooth University): Web defacements and takeovers and their role in web archiving 11.40 (IRE) / 12.40 (CEST) Session 10: Culture & Sport Chair: Gavin Mac Allister (Irish Military War Museum) ● Dr. Philipp Budka (University of Vienna; Free University Berlin): MyKnet.org: Traces of Digital Decoloniality in an Indigenous Web-Based Environment ● Helena Byrne (British Library): From the sidelines to the archived web: What are the most annoying football phrases in the UK? 13 12.30 (IRE) / 13.30 (CEST) Session 11: Research (lightning round) Chair: Dr Julie Brooks (School of History, University College Dublin) ● Caio de Castro Mello Santos & Daniela Cotta de Azevedo Major (School of Advanced Study, University of London): Tracking and Analysing Media Events through Web Archives ● Dr. Eamonn Bell (Trinity College Dublin): Reanimating the CDLink platform: A challenge for the preservation of mid-1990s Web-based interactive media and net.art ● Hannah Connell (King’s College London; British Library): Curating culturally themed collections online: The Russia in the UK Special Collection, UK Web Archive 14.00 (IRE) / 15.00 (CEST) / 9.00 (EST) Session 12: Youth & Family Chair: Dr. Lina Franken (University of Hamburg) ● Katie Mackinnon (University of Toronto): DELETE MY ACCOUNT: Ethical Approaches to Researching Youth Cultures in Historical Web Archives ● Dr. Susan Aasman (University of Groningen): Changing platforms of ritualized memory practices. Assessing the value of family websites 14.40 (IRE) / 15.40 (CEST) Session 13: Source code and app histories Chair: Prof. David Malone (Hamilton Institute, Maynooth University) ● Dr. Anne Helmond (University of Amsterdam) & Fernando van der Vlist (Utrecht University): Platform and app histories: Assessing source availability in web archives and app repositories ● Dr. Janne Nielsen (Aarhus University) Exploring archived source code: computational approaches to historical studies of web tracking 14 15.30 (IRE) / 16.30 (CEST) / 10.30 (EST) Session 14: AI and Infrastructures Chair: Dr. Juan-José Boté (Universitat de Barcelona) ● Mark Bell; Tom Storrar; Dr. Eirini Goudarouli; Pip Willcox (The National Archives, UK); David Beavan; Dr. Barbara McGillivray; Dr. Federico Nanni (The Alan Turing Institute): Cross-sector interdisciplinary collaboration to discover topics and trends in the UK Government Web Archive: a reflection on process ● Dr. Jessica Ogden (University of Southampton) & Emily Maemura (University of Toronto): A tale of two web archives: Challenges of engaging web archival infrastructures for research 16.10 (IRE) / 17.10 (CEST) Session 15: WARC and OAIS Chair: Kieran O’Leary (National Library of Ireland) ● Consultative Committee for Space Data Systems (CCSDS), Data Archive Interoperability (DAI) Working Group; Michael W. Kearney III; David Giaretta; John Garrett; Steve Hughes: What’s missing from WARC? (Abstract/Bio) 16.45 (IRE) / 17.45 (CEST) / 08:45 (PDT) Session 16: Web Archives as Scholarly Dataset Chair: Michael Kurzmeier (Maynooth University) ● Dr. Helge Holzmann & Mr. Jefferson Bailey (Internet Archive): Web Archives as Scholarly Dataset to Study the Web 17.15 (IRE) / 18.15 (CEST) An Irish Tale / Scéal Éireannach 17.45 (IRE) / 18.45 (CEST) The Future of EWA Sharon Healy & Michael Kurzmeier (Maynooth University) 15 #EWAVirtual Abstracts Session 1: Archiving Initiatives The National Library of Ireland's Web Archive: preserving Ireland's online life for tomorrow Maria Ryan (National Library of Ireland) Keywords: Collection development, national domains, web archives, research, datasets ABSTRACT The National Library of Ireland (NLI) was founded in 1877 and its mission remains the same today; to collect, protect and make available the memory of Ireland. The library cares for a collection of over ten million physical items, with collections including manuscripts, photographs, prints and drawings and an extensive ephemera collection. In the 21st century, the NLI is working towards meeting the challenges of the digital world; collecting, preserving and providing access to a born digital record of Irish life. This presentation aims to examine the NLI web archive and highlight its importance to the documentation of Irish society and culture. In 2011, the general and presidential election provided the catalyst for a pilot web-archiving project. Following the success of this project, the NLI focused on establishing the web-archiving programme by archiving political, cultural and social websites, capturing a record of elections, budgets, the decade of commemorations and historic events such as the 2015 marriage referendum. In 2016, the NLI received its first full time web archivist and launched a significant promotional drive around the 2016 commemorative project ‘Remembering 1916, Recording 2016’. In 2017, The NLI also undertook a domain crawl of the Irish web, allowing for the capture of a wider range of websites and greater amounts of data, when compared with the selective web archive. The 2017 crawl encompassed all of the Irish top-level domain and other relevant websites that could be recognised as being hosted in Ireland but outside the .ie domain. It also used language detection software to identify Irish language websites outside the national domain. The crawl 16 amounted in almost 40 TB of unique data, which is preserved in the NLI. However, due to legislative restrictions, this data cannot be made available to researchers. In the past nine years, the NLI web archive has grown and developed into what is now an established collecting strand in the NLI. Workflow development and a comprehensive collecting strategy has seen the web archive grow and mature. The NLI has embarked up to new opportunities for collaboration and research. Collaboration is at the heart of the values of the NLI and it has helped us broaden our collections and provide datasets to new researchers. The future of research lies largely in born digital archives. The social, political and historical researchers of the future will require a record of the 21st century in Ireland. In other words, they will need web archives. This presentation will explore how the NLI is dedicated to building an Irish web archive that will document Irish life for decades to come. Biography: Maria Ryan is an assistant keeper and web archivist at the National Library of Ireland. A qualified archivist, she is co-chair of the IIPC training working group and a member of the NLI's diversity and inclusion committee. Developing a Web Archiving Strategy for the Covid-19 Collecting Initiative at the University of Edinburgh Sara Day Thomson (Digital Archivist, Centre for Research Collections, University of Edinburgh) Keywords: Covid-19, web archiving strategy, challenges, opportunities for collaboration; web archive collections ABSTRACT In this talk, the Digital Archivist at the University of Edinburgh will discuss the process (so far) for developing a strategy for capturing and preserving web-based submissions to their Collecting Covid-19 Initiative. She will also present plans for using this process as a springboard to develop a wider institutional programme(s) of web archiving. 17 In April, the Centre for Research Collections (CRC) put out an open call for members of the university community to submit materials that document their experiences of the Covid-19 pandemic and lockdown [1]. Depositors are invited to submit their digital records using a web form embedded on the university website [2]. At the time of the open call, the CRC did not have an established web archiving programme. Therefore, a new strategy had to be developed in response to the influx of web-based submissions (and other relevant web pages identified by the collecting team). This strategy, further, had to address the identified concerns of the Initiative: namely speedy deployment, but also handling sensitive material, understanding potential research uses, and balancing metadata requirements with low-barrier submission requirements. The project team is now in the early stages of a partnership with the UK Web Archive through the National Library of Scotland. The CRC team will curate a special collection for the Collecting Covid-19 Initiative using the UKWA’s infrastructure and guidance. Recognising some of the limitations of this approach, the Digital Archivist will supplement the Collecting Covid-19 collection with manual captures using OS tools, such as Conifer / Webrecorder Desktop and TAGS. In order to make the most use of this strategy, the Digital Archivist has invited the project team to view these steps as a pilot study for wider web archiving programmes. This pilot will include an evaluation of methods for: ● gathering and analysing user needs and requirements ● choosing an approach, either collaboration with the UKWA or OS tools ● training, both staff and researchers, to capture web content as part of their work ● outreach to the wider university community to raise awareness of web archiving and of available archived web resources Currently, the focus is finding a robust and reliable way to capture, curate, and preserve web-based submissions to the Covid-19 Collecting Initiative. However, in the coming months, the Digital Archivist hopes to lay the groundwork for next steps. First and foremost, she aims to host a series of focus groups (potentially virtually) with key researchers in collaboration with the Research Data Support team to better gather information about research needs and to raise the profile of available archived web content. 18 References: [1] University of Edinburgh, Staff News, ‘Covid-19 experiences to be documented’ https://www.ed.ac.uk/news/students/2020/covid-19-experiences-to-be-documented [2] University of Edinburgh, Collecting Covid-19 Initiative, https://www.ed.ac.uk/information- services/library-museum-gallery/crc/collecting-covid-19-initiative Biography: Sara Day Thomson is Digital Archivist at the University of Edinburgh where she looks after the management and preservation of digital materials across collections. She joined the University from the Digital Preservation Coalition where she was Research Officer, supporting the development of new methods and technologies to ensure long-term access to digital data. She reconvened the DPC’s Web Archiving and Preservation Working Group, a forum for organisations to share experiences in archiving web content. She also contributed to the development of IIPC & DPC Beginner Web Archiving Training materials and is the author of Preserving Social Media, a DPC Technology Watch Report. Internet for everyone: the selection and harvest of the homepages of the oldest Dutch provider XS4ALL (1993-2001) Dr. Kees Teszelszky (Koninklijke Bibliotheek - National Library of the Netherlands) Keywords: web archiving, web archaeology, web incunables, homepages, early web ABSTRACT “Web incunables” can be defined as those websites which were published in the first stage of the world wide web between 1990 and 1998. The early sites of the nineties were made at the start of publishing texts on the web and mark the frontier between analogue prints on paper and digital publications on the web. The first Dutch homepage and web incunable was put online in 1993: the same year one of the oldest Dutch internet provider XS4ALL (“Access for All”) started to offer its services to customers for the first time. This provider was founded by hackers and techno- anarchists in this year. It attracted a large group of creative Dutch internet pioneers after the start in May 1993 who have built at least 10,000 homepages between 1993 and 2001, of which a large part is still online in some form. https://www.ed.ac.uk/news/students/2020/covid-19-experiences-to-be-documented https://www.ed.ac.uk/information-services/library-museum-gallery/crc/collecting-covid-19-initiative https://www.ed.ac.uk/information-services/library-museum-gallery/crc/collecting-covid-19-initiative 19 We can consider the remaining homepages as the most interesting born digital Dutch heritage collection still online and waiting to be studied. As XS4ALL was promoting and facilitating the building of these sites, the early web designers, artists, activists, writers and scientists were eagerly experimenting with the possibilities of the new medium in content, design and functionality. As XS4ALL was not so much seen as a company, but more as a society, many customers remained faithful to this provider till now. Due to this, a large amount of homepages of the early Dutch web can still be found at this provider. This heritage is however in danger. Dutch telephone company KPN took XS4ALL over in 1998 and announced in January 2019 to end this brand in near future. This is the reason why Koninklijke Bibliotheek - National Library of the Netherlands (KB-NL) started a web archiving project the same year to identify and rescue as much web incunables and early homepages as possible which are still hosted by this provider. This project was generously sponsored by SIDN-fonds and Stichting Internet4ALL. This paper describes the method and first results of the ongoing pilot research project on internet archaeology and web incunables of KB-NL. It is about web archiving a selection of web incunables published on the Dutch web before 2001 which mirror the development of Dutch online culture on the web. I will describe the methods and sum up the experiences with selecting and harvesting homepages and mapping the Dutch digital culture online by link analysis of this collection. I will discuss also the characteristics of web materials and archived web materials, among others the first Dutch interactive 3D house, a virtual metro line for the digital city of Amsterdam, the “Stone Age Computer” and the first Dutch online literature magazine. I will also explain the use of these various materials (harvested websites, metadata link clouds, context information) for future research on the history of the Dutch web. Biography: Kees Teszelszky (1972) is a historian and curator of the digital collections at the Koninklijke Bibliotheek - National Library of The Netherlands. He graduated at the University of Leiden (Political Science, 1999) and at the University of Amsterdam (East European Studies, 1998) and obtained his PhD at the University of Groningen (Cultural History, 2006). He has been involved in research on web archiving and born digital sources since 2012. His present research field covers the selection, harvest and presentation of born digital sources at the KB. He is currently involved in projects on internet archaeology in the Netherlands, mapping the Frisian and Dutch national web domain, online news and the historic sources of our Post-truth era. 20 Session 2: Collaborations Leveraging the UK Web Archive in an Irish context: Challenges and Opportunities Dr Brendan Power (The Library of Trinity College Dublin) Keywords: web archives, collaboration, legal deposit, 1916 Easter Rising ABSTRACT This paper will discuss a project to curate an archive of websites undertaken by The Library of Trinity College Dublin. The context for these projects was the UK legal deposit environment in which the six Legal Deposit Libraries (LDL’s) work together to help preserve the UK’s knowledge and memory. In 2013 the legal deposit remit was extended to include non-print, electronically published material, which means the LDL’s may now capture and archive any freely available websites that are published or hosted in the UK. This happens in the Legal Deposit UK Web Archive, with the British Library providing the technical and curatorial infrastructure, and all LDL’s contributing at both the strategic and planning level, and through curating themed collections. In this paper I will present a case study which demonstrates how The Library of Trinity College Dublin has explored the challenges and opportunities of utilising the research potential of this vast new resource. The 1916 Easter Rising collection was a collaborative project in 2015/2016 between The Library of Trinity College Dublin (University of Dublin), the Bodleian Libraries (University of Oxford), and the British Library. The project aimed to identify, collect, and preserve websites that contribute to an understanding of the 1916 Easter Rising, with the aim of enabling critical reflection on both the Rising itself, and how it was commemorated in 2016. The project was a test case for effective collaboration between libraries in multiple jurisdictions helping to explore how themed, curated web archive collections can promote the potential of web archives to a wider audience. The presentation will review the project and outline the challenges and opportunities that emerged as it progressed. In particular, it will highlight the challenges that arose from working across multiple jurisdictions, and the implications of different legislative frameworks for archive curation and collection building. 21 Biography: Brendan Power is Digital Preservation Librarian at The Library of Trinity College Dublin. He holds a BA from Dublin City University, an MPhil and PhD in History from Trinity College, the University of Dublin, and an MLIS from University College Dublin. A former Postdoctoral Research Fellow at Trinity College Dublin, he acted as the Web Archive Project Officer on the 1916 Easter Rising Web Archive and has previously published on this project. Creating a web archive at Tate: an opportunity for ongoing collaboration Sarah Haylett (Tate) Patricia Falcao (Tate) Keywords: web archives, net art, digital preservation, web-based art, archives ABSTRACT In the year 2000, Tate commissioned the first of fifteen net artworks for the then newly launched Tate website, Tate Online, which was devised as the fifth gallery. The commissioned artworks were meant to attract and challenge visitors to this still new online space. Initially these works were closely entwined with the main website, they were highlighted on the front page of the site, but as the number of works grew and Tate Online changed focus, these works were grouped together under the Intermedia Art microsite alongside contextualising texts, a programme of events and podcasts. The Intermedia website still exists online, but it has not been updated since 2012 and sits on a server that is now outdated and will eventually have to be decommissioned. Tate does not archive its website, as a public body this is carried out by The National Archives UK Government Web Archive. It has a significant number of captures for the Intermedia website, but it is not consistent in capturing its interactive content - which was a key feature of several of the commissioned artworks. Therefore, due to these gaps and missing contextual information, there is not a representative or effective archived version of the Intermedia website, or the artworks available. As part of the Andrew W. Mellon Foundation funded project Reshaping the Collectible: When Artworks Live in the Museum, a team of interdisciplinary researchers are looking at the history of the Net Art commissioning programme, the strategies to preserve the artworks and website as well 22 as looking to build Tate’s capacity to collect internet art. The project is also an opportunity to go beyond the artwork collection and consider the same set of issues from the perspective of institutional records and the Tate Archive. The developments in digital preservation, web archiving and more specifically in small scale web recording and emulation, means that this was the perfect moment to undertake extensive captures and documentation of the Intermedia Art website and individual artworks as they exist now. This has included extensive discussion with the artists who continue to host the works on their own servers. This paper will present the different but complementary perspectives of both Tate’s archive and Time-Based Media Conservation as they have worked together to understand the intricacies of documenting, conserving and maintaining the integrity and accessibility of web-based art and its online records in the contemporary art museum. It will discuss the tools and methodology used to archive the website and the plans to make it available as Tate’s first website archived as a public record. Biographies: Patricia Falcao is a Time-based Media Conservator with a broad interest in the preservation of the digital components of contemporary artworks. She has worked at Tate since 2008, and currently works in the acquisition of media-based media artworks into the Collection. She currently collaborates with Tate’s Research Department in the Reshaping the Collectible project, looking at the preservation of websites in Tate’s context, as well as working with Tate’s Technology team to continue to develop Tate's strategy for the preservation of high value digital assets. Patricia completed her MA at the University of the Arts in Bern with a thesis on risk assessment for software-based artworks. She continues to develop research in this field in her role as a Doctoral Researcher in the AHRC funded Collaborative Doctoral Program, between Tate Research and the Computing Department at Goldsmiths College, University of London. The subject of her research are the practices of software-based art preservation in collections, by artists and in the gaming industry. Sarah Haylett is a professional Archivist; she received her MA in Archives and Records Management from UCL in 2014. She joined Tate in June 2018 having previously worked at Zaha Hadid Architects, The Photographers’ Gallery and with private collectors. As part of the Reshaping the Collectible: When Artworks Live in the Museum project team, her research interests are rooted in the relationship between archival and curatorial theory and how, beyond a culture of compliance, Tate’s record keeping can be more intuitive to research and collecting practice. She is very interested in sites of archival creation and intention, and how these are represented in artistic practice and the contemporary art museum. 23 Session 3: Archiving Initiatives (Lightning Round) PRONI Web Archive: A collaborative approach Rosita Murchan (Public Record Office of Northern Ireland - PRONI) Keywords: Collaborations, challenges, resources, permissions, partnerships ABSTRACT The Public record of Northern Ireland web archive has been building its collection of websites for almost ten years, focusing initially on capturing the websites of our local councils and Government departments and those deemed historically or culturally important to Northern Ireland. However, unlike the UK and Ireland, Northern Ireland do not have Legal deposit status and as a result we are sometimes limited as to what we can capture. As the web archive has grown and evolved organically over the years with more and more requests for websites to be archived, PRONI has had to look at the issue of gaining permissions (and capturing sites without any legal deposit legislation) and on how we can continue to grow our collection with the limited resources we have available to us. One of the ways in which we are able to expand the scope of the collection is through collaborations not only with other institutes such as the British Library, that allow us to capture sites that would usually be outside our remit, but also by working in partnership with the other sections within our organisation. The aim of this short presentation will be to look in more depth at PRONI’s work with the web Archive, the strategies we have used to build it, our collaborative projects, and the challenges and obstacles we face as we continue to grow. Biography: Rosita Murchan has worked with the Public Record Office for two years and has been working solely on the web archive for one year. 24 An overview of 15 years of experience in archiving the Croatian web Inge Rudomino (Croatian Web Archive, National and University Library in Zagreb – NSK) Marta Matijević (Croatian Web Archive, National and University Library in Zagreb – NSK) Keywords: legal deposit, Croatian Web Archive, web archiving, open access, online publication ABSTRACT National and University Library in Zagreb (NSK) began archiving Croatian web in 2004 , in collaboration with the University of Zagreb University Computing Centre (SRCE) when the Croatian Web Archive (HAW) was established. The basis for archiving web was the Law on libraries (1997) which subjected online publications to legal deposit. To harvest the web, HAW is using three different approaches: selective, .hr domain harvesting and thematic harvesting. In period from 2004 to 2010, HAW was based only on the concept of selective harvesting which implies that each resource is selected to be archived according to established Selection Criteria. Each title has a full level of bibliographic description and is retrievable in library online catalogue providing the end user with high quality archived copy. Special care is given to news portals which are archived daily. To each title and archived copy an URN:NBN identifier is assigned to ensure permanent access that is of great importance for future citations. Since 2010, HAW conducts .hr domain crawls annually and harvests websites related to topics and events of national importance periodically. HAW’s primal task is to ensure that harvested resources are preserved in their entirety, original format and with all the accompanying functionalities. Majority of harvested content is in open access. The poster will present a fifteen years’ experience of the National and University Library in Zagreb (NSK) in managing web resources with the emphasis on selective, domain and thematic harvestings as well as new website design with new functionalities. Biographies: Inge Rudomino: Senior librarian at Croatian Web Archive, National and University Library in Zagreb (Croatia). Graduated at Information Sciences (Librarianship), Faculty of Philosophy, University of Zagreb. From 2001 to 2007 works as a cataloguer in Department for Cataloguing Foreign Publications in National and University Library in Zagreb. Since 2007 works at Croatian Web Archive on tasks which include identification, selection, cataloguing, archiving, maintaining 25 Croatian Web Archive, communications with publishers, and promotion. Publishes articles in Croatian and conference proceedings in the field of web archiving. Marta Matijević: MA is a librarian at Croatian Web Archive, National and University Library in Zagreb. Graduated Library and Information Science at Faculty of Humanities and Social Sciences in Osijek in 2016. From 2016 to 2018 has worked in academic and school libraries. Since 2019 works at Croatian Web Archive on identification, selection, cataloguing, archiving, maintaining Archive, communication with publishers and promotion. Her interests are web archiving and information theories and has published papers in such fields. The UK Web Archive and Wimbledon: A Winning Combination Robert McNicol (Kenneth Ritchie Wimbledon Library, Wimbledon Lawn Tennis Museum) Keywords: Tennis, Sport, Collaboration, Heritage, Preservation ABSTRACT Since January 2019, the Kenneth Ritchie Wimbledon Library, the world's largest tennis library, has been collaborating with the British Library on a web archiving project. The Wimbledon Library is curating the Tennis subsection of the UK Web Archive Sports Collection. The UK Web Archive aims to collect every UK website at least once per year and they also work with subject specialists to curate collections of websites on specific subjects. The ultimate aim is for the Tennis collection to contain all UK-based tennis-related websites. This will include websites relating to tournaments, clubs, players and governing bodies. It will also include social media feeds of individuals or organisations involved with tennis in the UK. Already we have collected the twitter feeds of all male and female British players with a world ranking. We have also archived Wimbledon’s own digital presence, including the award-winning Wimbledon.com, which celebrates its 25th anniversary in 2020. In addition to this we have archived Wimbledon’s social media accounts, including those belonging to the Museum and the Wimbledon Foundation and its international digital presence in the form of the Wimbledon page on Weibo, a Chinese social media site. This falls within the scope of the project as, although the site is not an English language one, it is based in the UK. 26 The collaboration is mutually beneficial. For a small, specialist library such as ours, there are many advantages to having a partnership with the British Library. Equally, the UK Web Archive benefits from our specialist expertise in curating their Tennis collection. In many ways, a project like this one is perfect for Wimbledon. Although our history and heritage are at the heart of everything we do, we’re always innovating and striving to improve as well. That’s why this project, which involves using the latest technology to preserve tennis history, is so exciting for us. This presentation will give an overview of why the Kenneth Ritchie Wimbledon Library wanted to get involved in web archiving, how the collaboration with the UK Web Archive came about and give an overview what has been collected so far. Biography: Since March 2016 I have worked as the Librarian of the Kenneth Ritchie Wimbledon Library, which is part of the Wimbledon Lawn Tennis Museum. Prior to this, I had a long career as a media librarian, mostly working in sport. From 2008 to 2016 I was Sport Media Manager at BBC Scotland in Glasgow. Before that, I also worked for the BBC in London and Aberdeen and I also worked briefly for ITV Sport and Sky Sports. I studied History at the University of Glasgow and Information and Library Studies at the University of Strathclyde. 27 Session 4: Research Engagement & Access Piloting access to the Belgian web-archive for scientific research: a methodological exploration Dr. Peter Mechant (Ghent University) Sally Chambers (Ghent University) Eveline Vlassenroot (Ghent University) Friedel Geeraert (KBR - Royal Library and the State Archives of Belgium) Keywords: research use of web archives, web-archiving, digital humanities, born-digital collections, digital research labs ABSTRACT The web is fraught with contradiction. On the one hand, the web has become a central means of information in everyday life and therefore holds the primary sources of our history created by a large variety of people (Milligan, 2016; Winters, 2017). Yet, much less importance is attached to its preservation, meaning that potentially interesting sources for future (humanities) research are lost. Web archiving therefore is a direct result of the computational turn and has a role to play in knowledge production and dissemination as demonstrated by a number of publications (e.g. Brügger & Schroeder, 2017) and research initiatives related to the research use of web archives (e.g. https://resaw.eu/). However, conducting research, and answering research questions based on web archives - in short; ‘using web archives as a data resource for digital scholars’ (Vlassenroot et al., 2019) - demonstrates that this so-called ‘computational turn’ in humanities and social sciences (i.e. the increased incorporation of advanced computational research methods and large datasets into disciplines which have traditionally dealt with considerably more limited collections of evidence), indeed requires new skills and new software. In December 2016, a pilot web-archiving project called PROMISE (PReserving Online Multiple Information: towards a Belgian StratEgy) was funded. The aim of the project was to (i) identify current best practices in web-archiving and apply them to the Belgian context, (ii) pilot Belgian 28 web-archiving, (iii) pilot access (and use) of the pilot Belgian web archive for scientific research, and (iv) make recommendations for a sustainable web-archiving service for Belgium. Now the project is moving towards its final stages, the project team is focusing on the third objective of the project, namely how pilot access to the Belgian web archive for scientific research. The aim of this presentation is to discuss how the PROMISE team approached piloting access to the Belgian web- archive for scientific research, including: a) reviewing how existing web-archives provide access to their collections for research, b) assessing the needs of researchers based on a range of initiatives focussing on research-use of web-archives (e.g. RESAW, BUDDAH, WARCnet, IIPC Research Working Group, etc. and c) exploring how the five persona’s created as part of the French National Library’s Corpus project (Moiraghi, 2018) could help us to explore how different types of academic researchers that might use web archives in their research. Finally, we will introduce the emerging Digital Research Lab at the Royal Library of Belgium (KBR) as part of a long-term collaboration with the Ghent Centre for Digital Humanities (GhentCDH) which aims to facilitate data-level access to KBR’s digitised and born-digital collections and could potentially provide the solution for offering research access to the Belgian web-archive. Bibliography Brügger, N. & Schroeder, R. (Eds.). (2017). The web as history: Using web archives to understand the past and present. London: UCL Press. Milligan, I. (2016). Lost in the infinite archive: the promise and pitfalls of web archives. International Journal of Humanities and Arts Computing, 10(1), 78-94. Doi: 10.3366/ijhac.2016.0161. Moiraghi, E. (2018). Le projet Corpus et ses publics potentiels: Une étude prospective sur les besoins et les attentes des futurs usagers. [Rapport de recherche] Bibliothèque nationale de France. 2018. ⟨hal-01739730⟩ Winters, J. (2017). Breaking into the mainstream: demonstrating the value of internet (and web) histories. Internet Histories, 1(1-2), 173-179. https://doi.org/10.1080/24701475.2017.1305713. Vlassenroot, E., Chambers, S., Di Pretoro, E., Geeraert, F., Haesendonck, G., Michel, A., & Mechant, P. (2019). Web archives as a data resource for digital scholars. International Journal of Digital Humanities, 1(1), 85-111. https://doi.org/10.1007/s42803-019-00007-7 Biographies: Dr Peter Mechant holds a PhD in Communication Sciences from Ghent University (2012). After joining research group mict (www.mict.be), Peter has been mainly working on research projects related to e-gov (open and linked data), smart cities, online communities and web archiving. As 29 senior researcher, he is currently involved in managing projects and project proposals at a European, national as well as regional level. Sally Chambers is Digital Humanities Research Coordinator at the Ghent Centre for Digital Humanities, Ghent University, Belgium and National Coordinator for DARIAH in Belgium. She is one of the instigators of an emerging Digital Research Lab at KBR, Royal Library of Belgium as part of a long-term collaboration with the Ghent Centre for Digital Humanities. This lab will facilitate data-level access to KBR’s digitised and born-digital collections for digital humanities research. Her role in PROMISE relates to research access and use of Belgium’s web-archive. Eveline Vlassenroot holds a Bachelor Degree in Communication Sciences (Ghent University) and graduated in 2016 as a Master in Communication Sciences with a specialisation in New Media and Society (Ghent University). After completing additional courses in Information Management & Security at Thomas More Mechelen (KU Leuven), she joined imec-mict-Ghent University in September 2017. She participates in the PROMISE project (Preserving Online Multiple Information: towards a Belgian StratEgy), where she is researching international best-practices for preserving and archiving online information. She is also involved in several projects with the Flemish government regarding data standards, the governance of interoperability standards and linked open data. Friedel Geeraert is a researcher at KBR (Royal Library) and the State Archives of Belgium, where she works on the PROMISE project that focuses on the development of a Belgian web archive at the federal level. Her role in the project includes comparing and analysing best practices regarding selection of and providing access to the information and data to be archived and making recommendations for the development of a long-term and sustainable web archiving service in Belgium. Reimagining Web Archiving as a Realtime Global Open Research Platform: The GDELT Project Dr. Kalev Hannes Leetaru (The GDELT Project) Keywords: GDELT Project; realtime; research-first web archive; news homepages ABSTRACT The GDELT Project (https://www.gdeltproject.org/) is a realization of the vision I laid out at the opening of the 2012 IIPC General Assembly for the transformation of web archives into open research platforms. Today GDELT is one of the world’s largest global open research datasets for understanding human society, spanning 200 years in 152 languages across almost every country on earth. Its datasets span text, imagery, spoken word and video, enabling fundamentally new https://www.gdeltproject.org/ https://blogs.loc.gov/thesignal/2012/05/a-vision-of-the-role-and-future-of-web-archives-conclusions-and-the-role-of-archives/ 30 kinds of multimodal analyses and reach deeply into local sources to reflect the richly diverse global landscape of events, narratives and emotions. At its core, GDELT in the web era is essentially a realtime production research-centered web archive centered on global news (defined as sources used to inform societies, both professional and citizen-generated). It continually maps the global digital news landscape in realtime across countries, languages and narrative communities, acting both as archival facilitator (providing a live stream of every URL it discovers to organizations including the Internet Archive for permanent preservation) and research platform. In contrast to the traditional post-analytic workflow most commonly associated with web archival research, in which archives are queried, sampled and analyzed after creation, GDELT focuses on realtime analysis, processing every single piece of content it encounters through an ever-growing array of standing datasets and APIs spanning rules-based, statistical and neural methodologies. Native analysis of 152 languages is supported, while machine translation is used to live translate everything it monitors in 65 languages, enabling language-independent search and analysis. Twin global crawler and computational fleets are distributed across 24 data centers across 17 countries, leveraging Google Cloud’s Compute Engine and Cloud Storage infrastructures, coupled with its ever-growing array of AI services and APIs, underpinning regional ElasticSearch and bespoke database and analytic clusters and all feeding into petascale analytic platforms like BigQuery and Inference API for at-scale analyses. This massive global-scale system must operate entirely autonomously, scale to support enormous sudden loads (such as during breaking disasters) and function within an environment in which both the structure (rendering and transport technologies) and semantics (evolving language use) are in a state of perpetual and rapid change. Traditional web archives are not always well-aligned with the research questions of news analysis, which often require fixed time guarantees and a greater emphasis on areas like change detection and agenda setting. Thus, GDELT includes numerous specialized news-centric structural datasets including the Global Frontpage Graph that catalogs more than 50,000 major news homepages every hour on the hour, totaling nearly a quarter trillion links over the last two years to support agenda setting research. The Global Difference Graph recrawls every article after 24 hours and after one week with fixed time guarantees to generate a 152-language realtime news editing dataset cataloging stealth editing and silent deletions. Structural markup is examined and embedded social media posts cataloged as part of its Global Knowledge Graph. A vast distributed processing pipeline performs everything from entity extraction and emotional coding to SOTA language 31 modeling and claims and relationship mapping. Images are extracted from each article and analyzed by Cloud Vision, enabling analysis of the visual landscape of the web. Datasets from quotations to geography to relationships to emotions to entailment and dependency extracts are all computed and output in realtime, operating on either native or translated content. In essence, GDELT doesn’t just crawl the open web, it processes everything it sees in realtime to create a vast archive of rich realtime research datasets. This firehose of data feeds into downloadable datasets and APIs to enable realtime interactive analyses, while BigQuery enables at-scale explorations of limitless complexity, including one-line terascale graph construction and geographic analysis and full integration with the latest neural modeling approaches. Full integration with GCE, GCS and BigQuery couples realtime analysis of GDELT’s rich standing annotations with the ability to interactively apply new analyses including arbitrarily complex neural modeling at scale. This means that GDELT is able to both provide a standing set of realtime annotations over everything it encounters and support traditional post-facto analysis at the effectively infinite scale of the public cloud. From mapping global conflict and modeling global narratives to providing the data behind one of the earliest alerts of the COVID-19 pandemic, GDELT showcases what a research-first web archive is capable of and how to leverage the full power of the modern cloud in transforming web archives from cold storage into realtime open research platforms. Biography Dr. Kalev Hannes Leetaru - One of Foreign Policy Magazine's Top 100 Global Thinkers of 2013, Kalev founded the open data GDELT Project. From 2013-2014 he was the Yahoo! Fellow in Residence of International Values, Communications Technology & the Global Internet at Georgetown University's Edmund A. Walsh School of Foreign Service, where he was also an Adjunct Assistant Professor, as well as a Council Member of the World Economic Forum's Global Agenda Council on the Future of Government. His work has been profiled in the presses of more than 100 nations and in 2011 The Economist selected his Culturomics 2.0 study as one of just five science discoveries deemed the most significant developments of 2011. Kalev’s work focuses on how innovative applications of the world's largest datasets, computing platforms, algorithms and mind-sets can reimagine the way we understand and interact with our global world. More on his latest projects can be found on his website at https://www.kalevleetaru.com/ or https://blog.gdeltproject.org. https://www.gdeltproject.org/ https://www.kalevleetaru.com/ https://blog.gdeltproject.org/ 32 Session 5: Archiving Initiatives Archiving 1418-Now using Rhizome’s Webrecorder: observations and reflections Anisa Hawes (Independent Curatorial Researcher and Web Archivist) Keywords: web archiving tools, social media, curation, process, Webrecorder ABSTRACT This paper explores the challenges of archiving https://www.1418now.org.uk/ and its associated social media profiles (Twitter, Instagram, and YouTube) using Rhizome’s Webrecorder. These web collections form an integral part of the Imperial War Museum’s record of the 14-18Now WW1 Centenary Art Commissions programme and represent a recognition that essential facets of many of the Commissions would otherwise be absent from the archive. Immediate public responses to Jeremy Deller’s modern memorial event We're Here Because We're Here, for example, played out in the contemporary context of Web 2.0. Many people who encountered the memorial directly were moved to share their reflections on social media. Many others encountered the event indirectly: via messages, images, and videos which circulated on social networking platforms. In this way, the online sphere became an expanded site of public participation and experience. Meanwhile, imprinted engagement metrics and appended comments threads provided unprecedented curatorial insight into the artwork's impact and reach. Webrecorder is a free, open-source web archiving tool developed by Rhizome. It enables high- fidelity capture of complex, interactive web pages, including social media sites. Written from the point of view of a curatorial researcher, this paper includes insights into the web archiving process and workflow. Combining work-in-progress screenshots and reflections extracted from my log notes, I’ll explain how I have utilised Webrecorder’s automation features and scripted behaviours alongside manual, action-by-action capture to build a rich collection, tackling the challenge of archiving both in-detail and at-scale. 33 Biography: Anisa Hawes is an independent curatorial researcher and web archivist based in London, UK. As an embedded researcher at the Victoria and Albert Museum (2015-18) her work investigated how digital tools and software environments have altered design practice; and how the web and social media have produced new, participatory poster forms––such as memes which are appropriated as they circulate. Collaborating with Rhizome and British Library/UK Web Archive, she tested web archiving technologies to capture digital objects in the context of the platforms where they are created and encountered, whilst developing a framework of curatorial principles to support digital collecting. Managing the Lifecycle of Web Archiving at a Large Private University Nicole Greenhouse (New York University Libraries) Keywords: workflows, accessioning, description, quality assurance, context ABSTRACT New York University Libraries has been archiving websites since 2007. The collection, developed using the service Archive-It, consists of websites related to Labor and Left movements, the New York City downtown arts scene, contemporary composers, and university websites, totaling approximately 5000 websites and 13 terabytes of data. In 2018, I was hired as the first permanent structural archivist whose role is to solely manage the web archiving program. During this first year, it was important to the Archival Collections Management department in the NYU Libraries to incorporate web archiving in the greater workflows of the department as well as manage the day to day work that comes with web archiving, including capture, website submissions, quality assurance, and access and description. This presentation will discuss how we have developed a database to manage capture and quality assurance, as well as the ongoing project to accession recently added websites and create consistent description across all of the archived websites. The database allows us to track the lifecycle of each archived website and take advantage of the scoping and quality assurance tools provided by Archive-it but work around the service’s limitations. The presentation will conclude with an overview of descriptive practices by creating accession records to track why curators and archivists add websites to the collection and update finding aids that provide a greater amount of contextual description that goes beyond Dublin Core and in line with 34 the department’s descriptive policies to create transparent and standards compliant description in the context of the Special Collection’s analog collections. By creating records that put the web archives in the context of the rest of the collections, NYU is able to promote the use of the archived websites. Biography: Nicole Greenhouse is the Web Archivist in the Archival Collections Management department at New York University Libraries. Nicole received her MA in Archives and Public History at NYU. She has previously worked at the Winthrop Group, the Center for Jewish History, and the Jewish Theological Seminary on a variety of analog and digital archives projects. She is currently the Communications Manager for the Web Archiving Section of the Society of American Archivists. 35 Session 6: Social Science & Politics Thematic web crawling and scraping as a way to form focussed web archives Benedikt Adelmann MSc (University of Hamburg) Dr. Lina Franken (University of Hamburg) Keywords: web crawling, scraping, thematic focussed web archives, discourse analysis ABSTRACT For humanities and social science research on the contemporary, the web and web archives are growing in their relevance. Not much is available when it comes to thematically based collections of websites. In order to find out about ongoing online discussions, a web crawling and scraping is needed as soon as a larger collection shall be generated as a corpus for further explorations. Within the study presented here, we focus on the acceptance of telemedicine and its challenges. For the discourse analysis conducted (Keller 2005), the concept of telemedicine often is discussed within a broader field of digital health systems, while there are only few statements of relevance within single texts. Therefore, a large corpus is needed to identify relevant stakeholders and discourse positions and go into details of text passages – big data turns into small data and has to be filtered (see Koch/Franken 2019). Thematic web crawling and scraping (Barbaresi 2019: 30) is a mayor facilitator with these steps. Web crawling has to start from a list of so-called seed URLs, which in our case refer to the main pages of web sites of organizations (e.g. health insurance companies, doctors’ or patients’ associations) known to be involved in the topic of interest. From these seed URLs, our crawl explores the network structure expressed by the (hyper)links between webpages in a breadth-first manner (see Barbaresi 2015: 120ff. for an overview of web crawling practices). It is able to handle content with MIME types text/html, application/pdf, application/x-pdf and text/plain. Content text is extracted and linguistically pre-processed: tokenization, part-of-speech tagging, lemmatization (reduction of word forms to their basic forms). If the lemmatized text contains at least one of some pre-defined keywords (see Adelmann et al. 2019 for this semantic-field based approach), the original content of the webpage (HTML, PDF etc.) is saved as well as the results of the linguistic 36 pre-processing. (Hyper)links from HTML pages are followed if they refer to (other) URLs of the same host. If the HTML page is a match, and only then, links are also followed if their host is different. We employ some heuristics to correct malformed URLs and avoid a variety of non-trivial equivalences since we are testing whether a URL has already been visited by the crawler. Of saved pages, the crawler records accessed URLs, date and time of access, and other metadata, including the matched keywords. URLs only visited (but not saved) are recorded without metadata; found links between them are as well. The script is published as hermA-Crawler (Adelmann 2019). When using focussed web archives formed in this way, it is easy to use different approaches such as topic modelling (Blei 2012) or sentiment analysis (D'Andrea et al. 2015) on a larger base in order to support discourse analysis with digital humanities approaches. References: Adelmann, Benedikt; Andresen, Melanie; Begerow, Anke; Franken, Lina; Gius, Evelyn; Vauth, Mi-chael: Evaluation of a Semantic Field-Based Approach to Identifying Text Sections about Specific Topics. In: Book of Abstracts DH2019. https://dh2019.adho.org/wp- content/uploads/2019/04/Short-Papers_23032019.pdf. Adelmann, Benedikt: hermA-Crawler. https://github.com/benadelm/hermA-Crawler. Barbaresi, Adrien: Ad hoc and general-purpose corpus construction from web sources. Doctoral dissertation, Lyon, 2015. Barbaresi, Adrien: The Vast and the Focused: On the need for thematic web and blog corpora. In: Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC- 7), Cardiff, 2019. DOI: https://doi.org/10.14618/ids-pub-9025 Blei, David M.: Probabilistic topic models. Surveying a suite of algorithms that offer a solution to managing large document archives. In: Communications of the ACM 55 (2012), S. 77–84. D'Andrea, Alessia; Ferri, Fernando; Grifoni, Patrizia; Guzzo, Tiziana: Approaches, Tools and Applications for Sentiment Analysis Implementation. In: International Journal of Computer Applications 125 (2015), S. 26–33. DOI: 10.5120/ijca2015905866. Keller, Reiner: Analysing Discourse. An Approach from the Sociology of Knowledge. In: Forum: Qualitative Social Research Volume 6, No. 3, Art. 32 (2005). DOI: http://dx.doi.org/10.17169/fqs-6.3.19 Koch, Gertraud; Franken, Lina: Automatisierungspotenziale in der qualitativen Diskursanalyse. Das Prinzip des „Filterns“. In: Sahle, Patrick (ed.): 6. Tagung des Verbands Digital Humanities im deutschsprachigen Raum e.V. (DHd 2019). Digital Humanities: multimedial & multimodal. https://github.com/benadelm/hermA-Crawler 37 Universitäten zu Mainz und Frankfurt, March 25 to 29, 2019. Book of Abstracts, pp. 89–91. DOI: 10.5281/zenodo.2596095 Biographies: Benedikt Adelmann is a computer scientist at the University of Hamburg. Lina Franken is a cultural anthropologist at the University of Hamburg. Together, they are working within the collaborative research project “Automated modelling of hermeneutic processes – The use of annotation in social research and the humanities for analyses on health (hermA)”. See https://www.herma.uni-hamburg.de/en.html. Metadata for social science research Andrea Prokopová (Webarchiv, National Library of the Czech Republic) Keywords: web archiving, metadata, big data, social sciences, data mining ABSTRACT The Czech web archive of National Library of the Czech Republic (Webarchiv) is one of the oldest in Europe (since 2000). It is therefore able to provide methodological support to new web archives and also has a large amount of harvested data. However, data cannot be provided due to copyright. At least there is the opportunity to use metadata of harvested web resources. Two years ago, sociologists from the Academy of Sciences of the Czech Republic showed interest in the data for their research. This started their cooperation with the Czech web archive and also with the Technical University in Pilsen. These three institutions are currently working together to Development of the Centralized Interface for the Web content and Social Networks Data Mining. The data sets that researchers prepare on their own using the interface can be used for various data analysis and interpretation of social trends and changes in the Internet environment. In the first phase of the project, a basic analysis of the content of the web archive took place. This revealed that the web archive contains nearly 9 and a half billion unique digital objects. These can be text, image, audio and video objects, or other digital objects (software, scripts, etc.). The analysis provided accurate information on how many objects are in the Webarchive with the current size. 38 The next phase was the programming work itself. There is already a prototype of the search engine that is in the process of internal testing. Bibliography: BRÜGGER, Niels, Niels Ole FINNEMANN, 2013. The Web and digital humanities: Theoretical and methodological concerns. Journal of Broadcasting & Electronic Media [online]. 2013, s. 66- 80. ISSN 1550-6878. Dostupné z: http://thelecturn.com/wp-content/uploads/2013/07/The-web- and-digital-humanities-Theoretical-and-Methodological-Concerns.pdf KVASNICA, Jaroslav, Marie HAŠKOVCOVÁ a Monika HOLOUBKOVÁ. Jak velký je Webarchiv? E-zpravodaj Národní knihovny ČR [online]. Praha: Národní knihovna ČR, 2018, 5(5), 6 - 7 [cit. 2020-01-22]. Dostupné z: http://text.nkp.cz/o-knihovne/zakladni- informace/vydane-publikace/soubory/ostatni/ez_2018_5.pdf KVASNICA, Jaroslav, Andrea PROKOPOVÁ, Zdenko VOZÁR a Zuzana KVAŠOVÁ. Analýza českého webového archivu: Provenience, autenticita a technické parametry. ProInflow [online]. 2019, 11(1) [cit. 2020-01-22]. DOI: 10.5817/ProIn2019-1-2. ISSN 1804-2406. Dostupné z: http://www.phil.muni.cz/journals/index.php/proinflow/article/view/2019-1-2 Webarchiv: O Webarchivu [online]. Praha, 2015 [cit. 2020-01-22]. Dostupné z: https://www.webarchiv.cz/cs/o-webarchivu Biography I work as data analyst at Czech Webarchive and also in a project called Centralized Interface for the Web content and Social Networks Data Mining. Our goal is to provide datasets of metadata to scientists from humanities especially sociologists for their future research and data analýzy. Webarchiv is a part of NationaI Library of the Czech Republic. We harvest and archive all web sources with the Czech domain. I study Library studies and information science at Masaryk University, so I currently work in my field. I am a typical book worm with a creative soul and a passion for photography. Exploring Web Archive Networks: The Case of the 2018 Irish Presidential Election Dr. Derek Greene (University College Dublin) Keywords: web archives, network analysis, data analysis, case study https://www.webarchiv.cz/cs/o-webarchivu 39 ABSTRACT The hyperlink structure of the Web can be used not only for search, but also to analyse the associations between websites. By representing large collections of web pages as a link network, researchers can apply existing methodologies from the field of network analysis. For web archives, we can use these methods to explore their content, potentially identifying meaningful historical trends. In recent years the National Library of Ireland (NLI) has selectively archived web content covering a variety of political and cultural events of public interest. In this work, we analyse an archive of websites pertaining to the 2018 Irish Presidential Election. The original archive consists of a total of 57,065 HTML pages retrieved in 2018. From this data we extracted all links appearing in these pages and mapped each link to a pair of domains. For our case study, we focus only on pairs of domains for which both the source and target are distinct, yielding 28,555 relevant domain pairs. Next, we created a directed weighted network representation. In this network, each node is a unique domain. Each edge from node A to node B indicates that there are one or more links in the pages on domain A pointing to domain B. Each edge also has a weight, indicating the number of links between two domains. This yielded a network with 263 nodes and 284 weighted directed edges. Using network diagrams generated on this data, we can visualise the link structure around the sites used to promote each presidential candidate, and how they relate to one another. This work highlights the potential insights which can be gained by using network analysis to explore web archives. These include the possible impact on collection development in the NLI selective web archive and the further study of the archived Irish web. Biography: Dr. Derek Greene is Assistant Professor at the UCD School of Computer Science and Research Investigator at the SFI Insight Centre for Data Analytics. He has over 15 years’ experience in AI and machine learning, with a PhD in Computer Science from Trinity College Dublin. He is involved in a range of interdisciplinary projects which involve applying machine learning methods in fields such as digital humanities, smart agriculture, and political science. 40 Session 7: Collaborations & Teaching IIPC: training, collecting, research, and outreach activities Dr. Olga Holownia (International Internet Preservation Consortium / British Library) Keywords: web archiving, web archiving training, collaborative collections, Covid-19 web archive collections, web archiving resources ABSTRACT The basis of founding the International Internet Preservation Consortium (IIPC) in 2003 was the acknowledgement of “the importance of international collaboration for preserving Internet content for future generations”. Over the years, the IIPC members have worked together on multiple technical, curatorial, and educational activities. They have developed standards and supported open source web archiving tools and software. The annual General Assembly (GA) and Web Archiving Conference (WAC) have provided a forum for exchanging knowledge and forging new collaborations not only within the IIPC but also within the wider web archiving community and beyond. This talk will give an update on the most recent activities, including the IIPC funded projects as well as initiatives led by the working groups: training, collecting, and research, all of which fall under membership engagement and outreach overseen by the IIPC Portfolios. One of the key initiatives this year has been the “Novel Coronavirus (Covid-19) outbreak” transnational collection coordinated by the IIPC Content Development Group and organised in partnership with the Internet Archive. Over 9000 sites from over 140 countries and over 160 top level domains were made available through Archive-It seven months after the collection was launched in February 2020. We have also been publishing blog posts documenting the IIPC members’ efforts at capturing and archiving web content related to the pandemic within the national domains. This year also saw the publication of training materials designed and produced by the IIPC Training Working Group in partnership with the Digital Preservation Coalition. The first module comprising eight sessions, is aimed at curators, policy makers and managers or those who would like to learn about the basics of web archiving, including what web archives are, how they work, 41 and how web archive collections are curated. The programme helps acquire basic skills in capturing web archive content, but also how to plan and implement a web archiving programme. In terms of research activities, alongside the repository of web archiving resources at the University of North Texas (UNT) Digital Library and enhancing the metadata in the Zotero bibliography, we have been promoting the outcomes of the IIPC funded projects through a series of webinars organised by the Research Working Group. Among the funded projects are a set of introductory Jupyter Notebooks developed by Tim Sherratt, the creator of the GLAM Workbench, and LinkGate, a tool for graph visualisation of web archives aided by an inventory of use cases. The former project was led by the UK Web Archive based at the British Library, in partnership with the Australian and the New Zealand web archives, the latter is a collaboration between Bibliotheca Alexandrina and the National Library of New Zealand. References About IIPC: https://netpreserve.org/about-us IIPC Working Groups: https://netpreserve.org/about-us/working-groups IIPC Projects: https://netpreserve.org/projects IIPC General Assembly and Web Archiving Conference: https://netpreserve.org/general- assembly IIPC collections in the UNT Digital Library: https://digital.library.unt.edu/explore/partners/IIPC IIPC members’ COVID-19 collections: https://netpreserveblog.wordpress.com/tag/covid-19- collection “Novel Coronavirus (Covid-19) outbreak” collaborative collection: https://archive- it.org/collections/13529 Biography Olga Holownia is Programme and Communications Officer based at the British Library. She manages the communications and provides support to the programmes of the International Internet Preservation Consortium (netpreserve.org). Her key projects include the organisation of the annual IIPC General Assembly and Web Archiving Conference as well as associated training and events. She is a co-chair of the IIPC Research Working Group. https://netpreserve.org/about-us https://netpreserve.org/about-us/working-groups https://netpreserve.org/projects https://netpreserve.org/general-assembly https://netpreserve.org/general-assembly https://digital.library.unt.edu/explore/partners/IIPC https://netpreserveblog.wordpress.com/tag/covid-19-collection https://netpreserveblog.wordpress.com/tag/covid-19-collection https://archive-it.org/collections/13529 https://archive-it.org/collections/13529 42 Using Web Archives to Teach and Opportunities in the Information Science Field Dr. Juan-José Boté (Universitat de Barcelona) Keywords: digital preservation, teaching, web archives, emulator, archiving software ABSTRACT Web archives are a useful tool for teaching different subjects to students, not only for history but also for teaching courses such as digital preservation, information architecture, or metadata structures. The digital preservation of web archives offers a unique set of challenges when teaching students about information science. The first one is teaching about search strategies. Web archives have specific search tools and it is necessary to develop search strategies before beginning any search. For instance, one of the main challenges for students is in learning how to look for information through collections or looking for a precise website. Secondly, in addition to search strategies, the students need to learn how to find and use old software to run images, videos, or other informational content. Part of the search process includes checking whether the archived software was commercial and whether it is possible to use for free with some limitations. Therefore, to run old software which can be downloaded from web archives, sometimes it is also necessary to use emulators to run the old software. Emulators are not always found in web archives and may not be available and students must add a further step in order to run old software. In addition, when students set up archiving software, it is useful to know how it works. Testing the possibilities of archiving software is often kept to small scenarios because of the limitations of the course. Exposure to archiving software would permit students to learn the process of building small collections or creating new datasets of archived websites. In this paper I explore different uses of the information science field when using web archives as a resource for teaching, which is especially helpful in a digital preservation course. 43 Biography: Juan-José Boté is Assistant Professor at Universitat de Barcelona where he is also the coordinator of the Postgraduate Program on Social Media Content. His research is focused on digital preservation and cultural heritage. 44 Session 8: Research of Web Archives Web archiving - professionals and amateurs Bartłomiej Konopa (State Archives in Bydgoszcz; Nicolaus Copernicus University) Keywords: web archives, professional web archiving, amateurish web archiving, ArchiveTeam, comparative study ABSTRACT Web archiving can be defined as "any form of deliberate and purposive preserving of web material" (Brügger, 2011). That broad definition allows us to divide web archiving on numerous levels and distinguish many types of it. One of the possible distinctions is between professional and amateurish archiving. As professional archive one can treat big projects led mainly by national libraries, which employs experts and have strict regulations, like for example UK Web Archive and Danish Netarkivet. They are interested in national Webs and mainly preserve resources from one ccTLD in routine and repeatable crawls. Sometimes these archives build special collections, but very often they are predictable and related to "real world" events, for instance national elections. On the other side, as amateurish, one can recognize initiatives like ArchiveTeam, which are open for Internet users and does not have rigorous rules. They react to what is happening on the Web, observe endangered websites and services and try to preserve it. Their actions are spontaneous and disposable, but precisely aimed on the resources that would be lost. Both sides are trying to preserve web resources, because they consider them as digital heritage, which needs to be saved for the future generations. However, despite the mutual goal, professional and amateurish archives visibly differ in the way they function and materials they are interested as described above. The paper will search for these differences and analyse its influence on how and what will be archived, and then available for those, who want to experience and research the past Web. To reach this goal the author will compare UK Web Archive and Netarkivet with ArchiveTeam. Main source of information about these projects will be papers, news and their websites. The most important elements of these studies will be selection policy and criteria, scope, frequency and methods of archiving, and access rules. It will show differences in thinking about 45 Web, its border and ways of preserving and sharing this digital heritage. These factors will have also an impact on what resources will be available for later studies. Biography: Bartłomiej obtained his master's degree in archival science in 2007, currently he is a senior archivist at the State Archives in Bydgoszcz and a PhD student at the Nicolaus Copernicus University in Toruń (Poland). He is preparing a doctoral dissertation on Web archives, which are his main research interest. He collaborated with the web archiving lab "webArch", which is a pioneering project to popularize this issue in Poland. 46 Session 9: Research Approaches Digital archaeology in the web of links: reconstructing a late-90s web sphere Dr Peter Webster (Independent Scholar, Historian and Consultant) Keywords: web spheres, method, link graphs, link analysis, reconstruction ABSTRACT As interest in Web history has grown, so has the understanding of the archived Web as an object of study. But there is more to the Web than individual objects and sites. This paper is an exercise in understanding a particular ‘web sphere’. Niels Brügger defines a web sphere as ‘web material … related to a topic, a theme, an event or a geographic area’ (Brügger 2018). I posit a distinction between ‘hard’ and ‘soft’ web spheres, defined in terms of the ease with which their boundaries may be drawn, and the rate at which those boundaries move over time. Examples of hard web spheres are organisations that have clear forms of membership or association: eg. the websites of the individual members of the European Parliament. The study of ‘soft’ web spheres tends to present additional difficulties, since the definition of topics or themes is more difficult if not expressed in institutional terms. The definition of ‘European politics’ may be contested in ways that ‘membership of the European Parliament’ may not. I present a method of reconstructing just such a soft web sphere, much of which is lost from the live web and exists only in the Internet Archive: the web estate of conservative Christian campaign groups in the UK in the 1990s and early 2000s. The historian of the late 1990s has a problem. The vast bulk of content from the period is no longer on the live web; there are few, if any, indications of what has been lost – no inventory of the 1990s Web against which to check; of the content that was captured by the Internet Archive, only a superficial layer is exposed to full-text search, and the bulk may only be retrieved by a search for the URL. We do not know what was never archived, and in the archive it is difficult to find what we might want, since there is no means of knowing the URL of a lost resource. 47 We need, then, to understand the archived Web using only the technical data about itself that it can be made to disclose. This method of web sphere reconstruction is based not on page content but on the relationships between sites, i.e., the web of hyperlinks. The method is iterative, involving the computational interrogation of large datasets from the British Library and the close examination of individual archived pages, along with the use of printed and other non-digital sources. It builds upon recent studies which explore the available primary sources from outside the Web from which it may be reconstructed (Nanni 2017; Teszelszky 2019, Ben-David 2016; Ben-David 2019). It develops my earlier work in which the method was applied to smaller, less complex spheres (Webster 2017; Webster 2019). References: Ben-David, Anat. 2016. What does the Web remember of its deleted past? An archival reconstruction of the former Yugoslav top-level domain. New Media and Society 18, 1103-1119. https://doi.org/10.1177/1461444816643790 Ben-David, Anat. 2019. National web histories at the fringe of the Web: Palestine, Kosovo and the quest for online self-determination. In: The Historical Web and Digital Humanities: the Case of National Web domains, eds Niels Brügger & Ditte Laursen, 89-109. London: Routledge. Brügger, Niels. 2018. The archived Web: Doing history in the digital age. Cambridge, MA: MIT Press. Nanni, Federico. 2017. Reconstructing a website’s lost past: methodological issues concerning the history of Unibo.it. Digital Humanities Quarterly 11. http://www.digitalhumanities.org/dhq/vol/11/2/000292/000292.html Teszelszky, Kees. 2019. Web archaeology in The Netherlands: the selection and harvest of the Dutch web incunables of provider Euronet (1994–2000). Internet Histories 3, 180-194, DOI: 10.1080/24701475.2019.1603951 Webster, Peter. 2017. Religious discourse in the archived web: Rowan Williams, archbishop of Canterbury, and the sharia law controversy of 2008. In: The Web as History, eds Niels Brügger & Ralph Schroeder, 190-203. London: UCL Press. Webster, Peter. 2019. Lessons from cross-border religion in the Northern Irish web sphere: understanding the limitations of the ccTLD as a proxy for the national web. In: The Historical Web and Digital Humanities: the Case of National Web domains, eds Niels Brügger & Ditte Laursen, 110-23. London: Routledge. Biographies: Dr Peter Webster is an independent scholar and consultant, and founder and managing director of Webster Research and Consulting (UK). He has published widely on the use of Web archives for contemporary history. http://www.digitalhumanities.org/dhq/vol/11/2/000292/000292.html 48 Web defacements and takeovers and their role in web archiving Michael Kurzmeier (Maynooth University) Keywords: defaced websites; hacktivism; cybercrime archives; Geocities; web archives ABSTRACT This paper will provide insight into the archiving and utilization of defaced websites as ephemeral, non-traditional web resources. Web defacements as a form of hacktivism are rarely archived and thus mostly lost for systematic study. When they find their way into web archives, it is often more as a by-product of a larger web archiving effort than as the result of a targeted effort. Aside from large collections such as Geocities, which during a crawl might pick up a few hacked pages, there also exists a small scene of community-maintained cybercrime archives that archive hacked web sites, some of which are hacked in a hacktivist context. By examining sample cases of cybercrime archives, the paper will show the ephemerality of their content and introduce a framework for analysis. As more and more of our daily communication happens digitally, marginalized and counter-public groups have often used the new media to overcome real-world limitations. This phenomenon can be traced back to the early days of the Web. This paper will provide an overview of defacements on the web and show the role web archives play in understanding these phenomena. Web defacements are ephemeral content and as such especially prone to link rot and deletion. They can provide not only information on the history of a single web page; they can also be seen as artifacts of a struggle for attention. Contextualized with metadata and the original page, defacements can add help restore such lost histories. The current state, however, is that only a number of collections are still online with only one collection still accepting new material and none being in a condition to be used for academic research. Finding relevant defacements in collections like the mentioned is a challenge, especially since there is little conformity in terms of content, language and layout between people hacking websites. The paper will introduce different approaches to methodology for identifying defacements and related pages. Biography: Michael Kurzmeier is a fourth-year PhD candidate in Digital Humanities and recipient of the Irish Research Council Postgraduate Scholarship. His research interest is the intersection between 49 technology and society. His PhD thesis investigates the use of hacktivism as a tool of political expression. The research is grounded in an understanding of a contested materiality of communication, in which hacktivism is one method to occupy contested space. Michael is working with Kylie Jarrett (MU Media Studies) and Orla Murphy (UCC Digital Humanities). ORCID: https://orcid.org/0000-0003-4925-5197. https://orcid.org/0000-0003-4925-5197 50 Session 10: Culture & Sport MyKnet.org: Traces of Digital Decoloniality in an Indigenous Web-Based Environment Dr. Philipp Budka (University of Vienna; Free University Berlin) Keywords: MyKnet.org, indigenous web-based environment, digital decoloniality, internet history, anthropology ABSTRACT This paper discusses traces of digital decoloniality (e.g., Deem 2019) by exploring the history of the indigenous web-based environment MyKnet.org. By considering the cultural and techno-social contexts of First Nations' everyday life in Northwestern Ontario, Canada, and by drawing from ethnographic fieldwork (e.g., Budka 2015, 2019), it critically reviews theoretical accounts and conceptualizations of change and continuity that have been developed in an anthropology of media and technology (e.g., Postill 2017). In so doing, it examines how techno-social change and cultural continuity can be conceptualized in relation to each other and in the context of (historical) processes of digital decoloniality. In 1994, the tribal council Keewaytinook Okimakanak (KO) established the Kuh-ke-nah Network (KO-KNET) to connect indigenous people in Northwestern Ontario' remote communities through and to the internet. At that time, a local telecommunication infrastructure was almost non-existent. KO-KNET started with a simple bulletin board system that developed into a community-controlled ICT infrastructure, which today includes landline and satellite broadband internet as well as internet-based mobile phone communication. Moreover, KO-KNET established services that became widely popular among the local indigenous communities such as the web-based environment MyKnet.org. MyKnet.org was set up in 1998 exclusively for First Nations people to create and maintain personal homepages within a cost- and commercial-free space on the web. Particularly between 2004 and 2008, MyKnet.org used to be extremely popular mainly because of two reasons. First, MyKnet.org enabled people to establish and maintain social relationships across spatial distance in an 51 infrastructurally disadvantaged region. They communicated through homepage’s communication boxes and they linked their homepages to the pages of family members and friends. Creating thus a “digital directory” of indigenous people in Northwestern Ontario. Second, MyKnet.org contributed to different forms of cultural representation and identity construction. Homepage producers utilized the service to represent and negotiate their everyday lives by displaying and sharing pictures, music, texts, website layouts, and artwork. During fieldwork in Northwestern Ontario (2006-2008), many people told me stories about their first MyKnet.org websites in the early 2000s and how they evolved. People vividly described how their homepages were designed and structured and to which other websites they were linked. To deepen my interpretation and understanding of these stories, I used the Internet Archive's Wayback Machine to recover archived versions of these websites whenever possible. Thus, the Wayback Machine became an important methodological tool for my research into the decolonial history of MyKnet.org and related practices and processes of techno-social change and cultural continuity. References: Budka, P. (2019). Indigenous media technologies in “the digital age”: Cultural articulation, digital practices, and sociopolitical concepts. In S. S. Yu & M. D. Matsaganis (Eds.), Ethnic media in the digital age (pp. 162-172). New York: Routledge. Budka, P. (2015). From marginalization to self-determined participation: Indigenous digital infrastructures and technology appropriation in Northwestern Ontario's remote communities. Journal des Anthropologues, 142-143(3), 127–153. Deem, A. (2019). Mediated intersections of environmental and decolonial politics in the No Dakota Access Pipeline movement. Theory, Culture & Society, 36(5), 113–131. Postill, J. (2017). The diachronic ethnography of media: From social changing to actual social changes. Moment. Journal of Cultural Studies, 4(1), 19–43. Biography: Philipp Budka is a Lecturer in the Department of Social and Cultural Anthropology, University of Vienna, and the M.A. program Visual and Media Anthropology at the Free University Berlin. His research areas include digital anthropology and ethnography, the anthropology of media and technology as well as visual culture and communication. He is the co-editor of Ritualisierung – Mediatisierung – Performance (Vienna University Press, 2019) and Theorsising Media and Conflict (Berghahn Books, in press). His research has also been published in journals and books such as Journal des Anthropologues, Canadian Journal of Communication and Ethnic Media in the Digital Age (Routledge, 2019). 52 From the sidelines to the archived web: What are the most annoying football phrases in the UK? Helena Byrne (British Library) Keywords: Football, Annoying Football Phrases, Shine, UK Web Archive, Web Archive Case Study ABSTRACT As the news and TV coverage of football has increased in recent years, there has been growing interest in the type of language and phrases used to describe the game. Online, there have been numerous news articles, blog posts and lists on public internet forums on what are the most annoying football clichés. However, all these lists focus on the men’s game and finding a similar list on women’s football online was very challenging. Only by posting a tweet with a survey to ask the public “What do you think are the most annoying phrases to describe women’s football?” was I able to collate an appropriate sample to work through. Consequently, the lack of any such list in a similar format highlights the issue of gender inequality online as this is a reflection of wider society. I filtered a sample of the phrases from men’s and women’s football to find the top five most annoying phrases. I then ran these phrases through the UK Web Archive Shine interface to determine their popularity on the archived web. The UK Web Archive Shine interface was first developed in 2015, as part of the Big UK Domain Data for the Arts and Humanities project. This presentation will assess how useful the Trends function on the Shine interface is to determine the popularity of a sample of selected football phrases from 1996 to 2013 on the UK web. The Shine interface searches across 3,520,628,647 distinct records from .uk domain, captured from January 1996 to the 6th April 2013. This paper goes through the challenges of using the Shine interface to determine: what are the most annoying football phrases on the archived UK web. By using this example, it highlights how working with this resource differs from working with digitised publications and what strategies can be employed to gain meaningful answers to research questions. It is hoped that the findings 53 from this study will be of interest to the footballing world but more importantly, encourage further research in sports and linguistics using the UK Web Archive. References: Helena Byrne. (2018). What do you think are the most annoying phrases to describe women’s football??https://footballcollective.org.uk/2018/05/18/what-do-you-think-are-the-most-annoying- phrases-to-describe-womens-football/ (Accessed August 26, 2018) Andrew Jackson. (2016). Introducing SHINE 2.0 – A Historical Search Engine. Retrieved from: http://blogs.bl.uk/webarchive/2016/02/updating-our-historical-search-service.html (Accessed August 26, 2018) Biography: Helena Byrne is the Curator of Web Archives at the British Library. She was the Lead Curator on the IIPC CDG 2018 and 2016 Olympic and Paralympic collections. Helena completed her Master’s in Library and Information Studies at UCD in 2015. Previously she worked as an English language teacher in Turkey, South Korea and Ireland. 54 Session 11: Research (Lightning Round) Tracking and Analysing Media Events through Web Archives Caio de Castro Mello Santos (School of Advanced Study, University of London) Daniela Cotta de Azevedo Major (School of Advanced Study, University of London) Keywords: Digital Humanities; Media Events; Web Archives; Discourse Analysis ABSTRACT Throughout the last two decades, media outlets have grown more reliant on online platforms to spread news and ideas. Web Archives are a valuable tool to analyse the recent past as well as the present social and political context. However, the use of Web Archives to conduct research can be challenging due to the amount of data and its access limits. This project aims to develop mechanisms to extract, process and analyse data in order to provide scholars with a model to explore the impact of massive media events in the last couple decades. Two events have been taken as case studies: The London 2012 and Rio 2016 Olympics and the European Parliamentary Elections from 2004 to 2019. Regarding the Olympics, we aim to understand how online media have described the legacies of the London 2012 and Rio 2016 Olympics and how the choices made by the gatekeepers (news editors, journalists) influence the narrative about the consequences of both events. Whereas the study of the media coverage of the European elections can shed light on how political concepts such as nationalism and integration have an impact on the European public opinion and its attitudes towards European Institutions. Given the geographical and the temporal range of these projects, we will focus on different yet complementary Web Archives initiatives such as the Internet Archive, the UK Web Archive and Arquivo.pt. This project is being developed as part of the Cleopatra Training Network under a PhD in Digital Humanities. Therefore, this research is combining traditional methods such as Discourse Analysis through a qualitative close reading with quantitative computational methods through distant reading. This approach aims to provide examples of how to apply this type of data to the interpretative methodologies of the Social Sciences. 55 Biographies: Daniela Major: Early Stage Researcher at School of Advanced Study. Her doctoral project is on the Media coverage of the European Elections 2004-2019. She holds a master of letters in Intellectual History from the University of Saint Andrews and is a former research fellow at Arquivo.pt. Caio Mello: Early Stage Researcher at the School of Advanced Study/University of London. Journalist with a master’s in communication (UFPE – Brazil). Former research fellow at the Center for Advanced Internet Studies (CAIS - Germany). Reanimating the CDLink platform: A challenge for the preservation of mid- 1990s Web-based interactive media and net.art Dr. Eamonn Bell (Trinity College Dublin) Keywords: compact disc, Web, preservation, music, interactive multimedia ABSTRACT The Voyager Company realised the creative and commercial potential of mixed-mode CD-ROMs as the platform par excellence for interactive multimedia. Starting in 1989, with the release of a HyperCard-based interactive listening guide for Beethoven's Symphony No. 9, Voyager tightly integrated rich multimedia, hyperlinked text, and high quality audiovisual recordings into over 50 software releases for Mac and PC well into the late 1990s. Consolidating their expertise in computer-controlled optical media with Laserdics, Voyager developed AudioStack: a set of extensions for the HyperCard environment that allowed fine-grained software control of high- fidelity audio stored on conventional optical media. AudioStack led to a cross-platform technology designed for use on the web called CDLink, comprising CD-ROM controller drivers, extensions for Macromedia Shockwave and the plain-text Voyager CDLink Control Language. CDLink enabled and inspired commercial ventures and amateur productions alike, such as Sony Music's short lived ConnecteD experiment, the small but dedicated community of fan-sites that published time-synced lyric pages alongside hyperlinked commentaries for popular records, and even experimental sonic net.art in Mark Kolmar's Chaotic Entertainment (1996). As Volker Straebel (1997) has pointed out, Kolmar's work used CDLink files to probabilistically remix and 56 loop the contents of the user's own CD collection in code, evincing similar tactics of creation by contemporary experimental musicians and sound artists. Owing to the mostly obsolete hardware and software dependencies of the CDLink platform and the challenges posted by the fading born- digital traces of the mid-1990s Web, CDLink-dependent artifacts create difficulties for preservation and access. I summarise the above-mentioned developments that culminated in CDLink and describe the challenges of preserving Kolmar's artwork and making it available for future audiences, as well as those of the larger so-called "extended CD" ecosystem, which flourished during this decade. Biography: Eamonn Bell is a Research Fellow at the Department of Music, Trinity College Dublin. His current research focus is on the cultural history of the digital Audio CD format told from a viewpoint between musicology and media studies. In 2019, Eamonn was awarded a Government of Ireland Postdoctoral Fellowship in support of this two-year project, ‘Opening the “Red Book”’. He holds a doctorate in music theory from Columbia University (2019), where he wrote a dissertation on the early history of computing in the analysis of musical scores. He also holds a bachelor's degree in music and mathematics from Trinity College Dublin (2013). His research engages the history of digital technology as it relates to musical production, consumption, and criticism in the twentieth century. Curating culturally themed collections online: The 'Russia in the UK' Special Collection, UK Web Archive Hannah Connell (King’s College London; British Library) Keywords: Curatorship, diaspora, media, community, web archiving ABSTRACT The researcher-curated special collection, Russia in the UK, is part of the UK Web Archive, hosted by the British Library. This collection comprises a selection of websites created for and by the Russian-speaking population in the UK. This paper will explore the challenges for creating and maintaining web archival collections. I will discuss difficulties in determining the parameters of this special collection. Alongside the impact 57 of the single-curatorial voice in shaping a collection, this paper will address the ways in which the legal and technical infrastructure underlying web archiving affects the shape of a collection. I will examine how the decision-making process behind curating and expanding this collection encourages reflection on the specific cultural context of Russian migration to the UK and complicates the notion of a culturally-themed diaspora collection. The Russia in the UK special collection is public but still growing. This collection is valuable for researchers both as a resource for further research, and as a means of questioning research practices. The practice of creating and maintaining a special collection such as the Russia in the UK collection influences the shape of the collection and the online representation of the diasporic community it reflects. This paper will examine how the ongoing process of research and selection can be broadened to include new curators. I will discuss the ways in which a broader community can be involved in the curation process and the development of this special collection in the future. Biography: Hannah is undertaking an AHRC funded collaborative PhD studentship with the British Library and King’s College London exploring interwar migration from Russia through Russian-language émigré publishing. The selection of the content for the UKWA ‘Russia in the UK’ special collection forms part of this research, reflecting the ways in which diasporic communities continue to preserve and contribute to a shared identity though new forms of media today. 58 Session 12: Youth & Family DELETE MY ACCOUNT: Ethical Approaches to Researching Youth Cultures in Historical Web Archives Katie Mackinnon (University of Toronto) Keywords: web history, web archives, research ethics, youth cultures, 1990s web ABSTRACT Over the past 25 years the web has become an “unprecedentedly rich primary source…it is where we socialise, learn, campaign and shop. All human life, as it were, is vigorously there” (Winters, 2017). Web archives, as an increasingly important resource for writing social, cultural, political, economic, and legal histories, pose new challenges for historians who must learn how to “navigate this sea of digital material” (Milligan, 2012). Throughout these past few decades, young people have been a focus of digital cultures and participation (Turkle, 1995; Kearney, 2006; Scheidt, 2006; Ito et al., 2010; boyd, 2014; Vickery, 2017; Watkins et al., 2018). The early web communities of GeoCities that are available on the Internet Archive are a unique and incredibly fruitful resource for studying youth participation in the early web (Milligan, 2017) in a way that gives youth voices autonomy and agency. New challenges emerge when applying computational methodologies and tools to youth cultures in historical web archives at scale. This paper considers the challenges in: 1) researching and writing about the phenomenon of young people divulging personal details about their lives without the possibility of informed consent; 2) accurately contextualizing web pages within wider online communities and; 3) engaging with socio-political climates that young people were experiencing and exploring the Web that focuses on the intersections of race, gender, sexuality, class, geography, and cultural and social pressures. The EU’s “Right to be Forgotten” (2014) and GDPR (2018) call into question the regularity with which young people become “data subjects” through their proximity to social networking sites, either through family, friends or themselves. Young people’s data is subject to commodification, surveillance, and archiving without consent. Researchers engaging with historical web material have a responsibility to develop better practices of care. This paper further develops frameworks 59 to ethically research young people’s historical web content in digital archives that accounts for the sensitive nature of web materials (Adair, 2018; Eichhorn, 2019), lack of consent protocols available to historical web researchers (Aoir IRE 3.0, 2019), and the ways in which computational methods and big data research attempts often fail to anonymize data (Brügger & Milligan, 2018). Web history research puts living human subjects at the forefront of historical research, which is something that historians are not particularly well-versed in. This paper surveys ethical approaches to internet and web archive research (Lomborg, 2018; Schäfer & Van Es, 2017; Whiteman, 2012; Weltevrede, 2016), identifies gaps in studying historical web youth cultures and suggests next steps. Works Cited: Adair, Cassius. (2019). “Delete Yr Account: Speculations on Trans Digital Lives and the Anti- Archival.” Digital Research Ethics Collaboratory. http://www.drecollab.org/ Brugger, Niels and Ian Milligan. (2018). The SAGE Handbook of Web History. London: Sage. Bruckman, Amy, Kurt Luther, and Casey Fiesler. 2015. “When Should We Use Real Names in Published Accounts of Internet Research?,” in Eszter Hargittai and Christian Sandvig (eds) Digital research confidential: the secrets of studying behavior online. Cambridge, Mass: MIT Press. DiMaggio, P., E. Hargittai, C. Celeste and S. Shafer. (2004). “Digital inequality: From unequal access to differentiated use.” In Social Inequality, ed. K. Neckerman. Russel Sage Foundation. Eichhorn, Kate (2019). The end of forgetting: growing up with social media. Cambridge, Mass: Harvard University Press. franzke, a.s., Bechmann, A., Zimmer, M. & Ess, C.M. (2019) Internet Research: Ethical Guidelines 3.0, Association of Internet Researchers, www.aoir.org/ethics. Ito et al. (2010). Hanging Out, Messing Around, and Geeking Out: Kids Living and Learning with New Media. MIT Press. Jenkins, H., M. Ito, and d. boyd. (2016). Participatory Culture in a Networked Era: A Conversation on Youth, Learning, Commerce, and Politics. Polity. Kearney, M. C. (2006). Girls Make Media. Routledge. Kearney, M. C. (2007). “Productive spaces girls’ bedrooms as sites of cultural production spaces.” Journal of Children and Media, 1, 126-141. Lincoln, S. (2013). “I’ve Stamped My Personality All Over It”: The Meaning of Objects in Teenage Bedroom Space.” Space and Culture, 17(3), 266–279. 60 Lomborg, Stine. (2018). “Ethical Considerations for Web Archives and Web History Research,” in SAGE Handbook of Web History, eds. Niels Brügger and Ian Milligan. Milligan, Ian. (2017). “Pages by Kids, For Kids”: Unlocking Childhood and Youth History through Web Archived Big Data,” in The Web as History, eds. Niels Brügger and Ralph Schroeder, UCL Press. Schäfer, Mirko Tobias, and Karin Van Es. (2017). The datafied society: studying culture through data. Amsterdam University Press. Scheidt, L. A. (2006.) “Adolescent diary weblogs and the unseen audience,” in Digital Generations: Children, Young People, and New Media, ed. D. Buckingham and R. Willet. Erlbaum. Skelton T. and Valentine G. (1998). Cool Places: Geographies of Youth Cultures. Routledge. Turkle, Sherry. (1995). Life on the Screen: Identity in the Age of the Internet, Simon and Schuster. van Dijck, José, Thomas Poell, and Martijn de Waal. (2018). The Platform Society; Public Values in a Connective World. New York: Oxford University Press. Vickery, J. R. (2017). Worried about the wrong things: Youth, risk, and opportunity in the digital world. Cambridge, MA: MIT Press. Watkins, S. C. et. al. (2018). The Digital Edge: How Black and Latino Youth Navigate Digital Inequality. NYU Press. Weltevrede. Esther. (2016). Repurposing digital methods. The research affordances of platforms and engines. PhD Dissertation, University of Amsterdam Whiteman, Natasha. (2012). “Ethical Stances in (Internet) Research,” in Undoing Ethics, by Natasha Whiteman, 1–23. Boston, MA: Springer US, 2012. Winters, Jane. (2017) “Breaking in to the mainstream: demonstrating the value of internet (and web) histories,” Internet Histories, 1:1-2, 173-179. Biography: Katherine (Katie) Mackinnon is a Ph.D. candidate at the University of Toronto in the Faculty of Information. She researches web histories, including early uses of the internet by young people in the 1990s through a case study of the popular website, ‘GeoCities’. She is particularly interested in using web archives to conduct historical work, focusing on youth expressions of identity and community within their specific socio-political contexts. 61 Changing platforms of ritualized memory practices. Assessing the value of family websites Dr. Susan Aasman (University of Groningen) Keywords: web archives, vernacular culture, amateur media, web archaeology, technologies of memory ABSTRACT In this presentation I want to introduce research on current personal digital archival practices, as they have shifted from private spaces to more public platforms. I would especially like to discuss the value of concrete everyday practices of storing and sharing multimodal family records on late nineties/early 21st century family web sites. In addition, I will address the vulnerability of these archival practices, introducing a casus of a particular family web site hosted by the famous Dutch provider XS4all who will close its service permanently. Although the National Library of the Netherlands (KB) started to collect XS4all websites, when it comes to selecting and preserving online personal archives, there is still a need to raise awareness about these deeply meaningful memory practices. For one, these type of practices of memory staging do have a history that is much older that the history of the web suggests; they belong to a long durée history of technologies of memory production and distribution. At the same time, understanding these family oriented websites as designed in the nineties and early 200s gives us an excellent opportunity to understand the specificities of the shift from private to public, and from analogue to digital. This research is part of larger agenda that addresses the urgent issue of long-term preservation of amateur media and how technological, political, social and cultural factors influence how we appraise and archive the often ephemeral nature of amateur media expressions. In particular, digital material poses multiple challenges, one of them the sustainability of many forms and formats of amateur media. The challenge is a shared task of public cultural heritage institutions, commercial, scholars and individuals alike. The archival strategies and the choices of what to keep and what to delete may resonate for decades to come. The presentation will argue that the complexities and contradictions that characterize present-day amateur media culture are mirrored by and reproduced in the complexities and contractions of archiving digital memories. There are no simple solutions and there are no simple guidelines, as amateur media archives – whether personal or collective or 62 whether they are analogue or digital - have been caught up in ethical, emotional, commercial, political contested areas and bear the burden of being technological, material, and personal. Biography: Dr. Susan Aasman is associate professor at the Centre for Media and Journalism Studies and Director of the Centre of Digital Humanities at the University of Groningen (NL). Her field of expertise is in media history, with a particular interest in amateur film and documentaries, digital cultures and digital archives, web history and digital history. She was a senior researcher in the research project ‘Changing Platforms of Ritualised Memory Practices: The Cultural Dynamics of Home Movie Making’. Together with Annamaria Motrescu-Mayes, she is the co-author of Amateur Media and Participatory Culture: Film, Video and Digital Media (Routledge 2019). Recently she started working on web archival and web historical projects. She co-edited – together with Kees Teszelszky and Tjarda de Haan - a special issue on Web Archaeology for the journal TMG/Journal for Media History (https://www.tmgonline.nl/). https://www.tmgonline.nl/ 63 Session 13: Source Code and App Histories Platform and app histories: Assessing source availability in web archives and app repositories Dr. Anne Helmond (University of Amsterdam) Fernando van der Vlist (Utrecht University) Keywords: platforms, apps, web historiography, web archiving, app archiving ABSTRACT In this presentation, we discuss the research opportunities for historical studies of apps and platforms by focusing on their distinctive characteristics and material traces. We demonstrate the value and explore the utility and breadth of web archives and software repositories for building corpora of archived platform and app sources. Platforms and apps notoriously resist archiving due to their ephemerality and continuous updates. As a result of rapid release cycles that enable developers to develop and deploy their code very quickly, large web platforms such as Facebook and YouTube change continuously, overwriting their material presence with each new deployment. Similarly, the pace of mobile app development and deployment is only growing, with each new software update overwriting the previous version. As a consequence, their histories are being overwritten with each update, rather than written and preserved. In this presentation, we consider how one might write the histories of these new digital objects, despite such challenges. When thinking of how platforms and apps are archived today, we contend that we need to consider their specific materiality. With the term materiality, we refer to the material form of those digital objects themselves as well as the material circumstances of those objects that leave material traces behind, including developer resources and reference documentation, business tools and product pages, and help and support pages. We understand these contextual materials as important primary sources through which digital objects such as platforms and apps write their own histories with web archives and software repositories. We present a method to assess the availability of these archived web materials for social media platforms and apps across the leading web archives and app repositories. Additionally, we conduct a comparative source set availability analysis to establish how, and how well, various source sets 64 are represented across web archives. Our preliminary results indicate that despite the challenges of social media and app archiving, many material traces of platforms and apps are in fact well preserved. The method is not just useful for building corpora of historical platform or app sources but also potentially valuable for determining significant omissions in web archives and for guiding future archiving practices. We showcase how researchers can use web archives and repositories to reconstruct platform and app histories, and narrate the drama of changes, updates, and versions. Biographies: Anne Helmond is an assistant professor of New Media and Digital Culture at the University of Amsterdam. Her research interests include software studies, platform studies, app studies, digital methods, and web history. Fernando van der Vlist is a PhD candidate at Utrecht University and a research associate with the Collaborative Research Centre “Media of Cooperation” at the University of Siegen. His research interests include software studies, digital methods, social media and platform studies, app studies, and critical data studies. Exploring archived source code: computational approaches to historical studies of web tracking Dr. Janne Nielsen (Aarhus University) Keywords: archived source code: computational approaches; historical studies; web tracking ABSTRACT This paper presents different ways of examining archived source code to find traces of tracking technologies in web archives. Several studies have shown a prolific use of tracking technologies used to collect data about web users and their behavior on the web (e.g. Altaweel, Good & Hoofnagle, 2015; Roesner, Kohno & Wetherall, 2012; Ayenson, Wambach, Soltani, Good & Hoofnagle, 2011; see also the review of existing tracking methods in Bujlow, Carela-Espanol, Lee & Barlet-Ros, 2017). Tracking is used for a multitude of purposes from authorisation and personalisation over web analytics and optimisation to targeted advertising and social profiling. The extent of web tracking and the magnitude of data collected by powerful companies like 65 Facebook and Google have caused concerns about privacy and consent. To better understand the spread of tracking and the possible implications of the practices involved, it is important to study the development leading up to today. Most studies of web tracking study the current web but to study the historical development of tracking, we can turn to web archives. The distinctive nature of archived web as "reborn digital" (Brügger, 2018) means that a study using archived web must always address the specific characteristics of this source and the associated methodological issues (Brügger, 2018; Masanès, 2006; Schneider & Foot, 2004) but a study of tracking technologies in the archived web poses additional, new methodological challenges. Tracking technologies are part of what could be called the environment of a website (cf. Helmond, 2017) but it is not part of what is usually considered the 'content', which the web archives aim to collect and preserve (Rogers, 2013). Tracking can also depend on technologies that are often difficult to archive (e.g. content based on JavaScript, Flash or similar). None the less, it is still possible to find traces of tracking technologies in web archives. One approach, inspired by the work of Helmond (2017), is to study the archived source code of websites. This paper presents a study of tracking technologies on the Danish web from 2006 to 2015 as it has been archived in the Danish national web archive Netarkivet. The study experiments with computational methods to map the development of different tracking technologies (e.g. http cookies and web beacons). The paper discusses the main methodological challenges of the study and shows how a profound knowledge of the specific archive and the changes in archiving strategies and settings over time is necessary for such a study. References: Altaweel, I., Good, N., & Hoofnagle, C. J. (2015). “Web Privacy Census”. Technology Science. Ayenson, M. D., Wambach, D. J., Soltani, A., Good, N., & Hoofnagle, C. J. 2011. “Flash Cookies and Privacy II: Now with Html5 and Etag Respawning.” Ssrn.com. July 29. Bujlow, T., Carela-Espanol, V., Lee, B.-R., & Barlet-Ros, P. 2017. “A Survey on Web Tracking: Mechanisms, Implications, and Defenses”. Proceedings of the IEEE, 105(8), 1476–1510. Brügger, N. 2018. The Archived Web: Doing History in the Digital Age. Cambridge: MIT Press. Helmond, A. 2017. Historical website ecology: Analyzing past states of the web using archived source code. In N. Brügger (Ed.), Web 25: histories from the first 25 years of the World Wide Web (pp. 139–155). New York: Peter Lang. Masanès, J. 2006. Web Archiving: Issues and Methods. In J. Masanes (Ed.), Web Archiving (pp. 1–53). Springer. 66 Roesner, F., Kohno, T., & Wetherall, D. 2012. “Detecting and Defending Against Third-Party Tracking on the Web”. Presented at the 9th USENIX Symposium on Networked Systems Design. Rogers, R. 2013. Digital methods. Cambridge: MIT Press. Schneider, S. M. & Foot, K. A. 2004. “The Web as an Object of Study”. New Media & Society, 6(1), 114–122. Biography: Janne Nielsen is an Assistant Professor, PhD, in Media Studies and a board member of the Centre for Internet Studies at Aarhus University. She is part of DIGHUMLAB, where she is head of LARM.fm (a community and research infrastructure for the study of audio and visual materials) and part of NetLab (a community and research infrastructure for the study of internet materials). Her research interests include media history, cross media, web historiography, web archiving, web tracking, privacy and consent. 67 Session 14: AI and Infrastructures Cross-sector interdisciplinary collaboration to discover topics and trends in the UK Government Web Archive: a reflection on process Mark Bell (The National Archives, UK) Tom Storrar (The National Archives, UK) David Beavan (The Alan Turing Institute) Dr. Eirini Goudarouli (The National Archives, UK) Dr. Barbara McGillivray (The Alan Turing Institute) Dr. Federico Nanni (The Alan Turing Institute) Pip Willcox (The National Archives, UK) Keywords: Discovery, Machine Learning, Collaboration, Machine Assisted Exploration, Scale ABSTRACT This paper proposes a discussion of a collaboration between The National Archives and The Alan Turing Institute to use artificial intelligence technologies to enable the navigation and comprehension of the UK Government Web Archive (UKGWA) at scale. The National Archives are the official archive of UK government holding over 1000 years of history. Since 1996 The National Archives have been archiving UK government websites and social media output that are publicly accessible through the UKGWA. Users of the UKGWA can browse sites or use the very effective full text search service to find content in over 350 million documents (and counting). Search relies on keyword matching and is most effective when combined with domain knowledge, but most of our users don’t have this. There is currently no way to view the UKGWA as a whole or to group similar material together. Research into UKGWA users indicates they expect an “intuitive” search experience, allowing them to navigate this massive dataset, with search results surfacing relevant results. That type of search experience requires resource intensive data engineering and natural language processing methods that handle a high volume of queries, neither of which is currently available. 68 With The Alan Turing Institute, the national institute for data science and AI, we proposed a Data Study Group (DSG) to bring together experts from across and beyond academia to work on a data challenge for a week. Held in December 2019, the challenge focuses on discoverability of the UKGWA, applying advanced machine learning and natural language processing approaches to tasks such as creating a subject matter overview of the archive, machine assisted exploration, and identifying the emergence, growth, and decay of topics over time. This talk will explain the challenges that we face when it comes to explore, understand, analyse and interpret the UKGWA; will focus on the collaboration between The National Archives and The Alan Turing Institute; and will present the work of selection and preparation of data prior to the challenge, as well as the process and outcomes of the challenge week itself – what went well, what didn’t, what surprised us. We will also discuss next steps and how we will seek to implement the outcomes of this collaboration. This will include the challenges of turning a complex research prototype developed in a technical environment into something that can be practically integrated into the UKGWA interface to meet the needs of, and be understood by, our users. We would welcome the thoughts of conference participants on this work to date, including on how it can be made useful to researchers, web archives, and their users. Biographies: Mark Bell is Senior Digital Researcher at The National Archives. He has worked as researcher on the AHRC funded project Traces Through Time on which he developed statistical methodologies for record linkage, and on the ESPRC funded ARCHANGEL which explored the use of Distributed Ledger Technology to provide trust in archived born-digital material. Mark’s research interests cover a broad range of areas including Handwritten Text Recognition, Crowdsourcing, applications of Machine Learning to archival processes, and of course the challenges of working with large scale web archives. Tom Storrar is the Head of Web Archiving at The National Archives. He has led the Web Archive team for over 10 years, transforming the way that web archiving is performed. Tom has spoken at a number of international conferences about the challenges of web archiving. As well as the day to day challenges of maintaining the archive, he has also defined collection policies around web pages, social media accounts, and even code repositories, as well as managing the migration to cloud based archiving. David Beavan is Senior Research Software Engineer – Digital Humanities in the Research Engineering Group (also known as Hut 23) in The Alan Turing Institute. He has been working in the Digital Humanities (DH) for over 15 years, working collaboratively, applying cutting edge computational methods to explore new humanities challenges. He is Co-Investigator for two Arts and Humanities Research Council (AHRC) funded projects: Living with Machines and Chronotopic Cartographies, is Co-organiser of the Humanities and Data Science Turing Interest 69 Group and is Research Engineering's challenge lead for Data Science for Science (and also humanities) and Urban Analytics. Eirini Goudarouli is a member of the Research Team at The National Archives. Her current research interests include digital humanities and digital archives. She is particularly interested in bringing together methods and theories from a range of disciplines that could essentially contribute to the rethinking of digital, archival and collection-based research. Eirini is the Co-Investigator of the International Research Collaboration Network in Computational Archival Science (IRCN- CAS), funded by the Arts and Humanities Research Council. Barbara McGillivray is Turing Research Fellow at The Alan Turing Institute and the University of Cambridge. She has always been passionate about how Sciences and Humanities can meet. She completed a PhD in Computational Linguistics from the University of Pisa in 2010 after a degree in Mathematics and one in Classics from the University of Florence (Italy). Before joining the Turing, she was language technologist in the Dictionary division of Oxford University Press and data scientist in the Open Research Group of Springer Nature. Federico Nanni is a Research Data Scientist at The Alan Turing Institute, working as part of the Research Engineering Group, and a visiting fellow at the School of Advanced Study, University of London. He completed a PhD in History of Technology and Digital Humanities at the University of Bologna focusing on the use of web archives in historical research and has been a post-doc in Computational Social Science at the Data and Web Science Group of the University of Mannheim. He also spent time as a visiting researcher at the Foundation Bruno Kessler and the University of New Hampshire, working on Natural Language Processing and Information Retrieval. Pip Willcox is Head of Research at The National Archives. She has a background in digital editing and book history, focussing first on encoding medieval manuscripts and later on early modern printed books. More recently she has worked on projects linking collections and semantic web technologies, and social machines. She has developed a framework for an experimental humanities, using digital simulation to close-read and explicate interpretation of the archive. Her focus for the past several years has been on multidisciplinary engagement with collections, enabling digital research and innovation. A tale of two web archives: Challenges of engaging web archival infrastructures for research Jessica Ogden (University of Southampton) Emily Maemura (University of Toronto) Keywords: national web archives, researcher engagement, infrastructure studies ABSTRACT Web archives (WAs) are a key source for historical web research, and recent anthologies provide examples of their use by scholars from a range of disciplines (Brügger, 2017; Brügger 2018; 70 Brügger & Schroeder, 2017). Much of this work has drawn on large-scale collections, with a particular focus on the use of national web domain collections (Brügger & Laursen, 2019; Hockx- Yu, 2016). This previous work demonstrates how WAs afford new scholarship opportunities, yet little work has addressed how researcher engagement is impacted by the complexity of WA collection and curation. Further research has begun to address the impact of specific organizational settings where the technical constraints interact with policy frameworks and the limitations of resources and labour (Dougherty & Meyer, 2014; Hockx-Yu, 2014; Maemura et al. 2018; Ogden et al., 2017). Here, we extend this work to consider how these factors influence subsequent engagement, to investigate the very real barriers researchers face when using WAs as a source for research. This paper explores the challenges of researcher engagement from the vantage point of two national WAs: the UK Web Archive at the British Library, and Netarkivet at the Royal Danish Library. We compare and contrast our experiences of undertaking WA research at these institutions. Our personal interactions with the collections are supplemented by observations of practice and interviews with staff, in an effort to investigate the circumstances that shape the ways that researchers use WAs. We compare these two national WAs along several dimensions, including: the legal mandates for collection; the ontological decisions that drive practices; the affordances of tools and technical standards; everyday infrastructural maintenance and labour; and the ways in which all of the above constructs the interfaces through which WAs are researched. Our approach explores the materiality of WAs data across these two sites to acknowledge the generative capabilities of web archiving and reinforce an understanding that these data are not given or ‘natural’ (Gitelman, 2013). We highlight how the sociotechnical infrastructure of web archiving shapes researcher access, the types of questions asked, and the methods used. Here, access is conceived of not only in terms of ‘open’ versus ‘closed’ data, but rather as a spectrum of possibilities that orientates researchers to particular ways of working with data, whilst often decontextualising them from the circumstances of their creation. We question which kinds of digital research are afforded by national WAs, particularly when the scoping of collection boundaries on ccTLDs (top level domains) creates ‘artificial geographic boundaries’ (Winters, in press). Through this process we recognise and centre the assumptions about collection and use that are embedded in these research infrastructures, to facilitate a discussion of how they both enable and foreclose on particular forms of engagement with the Web’s past. 71 Bibliography: Brügger, N. (2018). The Archived Web: Doing History in the Digital Age. Cambridge, MA: MIT Press. Brügger, N. (Ed.). (2017). Web 25: histories from the first 25 years of the World Wide Web. New York: Peter Lang. Brügger, N., & Laursen, D. (Eds.). (2019). The historical web and digital humanities: The case of national web domains. Abingdon: Routledge. Brügger, N., & Schroeder, R. (Eds.). (2017). The Web as History: Using Web Archives to Understand the Past and the Present. London: UCL Press. Retrieved from http://oapen.org/download?type=document&docid=625768 Dougherty, M., & Meyer, E. T. (2014). Community, tools, and practices in web archiving: The state-of-the-art in relation to social science and humanities research needs. Journal of the Association for Information Science and Technology, 65(11), 2195–2209. https://doi.org/10.1002/asi.23099 Gitelman, L. (Ed.). (2013). “Raw data” is an oxymoron. Cambridge, Massachusetts; London, England: The MIT Press. Hockx-Yu, H. (2014). Access and Scholarly Use of Web Archives. Alexandria: The Journal of National and International Library and Information Issues, 25(1), 113–127. https://doi.org/10.7227/ALX.0023 Hockx-Yu, H. (2016). Web Archiving at National Libraries Findings of Stakeholders’ Consultation by the Internet Archive. Internet Archive. Retrieved from https://archive.org/details/InternetArchiveStakeholdersConsultationFindingsPublic Maemura, E., Worby, N., Milligan, I., & Becker, C. (2018). If These Crawls Could Talk: Studying and Documenting Web Archives Provenance. Journal of the Association for Information Science and Technology, 69(10), 1223–1233. https://doi.org/10.1002/asi.24048 Ogden, J., Halford, S., & Carr, L. (2017). Observing Web Archives: The Case for an Ethnographic Study of Web Archiving. In Proceedings of the 2017 ACM on Web Science Conference (pp. 299–308). Troy, New York, USA: ACM Press. https://doi.org/10.1145/3091478.3091506 Winters, J. (in press, 2019). Giving with one hand, taking with the other: E-legal deposit, web archives and researcher access. In P. Gooding & M. Terras (Eds.), Electronic Legal Deposit: Shaping the library collections of the future. London: Facet Publishing. Biography: Jessica Ogden, University of Southampton; jessica.ogden@soton.ac.uk Jessica Ogden is a PhD Candidate based in Sociology and the Web Science Centre for Doctoral Training at the University of Southampton. Jessica’s research focuses on the politics of data, web archiving and digital data scholarship. 72 Emily Maemura, University of Toronto; e.maemura@mail.utoronto.ca Emily Maemura is a PhD candidate at the University of Toronto’s Faculty of Information (iSchool). Her research focus is on web archiving, including approaches and methods for working with web archives data and research collections, and capturing diverse perspectives of the internet as an object and/or site of study. 73 Session 15: WARC and OAIS What’s missing from WARC? (Consultative Committee for Space Data Systems (CCSDS), Data Archive Interoperability (DAI) Working Group) Mr. Michael W. Kearney III Sponsored by Google, Huntsville, Alabama, USA. Mr. John Garrett Garrett Software, Columbia, Maryland USA Mr. David Giaretta PTAB Ltd, Dorset, UK. Mr. Steve Hughes Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA Keywords: OAIS; WARC; CCSDS; HTML; MIME ABSTRACT This presentation will explain why the WARC format, by itself, is not adequate to preserve websites. As a brief justification of the claim, it is well known that a WARC file essentially captures the information sent from a website. However, by itself, this is not enough for long term preservation for the following reasons. Right now, there are suitable, readily available, Web browsers which can deal with current websites, supporting HTML standards, but often making guesses about how to display important but badly constructed web pages. In future these will not necessarily be available. More importantly websites not only display pages but also download files. The WARC file may show a MIME type of “application/vnd.ms-excel”, which is a hint to the web browser to use MS Excel to show a spreadsheet. But what do the columns mean? For example, a column labelled “speed” may seem easy to understand but a speed of 10 mm/hour is very different from a speed of 10 miles/second. The WARC file does not provide enough information. The presentation will also explain what can be done to supplement WARC to fix these problems utilizing the long-term preservation practices of OAIS. 74 Biographies: Mike Kearney is an engineering graduate of the University of Kentucky. He worked for NASA for 34 years in Systems Engineering and Technology positions; including chairmanship of the international standards body CCSDS, until retiring from NASA in 2015. He is now working with the non-profit Space Infrastructure Foundation and volunteers time for Google who sponsors attendance at Digital Preservation forums. David Giaretta has led developments of standards in digital preservation (ISO 14721), in particular audit and certification of repositories (ISO 16363 and 16919) and developed practical and coherent solutions and services that will help repositories seeking ISO certification while adding value to their holdings. Steve Hughes is a Principal Computer Scientist at the National Aeronautics and Space Administration (NASA) Jet Propulsion Laboratory. Three decades of experience with NASA’s official archive for Solar System Exploration science data, the Planetary Data System. Chief architect for the archive’s information architecture which is based on principles from the Open Archive Information System (OAIS) Reference Model (ISO-14721) and the ISO/IEC 11179 Metadata Registry (MDR) standard. Member of the Primary Trusted Digital Repository Accreditation Board (PTAB). Associate member of Jet Propulsion Laboratory’s Center for Data Science and Technology, a virtual center for research, development and operations of data intensive and data-driven science systems. He was awarded the NASA Exceptional Public Service Medal for exceptional service to NASA science missions and data archives, architecting and implementing data intensive systems, information models, and ontologies for three decades. John Garrett is an engineering graduate from Missouri University for Science and Technology and a Computer Science graduate of Johns Hopkins University. He spent 25 years working as a contractor for NASA’s National Space Science Data Archive, including many years representing their needs and interests while developing digital preservation standards. He was instrumental in developing the OAIS Reference Model and continues to help lead the CCSDS DAI efforts developing OAIS related standards and standards for certifying Trustworthy Digital Repositories. Background on the CCSDS DAI Working Group: CCSDS is the Consultative Committee for Space Data Systems. It started in 1982 developing data and communications interoperability standards for data systems (flight and ground) that are used in space missions. While CCSDS is organized by space agencies, it is inclusive of other non-space organizations, industry and academia. CCSDS consists of about 22 working groups, one of which is the Data Archive Interoperability WG. The DAI WG is focused on long-term digital preservation archives. With extensive support from non-space-industry organizations (national archives and libraries from various countries, academia, other industry domains, etc.), the DAI WG developed the Reference Model for OAIS. Due to its wide applicability, OAIS became broadly adopted outside of the space industry. CCSDS and DAI standards are procedurally adopted by and published by ISO (as CCSDS functions as ISO TC20/SC13). The DAI has published many standards that support OAIS and that are applicable to some space-related archives as well as other “generic” preservation archives globally. 75 Session 16: Web Archives as Scholarly Dataset Web Archives as Scholarly Dataset to Study the Web Dr. Helge Holzmann (Internet Archive) Jefferson Bailey (Internet Archive) Keywords: data processing, extraction, derivation, access, research ABSTRACT The Internet Archive (IA) has been archiving broad portions of the global web for over 20 years. This historical dataset, currently totaling over 20 petabytes of data, offers unparalleled insight into how the web has evolved over time. Part of this collecting effort has included the ability to support large-scale computational research efforts analyzing this collection. This presentation will update efforts within IA to support computational use of its web archive, approaching this topic through description of both program and technical development efforts. Web archives give us the opportunity to process the web as if it was a dataset, which can be searched, analyzed and studied, temporally as well as retrospectively. However, web data features some very specific traits that raise new challenges to deal with when providing services based on the contained information. Our Web Data Engineering efforts are tackling these challenges in order to discover, identify, extract and transform archival web data into meaningful information for our users and partners, by hiding all the complexity and abstract away technical details. Engineering has traditionally been the systematic application and combination of existing methods to build a desired system or thing. Data Engineering is different from this in that engineering here does not refer to creating something but transform the data in a way that it is more useful for what should be achieved. As part of this, new tools and processes are developed to accomplish this transformation more effectively as well as efficiently in terms of resources and time. The talk will outline different computational research services for historical web archive data, along with technical challenges, novel developments and opportunities as well as considerations to make when working with this unique dataset, including: 76 ● Researcher support scenarios ● Data limitations, affordances, and complexities ● Extraction, derivation, and access methods ● Infrastructure requirements ● Relevant tools and technologies ● Collection development and augmentation In covering these topics through the lens of specific collaborations between IA and computational researchers performing large-scale analysis of web archives, this presentation will illuminate issues and approaches that can inform both the implementation of similar programs at other web archiving institutions and also help researchers interested in data mining web collections better understand the possibilities of studying web archives and the types of services they can expect to encounter when pursuing this work. This overview is meant to showcase the latest achievements and upcoming data services from the Internet Archive's web archiving and data services group. Details about the way we and our systems work will be presented together with APIs and programming libraries that are ready to use as well as new features that are to be expected soon. Biographies: Helge Holzmann is Web Data Engineer at Internet Archive. Helge started working for the Archive in August 2018. Before, he earned his Master of Computer Science and worked as a researcher in Germany, striving for his PhD on efficient access methods for web archives, which resulted in publications at different conferences and journals, including TPDL, JCDL, BigData, SIGIR, WWW as well as the International Journal on Digital Libraries. He is passionate about big data, especially if there’s a temporal aspect to it, and is glad to contribute to a non-profit organization that holds one of the biggest collections of free data in the world. In addition to creating innovative services by deriving new value from this unique dataset, Helge is happy to support libraries and institutions interested in accessing the data as a consultant located in Europe. Jefferson Bailey is Director of Web Archiving & Data Services at Internet Archive. Jefferson joined Internet Archive in Summer 2014 and manages Internet Archive's web archiving services including Archive-It, used by over 650 institutions to preserve the web, as well as domain-scale and contract harvesting and indexing services. He works closely with partner institutions on collaborative technology development, computational research support, and data services. He is PI on multiple grants focused on systems interoperability, data-driven research use of web archives, and digital preservation initiatives. He was Chair of the Steering Committee of the International Internet Preservation Consortium (IIPC) until 2019. 77 Session 17: An Irish Tale / Scéal Éireannach Born-digital displaced records: The disappearance of the GAA websites Helena La Pina (Maynooth University) Keywords: Irish culture; GAA; archived websites; born-digital displaced records ABSTRACT This year, the author completed an MA in Historical Archives in Maynooth University, and produced a thesis titled: ‘Displaced archives, and the core components in the debates surrounding repatriation’. The thesis utilises secondary literature in archival science, information/records management, and interdisciplinary scholarship to investigate the dilemmas associated with displaced archives. During the thesis research process, the author discovered that there was a limited amount of scholarship dealing with the displacement of electronic records, and a scarcity of scholarship regarding the displacement of born-digital records. This presentation aims to open a discussion on how archived websites, might also be understood as displaced born-digital records. In doing so, the author discusses a research study, which explores the presence of the Gaelic Athletic Association (GAA) web heritage in the Internet Archive’s Wayback Machine. Danielson (cited in Winn, 2015) offers an interpretation of displaced archives as ‘archival materials that have been lost, seized, requisitioned, confiscated, purchased under duress, or otherwise gone astray’. Inkster (1983) proffers that a displaced or misplaced document comes under three definitions: the document is missing, the document is estray (which is the legal definition of a document not in possession of its owner), or the document is fugitive. The Society of Archivists (SAA) define fugitive as connoting ‘materials that are not held by the designated archives or library charged with their preservation.’ Displaced archives are also referred to as misplaced archives, expatriated archives, seized archives, archives in exile, and migrated archives (Inkster, 1983; Garaba, 2011; Winn, 2015). However, as Garaba argues, whatever term is used to describe displaced records and for whatever reason, the fundamental fact remains, they are not where they should be. In this presentation, the author provides an analysis of the official GAA website, archived in the Wayback Machine within a certain timeframe. It also covers, on the periphery, other ‘unofficial’ 78 GAA archived websites. While chronicling the important role the GAA has played in Irish society, the author observes what dates were used for capturing and why the randomness of captures is not calibrated with end-of-season competitions like the All-Ireland final. The author discusses how the disappearance of GAA websites from the live web, fit the description of a missing cultural record. The author also highlights how the capture of GAA websites in the Wayback Machine, offers an interpretation of born-digital displaced record, in so far as the record is not where it should be. References: Garaba, Francis (2010) An investigation into the management of the records and archives of former liberation movements in east and southern Africa held by national and private archival institutions (PhD Dissertation, University of KwaZulu-Natal, South Africa, 2010) (https://researchspace.ukzn.ac.za/xmlui/handle/10413/1495) Inkster, Carole M. (1983) Geographically misplaced archives and manuscripts: problems and arguments associated with their restitution, Archives and Manuscripts, 11(2), pp 113-124 (https://publications.archivists.org.au/index.php/asa/article/view/7559) Winn, Samantha R. (2015) Ethics of access in displaced archives, Provenance, Journal of the Society of Georgia Archivists, 33(1), pp 6-13 (http://digitalcommons.kennesaw.edu/provenance/vol33/iss1/5) Society of American Archivists, Dictionary of archival terminology, (https://dictionary.archivists.org/entry/fugitive.html). Biography: Helena La Pina recently completed an MA in Historical Archives at Maynooth University. Titled, ‘Displaced archives, and the core components in the debates surrounding repatriation’, her thesis investigates the dilemmas associated with displaced archives within the context of archival practices, and the justifications, rationales, and challenges for repatriation. Recording Ireland's technology heritage: Lessons learned John Sterne (TechArchives project, Ireland) Keywords: IT Histories; technology heritage https://researchspace.ukzn.ac.za/xmlui/handle/10413/1495 https://publications.archivists.org.au/index.php/asa/article/view/7559 http://digitalcommons.kennesaw.edu/provenance/vol33/iss1/5 https://dictionary.archivists.org/entry/fugitive.html 79 ABSTRACT At its public launch in June 2016 the TechArchives project reached out to people with experience of past generations of information technology in Ireland and asked them to record personal testimonies. This work is continuing. As the project evolved, however, it became more concerned about the limited quantity and quality of historic material. It is therefore developing processes and methods to locate, catalogue and preserve digital evidence of significant actions and events. Biography: John Sterne is the founder of the TechArchives project. In the past he worked as a researcher, author, reporter and editor. Table of Contents Introduction Welcome from Sharon Healy and Michael Kurzmeier #EWAVirtual KEYNOTES #EWAVirtual Programme #EWAVirtual Abstracts Session 1: Archiving Initiatives The National Library of Ireland's Web Archive: preserving Ireland's online life for tomorrow Developing a Web Archiving Strategy for the Covid-19 Collecting Initiative at the University of Edinburgh Internet for everyone: the selection and harvest of the homepages of the oldest Dutch provider XS4ALL (1993-2001) Session 2: Collaborations Leveraging the UK Web Archive in an Irish context: Challenges and Opportunities Creating a web archive at Tate: an opportunity for ongoing collaboration Session 3: Archiving Initiatives (Lightning Round) PRONI Web Archive: A collaborative approach An overview of 15 years of experience in archiving the Croatian web The UK Web Archive and Wimbledon: A Winning Combination Session 4: Research Engagement & Access Piloting access to the Belgian web-archive for scientific research: a methodological exploration Reimagining Web Archiving as a Realtime Global Open Research Platform: The GDELT Project Session 5: Archiving Initiatives Archiving 1418-Now using Rhizome’s Webrecorder: observations and reflections Managing the Lifecycle of Web Archiving at a Large Private University Session 6: Social Science & Politics Thematic web crawling and scraping as a way to form focussed web archives Metadata for social science research Exploring Web Archive Networks: The Case of the 2018 Irish Presidential Election Session 7: Collaborations & Teaching IIPC: training, collecting, research, and outreach activities Using Web Archives to Teach and Opportunities in the Information Science Field Session 8: Research of Web Archives Web archiving - professionals and amateurs Session 9: Research Approaches Digital archaeology in the web of links: reconstructing a late-90s web sphere Web defacements and takeovers and their role in web archiving Session 10: Culture & Sport MyKnet.org: Traces of Digital Decoloniality in an Indigenous Web-Based Environment From the sidelines to the archived web: What are the most annoying football phrases in the UK? Session 11: Research (Lightning Round) Tracking and Analysing Media Events through Web Archives Reanimating the CDLink platform: A challenge for the preservation of mid-1990s Web-based interactive media and net.art Curating culturally themed collections online: The 'Russia in the UK' Special Collection, UK Web Archive Session 12: Youth & Family DELETE MY ACCOUNT: Ethical Approaches to Researching Youth Cultures in Historical Web Archives Changing platforms of ritualized memory practices. Assessing the value of family websites Session 13: Source Code and App Histories Platform and app histories: Assessing source availability in web archives and app repositories Exploring archived source code: computational approaches to historical studies of web tracking Session 14: AI and Infrastructures Cross-sector interdisciplinary collaboration to discover topics and trends in the UK Government Web Archive: a reflection on process A tale of two web archives: Challenges of engaging web archival infrastructures for research Session 15: WARC and OAIS What’s missing from WARC? Session 16: Web Archives as Scholarly Dataset Web Archives as Scholarly Dataset to Study the Web Session 17: An Irish Tale / Scéal Éireannach Born-digital displaced records: The disappearance of the GAA websites Recording Ireland's technology heritage: Lessons learned