Static Search: An Archivable and Sustainable Search Engine for the Digital Humanities Joseph Takeda (@joeytakeda) (Digital Humanities Innovation Lab at Simon Fraser University) Martin Holmes (Humanities Computing and Media Centre at the University of Victoria) Project Endings Project Endings ● Static Websites – No server side dependencies – Strictly XHTML + CSS + Vanilla JS – Best chance for preservation 50 years + (we hope) But... But... ● Researchers require robust searching mechanisms, including: But... ● Researchers require robust searching mechanisms, including: – Keyword search But... ● Researchers require robust searching mechanisms, including: – Keyword search – Exact phrase search But... ● Researchers require robust searching mechanisms, including: – Keyword search – Exact phrase search – Filtered search (by date, by author, +++) Why not use...? Why not use...? Why not use...? Why not use...? Why not use...? Why not use...? ● Not particularly reliable Why not use...? ● Not particularly reliable ● External dependency = technical debt Why not use...? ● Not particularly reliable ● External dependency = technical debt ● Too many documents to put in one index So we built our own So we built our own ● https://github.com/projectEndings/staticSearch https://github.com/projectEndings/staticSearch Demonstration applications ● https://johnkeats.uvic.ca/search.html ● https://dvpp.uvic.ca/search.html https://johnkeats.uvic.ca/search.html https://dvpp.uvic.ca/search.html How it works You have: ● A collection of HTML files ● All well-formed & valid XHTML HTML Collection HTML HTML HTML You add: ● An HTML search page containing a special div:
Search page HTML You add: ● Some metadata elements to support search filters: HTML Collection HTML HTML HTML You create: ● A configuration file (XML) to specify the features and constraints for your search engine Configuration XML HTML Collection HTML HTML HTML Configuration XML HTML Collection HTML HTML HTML XSLT Files XSLT XSLT XSLT Configuration XML HTML Collection HTML HTML HTML JSON Tokens JSON JSON JSON XSLT Files XSLT XSLT XSLT Configuration XML HTML Collection HTML HTML HTML JSON Filters JSON JSON JSON JSON Tokens JSON JSON JSON XSLT Files XSLT XSLT XSLT Configuration XML HTML Collection HTML HTML HTML JSON Filters JSON JSON JSON JSON Tokens JSON JSON JSON Search page HTML XSLT Files XSLT XSLT XSLT Configuration XML JSON token file {"token": "unprofit", "instances": [ { "docId": "pom_2025_ithe_old_man_of_hoy", "docUri": "poems/goodwords/1870/pom_2025_ithe_old_man_of_hoy.html", "score": 1, "contexts": [ { "form": "unprofitably", "context": "…whole that day was spent unprofitably<\/mark>.", "weight": 1, "pos": 400 } ] }, { "docId": "pom_8733_john_and_joan_canto_ii", "docUri": "poems/blackwoods/1820/pom_8733_john_and_joan_canto_ii.html", "score": 1, "contexts": [ { "form": "unprofitable", "context": "…too much ap- propriated unto unprofitable<\/mark> jocularities and facetiousness. Craving licence,…" , "weight": 1, "pos": 164 } ] } ] } Filter JSON { "filterId": "ssBool2", "filterName": "Unsigned", "ssBool2_1": { "value": "true", "docs": [ "poems/chambers_series/1867/pom_7594_the_husbands_request.html", "poems/alltheyearround/1875/pom_4271_the_hourglass.html", "poems/alltheyearround/1879/pom_4507_in_the_conservatory.html", "poems/blackwoods/1843/pom_9963_jolly_father_joe.html", "poems/blackwoods/1829/pom_10314_the_watchmans_lament.html" ] } } On the website... ● ...of the thousands of JSON files... ● ...the search page JS retrieves only the ones it needs for the search you do: Search page HTML Search page HTML unprofitable Search page HTML unprofitable JS stem unprofit Search page HTML unprofitable JS stem unprofitunprofit.json Search page HTML unprofitable JS stem unprofitunprofit.jsonretrieve Search page HTML unprofitable JS stem unprofitunprofit.jsonretrieve compile filter Search page HTML unprofitable JS stem unprofitunprofit.jsonretrieve compile filter display Next Steps ● Wildcard searches (*, ?, [uv]) ● Pluggable stemmers for different languages / dialects Resources ● Get the code: https://github.com/projectEndings/staticSearch ● Read the documentation: https://projectEndings.github.io/staticSearch https://github.com/projectEndings/staticSearch https://projectEndings.github.io/staticSearch Thanks! ● HCMC, University of Victoria ● DHIL, Simon Fraser University ● Social Sciences and Humanities Research Council ● DHSI, Lindsey Seatter, Arun Jacob Slide 1 Slide 2 Slide 3 Slide 4 Slide 5 Slide 6 Slide 7 Slide 8 Slide 9 Slide 10 Slide 11 Slide 12 Slide 13 Slide 14 Slide 15 Slide 16 Slide 17 Slide 18 Slide 19 Slide 20 Slide 21 Slide 22 Slide 23 Slide 24 Slide 25 Slide 26 Slide 27 Slide 28 Slide 29 Slide 30 Slide 31 Slide 32 Slide 33 Slide 34 Slide 35 Slide 36 Slide 37 Slide 38 Slide 39 Slide 40 Slide 41 Slide 42