Software Development at Royal Danish Library Software Development at Royal Danish Library A peekhole into the life of the software development department at the Royal Danish Library Which type bug? A light tale of bug hunting an Out Of Memory problem with SolrCloud. The setup and the problem At the Royal Danish Library we provide full text search for the Danish Netarchive. The heavy lifting is done in a single … Continue reading → Touching encouraged (an ongoing story) Ongoing experiments with a large touch screen providing access to cultural heritage material Continue reading → DocValues jump tables in Lucene/Solr 8 Lucene/Solr 8 is about to be released. Among a lot of other things is brings LUCENE-8585, written by your truly with a heap of help from Adrien Grand. LUCENE-8585 introduces jump-tables for DocValues, is all about performance and brings speed-ups … Continue reading → Faster DocValues in Lucene/Solr 7+ This is a fairly technical post explaining LUCENE-8374 and its implications on Lucene, Solr and (qualified guess) Elasticsearch search and retrieval speed. It is primarily relevant for people with indexes of 100M+ documents. Teaser We have a Solr setup for … Continue reading → Prebuild Big Data Word2Vec dictionaries                    Prebuild and trained Word2Vec dictionaries ready for use Two different prebuild big data Word2Vec dictionaries has been added to LOAR (Library Open Access Repository) for download. These dictionaries are build from the text of 55,000 e-books from Project Gutenberg … Continue reading → SolrWayback software bundle has been released The SolrWayback software bundle can be used to search and playback archived webpages in Warc format. It is an out of the box solution with index workflow, Solr and Tomcat webserver and a free text search interface with playback functionality. … Continue reading → Visualising Netarchive Harvests   An overview of website harvest data is important for both research and development operations in the netarchive team at Det Kgl. Bibliotek. In this post we present a recent frontend visualisation widget we have made. From the SolrWayback Machine … Continue reading → SolrWayback Machine Another ‘google innovation week’ at work has produced the SolrWayback Machine. It works similar to the Internet Archive: Wayback Machine (https://archive.org/web/) and can be used to show harvested web content (Warc files).  The Danish Internet Archive has over 20billion harvested … Continue reading → juxta – image collage with metadata Creating large collages of images to give a bird’s eye view of a collection seems to be gaining traction. Two recent initiatives: The New York Public Library has a very visually pleasing presentation of public domain digitizations, but with a … Continue reading → Automated improvement of search in low quality OCR using Word2Vec This abstract has been accepted for Digital Humanities in the Nordic Countries 2nd Conference, http://dhn2017.eu/ In the Danish Newspaper Archive[1] you can search and view 26 million newspaper pages. The search engine[2] uses OCR (optical character recognition) from scanned pages … Continue reading →