Fedora Migration Paths and Tools Project Update: July 2021

This is the latest in a series of monthly updates on the Fedora Migration Paths and Tools project – please see the previous post for a summary of the work completed up to that point. This project has been generously funded by the IMLS.

We completed some final performance tests and optimizations for the University of Virginia pilot. Both the migration to their AWS server and the Fedora 6.0 indexing operation were much slower than anticipated, so the project team tested a number of optimizations, including:

  1. Adding more processing threads
  2. Increasing the size of the server instance 
  3. Using a separate and larger database server 
  4. Using locally attached flash storage

Fortunately, these improvements made a big difference; for example, ingest speed was increased from 6.8 resources per second to 45.6 resources per second. In general, this means that institutions with specific performance targets can use a combination of parallel processing and increased computational resources. Feedback from this pilot has been incorporated into the migration guide, updates to the migration-utils to improve performance, updates to the aws-deployer tool to provide additional options, and improvements to the migration-validator to handle errors.

The Whitman College team has begun their production migration using Islandora Workbench. Initial benchmarking has shown that running Workbench from the production server rather than locally on a laptop achieves much better performance, so this is the recommended approach. The team is working collection-by-collection using CSV files and a tracking spreadsheet to keep track of each collection as it is ingested and ready to be tested. They have also developed a Quality Control checklist to make sure everything is working as intended – we anticipate doing detailed checks on the first few collections and spot checks for subsequent collections.

As we near the end of the pilot project phase of the grant work we are focused on documentation for the migration toolkit. We plan to complete a draft of this documentation over the summer, after which this draft will be shared with the broader community for feedback. We will organize meetings in the Fall to provide opportunities for community members to provide additional feedback on the toolkit and make suggestions for improvements.