MarcEdit 7.3.x/7.5.x (beta) Updates

By reeset / On / In MarcEdit

Versions are available at: https://marcedit.reeset.net/downloads

Information about the changes:

If you are using 7.x – this will prompt as normal for update. 7.5.x is the beta build, please be aware I expect to be releasing updates to this build weekly and also expect to find some issues.

Questions, let me know.

–tr

MarcEdit 7.5.x/MacOS 3.5.x Timelines

By reeset / On / In MarcEdit

I sent this to the MarcEdit Listserv to provide info about my thoughts around timelines related to the beta and release.  Here’s the info.

Dear All,

As we are getting close to Feb. 1 (when I’ll make the 7.5 beta build available for testing) – I wanted to provide information about the update process going forward.

Feb. 1:

  1. MarcEdit 7.5 Download will be released.  This will be a single build that includes both the 32 and 64 bit builds, dependencies, and can install if you have Admin rights or non-admin rights.
    1. I expect to be releasing new builds weekly – with the goal of taking the beta tag off the build no later than April 1.
  2. MarcEdit 7.3.x
    1. I’ll be providing updates for 7.3.x till 7.5 comes out of beta.  This will fold in some changes (mostly bug fixes) when possible. 
  3. MarcEdit MacOS 3.2.x
    1. I’ll be providing Updates for MacOS 3.2.x till 3.5 is out and out of beta
  4. MarcEdit MacOS 3.5.x Beta
    1. Once MarcEdit 7.5.x beta is out, I’ll be looking to push a 3.5.x beta by mid-late Feb.  Again, with the idea of taking the beta tag off by April (assuming I make the beta timeline)

March 2021

  1. MarcEdit MacOS 3.5.x beta will be out and active (with weekly likely builds)
  2. MarcEdit 7.5.x beta – testing assessed and then determine how long the beta process continues (with April 1 being the end bookend date)
  3. MarcEdit 7.3.x – Updates continue
  4. MarcEdit MacOS 3.2.x – updates continue

April 2021

  1. MarcEdit 7.5.x comes out of Beta
  2. MarcEdit 7.3.x is deprecated
  3. MarcEdit MacOS 3.5.x beta assessed – end bookend date is April 15th if above timelines are met

May 2021

  1. MarcEdit MacOS 3.5.x is out of beta
  2. MarcEdit MacOS 3.2.x is deprecated

Let me know if you have questions.

MarcEdit 7.5 Change/Bug Fix list

By reeset / On / In MarcEdit

* Updated; 1/20

Change: Allow OS to manage supported supported Security Protocol types.

Change: Remove com.sun dependency related to dns and httpserver

Change: Changed AppData Path

Change: First install automatically imports settings from MarcEdit 7.0-2.x

Change: Field Count – simplify UI (consolidate elements)

Change: 008 Windows — update help urls to oclc

Change: Generate FAST Headings — update help urls

Change: .NET changes thread stats queuing. Updating thread processing on forms:

* Generate FAST Headings

* Batch Process Records

* Build Links

* Main Window

* RDA Helper

* Delete Selected Records

* MARC Tools

* Check URL Tools

* MARCValidator

Change: XML Function List — update process for opening URLs

Change: Z39.50 Preferences Window – update process for opening URLs

Change: About Windows — new information, updated how version information is calculated.

Change: Catalog Calculator Window — update process for opening URLs

Change: Generate Call Numbers — update process for opening URLs

Change: Generate Material Formats — update process for opening URLs

Change: Tab Delimiter — remove context windows

Change: Tab Delimiter — new options UI

Change: Tab Delimiter — normalization changes

Change: Remove Old Help HTML Page

Change: Remove old Hex Editor Page

Change: Updated Hex Editor to integrate into main program

Change: Main Window — remove custom scheduler dependency

Change: UI Update to allow more items

Change: Main Window — new icon

Change: Main Window — update process for opening URLs

Change: Main Window — removed context menus

Change: Main Window — Upgrade changes to new executable name

Change: Main Window — Updated the following menu Items:

* Edit Linked Data Tools

* Removed old help menu item

* Added new application shortcut

Change: OCLC Bulk Downloader — new UI elements to correspond to new OCLC API

Change: OCLC Search Page — new UI elements to correspond to new OCLC API

Change: Preferences — Updates related to various preference changes:

* Hex Editor

* Integrations

* Editor

* Other

Change: RDA Helper — update process for opening URLs

Change: RDA Helper — Opening files for editing

Change: Removed the Script Maker

Change: Templates for Perl and vbscripts includes

Change: Removed Find/Search XML in the XML Editor and consolidated in existing windows

Change: Delete Selected Records: Exposed the form and controls to the MarcEditor

Change: Sparql Browser — update process for opening URLs

Change: Sparql Browser — removed context menus

Change: TroubleShooting Wizard — Added more error codes and kb information to the Wizard

Change: UNIMARC Utility — controls change, configurable transform selections

Change: MARC Utilities — removed the context menu

Change: First Run Wizard — new options, new agent images

Change: XML Editor — Delete Block Addition

Change: XML Editor — XQuery transform support

Change: XML Profile Wizard — option to process attributes

Change: MarcEditor — Status Bar control doesn’t exist in NET 5.0. Control has changed.

Change: MarcEditor — Improved Page Loading

Change: MarcEditor — File Tracking updated to handle times when the file opened is a temp record

Change: MarcEditor — removed ~7k of old code

Change: MarcEditor — Added Delete Selected Records Option

Change: Removed helper code used by Installer

Change: Removed Office2007 menu formatting code

Change: Consolidated Extensions into new class (removed 3 files)

Change: Removed calls Marshalled to the Windows API — replaced with Managed Code

Change: OpenRefine Format handler updated to capture changes between OpenRefine versions

Change: MarcEngine — namespace update to 75

Bug Fix: Main Window — corrects process for determining version for update

Bug Fix: Main Window — Updated image

Bug Fix: When doing first run, wizard not showing in some cases.

Bug Fix: Main Window — Last Tool used sometimes shows duplicates

Bug Fix: RDA Helper — $e processing

Bug Fix: RDA Helper — punctuation in the $e

Bug Fix: XML Profile Wizard — When the top element is selected, it’s not viewed for processing (which means not seeing element data or attribute data)

Bug Fix: MarcEditor — Page Processing correct to handle invalid formatted data better

MarcEdit 7.5 Updates

By reeset / On / In MarcEdit

Current list of MarcEdit 7.5 general updates.  I’ll be walking through many of these changes in a webinar 1/15.

  • Significant Changes:
    • Targeted Framework: .NET 5.0 (What’s new in .NET 5 | Microsoft Docs)
    • XML Wizard Changes
      • Support for Attribute-based mapping (extends previous entity based mapping)
    • Linked Data Components updated
    • SPARQL Components Updated
    • Linked Data Rules File Format Updates
      • Multiple rule blocks for the same field number allowed
      • Allows for redirection of URI to different fields (than one’s evaluated)
    • Delimited Text Translator
      • Ability to add custom mnemonic replacements (any {000$a} option allowed)
      • No longer a stand alone program
        • Now a part of main MarcEdit app
        • Command-Line options integrated into the MarcEdit Command-Line options
    • OCLC Updates
      • Shift to Metadata Search API (removes reliance on old search API)
      • New indexes available for query
        • Examples: catalog date, catalog org, etc.
    • UNIMARC Tools
      • Shifted to MARC Flavors
        • Allows users to add new translations
        • Configuration-based
      • New Chinese MARC 2 MARC21 translation added
  • Incremental Changes
    • Upgrade Wizard updates
    • RDA Helper
      • 100$e updated to assess both the $e and $4
      • Updates related to updated RDA
      • Will assess only bibliographic records if bibliographic, authority, or holdings records included in the same file.
    • UI Updates
      • Updated Home Screen with added apps the main screen
      • Integration with the command-line tool
    • XML Editor
      • Record specific tools
        • Delete Block
      • XQuery Processing Option
    • Installer
      • Exposed the extensions to the install process
      • Embedded necessary dependencies (.NET) in the installer
      • Single Installer (32 or 64 bit)
    • Linked Data
      • New Collections
        • Wikidata for example

MarcEdit 7.5 Update Status

By reeset / On / In Uncategorized

I’m planning to start making testing versions of the new MarcEdit instance available around the first of the year broadly, to a handful of testers in mid-Dec.  The translation from .NET 4.7.2 to .NET 5 was more significant than I would have thought – and includes a number of swapped default values – so hunting down behavior changes.  Currently, the follow updates have been completed.

    • Framework used: .NET 5.0
    • RDA Helper: 100$e process modified. Added criteria to $e generation. Previously, if a $e is already present, an new $e wasn’t added. Now, if a $e or $4 is present, a $e won’t be generated.
    • RDA Helper: Changes related to RDA updates
    • Added new elements to the new window programs for pinning
    • XML Editor: Delete Block element added
    • XML Editor: XQuery processing option
    • If a set of records include bibliographic and authority records, the RDA helper will skip the authority records
    • Updated Installation Wizard (allows migration of 6.x and 7.x content into the tool)
    • Updating OCLC Integration to use new Metadata API Search
    • Delimited Text Translator — added ability to use custom mnemonic replacements
    • Delimited Text Translator — no longer a stand alone program
      • App part of main marcedit app
      • Command line options folded into marcedit app
    • [in process] linked data rules file version 2
      • Enhancements to the rules file schema
  • -tr

Changes to System.Diagnostics.Process in .NET Core

By reeset / On / In Uncategorized

In .NET Core, one of the changes that caught me by surprise is the change related to starting processes.  In the .NET framework – you can open a web site, file, etc. just by using the following:\

System.Diagnostics.Process.Start(path);

However, in .NET Core – this won’t work.  When trying to open a file, the process will fail – reporting that a program isn’t associated with the file type.  When trying to open a folder on the system, the process will fail with a permission error unless the application is running with administrator permissions (which you don’t want to be doing).  The change is related to a change in a property default – specifically:

System.Diagnostics.ProcessStartInfo.UseShellExecute

In the .NET framework – this property is set to true by default.  In the .NET Core, it is set to false.  The difference here probably makes sense – .NET Core is meant to be more portable and you do need to change this value on some systems.  To fix this, I’d recommend removing any direct calls to this assembly and run in through a function like this:

<code>

public static void OpenURL(string url)
  {
    var psi = new System.Diagnostics.ProcessStartInfo
    {
      FileName = url,
      UseShellExecute = true
    };
    try {
      System.Diagnostics.Process.Start(psi);
    } catch {
      psi.UseShellExecute = false;
      System.Diagnostics.Process.Start(psi);
    }
  }

public static void OpenFileOrFolder(string spath, string sarg = "")
  {
    var psi = new System.Diagnostics.ProcessStartInfo
    {
      FileName = spath,
      UseShellExecute = true
    };
    try {
      System.IO.FileAttributes attr = System.IO.File.GetAttributes(spath);
      if ((attr & System.IO.FileAttributes.Directory) == System.IO.FileAttributes.Directory) {
          System.Diagnostics.Process.Start(psi);
      } else {
        if (sargs.Trim().Length !=0) {
          psi.Arguments = sargs;
        }
        System.Diagnostics.Process.Start(psi);
      }
    } catch {
      psi.UseShellExecute = false;
      System.IO.FileAttributes attr = System.IO.File.GetAttributes(spath);
      if ((attr & System.IO.FileAttributes.Directory) == System.IO.FileAttributes.Directory) {
          System.Diagnostics.Process.Start(psi);
      } else {
        if (sargs.Trim().Length !=0) {
          psi.Arguments = sargs;
        }
      System.Diagnostics.Process.Start(psi);
    }
  }

Since this vexed me for a little bit – I’m putting this here so I don’t forget.

tr

MarcEdit 7.5/MarcEdit Mac 3.5 Work

By reeset / On / In MarcEdit

Every year, around this time, I try to dedicate significant time to address any large project work that may have been percolating around MarcEdit.  This year will be no different.  Over the past 4 months, I’ve been working on moving MarcEdit away from the .NET 4.7.2 Framework to .NET Core 3.1.  There a lot of reasons for looking at this, the most important being that this is the direction Microsoft is taking the framework – a move to unify the various .NET development platforms to make distribution and maintenance easier.  Well, with the release of .NET 5 this Nov., all the tools I need to officially make this transition are now in place.

So, over the next two months, I’ll be working on shifting MarcEdit away from Framework 4.7.2 and to .NET 5.  I believe this will be possible – I only have concerns about two libraries that I rely on – and if I have to, both are open source so I can look at potentially spending time helping the project maintainers target a non-framework build.  My hope is to have a working version of MarcEdit using NET 5 by Thanksgiving that I can start unit testing and testing locally. 

Of course, with this change, I’ll also have to change the installer process.  The reason is that this transition will remove the necessity of having to have .NET installed on one’s machine.  One of the changes to the framework is the ability to publish self contained applications – allowing for faster startup and lower memory usage.  This is something I’m excited about as I currently move slowly updating build frameworks due to the need to have these frameworks installed locally.  By removing that dependency, I’m hoping to be able to take advantages of changes to the C# language that make programming easier and more efficient, while also allowing me to remove some of the work around code I’ve had to develop to account for bugs or limitations in previous frameworks.

Finally, this change is going to simplify a lot of cross platform development – and once the initial transition has occurred, I’ll be spending time working on expanding the MarcEdit MacOS version.  There are a couple of areas where this program still lacks parity in relation to the Windows version, and these changes will give me the opportunity to close many of these gaps. 

–tr

MarcEdit: Identifying Invalid UTF-8 Data in MARC Records

By reeset / On / In MarcEdit

The fifth circle, illustrated by Stradanus

Ah Dante – if only he had been a librarian.  I’m almost certain that had the divine comedy been written by a cataloger – character encodings and those that mangle them – would definitely make an appearance.  I can almost see the story in my head.  Our wayward traveler, confused when our guide, Virgil, comments on the unholy mess libraries, vendors, and tool writers in general have made of the implementation of UTF-8 across the library spectrum – takes us to the 5th circle of hell filled with broken characters and undefined character boxes.  But spend anytime working in metadata management today, and the problems of mixed Unicode normalizations, the false equivalency of ISO-8859-2 and UTF-8 (especially by vendors that server Western European markets), lackluster font development, and applications and programming languages that quietly and happily mangle UTF-8 data as part of general course – and you can suddenly see why we might make a stop at the lake of fire and eternal damnation.

Within MarcEdit, one of the hardest things that the application does is attempt to correct and normalize character encodings across the various known codepoints.  This isn’t super easy – especially when our MARC forepersons made that fateful decision to create MARC-8, a 100% imaginary character encoding only (kind of) supported within the Library community and software.  These kinds of decisions, and the desire to maintain legacy compatibility, has haunted our metadata and made working with it immensely complicated.  Sometimes, these complications can be managed, other times, they are so gruesomely mangled that Brutus, himself, would cry yield.  That’s what this new option will attempt to help remediate.

Through the years, I’ve often helped individuals come up with a wide variety of ways to identify invalid UTF-8 characters that litter library records.  Sometimes, this can be straightforward, but more often, it’s not.  To that end, I’ve attempted to provide a couple of tools that will hopefully help to identify and support some kind of remediation for catalogers haunted by the specter of bad data.

Identification

The first enhancement comes in the MARCValidator.  When validating a record against the rules file, the tool will automatically attempt to determine if UTF-8 data (if present) found within a record is valid.  If not, the information will be presented as a warning – identifying the field, record number, and data where the invalid data was identified.

Image

By facilitating a process to identify invalid UTF-8 record data within the validator – the idea is that this will empower catalogers looking to take a more active role in rooting out bad diacritical data before a record is loaded into the catalog and  made available to the public.

Removing bad data

In addition to identification, I’ve added three new options to give users different options for dealing with invalid character data.

Delete Subfields

Added to the Edit Subfield Utility – I’ve included an option to evaluate and delete a subfield if invalid character set data is encountered.

Image

Delete Fields

Added to the Add/Delete Field Utility – I’ve included an option to evaluate and delete a field if invalid character set data is encountered.

Image

Delete Records

Added to the Delete Records tool within the MarcEditor – I’ve included an option to delete a record if a field or field group has been identified as having invalid character set data.  Additionally, this tool will create a second file in the same directory as the file being processed, that will contain the deleted records in a file structured as: [name of original file]_bad_yyyyMMddhhmmss.mrk

Caveat Emptor

Hopefully, the above sounds useful.  I think it will be.  There have been many times where I wish I had these tools readily at my fingertips.  If it were only this easy.  I believe I mentioned above….encodings are difficult.  The Unicode specification is constantly changing, and identifying invalid characters is definitely more art than science in many cases.  There are tools and established algorithms.  I use these approaches.  I’m also leveraging a method with the .NET Framework — CharUnicodeInfo.GetUnicodeCategory – which attempts to take a character and break it down into its character classification.  When a character isn’t classified – that’s usually a good indicator that it’s not valid.  But this process won’t catch everything – but it hopefully will provide a good starting point for users vexed with these issues and in need of a tool in their toolbox to attempt to remediate them.

Conclusion

My hope is that these new options will give catalogers a little more control and insight into their records – specifically given how invisible character encoding issues often are.  And maybe too, by shedding light on this most vexing of issues, I can buy myself a little less time in cataloging purgatory as I’m sure there will come a point, somewhere, sometime, where my own contributions to keeping MARC alive and active will be held to account.

These new options will show up in MarcEdit and MarcEdit Mac in versions 7.2.210 (Windows) and 3.2.100 (Mac).

Questions, let me know.

–tr

[1] The fifth circle, illustrated by Stradanus (https://en.wikipedia.org/wiki/Inferno_(Dante)#/media/File:Stradano_Inferno_Canto_08.jpg)

MarcEdit 7.2.200

By reeset / On / In MarcEdit

I’ve worked on a number of updates this weekend– here is the list:

UI Changes

I’ve removed the quick links on the front page, and changed this to a list of selectable topics.  This will make it easier for me to add to this list.

image

I’ve added a new Quick Access button to the top ribbon.  At this point, this isn’t configurable.  Will work to make it configurable later.

image

These Quick Access items have been added to the Marc Tools window – with the removal of the old quick links as well.

image

Network Changes

MarcEdit uses .Net 4.7.2.  Internally, the tool has traditionally used the HTTPWebRequest Assembly.  Accessing this assembly directly has been deprecated, with the preferred method shifting to the System.Net.Http Assembly.  This is object is thread-safe and works natively with the System.Threading.Tasks structure.  This also has the benefit of allowing me to allow .NET to gracefully support older TLS standards, which isn’t the default.  By default, .NET selects support for the default TLS instance utilized by the operating system and disables older standards.  This is problematic – and these changes will give me more control over which TLS instances are supported and how fallback is supported.  This required updating 9 assemblies.

MarcEditor Changes

Bug Fix:  When Opening mrc records into the MarcEditor, a memory leak can occur with large files.  I’ve corrected this.

Bug Fix: MarcEdit uses a custom created control that allows the tool to select the most current version of the Richtext library when showing the MarcEditor.  In .NET 4.7 – there appears to have been behavior change, in the that names used to register classes in Windows needed to be all upper case.  If they weren’t then an error would be thrown when mixing the enhanced control and the .NET frameworks default Richtextbox control (which uses the older richtext library).  For example: if internally, the enhanced control used RichEdit5W and then the Richtextbox was used, the program would throw an error.  This wasn’t a problem in MarcEdit, because I only use the enhanced control, but users that may create plugins against MarcEdit may experience issues.  The correction is the use uppercase text to normalize class names now used by .NET 4.7+ (Example: RICHEDIT5W).

Z39.50/SRU Changes

Enhancement: Cleaned up some code related to how records display inside the Results Viewer when pulling non-MARC data.

Validate Headings

Behavior Change: Check $a Only with Subjects.  When working with 60x or 610– this setting doesn’t work like folks might expect.  This is because names often include additional information that must be provided or false variants can be noted.  When working with 60x or 610 data – the program will now include all subfields used when validating the 1xx fields and update data with variants accordingly.  When $a isn’t selected, then the tool will utilize all fields noted as used for validation in the rules file.  This is a behavior change, but likely more in line with the expectations that I’m guess most folks have when using the $a option.

Behavior Change: When changing variants – it appears that multiple $a’s would be placed.  I’m not sure if there was a change on the source record side or not – so instead, I just updated the code to ensure that the tool validated specific data before making updates.

–tr

Build New Field Changes

By reeset / On / In MarcEdit

** Updated: Official Help page in the KB: https://marcedit.reeset.net/build-new-field

This isn’t going to meet all the use cases I’ve seen – but this should address the most common question that comes up – the ability to have the build new field generate multiple fields.

The process will be based on the presence or lack of a new element in the pattern – a variable marker that will MarcEdit uses internally to hold an internal variable.

Example:

=040  \\$aMiU$cMiU

=040  \\$aBDS$beng$cBDS$dOCLCQ$dABCU

=041  \\$aengrusger

=043  \\$ae-gx—$ae-uk—$an-us—

=090  \\$aTK1005$b(INTERNET) $c[UK.]

Say we have these fields – and the pattern I want to create is a 999 field, and in that field, I want to create a new 999 field for each 040$a – but I would also like to have the 090$a to be a part of the pattern.

The new pattern would look like this:

=999  \\$a{040$a[x]} : {090$a}

This pattern would generate the following results:

=999  \\$aMiU : TK1005

=999  \\$aBDS : TK1005

If I changed the pattern to:

=999  \\$a{040$a} : {090$a}

The program falls back to use the current functionality (only one field is created).

Please note, you cannot ask for a specific 040 to be used (outside of using find/reg functions inside the pattern) – the data inside the [x] isn’t an integer you can set.  It is a value that indicates to MarcEdit that the subfield should be tracked and multiple fields are desired.

The [x] syntax works both after the subfield or after the field number, with data being scoped based on the location of the [x].  Any other value other than [x] will likely result in inconsistent results.  The [x] bracket is a reserved element within the field to indicate that multiple field generation is desired, and to tell the program to tokenize the data marked.

Finally – the tool placed data in the index range of the new field being generated.  So, consider this example:

=040  \\$aMiU$cMiU

=040  \\$aBDS$beng$cBDS$dOCLCQ$dABCU

=041  \\$aengrusger

=043  \\$ae-gx—$ae-uk—$an-us—

=090  \\$aTK1005$b(INTERNET) $c[UK.]

If I used the following pattern:

=999  \\$a{040$a[x]} : {090$a[x]}

The expected results would be:

=999  \\$aMiU : TK1005

=999  \\$aBDS :

Why?  Because the tool will slot values marked with the multi-field value [x] into the same field groups.  Since only one 090$a exists, the tool only updates the field group that it belongs.  However, if I had the following data:

=040  \\$aMiU$cMiU

=040  \\$aBDS$beng$cBDS$dOCLCQ$dABCU

=041  \\$aengrusger

=043  \\$ae-gx—$ae-uk—$an-us—

=090  \\$aTK1005$b(INTERNET) $c[UK.]

=090  \\$aG24211$b(INTERNET)

And used this pattern:

=999  \\$a{040$a[x]} : {090$a[x]}

I would expect the following result:

=999  \\$aMiU : TK1005

=999  \\$aBDS : G24211

Again – internally, MarcEdit is creating tokens of data with the [x] and placing them within the same scope.  So, the tool would create new fields, placing data within the same scope onto the new fields.

I started making these changes with the last update – and have finished updating the tokenization algorithms so that the tracking of the data is correct.  I’ll be turning this new option on with the next update – and across both the Windows and Mac version.

Since the presence of the [x] is necessary to turn on the multi-field generation, any existing patterns within tasks shouldn’t be impacted by the changes.  They will work as they had previously.  Only patterns with the new [x] structure will activate the new processing logic.