CATQC AND SHELF-READY MATERIAL | JAY, SIMPSON, AND SMITH 41 Michael Jay ([e-mail?]) is Information Technology Expert, Software Unit, Information Technology Department; Betsy Simpson is Chair, Cataloging and Metadata Department; and Doug Smith is head, Copy Cataloging Unit, Cataloging and Metadata Department, george A. Smathers Libraries, University of Florida, gainesville. Michael Jay, Betsy Simpson, and Doug Smith CatQC and Shelf-Ready Material: Speeding Collections to Users While Preserving Data Quality Libraries contract with vendors to provide shelf-ready material, but is it really shelf-ready? It arrives with all the physical processing needed for immediate shelving, then lingers in back offices while staff conduct item- by-item checks against the catalog. CatQC, a console application for Microsoft Windows developed at the University of Florida, builds on OCLC services to get material to the shelves and into the hands of users with- out delay and without sacrificing data quality. Using standard C programming, CatQC identifies problems in MARC record files, often applying complex condition- als, and generates easy-to-use reports that do not require manual item review. A primary goal behind improvements in technical service workflows is to serve users more efficiently. However, the push to move material through the system faster can result in shortcuts that undermine bib- liographic quality. Developing safeguards that maintain sufficiently high standards but don’t sacrifice productiv- ity is the modus operandi for technical service managers. The implementation of OCLC’s WorldCat Cataloging Partners (WCP, formerly PromptCat) and Bibliographic Record Notification services offers an opportunity to retool workflows to take advantage of automated pro- cesses to the fullest extent possible, but also requires some backroom creativity to assure that adequate access to material is not diminished. n Literature review Quality control has traditionally been viewed as a central aspect of cataloging operations, either as part of item-by- item handling or manual and automated authority main- tenance. How this activity has been applied to outsourced cataloging was the subject of a survey of academic librar- ies in the United States and Canada. A total of 19 percent of libraries in the survey indicated that they forgo quality control of outsourced copy, primarily for government documents records. However, most respondents reported they review records for errors. Of that group, 50 percent focus on access points, 30 percent check a variety of fields, and a significant minority—20 percent—look at all data points. Overall, the libraries expressed satisfaction with the outsourced cataloging using the following measures of quality supplied by the author: accuracy, consistency, adequacy of access points, and timeliness.1 At the incep- tion of OCLC’s PromptCat service in 1995, Ohio State University Libraries participated in a study to test similar quality control criteria with the stated goals of improving efficiency and reducing copyediting. The results were so favorable that the author speculated that PromptCat would herald a future where libraries can “reassess their local practices and develop greater confidence in national standards so that catalog records can be integrated into local OPACs with minimal revision and library hold- ings can be made available in bibliographic databases as quickly as possible.”2 Fast forward a few years and the new incarnation of PromptCat, WCP, is well on its way to fulfilling this dream. In a recent investigation conducted at the University of Arkansas Libraries, researchers concluded that error review of copy supplied through PromptCat is necessary, but the error rate does not warrant discontinuance of the service. The benefits in terms of time savings far outweigh the effort expended to correct errors, particularly when the focus of the review is to correct errors critical to user access. While the researchers examined a wide variety of errors, a primary consideration was series headings, particularly given the problems cited in previous studies and noted in the article.3 With the 2006 announcement by the Library of Congress (LC) to curtail its practice of providing controlled series access, the cataloging community voiced great con- cern about the effect of that decision on user access.4 The Arkansas study determined that “the significant number of series issues overall (even before LC stopped perform- ing series authority work) more than justifies our concern about providing series authority control for the shelf-ready titles.” Approximately one third of the outsourced copy across the three record samples studied had a series, and, of that group, 32 percent needed attention, predominantly taking the form of authority record creation with associ- ated analysis and classification decisions.5 The overwhelming consensus among catalogers is that error review is essential. As far as can be determined, an underlying premise behind such efforts seems to be that it is done with the book in hand. But could there be a way to satisfy the concerns without the book in hand? Certainly, validation tools embedded in library management sys- tems provide protections whether records are manually entered or batchloaded, and outsourced authority main- tenance services (for those who can use them) offer fur- ther control. But a customizable tool that allows libraries to target specific needs, both standards-based and local, without relying on item-by-item handling can contribute Michael Jay (emjay@ufl.edu) is Information Technology Expert, Software Unit, Information Technology Department; Betsy Simpson (betsys@uflib.ufl.edu) is Chair, Cataloging and Metadata Department; and Doug Smith (dougsmith@uflib.ufl .edu) is head, Copy Cataloging Unit, Cataloging and Metadata Department, george A. Smathers Libraries, University of Florida, gainesville. 42 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 to an economy of scale demanded by an environment with shrinking budgets and staff to devote to manual bibliographic scrutiny. If that tool is viewed as part of a workflow stream involving local error detection at the receiving location as well as enhancement at the network level (i.e., OCLC’s Bibliographic Record Notification service), then it becomes an important step in freeing catalogers to turn their attention to other priorities, such as digitized and hidden collections. n Local setting and workflow The George A. Smathers Libraries at the University of Florida encompasses six branches that address the infor- mation needs of a diverse academic research campus with close to fifty thousand undergraduate and gradu- ate students. The Technical Services Division, which includes the Acquisitions and Licensing Department and the Cataloging and Metadata Department, acquires and catalogs approximately forty thousand items annu- ally. Seeking ways to minimize the handling of incoming material, beginning in 2006 the departments developed a workflow that made it possible to send shelf-ready incoming material directly to the branches after check-in against the invoice. Shelf-ready items represent approxi- mately 30 percent of the Libraries’ purchased mono- graphic resources at this time. By using WCP record loads along with vendor-supplied shelf-ready process- ing, the time from receipt to shelf has been reduced significantly because it is no longer necessary to send the bulk of the shipments to Cataloging and Metadata. Exceptions to this practice include specific categories of material that require individual inspection. The vendor is asked to include a flag in books that fall into many of these categories: n any nonprocessed book or book without a spine label n books with spine labels that have numbering after the date (e.g., vol. 4, no. 2) n books with CDs or other formats included n books with loose maps n atlases n spiral-bound books n books that have the words “annual,” “biennial,” or a numeric year in the title (these may be a serial add to an existing record or part of a series that will be established during cataloging) To facilitate a post–receipt record review for those items not sent to Cataloging and Metadata, Acquisitions and Licensing runs a local programming tool, CatQC, which reports records containing attributes Cataloging and Metadata has determined necessitate closer exami- nation. Figure 1 is an example of the reports generated, which are viewed using the Mozilla Firefox browser. Copy catalogers rotate responsibility for checking the report and revising records when necessary. Retrieval of the physical piece is only necessary in the 1 percent of cases where the item needs to be relabeled. n CatQC report CatQC analyzes the content of the WCP record file and identifies records with particular bibliographic coding, which are used to detect potential problems: 1. encoding levels 2, 3, 5, 7, E, J, K, M 2. 040 with non-English subfield b 3. 245 fields with subfields h, n, or p 4. 245 fields with subfields a or b that contain numerals 5. 245 fields with subfields a or b that contain red flag keywords 6. 246 fields 7. 490 fields with first indicator 0 8. 856 fields without subfield 3 9. 6xx fields with second indicators 4, 5, 6, and 7 The numbers following each problem listed below indicate which codes are used to signal the presence of a potential problem. Minimal-level copy (1) The library’s WCP profiles, currently in place for three vendors, are set up to accept all OCLC encoding levels. With such a wide-open plan, it is important to catch records with minimal-level copy to assure that appro- priate access points exist and are coded correctly. The library encounters these less-than-full encoding levels infrequently. Parallel records (2) CatQC identifies foreign library records that are candi- dates for parallel record treatment by indicating in the report if the 040 has a non-English subfield b. The report includes a 936 field if present to alert catalogers that a parallel record is available. volume sets (3, 4, 5) The library does not generally analyze the individual volumes of multipart monographic sets (i.e., volume sets) even when the volumes have distinctive titles. These CATCQ AND SHELF-READY MATERIAL | JAY, SIMPSON, AND SMITH 43 “volume,” “part,” and “number” as well as common abbreviations of those words (e.g., v. or vol.). Serial vs. monograph treatment (4, 5) Titles owned by the library and classi- fied as serials sometimes are ordered inadvertently as monographs, result- ing in the delivery of a monographic record. A similar problem also occasion- ally arises with new titles. By detecting numerals, keywords, or the presence of one or more of the subfields in the 245 field, we can quickly scan a list of records with these characteristics. Of course, most of the records detected by CatQC are false hits because of the broad scope of the search; however, it takes only a few minutes to scan through the record list. Non-print formats (3) The library does not receive records for any format other than print through WCP. Consequently, detecting the presence of a subfield h in the 245 field is a good signal that there may be a problem with the record. Alternate titles (6) Alternate titles can be an important access point for library users. Sometimes text that should properly be in subfield i (e.g., “at head of title”) of the 246 field is placed in subfield a in front of the alternate title. This adversely affects user access to the title through browse searching. CatQC checks for and reports the presence of a 246 field. The cataloger can then quickly confirm that it is coded correctly. untraced series (7) As a Program for Cooperative Cataloging (PCC) partici- pant, the library opted to follow PCC practice to continue to trace series despite LC’s decision in 2006 to treat as untraced all series statements in newly cataloged records. Because some libraries chose to follow LC in its decision, there has been an overall increase in the use of untraced series statements across all types of record-encoding volumes are added to the collection under the title of the set. The June 2006 decision by LC to produce individual volume records when a distinctive title exists caused concern about the integrity of the Libraries’ existing open volume set records. Because such records typically have enumeration indicated in the subfield n, and sometimes p, of the 245 field, the program searches for instances of those subfields. In addition, the program detects the presence of numerals in the 245 and keywords such as Figure 1. An Example Report from CatCQ 44 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 levels. To address this issue, CatQC searches all WCP records for 490 fields with first indicator 0. Catalogers check the authority files for the series and make any necessary changes to the records. This is by far the most frequent correction made by catalogers. Links (8) To provide users with information about the nature of the URLs displayed in the catalog, catalogers insure that explanatory text is recorded in subfield 3 of the 856 field. CatQC looks for the absence of subfield 3, and, if absent, displays the 856 field in the report as a hyperlink. The cata- loger adds the appropriate text (e.g., full text) as needed. Subject headings with second indicators 4, 5, 6, and 7 (9) The CatQC report reviewed by catalogers includes sub- ject headings with second indicator 4. When these head- ings duplicate headings already on the record, catalogers delete them from our local system. When the headings are not duplicates, the catalogers change the second indi- cator 4 to 0. Typically, 6xx fields with second indicators 5, 6, and 7 contain non-English headings based on foreign thesauri. These headings can conflict with LC headings and, in some cases, are cross references on LC authorities. The resulting split files are not only confusing to patrons, but also add to the numbers of errors reported that require authority maintenance. For these reasons, our policy is to delete the headings from our local system. CatQC detects the presence of second indicators 5, 6, or 7 and creates a modified file with the headings removed with one excep- tion: A heading with second indicator 7 and subfield 2 of “nasat,” which indicates the heading is taken from the National Aeronautics and Space Administration the- saurus, is not removed because the local preference is to retain the “nasat” headings. n Library-specific issues CatQC resolves local problems when needed. For exam- ple, when more than one LC call number was present on the record, the WCP spine manifest sent to the ven- dor used to contain the second call number, which was affixed to the item. When the WCP records were loaded into the library’s catalog, the first call number populated the holding. As a result, there was a discrepancy between the spine label on the book and the call number in the catalog. Prior to generating the report, CatQC found mul- tiple instances of call numbers in the records in the WCP file and created a modified file with the call numbers reordered so that the correct call number was used on the holding when the record was loaded. Previously, the library’s OPAC did not display the text in subfield 3 of the 856 field, which specifies the type of material covered by the link, and to the user it appeared that the link was to a full-text resource. This was par- ticularly troublesome for records with LC links to table of contents, publisher descriptions, contributor information, and sample text. To prevent user frustration, CatQC was programmed to move the links on the WCP records to 5XX fields. When the OPAC interface improved and the pro- gramming was no longer necessary, CatQC was revised. n Analysis To see how well CatQC and OCLC’s Bibliographic Notification service were meeting our goal of maintain- ing high-quality bibliographic control, 63 reports were randomly selected from the 171 reports generated by CatQC between October 2007 and April 2008. CatQC found no problems in twelve (19 percent) of the selected reports. These twelve were not used in the analysis, leav- ing fifty-one CatQC reports examined with at least one potential problem flagged for review. An average of 35.6 percent of the records in the sample of reports was flagged as requiring review by a cataloger. An average of thirteen possible problems was detected per report. Of these, 55 percent were potential problems requiring at least some attention from the cataloger. The action required of the cataloger varied from simply check- ing the text of a field displayed in the report (e.g., 246 fields) to bringing up the record in Aleph and editing the bibliographic record (e.g., verifying and correcting series headings or eliminating unwanted subject headings). Why the relatively high rate of false positives (45 per- cent)? To minimize missing serials and volumes belong- ing to sets, CatQC is designed to err on the side of caution. Two of the criteria listed earlier were responsible for the vast majority of the false positives generated by CatQC: 245 fields with subfields a or b that contain numerals and 245 fields with subfields a or b that contain red-flag keywords. Clearly, if every record with a numeral in the 245 is flagged, a lot of hits will be generated that are not actual problems. The list of keywords was purposefully designed to be extensive. For example, “volume,” “vol.,” and “v.” are all triggers causing a record to be flagged. Therefore a bibliographic record containing the phrase “Volume Cost Profit Analysis” in the 245 field would be flagged as a potential problem. At first glance, a report filled with so many false posi- tives may seem inefficient and burdensome for catalogers to use; however, this is largely mitigated by the excellent display format. The programmer worked closely with CATCQ AND SHELF-READY MATERIAL | JAY, SIMPSON, AND SMITH 45 the Copy Cataloging Unit staff to develop a user-friendly report format. Each record is framed separately, making it easy to distinguish from adjoining records. Potential problems are highlighted with red lettering immediately alerting catalogers to what the potential problem might be. Whenever a potential problem is found, the text of the entire field appears in the report so that catalogers can see quickly whether the field triggering the flag is an actual problem. It takes a matter of seconds to glance through the 245 fields of half a dozen records to see if the numeral or keyword detected is a problem. The catalogers who work with these reports estimated that it took them between two and three hours per month to both review the files and make corrections to bibliographic records. A second component of bibliographic quality main- tenance is OCLC’s Bibliographic Record Notification service. This service compares newly upgraded OCLC records with records held by the library and delivers the upgraded records to the library. Because CatQC flags records with encoding levels of 2, 3, 5, 7, E, J, K, and M, it was possible to determine if these records had, in fact, been upgraded in OCLC. In the sample, thirty-three records were flagged because of the encoding level. No upgrade had been made to 21.2 percent of the records in OCLC as of August 2008. Upgrades had been made to 45.5 percent of the records. The remaining 33.3 percent of the records were manu- ally loaded by catalogers in Copy Cataloging. These typi- cally are records for items brought to Copy Cataloging by Acquisitions and Licensing because they meet one or more of the criteria for individual inspection discussed previously. When catalogers search OCLC and find that the received record has not been upgraded, they search for another matching record. A third of the time, a record of higher quality than that received is found in OCLC and exported to the catalog. The reason why the record of better quality is not harvested initially is not clear. It is possible that at the time the records were harvested both records were of equivalent quality and by chance one was enhanced over another. In no instance had any of the records originally harvested been upgraded (this is not reflected in the 21.2 percent of records not upgraded). Encoding level 8 records are excluded from CatQC reports. Because of the relatively quick turnaround for upgrades of this type of copy, the library decided to rely solely on the Bibliographic Record Notification service. n Technical specifications CatQC is a console application for Windows. Written in standard C, it is designed to be portable to multiple oper- ating systems with little modification. No graphic inter- face was developed because (a) the users are satisfied with the current operating procedure and (b) the treat- ment of the records is predefined as a matter of local policy. The user opens a command console (cmd.exe) and types “catqc”+Space+“[name of MARC file]”+Enter. The corrected file is generated; CatQC analyzes the modified file and creates the XML report. It moves the report to a reviewing folder on a file server across the LAN and indicates to the user that it is terminating. Modifications require action by a programmer; the user cannot choose from a list of options. Benefits include a 100 kb file size and a processing speed of approximately 1,000 records per second. No quantitative analysis has yet been done related to the speed of processing, but to the user the entire process seems nearly instantaneous. The genesis of the project was an interest in the record structure of MARC files brought about in the program- mer by the use of earlier local automation tools. The proj- ect was speculative. The first experiment contained the programming structure that would become CatQC. One record is read into memory at a time, and there is another array held for individual MARC fields. Conceptually, the records are divided into three portions—leader, directory, and dataset—when the need arises to build an edited record. Initially there was no editing, only the production of the report. The generation of strict, valid XML is a significant aspect of CatQC. An original document type was created, along with a corresponding Cascading Style Sheet. The reports are viewable to anyone with an XML–capable browser either through file server, Web server, or e-mail. (The current version of Internet Explorer does not fully support the style sheet syntax.) This continues to be con- venient for the report reviewers because they do not have to be client application operators. See appendix A for an excerpt of a document instance and appendix B for the document type definition. CatQC is not currently a generalized tool such as MarcEdit, a widely used MARC editing utility that pro- vides a standard array of basic capabilities: field count- ing, field and subfield deletion (with certain conditional checks), field and subfield additions, field swapping and text replacement, and file conversion to and from vari- ous formats such as MARCXML and Dublin Core as well as between MARC-8 and UTF-8 encodings.6 MarcEdit continues to grow and does offer programmability that relies on the Windows Scripting Host. This requires the user to either learn VBScript or use the wizards offered by MarcEdit. The CatQC development goal was to create a report, viewable through a LAN or the Internet, which alerts a group of catalogers to potential problems with spe- cific records, often illustrating those problems. Although it might have been possible to use a combination of MarcEdit capabilities and local programming to help achieve this goal, it likely would have been a more cumbersome route, particularly taking into consideration the multidimensional 46 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 conditionals desired. It was deemed easier to write a pro- gram that addresses local needs directly in a language already familiar to the programmer. As CatQC evolved, it was modified to identify more potential problems and to do more logical comparisons as well as to edit the files as necessary before generating the reports. CatQC addresses a particular workflow directly and provides one solution. It is procedural as opposed to event driven or object oriented. With version 1.3, the generic functions were extracted into a marclib 1.0, a Common Object File Format library. Functions specific to local workflow remain in CatQC. The program is freely available to interested libraries by contacting the authors. As of this writing, the University of Florida plans to dis- tribute this utility under the GNU Public License version 3 (see www.opensource.org/licenses/gpl-3.0.html) while retaining copyright. n Conclusion CatQC provides catalogers an easy way to check the bibliographic quality of shelf-ready material without the book in hand. As a result, throughput time from receipt to shelf is reduced, and staff can focus data review on problem areas—those affecting access or interfering with local processes. Some of the issues addressed by CatQC are of concern to all libraries while others reflect local preferences. The program could be easily modified to conform to those preferences. Automation tools such as CatQC are of key importance to libraries seeking ways to streamline workflows to the benefit of users. References and notes 1. Vinh-The Lam, “Quality Control Issues in Outsourcing Cataloging in United States and Canadian Academic Libraries,” Cataloging & Classification Quarterly 40, no. 1 (2005): 101–22. 2. Mary M. Rider, “PromptCat: A Projected Service for Automatic Cataloging—Results of a Study at the Ohio State University Libraries,” Cataloging & Classification Quarterly 20, no. 4 (1995): 43. 3. Mary Walker and Deb Kulczak, “Shelf-Ready Books using PromptCat and YBP: Issues to Consider (An Analysis of Errors at the University of Arkansas),” Library Collections, Acquisitions, & Technical Services 31, no. 2 (2007): 61–84. 4. “LC Pulls Plug on Series Authority Records,” Cataloging & Classification Quarterly 43, no. 2 (2006): 98–99. 5. Walker and Kulczak, “Shelf-ready books.” 6. For more information about MarcEdit, see http://oregon state.edu/~reeset/marcedit/html/index.php. WCP File Analysis: 201 records analyzed. Record: 71 OCLC Number: 243683394 Timestamp: 20080824000000.0 245: 10 |a Difference algebra /|c Levin Alexander. 245 h 245 n 245 p numerals keywords APPENDIx A. CatQC Document Instance Excerpt CATCQ AND SHELF-READY MATERIAL | JAY, SIMPSON, AND SMITH 47 490: 0 |a Algebras and applications ;|v v. 8 . . . APPENDIx B. CatQC Document Type Definition 48 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009