id sid tid token lemma pos 12235 1 1 Evaluating evaluate VBG 12235 1 2 the the DT 12235 1 3 Impact Impact NNP 12235 1 4 of of IN 12235 1 5 the the DT 12235 1 6 Long Long NNP 12235 1 7 - - HYPH 12235 1 8 S S NNP 12235 1 9 upon upon IN 12235 1 10 18th 18th JJ 12235 1 11 - - HYPH 12235 1 12 Century century NN 12235 1 13 Encyclopedia Encyclopedia NNP 12235 1 14 Britannica Britannica NNP 12235 1 15 Automatic Automatic NNP 12235 1 16 Subject Subject NNP 12235 1 17 Metadata Metadata NNP 12235 1 18 Generation generation NN 12235 1 19 Results result VBZ 12235 1 20 ARTICLES articles NN 12235 1 21 Evaluating evaluate VBG 12235 1 22 the the DT 12235 1 23 Impact Impact NNP 12235 1 24 of of IN 12235 1 25 the the DT 12235 1 26 Long Long NNP 12235 1 27 - - HYPH 12235 1 28 S S NNP 12235 1 29 upon upon IN 12235 1 30 18th 18th JJ 12235 1 31 - - HYPH 12235 1 32 Century century NN 12235 1 33 Encyclopedia Encyclopedia NNP 12235 1 34 Britannica Britannica NNP 12235 1 35 Automatic Automatic NNP 12235 1 36 Subject Subject NNP 12235 1 37 Metadata Metadata NNP 12235 1 38 Generation generation NN 12235 1 39 Results result VBZ 12235 1 40 Sam Sam NNP 12235 1 41 Grabus Grabus NNP 12235 1 42 INFORMATION INFORMATION NNP 12235 1 43 TECHNOLOGY TECHNOLOGY NNP 12235 1 44 AND and CC 12235 1 45 LIBRARIES LIBRARIES NNP 12235 1 46 | | NNP 12235 1 47 SEPTEMBER SEPTEMBER NNP 12235 1 48 2020 2020 CD 12235 1 49 https://doi.org/10.6017/ital.v39i3.12235 https://doi.org/10.6017/ital.v39i3.12235 NN 12235 1 50 Sam Sam NNP 12235 1 51 Grabus Grabus NNP 12235 1 52 ( ( -LRB- 12235 1 53 smg383@Drexel.edu smg383@Drexel.edu NNP 12235 1 54 ) ) -RRB- 12235 1 55 is be VBZ 12235 1 56 an an DT 12235 1 57 Information Information NNP 12235 1 58 Science Science NNP 12235 1 59 PhD phd NN 12235 1 60 Candidate candidate NN 12235 1 61 at at IN 12235 1 62 Drexel Drexel NNP 12235 1 63 University University NNP 12235 1 64 ’s ’s NNP 12235 1 65 College College NNP 12235 1 66 of of IN 12235 1 67 Computing Computing NNP 12235 1 68 and and CC 12235 1 69 Informatics Informatics NNP 12235 1 70 , , , 12235 1 71 and and CC 12235 1 72 Research Research NNP 12235 1 73 Assistant Assistant NNP 12235 1 74 at at IN 12235 1 75 Drexel Drexel NNP 12235 1 76 ’s ’s POS 12235 1 77 Metadata Metadata NNP 12235 1 78 Research Research NNP 12235 1 79 Center Center NNP 12235 1 80 . . . 12235 2 1 This this DT 12235 2 2 article article NN 12235 2 3 is be VBZ 12235 2 4 the the DT 12235 2 5 2020 2020 CD 12235 2 6 winner winner NN 12235 2 7 of of IN 12235 2 8 the the DT 12235 2 9 LITA LITA NNP 12235 2 10 / / SYM 12235 2 11 Ex Ex NNP 12235 2 12 Libris Libris NNP 12235 2 13 Student Student NNP 12235 2 14 Writing Writing NNP 12235 2 15 Award Award NNP 12235 2 16 . . . 12235 3 1 © © NNP 12235 3 2 2020 2020 CD 12235 3 3 . . . 12235 4 1 ABSTRACT ABSTRACT NNP 12235 4 2 This this DT 12235 4 3 research research NN 12235 4 4 compares compare VBZ 12235 4 5 automatic automatic JJ 12235 4 6 subject subject JJ 12235 4 7 metadata metadata NN 12235 4 8 generation generation NN 12235 4 9 when when WRB 12235 4 10 the the DT 12235 4 11 pre-1800s pre-1800s , 12235 4 12 Long long JJ 12235 4 13 - - HYPH 12235 4 14 S S NNP 12235 4 15 character character NN 12235 4 16 is be VBZ 12235 4 17 corrected correct VBN 12235 4 18 to to IN 12235 4 19 a a DT 12235 4 20 standard standard NN 12235 4 21 < < XX 12235 4 22 s s NNP 12235 4 23 > > XX 12235 4 24 . . . 12235 5 1 The the DT 12235 5 2 test test NN 12235 5 3 environment environment NN 12235 5 4 includes include VBZ 12235 5 5 entries entry NNS 12235 5 6 from from IN 12235 5 7 the the DT 12235 5 8 third third JJ 12235 5 9 edition edition NN 12235 5 10 of of IN 12235 5 11 the the DT 12235 5 12 Encyclopedia Encyclopedia NNP 12235 5 13 Britannica Britannica NNP 12235 5 14 , , , 12235 5 15 and and CC 12235 5 16 the the DT 12235 5 17 HIVE HIVE NNP 12235 5 18 automatic automatic JJ 12235 5 19 subject subject JJ 12235 5 20 indexing indexing NN 12235 5 21 tool tool NN 12235 5 22 . . . 12235 6 1 A a DT 12235 6 2 comparative comparative JJ 12235 6 3 study study NN 12235 6 4 of of IN 12235 6 5 metadata metadata NN 12235 6 6 generated generate VBN 12235 6 7 before before IN 12235 6 8 and and CC 12235 6 9 after after IN 12235 6 10 correction correction NN 12235 6 11 of of IN 12235 6 12 the the DT 12235 6 13 Long Long NNP 12235 6 14 - - HYPH 12235 6 15 S S NNP 12235 6 16 demonstrated demonstrate VBD 12235 6 17 an an DT 12235 6 18 average average NN 12235 6 19 of of IN 12235 6 20 26.51 26.51 CD 12235 6 21 percent percent NN 12235 6 22 potentially potentially RB 12235 6 23 relevant relevant JJ 12235 6 24 terms term NNS 12235 6 25 per per IN 12235 6 26 entry entry NN 12235 6 27 omitted omit VBN 12235 6 28 from from IN 12235 6 29 results result NNS 12235 6 30 if if IN 12235 6 31 the the DT 12235 6 32 Long Long NNP 12235 6 33 - - HYPH 12235 6 34 S S NNP 12235 6 35 is be VBZ 12235 6 36 not not RB 12235 6 37 corrected correct VBN 12235 6 38 . . . 12235 7 1 Results result NNS 12235 7 2 confirm confirm VBP 12235 7 3 that that IN 12235 7 4 correcting correct VBG 12235 7 5 the the DT 12235 7 6 Long long JJ 12235 7 7 - - HYPH 12235 7 8 S S NNP 12235 7 9 increases increase VBZ 12235 7 10 the the DT 12235 7 11 availability availability NN 12235 7 12 of of IN 12235 7 13 terms term NNS 12235 7 14 that that WDT 12235 7 15 can can MD 12235 7 16 be be VB 12235 7 17 used use VBN 12235 7 18 for for IN 12235 7 19 creating create VBG 12235 7 20 quality quality NN 12235 7 21 metadata metadata NN 12235 7 22 records record NNS 12235 7 23 . . . 12235 8 1 A a DT 12235 8 2 relationship relationship NN 12235 8 3 is be VBZ 12235 8 4 also also RB 12235 8 5 demonstrated demonstrate VBN 12235 8 6 between between IN 12235 8 7 shorter short JJR 12235 8 8 entries entry NNS 12235 8 9 and and CC 12235 8 10 an an DT 12235 8 11 increase increase NN 12235 8 12 in in IN 12235 8 13 omitted omitted JJ 12235 8 14 terms term NNS 12235 8 15 when when WRB 12235 8 16 the the DT 12235 8 17 Long Long NNP 12235 8 18 - - HYPH 12235 8 19 S S NNP 12235 8 20 is be VBZ 12235 8 21 not not RB 12235 8 22 corrected correct VBN 12235 8 23 . . . 12235 9 1 INTRODUCTION introduction VB 12235 9 2 The the DT 12235 9 3 creation creation NN 12235 9 4 of of IN 12235 9 5 subject subject JJ 12235 9 6 metadata metadata NN 12235 9 7 for for IN 12235 9 8 individual individual JJ 12235 9 9 documents document NNS 12235 9 10 is be VBZ 12235 9 11 long long RB 12235 9 12 known know VBN 12235 9 13 to to TO 12235 9 14 support support VB 12235 9 15 standardized standardized JJ 12235 9 16 resource resource NN 12235 9 17 discovery discovery NN 12235 9 18 and and CC 12235 9 19 analysis analysis NN 12235 9 20 by by IN 12235 9 21 identifying identify VBG 12235 9 22 and and CC 12235 9 23 connecting connect VBG 12235 9 24 resources resource NNS 12235 9 25 with with IN 12235 9 26 similar similar JJ 12235 9 27 aboutness aboutness NN 12235 9 28 .1 .1 . 12235 9 29 In in IN 12235 9 30 order order NN 12235 9 31 to to TO 12235 9 32 address address VB 12235 9 33 the the DT 12235 9 34 challenges challenge NNS 12235 9 35 of of IN 12235 9 36 scale scale NN 12235 9 37 , , , 12235 9 38 automatic automatic JJ 12235 9 39 or or CC 12235 9 40 semi semi JJ 12235 9 41 - - JJ 12235 9 42 automatic automatic JJ 12235 9 43 indexing indexing NN 12235 9 44 is be VBZ 12235 9 45 frequently frequently RB 12235 9 46 employed employ VBN 12235 9 47 for for IN 12235 9 48 the the DT 12235 9 49 generation generation NN 12235 9 50 of of IN 12235 9 51 subject subject JJ 12235 9 52 metadata metadata NN 12235 9 53 , , , 12235 9 54 particularly particularly RB 12235 9 55 for for IN 12235 9 56 academic academic JJ 12235 9 57 articles article NNS 12235 9 58 , , , 12235 9 59 where where WRB 12235 9 60 the the DT 12235 9 61 abstract abstract JJ 12235 9 62 and and CC 12235 9 63 title title NN 12235 9 64 can can MD 12235 9 65 be be VB 12235 9 66 used use VBN 12235 9 67 as as IN 12235 9 68 surrogates surrogate NNS 12235 9 69 in in IN 12235 9 70 place place NN 12235 9 71 of of IN 12235 9 72 indexing index VBG 12235 9 73 the the DT 12235 9 74 full full JJ 12235 9 75 text text NN 12235 9 76 . . . 12235 10 1 When when WRB 12235 10 2 automatically automatically RB 12235 10 3 generating generate VBG 12235 10 4 subject subject JJ 12235 10 5 metadata metadata NN 12235 10 6 for for IN 12235 10 7 historical historical JJ 12235 10 8 humanities humanity NNS 12235 10 9 full full JJ 12235 10 10 texts text NNS 12235 10 11 that that WDT 12235 10 12 do do VBP 12235 10 13 not not RB 12235 10 14 have have VB 12235 10 15 an an DT 12235 10 16 abstract abstract JJ 12235 10 17 , , , 12235 10 18 anachronistic anachronistic JJ 12235 10 19 typographical typographical JJ 12235 10 20 challenges challenge NNS 12235 10 21 may may MD 12235 10 22 arise arise VB 12235 10 23 . . . 12235 11 1 One one CD 12235 11 2 key key JJ 12235 11 3 challenge challenge NN 12235 11 4 is be VBZ 12235 11 5 that that IN 12235 11 6 presented present VBN 12235 11 7 by by IN 12235 11 8 the the DT 12235 11 9 historical historical JJ 12235 11 10 “ " `` 12235 11 11 Long Long NNP 12235 11 12 - - HYPH 12235 11 13 S s NN 12235 11 14 ” " '' 12235 11 15 < < XX 12235 11 16 ſ ſ NNP 12235 11 17 > > XX 12235 11 18 . . . 12235 12 1 In in IN 12235 12 2 order order NN 12235 12 3 to to TO 12235 12 4 account account VB 12235 12 5 for for IN 12235 12 6 these these DT 12235 12 7 idiosyncrasies idiosyncrasy NNS 12235 12 8 , , , 12235 12 9 there there EX 12235 12 10 is be VBZ 12235 12 11 a a DT 12235 12 12 need need NN 12235 12 13 to to TO 12235 12 14 understand understand VB 12235 12 15 the the DT 12235 12 16 impact impact NN 12235 12 17 that that WDT 12235 12 18 they -PRON- PRP 12235 12 19 have have VBP 12235 12 20 upon upon IN 12235 12 21 the the DT 12235 12 22 automatic automatic JJ 12235 12 23 subject subject JJ 12235 12 24 indexing indexing NN 12235 12 25 output output NN 12235 12 26 . . . 12235 13 1 Addressing address VBG 12235 13 2 this this DT 12235 13 3 challenge challenge NN 12235 13 4 will will MD 12235 13 5 help help VB 12235 13 6 librarians librarian NNS 12235 13 7 and and CC 12235 13 8 information information NN 12235 13 9 professionals professional NNS 12235 13 10 to to TO 12235 13 11 determine determine VB 12235 13 12 whether whether IN 12235 13 13 or or CC 12235 13 14 not not RB 12235 13 15 they -PRON- PRP 12235 13 16 will will MD 12235 13 17 need need VB 12235 13 18 to to TO 12235 13 19 correct correct VB 12235 13 20 the the DT 12235 13 21 Long Long NNP 12235 13 22 - - HYPH 12235 13 23 S s NN 12235 13 24 when when WRB 12235 13 25 automatically automatically RB 12235 13 26 generating generate VBG 12235 13 27 subject subject JJ 12235 13 28 metadata metadata NN 12235 13 29 for for IN 12235 13 30 full full JJ 12235 13 31 - - HYPH 12235 13 32 text text NN 12235 13 33 pre-1800s pre-1800s '' 12235 13 34 documents document NNS 12235 13 35 . . . 12235 14 1 The the DT 12235 14 2 problem problem NN 12235 14 3 of of IN 12235 14 4 the the DT 12235 14 5 Long Long NNP 12235 14 6 - - HYPH 12235 14 7 S S NNP 12235 14 8 in in IN 12235 14 9 Optical Optical NNP 12235 14 10 Character Character NNP 12235 14 11 Recognition Recognition NNP 12235 14 12 ( ( -LRB- 12235 14 13 OCR OCR NNP 12235 14 14 ) ) -RRB- 12235 14 15 for for IN 12235 14 16 digital digital JJ 12235 14 17 manuscript manuscript NN 12235 14 18 images image NNS 12235 14 19 has have VBZ 12235 14 20 been be VBN 12235 14 21 discussed discuss VBN 12235 14 22 for for IN 12235 14 23 decades.2 decades.2 NNP 12235 14 24 Many many JJ 12235 14 25 scholars scholar NNS 12235 14 26 have have VBP 12235 14 27 researched research VBN 12235 14 28 methods method NNS 12235 14 29 for for IN 12235 14 30 correcting correct VBG 12235 14 31 the the DT 12235 14 32 Long- Long- NNP 12235 14 33 S S NNP 12235 14 34 through through IN 12235 14 35 the the DT 12235 14 36 use use NN 12235 14 37 of of IN 12235 14 38 rule rule NN 12235 14 39 - - HYPH 12235 14 40 based base VBN 12235 14 41 algorithms algorithm NNS 12235 14 42 or or CC 12235 14 43 dictionaries.3 dictionaries.3 NN 12235 14 44 While while IN 12235 14 45 the the DT 12235 14 46 problem problem NN 12235 14 47 of of IN 12235 14 48 the the DT 12235 14 49 Long Long NNP 12235 14 50 - - HYPH 12235 14 51 S S NNP 12235 14 52 is be VBZ 12235 14 53 well well RB 12235 14 54 - - HYPH 12235 14 55 known know VBN 12235 14 56 in in IN 12235 14 57 the the DT 12235 14 58 digital digital JJ 12235 14 59 humanities humanity NNS 12235 14 60 community community NN 12235 14 61 , , , 12235 14 62 automatic automatic JJ 12235 14 63 subject subject JJ 12235 14 64 metadata metadata NN 12235 14 65 generation generation NN 12235 14 66 for for IN 12235 14 67 a a DT 12235 14 68 large large JJ 12235 14 69 corpus corpus NN 12235 14 70 of of IN 12235 14 71 pre-1800s pre-1800 NNS 12235 14 72 documents document NNS 12235 14 73 is be VBZ 12235 14 74 rare rare JJ 12235 14 75 , , , 12235 14 76 as as IN 12235 14 77 is be VBZ 12235 14 78 research research NN 12235 14 79 about about IN 12235 14 80 the the DT 12235 14 81 application application NN 12235 14 82 and and CC 12235 14 83 evaluation evaluation NN 12235 14 84 of of IN 12235 14 85 existing exist VBG 12235 14 86 automatic automatic JJ 12235 14 87 subject subject JJ 12235 14 88 metadata metadata NN 12235 14 89 generation generation NN 12235 14 90 tools tool NNS 12235 14 91 on on IN 12235 14 92 18th 18th JJ 12235 14 93 - - HYPH 12235 14 94 century century NN 12235 14 95 documents document NNS 12235 14 96 in in IN 12235 14 97 real real JJ 12235 14 98 - - HYPH 12235 14 99 world world NN 12235 14 100 information information NN 12235 14 101 environments environment NNS 12235 14 102 . . . 12235 15 1 The the DT 12235 15 2 impact impact NN 12235 15 3 of of IN 12235 15 4 the the DT 12235 15 5 Long Long NNP 12235 15 6 - - HYPH 12235 15 7 S S NNP 12235 15 8 upon upon IN 12235 15 9 automatic automatic JJ 12235 15 10 subject subject JJ 12235 15 11 metadata metadata NN 12235 15 12 generation generation NN 12235 15 13 results result NNS 12235 15 14 for for IN 12235 15 15 pre-1800s pre-1800 NNS 12235 15 16 texts text NNS 12235 15 17 has have VBZ 12235 15 18 not not RB 12235 15 19 been be VBN 12235 15 20 extensively extensively RB 12235 15 21 explored explore VBN 12235 15 22 . . . 12235 16 1 The the DT 12235 16 2 research research NN 12235 16 3 presented present VBN 12235 16 4 in in IN 12235 16 5 this this DT 12235 16 6 paper paper NN 12235 16 7 addresses address NNS 12235 16 8 this this DT 12235 16 9 need need NN 12235 16 10 . . . 12235 17 1 The the DT 12235 17 2 paper paper NN 12235 17 3 reports report NNS 12235 17 4 results result VBZ 12235 17 5 from from IN 12235 17 6 basic basic JJ 12235 17 7 statistical statistical JJ 12235 17 8 analysis analysis NN 12235 17 9 and and CC 12235 17 10 visualization visualization NN 12235 17 11 using use VBG 12235 17 12 the the DT 12235 17 13 Helping Helping NNP 12235 17 14 Interdisciplinary Interdisciplinary NNP 12235 17 15 Vocabulary Vocabulary NNP 12235 17 16 Engineering Engineering NNP 12235 17 17 ( ( -LRB- 12235 17 18 HIVE HIVE NNP 12235 17 19 ) ) -RRB- 12235 17 20 tool tool NN 12235 17 21 automatic automatic JJ 12235 17 22 mailto:smg383@Drexel.edu mailto:smg383@Drexel.edu NNS 12235 17 23 INFORMATION INFORMATION NNP 12235 17 24 TECHNOLOGY TECHNOLOGY NNP 12235 17 25 AND and CC 12235 17 26 LIBRARIES library NNS 12235 17 27 SEPTEMBER SEPTEMBER NNP 12235 17 28 2020 2020 CD 12235 17 29 EVALUATING evaluate VBG 12235 17 30 THE the DT 12235 17 31 IMPACT impact NN 12235 17 32 OF of IN 12235 17 33 THE the DT 12235 17 34 LONG long JJ 12235 17 35 - - HYPH 12235 17 36 S s NN 12235 17 37 | | NNP 12235 17 38 GRABUS grabus NN 12235 17 39 2 2 CD 12235 17 40 subject subject JJ 12235 17 41 indexing indexing NN 12235 17 42 results result NNS 12235 17 43 , , , 12235 17 44 before before RB 12235 17 45 and and CC 12235 17 46 after after IN 12235 17 47 the the DT 12235 17 48 correction correction NN 12235 17 49 of of IN 12235 17 50 the the DT 12235 17 51 historical historical JJ 12235 17 52 Long Long NNP 12235 17 53 - - HYPH 12235 17 54 S S NNP 12235 17 55 in in IN 12235 17 56 the the DT 12235 17 57 3rd 3rd JJ 12235 17 58 edition edition NN 12235 17 59 of of IN 12235 17 60 the the DT 12235 17 61 Encyclopedia Encyclopedia NNP 12235 17 62 Britannica Britannica NNP 12235 17 63 . . . 12235 18 1 Background background NN 12235 18 2 work work NN 12235 18 3 was be VBD 12235 18 4 conducted conduct VBN 12235 18 5 over over IN 12235 18 6 the the DT 12235 18 7 Summer Summer NNP 12235 18 8 and and CC 12235 18 9 Fall Fall NNP 12235 18 10 of of IN 12235 18 11 2019 2019 CD 12235 18 12 , , , 12235 18 13 and and CC 12235 18 14 the the DT 12235 18 15 research research NN 12235 18 16 presented present VBN 12235 18 17 was be VBD 12235 18 18 conducted conduct VBN 12235 18 19 during during IN 12235 18 20 Winter Winter NNP 12235 18 21 2020 2020 CD 12235 18 22 . . . 12235 19 1 The the DT 12235 19 2 work work NN 12235 19 3 was be VBD 12235 19 4 motivated motivate VBN 12235 19 5 by by IN 12235 19 6 current current JJ 12235 19 7 work work NN 12235 19 8 on on IN 12235 19 9 the the DT 12235 19 10 “ " `` 12235 19 11 Developing develop VBG 12235 19 12 the the DT 12235 19 13 Data Data NNP 12235 19 14 Set Set NNP 12235 19 15 of of IN 12235 19 16 Nineteenth Nineteenth NNP 12235 19 17 - - HYPH 12235 19 18 Century Century NNP 12235 19 19 Knowledge Knowledge NNP 12235 19 20 ” " '' 12235 19 21 project project NN 12235 19 22 , , , 12235 19 23 a a DT 12235 19 24 National National NNP 12235 19 25 Endowment Endowment NNP 12235 19 26 for for IN 12235 19 27 the the DT 12235 19 28 Humanities Humanities NNPS 12235 19 29 collaborative collaborative JJ 12235 19 30 project project NN 12235 19 31 between between IN 12235 19 32 Temple Temple NNP 12235 19 33 University University NNP 12235 19 34 ’s ’s POS 12235 19 35 Digital Digital NNP 12235 19 36 Scholarship Scholarship NNP 12235 19 37 Center Center NNP 12235 19 38 and and CC 12235 19 39 Drexel Drexel NNP 12235 19 40 University University NNP 12235 19 41 ’s ’s POS 12235 19 42 Metadata Metadata NNP 12235 19 43 Research Research NNP 12235 19 44 Center Center NNP 12235 19 45 . . . 12235 20 1 The the DT 12235 20 2 grant grant NN 12235 20 3 is be VBZ 12235 20 4 part part NN 12235 20 5 of of IN 12235 20 6 a a DT 12235 20 7 larger large JJR 12235 20 8 project project NN 12235 20 9 , , , 12235 20 10 Temple Temple NNP 12235 20 11 University University NNP 12235 20 12 ’s ’s POS 12235 20 13 “ " `` 12235 20 14 19th 19th JJ 12235 20 15 - - HYPH 12235 20 16 Century Century NNP 12235 20 17 Knowledge Knowledge NNP 12235 20 18 Project Project NNP 12235 20 19 , , , 12235 20 20 ” " '' 12235 20 21 which which WDT 12235 20 22 is be VBZ 12235 20 23 digitizing digitize VBG 12235 20 24 four four CD 12235 20 25 historical historical JJ 12235 20 26 editions edition NNS 12235 20 27 of of IN 12235 20 28 the the DT 12235 20 29 Encyclopedia Encyclopedia NNP 12235 20 30 Britannica.4 Britannica.4 NNP 12235 20 31 The the DT 12235 20 32 next next JJ 12235 20 33 section section NN 12235 20 34 of of IN 12235 20 35 this this DT 12235 20 36 paper paper NN 12235 20 37 presents present NNS 12235 20 38 background background NN 12235 20 39 covering cover VBG 12235 20 40 the the DT 12235 20 41 historical historical JJ 12235 20 42 Encyclopedia Encyclopedia NNP 12235 20 43 Britannica Britannica NNP 12235 20 44 data datum NNS 12235 20 45 , , , 12235 20 46 the the DT 12235 20 47 automatic automatic JJ 12235 20 48 subject subject JJ 12235 20 49 metadata metadata NN 12235 20 50 generation generation NN 12235 20 51 tool tool NN 12235 20 52 used use VBN 12235 20 53 for for IN 12235 20 54 this this DT 12235 20 55 project project NN 12235 20 56 , , , 12235 20 57 a a DT 12235 20 58 brief brief JJ 12235 20 59 background background NN 12235 20 60 of of IN 12235 20 61 “ " `` 12235 20 62 the the DT 12235 20 63 Long Long NNP 12235 20 64 - - HYPH 12235 20 65 S S NNP 12235 20 66 Problem problem NN 12235 20 67 , , , 12235 20 68 ” " '' 12235 20 69 and and CC 12235 20 70 the the DT 12235 20 71 distribution distribution NN 12235 20 72 of of IN 12235 20 73 encyclopedia encyclopedia JJ 12235 20 74 entry entry NN 12235 20 75 lengths length NNS 12235 20 76 in in IN 12235 20 77 the the DT 12235 20 78 3rd 3rd JJ 12235 20 79 edition edition NN 12235 20 80 . . . 12235 21 1 The the DT 12235 21 2 background background NN 12235 21 3 section section NN 12235 21 4 will will MD 12235 21 5 be be VB 12235 21 6 followed follow VBN 12235 21 7 by by IN 12235 21 8 research research NN 12235 21 9 objectives objective NNS 12235 21 10 and and CC 12235 21 11 method method NN 12235 21 12 supporting support VBG 12235 21 13 the the DT 12235 21 14 analysis analysis NN 12235 21 15 . . . 12235 22 1 Next next RB 12235 22 2 , , , 12235 22 3 the the DT 12235 22 4 results result NNS 12235 22 5 are be VBP 12235 22 6 presented present VBN 12235 22 7 , , , 12235 22 8 demonstrating demonstrate VBG 12235 22 9 prevalence prevalence NN 12235 22 10 of of IN 12235 22 11 terms term NNS 12235 22 12 omitted omit VBN 12235 22 13 from from IN 12235 22 14 the the DT 12235 22 15 automatic automatic JJ 12235 22 16 subject subject JJ 12235 22 17 metadata metadata NN 12235 22 18 generation generation NN 12235 22 19 results result NNS 12235 22 20 if if IN 12235 22 21 the the DT 12235 22 22 Long Long NNP 12235 22 23 - - HYPH 12235 22 24 S S NNP 12235 22 25 is be VBZ 12235 22 26 not not RB 12235 22 27 corrected correct VBN 12235 22 28 to to IN 12235 22 29 a a DT 12235 22 30 standard standard JJ 12235 22 31 small small JJ 12235 22 32 < < XX 12235 22 33 s s NNP 12235 22 34 > > XX 12235 22 35 character character NN 12235 22 36 , , , 12235 22 37 as as RB 12235 22 38 well well RB 12235 22 39 as as IN 12235 22 40 the the DT 12235 22 41 impact impact NN 12235 22 42 of of IN 12235 22 43 encyclopedia encyclopedia NNS 12235 22 44 entry entry NN 12235 22 45 length length NN 12235 22 46 upon upon IN 12235 22 47 these these DT 12235 22 48 results result NNS 12235 22 49 . . . 12235 23 1 The the DT 12235 23 2 results result NNS 12235 23 3 are be VBP 12235 23 4 followed follow VBN 12235 23 5 by by IN 12235 23 6 a a DT 12235 23 7 contextual contextual JJ 12235 23 8 discussion discussion NN 12235 23 9 , , , 12235 23 10 and and CC 12235 23 11 a a DT 12235 23 12 conclusion conclusion NN 12235 23 13 that that WDT 12235 23 14 highlights highlight VBZ 12235 23 15 key key JJ 12235 23 16 findings finding NNS 12235 23 17 and and CC 12235 23 18 identifies identify VBZ 12235 23 19 future future JJ 12235 23 20 research research NN 12235 23 21 . . . 12235 24 1 BACKGROUND background NN 12235 24 2 Indexing index VBG 12235 24 3 for for IN 12235 24 4 the the DT 12235 24 5 19th 19th JJ 12235 24 6 - - HYPH 12235 24 7 Century Century NNP 12235 24 8 Knowledge Knowledge NNP 12235 24 9 Project Project NNP 12235 24 10 The the DT 12235 24 11 19th 19th JJ 12235 24 12 - - HYPH 12235 24 13 Century Century NNP 12235 24 14 Knowledge Knowledge NNP 12235 24 15 Project Project NNP 12235 24 16 , , , 12235 24 17 an an DT 12235 24 18 NEH NEH NNP 12235 24 19 - - HYPH 12235 24 20 funded fund VBN 12235 24 21 initiative initiative NN 12235 24 22 at at IN 12235 24 23 Temple Temple NNP 12235 24 24 University University NNP 12235 24 25 , , , 12235 24 26 is be VBZ 12235 24 27 fully fully RB 12235 24 28 digitizing digitize VBG 12235 24 29 four four CD 12235 24 30 historical historical JJ 12235 24 31 editions edition NNS 12235 24 32 of of IN 12235 24 33 the the DT 12235 24 34 Encyclopedia Encyclopedia NNP 12235 24 35 Britannica Britannica NNP 12235 24 36 ( ( -LRB- 12235 24 37 the the DT 12235 24 38 3rd 3rd JJ 12235 24 39 , , , 12235 24 40 7th 7th JJ 12235 24 41 , , , 12235 24 42 9th 9th JJ 12235 24 43 , , , 12235 24 44 and and CC 12235 24 45 11th 11th NN 12235 24 46 ) ) -RRB- 12235 24 47 . . . 12235 25 1 The the DT 12235 25 2 long long JJ 12235 25 3 - - HYPH 12235 25 4 term term NN 12235 25 5 goal goal NN 12235 25 6 of of IN 12235 25 7 the the DT 12235 25 8 project project NN 12235 25 9 is be VBZ 12235 25 10 to to TO 12235 25 11 analyze analyze VB 12235 25 12 the the DT 12235 25 13 evolving evolve VBG 12235 25 14 conceptualization conceptualization NN 12235 25 15 of of IN 12235 25 16 knowledge knowledge NN 12235 25 17 across across IN 12235 25 18 the the DT 12235 25 19 19th 19th JJ 12235 25 20 century.5 century.5 . 12235 25 21 The the DT 12235 25 22 3rd 3rd JJ 12235 25 23 edition edition NN 12235 25 24 of of IN 12235 25 25 the the DT 12235 25 26 Encyclopedia Encyclopedia NNP 12235 25 27 Britannica Britannica NNP 12235 25 28 ( ( -LRB- 12235 25 29 1797 1797 CD 12235 25 30 ) ) -RRB- 12235 25 31 is be VBZ 12235 25 32 the the DT 12235 25 33 earliest early JJS 12235 25 34 edition edition NN 12235 25 35 being be VBG 12235 25 36 digitized digitize VBN 12235 25 37 for for IN 12235 25 38 this this DT 12235 25 39 project project NN 12235 25 40 . . . 12235 26 1 The the DT 12235 26 2 3rd 3rd JJ 12235 26 3 edition edition NN 12235 26 4 consists consist VBZ 12235 26 5 of of IN 12235 26 6 18 18 CD 12235 26 7 volumes volume NNS 12235 26 8 , , , 12235 26 9 with with IN 12235 26 10 a a DT 12235 26 11 total total NN 12235 26 12 of of IN 12235 26 13 14,579 14,579 CD 12235 26 14 pages page NNS 12235 26 15 , , , 12235 26 16 and and CC 12235 26 17 individual individual JJ 12235 26 18 entries entry NNS 12235 26 19 ranging range VBG 12235 26 20 from from IN 12235 26 21 four four CD 12235 26 22 to to IN 12235 26 23 over over IN 12235 26 24 150,000 150,000 CD 12235 26 25 words word NNS 12235 26 26 . . . 12235 27 1 For for IN 12235 27 2 each each DT 12235 27 3 individual individual JJ 12235 27 4 entry entry NN 12235 27 5 , , , 12235 27 6 researchers researcher NNS 12235 27 7 at at IN 12235 27 8 Temple Temple NNP 12235 27 9 have have VBP 12235 27 10 created create VBN 12235 27 11 individual individual JJ 12235 27 12 TEI tei NN 12235 27 13 - - HYPH 12235 27 14 XML xml NN 12235 27 15 files file NNS 12235 27 16 from from IN 12235 27 17 the the DT 12235 27 18 OCR OCR NNP 12235 27 19 output output NN 12235 27 20 . . . 12235 28 1 In in IN 12235 28 2 order order NN 12235 28 3 to to TO 12235 28 4 enrich enrich VB 12235 28 5 accessibility accessibility NN 12235 28 6 and and CC 12235 28 7 analysis analysis NN 12235 28 8 across across IN 12235 28 9 this this DT 12235 28 10 digital digital JJ 12235 28 11 collection collection NN 12235 28 12 , , , 12235 28 13 The the DT 12235 28 14 Knowledge Knowledge NNP 12235 28 15 Project Project NNP 12235 28 16 will will MD 12235 28 17 be be VB 12235 28 18 adding add VBG 12235 28 19 controlled control VBN 12235 28 20 vocabulary vocabulary JJ 12235 28 21 subject subject JJ 12235 28 22 headings heading NNS 12235 28 23 into into IN 12235 28 24 the the DT 12235 28 25 TEI tei NN 12235 28 26 headers header NNS 12235 28 27 of of IN 12235 28 28 each each DT 12235 28 29 encyclopedia encyclopedia JJ 12235 28 30 entry entry NN 12235 28 31 XML xml NN 12235 28 32 file file NN 12235 28 33 . . . 12235 29 1 Considering consider VBG 12235 29 2 the the DT 12235 29 3 size size NN 12235 29 4 of of IN 12235 29 5 this this DT 12235 29 6 corpus corpus NN 12235 29 7 , , , 12235 29 8 both both CC 12235 29 9 in in IN 12235 29 10 terms term NNS 12235 29 11 of of IN 12235 29 12 entry entry NN 12235 29 13 length length NN 12235 29 14 and and CC 12235 29 15 number number NN 12235 29 16 of of IN 12235 29 17 entries entry NNS 12235 29 18 , , , 12235 29 19 automatic automatic JJ 12235 29 20 subject subject JJ 12235 29 21 metadata metadata NN 12235 29 22 generation generation NN 12235 29 23 will will MD 12235 29 24 be be VB 12235 29 25 required require VBN 12235 29 26 for for IN 12235 29 27 the the DT 12235 29 28 creation creation NN 12235 29 29 of of IN 12235 29 30 this this DT 12235 29 31 metadata metadata NN 12235 29 32 . . . 12235 30 1 The the DT 12235 30 2 Knowledge Knowledge NNP 12235 30 3 Project Project NNP 12235 30 4 will will MD 12235 30 5 employ employ VB 12235 30 6 controlled control VBN 12235 30 7 vocabularies vocabulary NNS 12235 30 8 to to TO 12235 30 9 replace replace VB 12235 30 10 or or CC 12235 30 11 complement complement NN 12235 30 12 naturally naturally RB 12235 30 13 extracted extract VBN 12235 30 14 keywords keyword NNS 12235 30 15 for for IN 12235 30 16 this this DT 12235 30 17 process process NN 12235 30 18 . . . 12235 31 1 Using use VBG 12235 31 2 controlled control VBN 12235 31 3 vocabularies vocabulary NNS 12235 31 4 adheres adhere NNS 12235 31 5 to to TO 12235 31 6 metadata metadata VB 12235 31 7 semantic semantic JJ 12235 31 8 interoperability interoperability NN 12235 31 9 best good JJS 12235 31 10 practices practice NNS 12235 31 11 , , , 12235 31 12 ensures ensure VBZ 12235 31 13 representation representation NN 12235 31 14 consistency consistency NN 12235 31 15 , , , 12235 31 16 and and CC 12235 31 17 helps help VBZ 12235 31 18 to to TO 12235 31 19 bypass bypass VB 12235 31 20 linguistic linguistic JJ 12235 31 21 idiosyncrasies idiosyncrasy NNS 12235 31 22 of of IN 12235 31 23 these these DT 12235 31 24 18th 18th JJ 12235 31 25 and and CC 12235 31 26 19th 19th JJ 12235 31 27 Century Century NNP 12235 31 28 primary primary JJ 12235 31 29 source source NN 12235 31 30 materials material NNS 12235 31 31 . . . 12235 32 1 6 6 CD 12235 32 2 We -PRON- PRP 12235 32 3 selected select VBD 12235 32 4 two two CD 12235 32 5 versions version NNS 12235 32 6 of of IN 12235 32 7 the the DT 12235 32 8 Library Library NNP 12235 32 9 of of IN 12235 32 10 Congress Congress NNP 12235 32 11 Subject Subject NNP 12235 32 12 Headings Headings NNPS 12235 32 13 ( ( -LRB- 12235 32 14 LCSH LCSH NNP 12235 32 15 ) ) -RRB- 12235 32 16 as as IN 12235 32 17 the the DT 12235 32 18 controlled control VBN 12235 32 19 vocabularies vocabulary NNS 12235 32 20 for for IN 12235 32 21 this this DT 12235 32 22 project project NN 12235 32 23 . . . 12235 33 1 LCSH LCSH NNP 12235 33 2 was be VBD 12235 33 3 selected select VBN 12235 33 4 due due IN 12235 33 5 to to IN 12235 33 6 its -PRON- PRP$ 12235 33 7 relational relational JJ 12235 33 8 thesaurus thesaurus NN 12235 33 9 structure structure NN 12235 33 10 , , , 12235 33 11 multidisciplinary multidisciplinary JJ 12235 33 12 nature nature NN 12235 33 13 , , , 12235 33 14 and and CC 12235 33 15 continued continue VBN 12235 33 16 prevalence prevalence NN 12235 33 17 in in IN 12235 33 18 digital digital JJ 12235 33 19 collections collection NNS 12235 33 20 due due JJ 12235 33 21 to to IN 12235 33 22 its -PRON- PRP$ 12235 33 23 expressiveness expressiveness NN 12235 33 24 and and CC 12235 33 25 status status NN 12235 33 26 as as IN 12235 33 27 the the DT 12235 33 28 largest large JJS 12235 33 29 general general JJ 12235 33 30 indexing indexing NN 12235 33 31 vocabulary.7 vocabulary.7 PDT 12235 33 32 In in IN 12235 33 33 addition addition NN 12235 33 34 to to IN 12235 33 35 the the DT 12235 33 36 headings heading NNS 12235 33 37 from from IN 12235 33 38 the the DT 12235 33 39 2018 2018 CD 12235 33 40 edition edition NN 12235 33 41 of of IN 12235 33 42 LCSH LCSH NNP 12235 33 43 , , , 12235 33 44 headings heading NNS 12235 33 45 from from IN 12235 33 46 the the DT 12235 33 47 1910 1910 CD 12235 33 48 LCSH LCSH NNP 12235 33 49 are be VBP 12235 33 50 also also RB 12235 33 51 implemented implement VBN 12235 33 52 in in IN 12235 33 53 order order NN 12235 33 54 to to TO 12235 33 55 provide provide VB 12235 33 56 a a DT 12235 33 57 more more RBR 12235 33 58 multi multi JJ 12235 33 59 - - JJ 12235 33 60 faceted faceted JJ 12235 33 61 representation representation NN 12235 33 62 , , , 12235 33 63 using use VBG 12235 33 64 temporally temporally RB 12235 33 65 - - HYPH 12235 33 66 relevant relevant JJ 12235 33 67 terms term NNS 12235 33 68 that that WDT 12235 33 69 may may MD 12235 33 70 have have VB 12235 33 71 been be VBN 12235 33 72 removed remove VBN 12235 33 73 from from IN 12235 33 74 the the DT 12235 33 75 contemporary contemporary JJ 12235 33 76 LCSH LCSH NNP 12235 33 77 . . . 12235 34 1 The the DT 12235 34 2 tool tool NN 12235 34 3 applied apply VBD 12235 34 4 for for IN 12235 34 5 this this DT 12235 34 6 process process NN 12235 34 7 is be VBZ 12235 34 8 HIVE HIVE NNP 12235 34 9 , , , 12235 34 10 a a DT 12235 34 11 vocabulary vocabulary JJ 12235 34 12 server server NN 12235 34 13 and and CC 12235 34 14 automatic automatic JJ 12235 34 15 indexing indexing NN 12235 34 16 application application NN 12235 34 17 . . . 12235 35 1 8 8 CD 12235 35 2 HIVE hive RB 12235 35 3 allows allow VBZ 12235 35 4 the the DT 12235 35 5 user user NN 12235 35 6 to to TO 12235 35 7 upload upload VB 12235 35 8 a a DT 12235 35 9 digital digital JJ 12235 35 10 text text NN 12235 35 11 or or CC 12235 35 12 URL URL NNP 12235 35 13 , , , 12235 35 14 select select VB 12235 35 15 one one CD 12235 35 16 or or CC 12235 35 17 more more JJR 12235 35 18 controlled control VBN 12235 35 19 vocabularies vocabulary NNS 12235 35 20 , , , 12235 35 21 and and CC 12235 35 22 performs perform VBZ 12235 35 23 automatic automatic JJ 12235 35 24 subject subject JJ 12235 35 25 indexing indexing NN 12235 35 26 through through IN 12235 35 27 the the DT 12235 35 28 mapping mapping NN 12235 35 29 of of IN 12235 35 30 naturally naturally RB 12235 35 31 extracted extract VBN 12235 35 32 keywords keyword NNS 12235 35 33 to to IN 12235 35 34 the the DT 12235 35 35 available available JJ 12235 35 36 controlled control VBN 12235 35 37 vocabulary vocabulary JJ 12235 35 38 terms term NNS 12235 35 39 . . . 12235 36 1 HIVE HIVE NNP 12235 36 2 was be VBD 12235 36 3 initially initially RB 12235 36 4 launched launch VBN 12235 36 5 as as IN 12235 36 6 an an DT 12235 36 7 IMLS IMLS NNP 12235 36 8 linked link VBN 12235 36 9 open open JJ 12235 36 10 INFORMATION INFORMATION NNP 12235 36 11 TECHNOLOGY technology NN 12235 36 12 AND and CC 12235 36 13 LIBRARIES library NNS 12235 36 14 SEPTEMBER SEPTEMBER NNP 12235 36 15 2020 2020 CD 12235 36 16 EVALUATING evaluate VBG 12235 36 17 THE the DT 12235 36 18 IMPACT impact NN 12235 36 19 OF of IN 12235 36 20 THE the DT 12235 36 21 LONG long JJ 12235 36 22 - - HYPH 12235 36 23 S S NNP 12235 36 24 | | NNP 12235 36 25 GRABUS grabus NN 12235 36 26 3 3 CD 12235 36 27 vocabulary vocabulary NN 12235 36 28 and and CC 12235 36 29 indexing indexing NN 12235 36 30 demonstration demonstration NN 12235 36 31 project project NN 12235 36 32 in in IN 12235 36 33 2009 2009 CD 12235 36 34 . . . 12235 37 1 Since since IN 12235 37 2 that that DT 12235 37 3 time time NN 12235 37 4 , , , 12235 37 5 HIVE HIVE NNP 12235 37 6 has have VBZ 12235 37 7 been be VBN 12235 37 8 further further RB 12235 37 9 developed develop VBN 12235 37 10 , , , 12235 37 11 with with IN 12235 37 12 the the DT 12235 37 13 addition addition NN 12235 37 14 of of IN 12235 37 15 more more JJR 12235 37 16 controlled controlled JJ 12235 37 17 vocabularies vocabulary NNS 12235 37 18 , , , 12235 37 19 user user NN 12235 37 20 interface interface NN 12235 37 21 options option NNS 12235 37 22 , , , 12235 37 23 and and CC 12235 37 24 the the DT 12235 37 25 RAKE rake NN 12235 37 26 keyword keyword NN 12235 37 27 extraction extraction NN 12235 37 28 algorithm algorithm NNP 12235 37 29 . . . 12235 38 1 The the DT 12235 38 2 RAKE rake NN 12235 38 3 keyword keyword NN 12235 38 4 extraction extraction NN 12235 38 5 algorithm algorithm NNP 12235 38 6 has have VBZ 12235 38 7 been be VBN 12235 38 8 selected select VBN 12235 38 9 for for IN 12235 38 10 this this DT 12235 38 11 project project NN 12235 38 12 after after IN 12235 38 13 a a DT 12235 38 14 comparison comparison NN 12235 38 15 of of IN 12235 38 16 topic topic NNP 12235 38 17 relevance relevance NNP 12235 38 18 precision precision NN 12235 38 19 scores score NNS 12235 38 20 for for IN 12235 38 21 three three CD 12235 38 22 keyword keyword NNP 12235 38 23 extraction extraction NN 12235 38 24 algorithms.9 algorithms.9 CD 12235 38 25 The the DT 12235 38 26 Long Long NNP 12235 38 27 - - HYPH 12235 38 28 S S NNP 12235 38 29 Problem problem NN 12235 38 30 Early early RB 12235 38 31 in in IN 12235 38 32 our -PRON- PRP$ 12235 38 33 metadata metadata NN 12235 38 34 generation generation NN 12235 38 35 efforts effort NNS 12235 38 36 , , , 12235 38 37 we -PRON- PRP 12235 38 38 discovered discover VBD 12235 38 39 that that IN 12235 38 40 the the DT 12235 38 41 3rd 3rd JJ 12235 38 42 edition edition NN 12235 38 43 of of IN 12235 38 44 the the DT 12235 38 45 Encyclopedia Encyclopedia NNP 12235 38 46 Britannica Britannica NNP 12235 38 47 employs employ VBZ 12235 38 48 the the DT 12235 38 49 historical historical JJ 12235 38 50 Long Long NNP 12235 38 51 - - HYPH 12235 38 52 S. S. NNP 12235 39 1 Originating originate VBG 12235 39 2 in in IN 12235 39 3 early early JJ 12235 39 4 Roman roman JJ 12235 39 5 cursive cursive NN 12235 39 6 script script NN 12235 39 7 , , , 12235 39 8 the the DT 12235 39 9 Long Long NNP 12235 39 10 - - HYPH 12235 39 11 S S NNP 12235 39 12 was be VBD 12235 39 13 used use VBN 12235 39 14 in in IN 12235 39 15 typesetting typesetting NN 12235 39 16 up up RP 12235 39 17 through through IN 12235 39 18 the the DT 12235 39 19 18th 18th JJ 12235 39 20 century century NN 12235 39 21 , , , 12235 39 22 both both CC 12235 39 23 with with IN 12235 39 24 and and CC 12235 39 25 without without IN 12235 39 26 a a DT 12235 39 27 left left JJ 12235 39 28 crossbar crossbar NN 12235 39 29 . . . 12235 40 1 By by IN 12235 40 2 the the DT 12235 40 3 end end NN 12235 40 4 of of IN 12235 40 5 the the DT 12235 40 6 18th 18th JJ 12235 40 7 century century NN 12235 40 8 , , , 12235 40 9 the the DT 12235 40 10 Long Long NNP 12235 40 11 - - HYPH 12235 40 12 S S NNP 12235 40 13 fell fall VBD 12235 40 14 out out IN 12235 40 15 of of IN 12235 40 16 use use NN 12235 40 17 with with IN 12235 40 18 printers.10 printers.10 NNP 12235 40 19 As as IN 12235 40 20 outlined outline VBN 12235 40 21 by by IN 12235 40 22 lexicographers lexicographer NNS 12235 40 23 of of IN 12235 40 24 the the DT 12235 40 25 17th 17th JJ 12235 40 26 and and CC 12235 40 27 18th 18th JJ 12235 40 28 centuries century NNS 12235 40 29 , , , 12235 40 30 the the DT 12235 40 31 rules rule NNS 12235 40 32 for for IN 12235 40 33 using use VBG 12235 40 34 the the DT 12235 40 35 Long Long NNP 12235 40 36 - - HYPH 12235 40 37 S S NNP 12235 40 38 were be VBD 12235 40 39 frequently frequently RB 12235 40 40 vague vague JJ 12235 40 41 , , , 12235 40 42 complicated complicated JJ 12235 40 43 , , , 12235 40 44 inconsistent inconsistent JJ 12235 40 45 over over IN 12235 40 46 time time NN 12235 40 47 , , , 12235 40 48 and and CC 12235 40 49 varied vary VBD 12235 40 50 according accord VBG 12235 40 51 to to IN 12235 40 52 language language NN 12235 40 53 ( ( -LRB- 12235 40 54 English English NNP 12235 40 55 , , , 12235 40 56 French french JJ 12235 40 57 , , , 12235 40 58 Spanish spanish JJ 12235 40 59 , , , 12235 40 60 or or CC 12235 40 61 Italian Italian NNP 12235 40 62 ) ) -RRB- 12235 40 63 . . . 12235 41 1 11 11 CD 12235 41 2 These these DT 12235 41 3 rules rule NNS 12235 41 4 specified specify VBD 12235 41 5 where where WRB 12235 41 6 in in IN 12235 41 7 a a DT 12235 41 8 word word NN 12235 41 9 the the DT 12235 41 10 Long Long NNP 12235 41 11 - - HYPH 12235 41 12 S S NNP 12235 41 13 should should MD 12235 41 14 be be VB 12235 41 15 used use VBN 12235 41 16 instead instead RB 12235 41 17 of of IN 12235 41 18 a a DT 12235 41 19 short short JJ 12235 41 20 < < XX 12235 41 21 s s NNP 12235 41 22 > > XX 12235 41 23 , , , 12235 41 24 whether whether IN 12235 41 25 it -PRON- PRP 12235 41 26 is be VBZ 12235 41 27 capitalized capitalize VBN 12235 41 28 , , , 12235 41 29 where where WRB 12235 41 30 it -PRON- PRP 12235 41 31 may may MD 12235 41 32 be be VB 12235 41 33 used use VBN 12235 41 34 in in IN 12235 41 35 proximity proximity NN 12235 41 36 to to IN 12235 41 37 apostrophes apostrophe NNS 12235 41 38 , , , 12235 41 39 hyphens hyphen NNS 12235 41 40 , , , 12235 41 41 and and CC 12235 41 42 the the DT 12235 41 43 letters letter NNS 12235 41 44 < < XX 12235 41 45 f f XX 12235 41 46 > > XX 12235 41 47 , , , 12235 41 48 < < XX 12235 41 49 b b NNP 12235 41 50 > > XX 12235 41 51 , , , 12235 41 52 < < XX 12235 41 53 h h NNP 12235 41 54 > > XX 12235 41 55 , , , 12235 41 56 and and CC 12235 41 57 < < XX 12235 41 58 k k NNP 12235 41 59 > > XX 12235 41 60 ; ; : 12235 41 61 and and CC 12235 41 62 whether whether IN 12235 41 63 it -PRON- PRP 12235 41 64 is be VBZ 12235 41 65 used use VBN 12235 41 66 as as IN 12235 41 67 part part NN 12235 41 68 of of IN 12235 41 69 a a DT 12235 41 70 compound compound NN 12235 41 71 word word NN 12235 41 72 or or CC 12235 41 73 abbreviation.12 abbreviation.12 NNP 12235 41 74 This this DT 12235 41 75 is be VBZ 12235 41 76 further far RBR 12235 41 77 complicated complicate VBN 12235 41 78 by by IN 12235 41 79 the the DT 12235 41 80 inclusion inclusion NN 12235 41 81 of of IN 12235 41 82 the the DT 12235 41 83 half half JJ 12235 41 84 - - HYPH 12235 41 85 crossbar crossbar NN 12235 41 86 , , , 12235 41 87 which which WDT 12235 41 88 occasionally occasionally RB 12235 41 89 results result VBZ 12235 41 90 in in IN 12235 41 91 two two CD 12235 41 92 consequences consequence NNS 12235 41 93 : : : 12235 41 94 ( ( -LRB- 12235 41 95 a a LS 12235 41 96 ) ) -RRB- 12235 41 97 The the DT 12235 41 98 Long Long NNP 12235 41 99 - - HYPH 12235 41 100 S S NNP 12235 41 101 may may MD 12235 41 102 be be VB 12235 41 103 interpreted interpret VBN 12235 41 104 by by IN 12235 41 105 OCR OCR NNP 12235 41 106 as as IN 12235 41 107 an an DT 12235 41 108 < < XX 12235 41 109 f f XX 12235 41 110 > > XX 12235 41 111 , , , 12235 41 112 and and CC 12235 41 113 < < XX 12235 41 114 b b NNP 12235 41 115 > > XX 12235 41 116 and and CC 12235 41 117 < < XX 12235 41 118 f f NNP 12235 41 119 > > XX 12235 41 120 may may MD 12235 41 121 be be VB 12235 41 122 interpreted interpret VBN 12235 41 123 by by IN 12235 41 124 OCR OCR NNP 12235 41 125 as as IN 12235 41 126 a a DT 12235 41 127 Long Long NNP 12235 41 128 - - HYPH 12235 41 129 S. s. NN 12235 42 1 Figure figure NN 12235 42 2 1 1 CD 12235 42 3 shows show VBZ 12235 42 4 an an DT 12235 42 5 example example NN 12235 42 6 from from IN 12235 42 7 the the DT 12235 42 8 3rd 3rd JJ 12235 42 9 edition edition NN 12235 42 10 entry entry NN 12235 42 11 on on IN 12235 42 12 Russia Russia NNP 12235 42 13 , , , 12235 42 14 in in IN 12235 42 15 which which WDT 12235 42 16 the the DT 12235 42 17 original original JJ 12235 42 18 text text NN 12235 42 19 specifies specify VBZ 12235 42 20 “ " `` 12235 42 21 of of IN 12235 42 22 ” " '' 12235 42 23 ( ( -LRB- 12235 42 24 line line NN 12235 42 25 1 1 CD 12235 42 26 in in IN 12235 42 27 top top JJ 12235 42 28 figure figure NN 12235 42 29 ) ) -RRB- 12235 42 30 , , , 12235 42 31 yet yet CC 12235 42 32 the the DT 12235 42 33 OCR OCR NNP 12235 42 34 output output NN 12235 42 35 has have VBZ 12235 42 36 interpreted interpret VBN 12235 42 37 the the DT 12235 42 38 character character NN 12235 42 39 as as IN 12235 42 40 a a DT 12235 42 41 Long Long NNP 12235 42 42 - - HYPH 12235 42 43 S. s. NN 12235 43 1 The the DT 12235 43 2 Long Long NNP 12235 43 3 - - HYPH 12235 43 4 S S NNP 12235 43 5 may may MD 12235 43 6 also also RB 12235 43 7 occasionally occasionally RB 12235 43 8 be be VB 12235 43 9 interpreted interpret VBN 12235 43 10 by by IN 12235 43 11 the the DT 12235 43 12 OCR OCR NNP 12235 43 13 as as IN 12235 43 14 a a DT 12235 43 15 lower- lower- JJ 12235 43 16 case case NN 12235 43 17 < < XX 12235 43 18 l l NNP 12235 43 19 > > XX 12235 43 20 , , , 12235 43 21 such such JJ 12235 43 22 as as IN 12235 43 23 the the DT 12235 43 24 “ " `` 12235 43 25 univerlity univerlity NN 12235 43 26 of of IN 12235 43 27 Dublin Dublin NNP 12235 43 28 ” " '' 12235 43 29 in in IN 12235 43 30 the the DT 12235 43 31 3rd 3rd JJ 12235 43 32 edition edition NN 12235 43 33 entry entry NN 12235 43 34 on on IN 12235 43 35 Robinson Robinson NNP 12235 43 36 ( ( -LRB- 12235 43 37 The the DT 12235 43 38 most most RBS 12235 43 39 Rev Rev NNP 12235 43 40 Sir Sir NNP 12235 43 41 Richard Richard NNP 12235 43 42 ) ) -RRB- 12235 43 43 . . . 12235 44 1 These these DT 12235 44 2 complications complication NNS 12235 44 3 and and CC 12235 44 4 inconsistencies inconsistency NNS 12235 44 5 are be VBP 12235 44 6 challenges challenge NNS 12235 44 7 when when WRB 12235 44 8 developing develop VBG 12235 44 9 Python Python NNP 12235 44 10 rules rule NNS 12235 44 11 for for IN 12235 44 12 correcting correct VBG 12235 44 13 the the DT 12235 44 14 Long Long NNP 12235 44 15 - - HYPH 12235 44 16 S S NNP 12235 44 17 in in IN 12235 44 18 an an DT 12235 44 19 automated automate VBN 12235 44 20 way way NN 12235 44 21 , , , 12235 44 22 and and CC 12235 44 23 even even RB 12235 44 24 preexisting preexist VBG 12235 44 25 scripts script NNS 12235 44 26 will will MD 12235 44 27 need need VB 12235 44 28 to to TO 12235 44 29 be be VB 12235 44 30 adapted adapt VBN 12235 44 31 for for IN 12235 44 32 individual individual JJ 12235 44 33 use use NN 12235 44 34 with with IN 12235 44 35 a a DT 12235 44 36 particular particular JJ 12235 44 37 corpus corpus NN 12235 44 38 . . . 12235 45 1 Figure figure NN 12235 45 2 1 1 CD 12235 45 3 . . . 12235 46 1 Example example NN 12235 46 2 from from IN 12235 46 3 the the DT 12235 46 4 3rd 3rd JJ 12235 46 5 edition edition NN 12235 46 6 entry entry NN 12235 46 7 on on IN 12235 46 8 Russia Russia NNP 12235 46 9 , , , 12235 46 10 comparing compare VBG 12235 46 11 the the DT 12235 46 12 original original JJ 12235 46 13 use use NN 12235 46 14 of of IN 12235 46 15 a a DT 12235 46 16 letter letter NN 12235 46 17 < < XX 12235 46 18 f f XX 12235 46 19 > > XX 12235 46 20 in in IN 12235 46 21 “ " `` 12235 46 22 of of IN 12235 46 23 ” " '' 12235 46 24 to to IN 12235 46 25 the the DT 12235 46 26 OCR OCR NNP 12235 46 27 output output NN 12235 46 28 of of IN 12235 46 29 the the DT 12235 46 30 same same JJ 12235 46 31 passage passage NN 12235 46 32 , , , 12235 46 33 which which WDT 12235 46 34 mistakenly mistakenly RB 12235 46 35 interprets interpret VBZ 12235 46 36 the the DT 12235 46 37 character character NN 12235 46 38 as as IN 12235 46 39 a a DT 12235 46 40 Long Long NNP 12235 46 41 - - HYPH 12235 46 42 S. s. NN 12235 46 43 INFORMATION INFORMATION NNP 12235 46 44 TECHNOLOGY technology NN 12235 46 45 AND and CC 12235 46 46 LIBRARIES library NNS 12235 46 47 SEPTEMBER SEPTEMBER NNP 12235 46 48 2020 2020 CD 12235 46 49 EVALUATING evaluate VBG 12235 46 50 THE the DT 12235 46 51 IMPACT impact NN 12235 46 52 OF of IN 12235 46 53 THE the DT 12235 46 54 LONG long JJ 12235 46 55 - - HYPH 12235 46 56 S s NN 12235 46 57 | | NNP 12235 46 58 GRABUS grabus NN 12235 46 59 4 4 CD 12235 46 60 Despite despite IN 12235 46 61 the the DT 12235 46 62 transition transition NN 12235 46 63 away away RB 12235 46 64 from from IN 12235 46 65 the the DT 12235 46 66 Long Long NNP 12235 46 67 - - HYPH 12235 46 68 S S NNP 12235 46 69 towards towards IN 12235 46 70 the the DT 12235 46 71 end end NN 12235 46 72 of of IN 12235 46 73 the the DT 12235 46 74 18th 18th JJ 12235 46 75 century century NN 12235 46 76 , , , 12235 46 77 the the DT 12235 46 78 3rd 3rd JJ 12235 46 79 edition edition NN 12235 46 80 of of IN 12235 46 81 the the DT 12235 46 82 Encyclopedia Encyclopedia NNP 12235 46 83 Britannica Britannica NNP 12235 46 84 ( ( -LRB- 12235 46 85 published publish VBN 12235 46 86 in in IN 12235 46 87 1797 1797 CD 12235 46 88 ) ) -RRB- 12235 46 89 implements implement VBZ 12235 46 90 the the DT 12235 46 91 Long Long NNP 12235 46 92 - - HYPH 12235 46 93 S S NNP 12235 46 94 throughout throughout IN 12235 46 95 , , , 12235 46 96 with with IN 12235 46 97 approximately approximately RB 12235 46 98 100,594 100,594 CD 12235 46 99 instances instance NNS 12235 46 100 of of IN 12235 46 101 the the DT 12235 46 102 Long Long NNP 12235 46 103 - - HYPH 12235 46 104 S S NNP 12235 46 105 in in IN 12235 46 106 the the DT 12235 46 107 OCR OCR NNP 12235 46 108 output output NN 12235 46 109 . . . 12235 47 1 When when WRB 12235 47 2 performing perform VBG 12235 47 3 metadata metadata NN 12235 47 4 generation generation NN 12235 47 5 with with IN 12235 47 6 the the DT 12235 47 7 HIVE HIVE NNP 12235 47 8 tool tool NN 12235 47 9 on on IN 12235 47 10 the the DT 12235 47 11 OCR OCR NNP 12235 47 12 output output NN 12235 47 13 for for IN 12235 47 14 an an DT 12235 47 15 entry entry NN 12235 47 16 , , , 12235 47 17 the the DT 12235 47 18 Long Long NNP 12235 47 19 - - HYPH 12235 47 20 S S NNP 12235 47 21 is be VBZ 12235 47 22 most most RBS 12235 47 23 often often RB 12235 47 24 interpreted interpret VBN 12235 47 25 by by IN 12235 47 26 the the DT 12235 47 27 automatic automatic JJ 12235 47 28 metadata metadata NN 12235 47 29 generation generation NN 12235 47 30 tool tool NN 12235 47 31 as as IN 12235 47 32 an an DT 12235 47 33 < < XX 12235 47 34 f f XX 12235 47 35 > > XX 12235 47 36 , , , 12235 47 37 which which WDT 12235 47 38 can can MD 12235 47 39 result result VB 12235 47 40 in in IN 12235 47 41 ( ( -LRB- 12235 47 42 a a DT 12235 47 43 ) ) -RRB- 12235 47 44 inaccurate inaccurate JJ 12235 47 45 keyword keyword NN 12235 47 46 extraction extraction NN 12235 47 47 ( ( -LRB- 12235 47 48 e.g. e.g. RB 12235 47 49 , , , 12235 47 50 Russians→ russians→ JJ 12235 47 51 Ruffians Ruffians NNPS 12235 47 52 ) ) -RRB- 12235 47 53 , , , 12235 47 54 and and CC 12235 47 55 ( ( -LRB- 12235 47 56 b b NN 12235 47 57 ) ) -RRB- 12235 47 58 when when WRB 12235 47 59 mapping mapping NN 12235 47 60 extracted extract VBD 12235 47 61 keywords keyword NNS 12235 47 62 to to IN 12235 47 63 controlled controlled JJ 12235 47 64 vocabulary vocabulary JJ 12235 47 65 terms term NNS 12235 47 66 , , , 12235 47 67 essential essential JJ 12235 47 68 topics topic NNS 12235 47 69 could could MD 12235 47 70 be be VB 12235 47 71 unidentifiable unidentifiable JJ 12235 47 72 , , , 12235 47 73 and and CC 12235 47 74 HIVE HIVE NNP 12235 47 75 will will MD 12235 47 76 subsequently subsequently RB 12235 47 77 omit omit VB 12235 47 78 them -PRON- PRP 12235 47 79 from from IN 12235 47 80 the the DT 12235 47 81 results result NNS 12235 47 82 because because IN 12235 47 83 they -PRON- PRP 12235 47 84 can can MD 12235 47 85 not not RB 12235 47 86 be be VB 12235 47 87 mapped map VBN 12235 47 88 to to IN 12235 47 89 controlled control VBN 12235 47 90 vocabulary vocabulary JJ 12235 47 91 terms term NNS 12235 47 92 . . . 12235 48 1 Figure figure NN 12235 48 2 2 2 CD 12235 48 3 provides provide VBZ 12235 48 4 a a DT 12235 48 5 truncated truncated JJ 12235 48 6 view view NN 12235 48 7 of of IN 12235 48 8 Long long JJ 12235 48 9 - - HYPH 12235 48 10 S S NNP 12235 48 11 words word NNS 12235 48 12 in in IN 12235 48 13 the the DT 12235 48 14 3rd 3rd JJ 12235 48 15 edition edition NN 12235 48 16 entry entry NN 12235 48 17 on on IN 12235 48 18 Rum Rum NNP 12235 48 19 , , , 12235 48 20 which which WDT 12235 48 21 are be VBP 12235 48 22 subsequently subsequently RB 12235 48 23 removed remove VBN 12235 48 24 from from IN 12235 48 25 the the DT 12235 48 26 pool pool NN 12235 48 27 of of IN 12235 48 28 automatically automatically RB 12235 48 29 extracted extract VBN 12235 48 30 keywords keyword NNS 12235 48 31 when when WRB 12235 48 32 performing perform VBG 12235 48 33 the the DT 12235 48 34 automatic automatic JJ 12235 48 35 subject subject JJ 12235 48 36 indexing indexing NN 12235 48 37 sequence sequence NN 12235 48 38 in in IN 12235 48 39 HIVE HIVE NNP 12235 48 40 . . . 12235 49 1 Using use VBG 12235 49 2 keyword keyword NNP 12235 49 3 extraction extraction NN 12235 49 4 algorithms algorithm NNS 12235 49 5 that that WDT 12235 49 6 are be VBP 12235 49 7 largely largely RB 12235 49 8 dependent dependent JJ 12235 49 9 upon upon IN 12235 49 10 term term NN 12235 49 11 frequencies frequency NNS 12235 49 12 , , , 12235 49 13 automatic automatic JJ 12235 49 14 subject subject NN 12235 49 15 indexing indexing NN 12235 49 16 for for IN 12235 49 17 an an DT 12235 49 18 entry entry NN 12235 49 19 on on IN 12235 49 20 Rum Rum NNP 12235 49 21 may may MD 12235 49 22 be be VB 12235 49 23 substantially substantially RB 12235 49 24 hindered hinder VBN 12235 49 25 when when WRB 12235 49 26 meaningful meaningful JJ 12235 49 27 and and CC 12235 49 28 frequently frequently RB 12235 49 29 occurring occur VBG 12235 49 30 words word NNS 12235 49 31 such such JJ 12235 49 32 as as IN 12235 49 33 sugar sugar NN 12235 49 34 , , , 12235 49 35 and and CC 12235 49 36 yeast yeast NN 12235 49 37 are be VBP 12235 49 38 removed remove VBN 12235 49 39 . . . 12235 50 1 Figure figure NN 12235 50 2 2 2 CD 12235 50 3 . . . 12235 51 1 Examples example NNS 12235 51 2 of of IN 12235 51 3 the the DT 12235 51 4 Long Long NNP 12235 51 5 - - HYPH 12235 51 6 S S NNP 12235 51 7 in in IN 12235 51 8 the the DT 12235 51 9 3rd 3rd JJ 12235 51 10 edition edition NN 12235 51 11 Encyclopedia Encyclopedia NNP 12235 51 12 Britannica Britannica NNP 12235 51 13 entry entry NN 12235 51 14 on on IN 12235 51 15 Rum Rum NNP 12235 51 16 . . . 12235 52 1 Using use VBG 12235 52 2 this this DT 12235 52 3 example example NN 12235 52 4 entry entry NN 12235 52 5 , , , 12235 52 6 the the DT 12235 52 7 automatic automatic JJ 12235 52 8 subject subject JJ 12235 52 9 indexing indexing NN 12235 52 10 results result NNS 12235 52 11 were be VBD 12235 52 12 compared compare VBN 12235 52 13 using use VBG 12235 52 14 Python Python NNP 12235 52 15 , , , 12235 52 16 to to TO 12235 52 17 determine determine VB 12235 52 18 which which WDT 12235 52 19 terms term NNS 12235 52 20 only only RB 12235 52 21 appear appear VBP 12235 52 22 when when WRB 12235 52 23 the the DT 12235 52 24 Long Long NNP 12235 52 25 - - HYPH 12235 52 26 S S NNP 12235 52 27 has have VBZ 12235 52 28 been be VBN 12235 52 29 corrected correct VBN 12235 52 30 to to IN 12235 52 31 the the DT 12235 52 32 standard standard NN 12235 52 33 < < XX 12235 52 34 s s NNP 12235 52 35 > > XX 12235 52 36 . . . 12235 53 1 The the DT 12235 53 2 comparison comparison NN 12235 53 3 showed show VBD 12235 53 4 that that IN 12235 53 5 16 16 CD 12235 53 6 total total JJ 12235 53 7 terms term NNS 12235 53 8 no no RB 12235 53 9 longer long RBR 12235 53 10 appeared appear VBD 12235 53 11 in in IN 12235 53 12 the the DT 12235 53 13 results result NNS 12235 53 14 when when WRB 12235 53 15 the the DT 12235 53 16 Long Long NNP 12235 53 17 - - HYPH 12235 53 18 S S NNP 12235 53 19 was be VBD 12235 53 20 not not RB 12235 53 21 corrected correct VBN 12235 53 22 to to IN 12235 53 23 a a DT 12235 53 24 standard standard NN 12235 53 25 < < XX 12235 53 26 s s NNP 12235 53 27 > > XX 12235 53 28 : : : 12235 53 29 ten ten CD 12235 53 30 terms term NNS 12235 53 31 using use VBG 12235 53 32 the the DT 12235 53 33 2018 2018 CD 12235 53 34 LCSH LCSH NNP 12235 53 35 , , , 12235 53 36 and and CC 12235 53 37 six six CD 12235 53 38 terms term NNS 12235 53 39 using use VBG 12235 53 40 the the DT 12235 53 41 1910 1910 CD 12235 53 42 LCSH LCSH NNP 12235 53 43 . . . 12235 54 1 These these DT 12235 54 2 omitted omit VBN 12235 54 3 results result NNS 12235 54 4 included include VBD 12235 54 5 the the DT 12235 54 6 terms term NNS 12235 54 7 sugar sugar NN 12235 54 8 and and CC 12235 54 9 yeast yeast NN 12235 54 10 . . . 12235 55 1 The the DT 12235 55 2 next next JJ 12235 55 3 section section NN 12235 55 4 will will MD 12235 55 5 discuss discuss VB 12235 55 6 the the DT 12235 55 7 encyclopedia encyclopedia NN 12235 55 8 entry entry NN 12235 55 9 word word NN 12235 55 10 count count NN 12235 55 11 for for IN 12235 55 12 this this DT 12235 55 13 corpus corpus NN 12235 55 14 , , , 12235 55 15 and and CC 12235 55 16 the the DT 12235 55 17 possible possible JJ 12235 55 18 impact impact NN 12235 55 19 that that IN 12235 55 20 this this DT 12235 55 21 may may MD 12235 55 22 have have VB 12235 55 23 upon upon IN 12235 55 24 automatic automatic JJ 12235 55 25 subject subject JJ 12235 55 26 indexing indexing NN 12235 55 27 between between IN 12235 55 28 corrected corrected JJ 12235 55 29 and and CC 12235 55 30 uncorrected uncorrecte VBD 12235 55 31 Long Long NNP 12235 55 32 - - HYPH 12235 55 33 S S NNP 12235 55 34 instances instance NNS 12235 55 35 . . . 12235 56 1 Encyclopedia encyclopedia NN 12235 56 2 Entry entry NN 12235 56 3 Lengths length VBZ 12235 56 4 Consistent Consistent NNP 12235 56 5 with with IN 12235 56 6 other other JJ 12235 56 7 Encyclopedia Encyclopedia NNP 12235 56 8 Britannica Britannica NNP 12235 56 9 editions edition NNS 12235 56 10 in in IN 12235 56 11 the the DT 12235 56 12 18th 18th JJ 12235 56 13 and and CC 12235 56 14 19th 19th JJ 12235 56 15 centuries century NNS 12235 56 16 , , , 12235 56 17 the the DT 12235 56 18 encyclopedia encyclopedia JJ 12235 56 19 entries entry NNS 12235 56 20 in in IN 12235 56 21 the the DT 12235 56 22 3rd 3rd JJ 12235 56 23 edition edition NN 12235 56 24 vary vary VBP 12235 56 25 substantially substantially RB 12235 56 26 in in IN 12235 56 27 length length NN 12235 56 28 . . . 12235 57 1 A a DT 12235 57 2 convenience convenience NN 12235 57 3 sample sample NN 12235 57 4 of of IN 12235 57 5 3,849 3,849 CD 12235 57 6 3rd 3rd JJ 12235 57 7 edition edition NN 12235 57 8 entries entry NNS 12235 57 9 ranging range VBG 12235 57 10 in in IN 12235 57 11 length length NN 12235 57 12 from from IN 12235 57 13 2 2 CD 12235 57 14 to to IN 12235 57 15 202,848 202,848 CD 12235 57 16 words word NNS 12235 57 17 demonstrated demonstrate VBD 12235 57 18 an an DT 12235 57 19 arithmetic arithmetic JJ 12235 57 20 mean mean NN 12235 57 21 of of IN 12235 57 22 INFORMATION INFORMATION NNP 12235 57 23 TECHNOLOGY TECHNOLOGY NNP 12235 57 24 AND and CC 12235 57 25 LIBRARIES library NNS 12235 57 26 SEPTEMBER SEPTEMBER NNP 12235 57 27 2020 2020 CD 12235 57 28 EVALUATING evaluate VBG 12235 57 29 THE the DT 12235 57 30 IMPACT impact NN 12235 57 31 OF of IN 12235 57 32 THE the DT 12235 57 33 LONG long JJ 12235 57 34 - - HYPH 12235 57 35 S S NNP 12235 57 36 | | NNP 12235 57 37 GRABUS grabus NN 12235 57 38 5 5 CD 12235 57 39 826.60 826.60 CD 12235 57 40 , , , 12235 57 41 and and CC 12235 57 42 a a DT 12235 57 43 median median JJ 12235 57 44 word word NN 12235 57 45 count count NN 12235 57 46 of of IN 12235 57 47 71 71 CD 12235 57 48 . . . 12235 58 1 As as IN 12235 58 2 shown show VBN 12235 58 3 in in IN 12235 58 4 figure figure NN 12235 58 5 3 3 CD 12235 58 6 , , , 12235 58 7 this this DT 12235 58 8 indicates indicate VBZ 12235 58 9 a a DT 12235 58 10 significant significant JJ 12235 58 11 skew skew NN 12235 58 12 towards towards IN 12235 58 13 shorter short JJR 12235 58 14 entry entry NN 12235 58 15 lengths length NNS 12235 58 16 . . . 12235 59 1 For for IN 12235 59 2 the the DT 12235 59 3 vast vast JJ 12235 59 4 majority majority NN 12235 59 5 of of IN 12235 59 6 encyclopedia encyclopedia JJ 12235 59 7 entries entry NNS 12235 59 8 in in IN 12235 59 9 this this DT 12235 59 10 corpus corpus NN 12235 59 11 , , , 12235 59 12 a a DT 12235 59 13 low low JJ 12235 59 14 total total JJ 12235 59 15 word word NN 12235 59 16 count count NN 12235 59 17 may may MD 12235 59 18 impact impact VB 12235 59 19 the the DT 12235 59 20 degree degree NN 12235 59 21 of of IN 12235 59 22 Long long JJ 12235 59 23 - - HYPH 12235 59 24 S s NN 12235 59 25 impact impact NN 12235 59 26 for for IN 12235 59 27 automatic automatic JJ 12235 59 28 subject subject JJ 12235 59 29 indexing indexing NN 12235 59 30 results result NNS 12235 59 31 , , , 12235 59 32 given give VBN 12235 59 33 the the DT 12235 59 34 importance importance NN 12235 59 35 of of IN 12235 59 36 term term NN 12235 59 37 availability availability NN 12235 59 38 and and CC 12235 59 39 frequency frequency NN 12235 59 40 for for IN 12235 59 41 keyword keyword NNP 12235 59 42 extraction extraction NNP 12235 59 43 algorithms algorithm NNS 12235 59 44 . . . 12235 60 1 Figure figure NN 12235 60 2 3 3 CD 12235 60 3 . . . 12235 61 1 Scatterplot scatterplot NN 12235 61 2 of of IN 12235 61 3 word word NN 12235 61 4 count count NN 12235 61 5 for for IN 12235 61 6 a a DT 12235 61 7 convenience convenience NN 12235 61 8 sample sample NN 12235 61 9 of of IN 12235 61 10 3,849 3,849 CD 12235 61 11 3rd 3rd NNP 12235 61 12 Edition Edition NNP 12235 61 13 Encyclopedia Encyclopedia NNP 12235 61 14 Britannica Britannica NNP 12235 61 15 entries entrie VBZ 12235 61 16 . . . 12235 62 1 Large large JJ 12235 62 2 - - HYPH 12235 62 3 scale scale NN 12235 62 4 metadata metadata NN 12235 62 5 generation generation NN 12235 62 6 requires require VBZ 12235 62 7 time time NN 12235 62 8 , , , 12235 62 9 labor labor NN 12235 62 10 , , , 12235 62 11 and and CC 12235 62 12 resources resource NNS 12235 62 13 , , , 12235 62 14 and and CC 12235 62 15 it -PRON- PRP 12235 62 16 becomes become VBZ 12235 62 17 more more RBR 12235 62 18 costly costly JJ 12235 62 19 when when WRB 12235 62 20 accounting account VBG 12235 62 21 for for IN 12235 62 22 the the DT 12235 62 23 complications complication NNS 12235 62 24 of of IN 12235 62 25 correcting correct VBG 12235 62 26 the the DT 12235 62 27 Long Long NNP 12235 62 28 - - HYPH 12235 62 29 S S NNP 12235 62 30 for for IN 12235 62 31 a a DT 12235 62 32 particular particular JJ 12235 62 33 corpus corpus NN 12235 62 34 . . . 12235 63 1 Library library NN 12235 63 2 and and CC 12235 63 3 information information NN 12235 63 4 professionals professional NNS 12235 63 5 working work VBG 12235 63 6 with with IN 12235 63 7 digital digital JJ 12235 63 8 humanities humanity NNS 12235 63 9 resources resource NNS 12235 63 10 will will MD 12235 63 11 need need VB 12235 63 12 to to TO 12235 63 13 understand understand VB 12235 63 14 the the DT 12235 63 15 impact impact NN 12235 63 16 of of IN 12235 63 17 correcting correct VBG 12235 63 18 or or CC 12235 63 19 not not RB 12235 63 20 corrected correct VBN 12235 63 21 the the DT 12235 63 22 Long Long NNP 12235 63 23 - - HYPH 12235 63 24 S S NNP 12235 63 25 in in IN 12235 63 26 the the DT 12235 63 27 corpus corpus NN 12235 63 28 before before IN 12235 63 29 designating designate VBG 12235 63 30 resources resource NNS 12235 63 31 and and CC 12235 63 32 developing develop VBG 12235 63 33 a a DT 12235 63 34 protocol protocol NN 12235 63 35 for for IN 12235 63 36 generating generate VBG 12235 63 37 the the DT 12235 63 38 automatic automatic JJ 12235 63 39 or or CC 12235 63 40 semi semi JJ 12235 63 41 - - JJ 12235 63 42 automatic automatic JJ 12235 63 43 metadata metadata NN 12235 63 44 for for IN 12235 63 45 full full JJ 12235 63 46 - - HYPH 12235 63 47 text text NN 12235 63 48 resources resource NNS 12235 63 49 . . . 12235 64 1 This this DT 12235 64 2 includes include VBZ 12235 64 3 understanding understanding NN 12235 64 4 whether whether IN 12235 64 5 or or CC 12235 64 6 not not RB 12235 64 7 the the DT 12235 64 8 length length NN 12235 64 9 of of IN 12235 64 10 each each DT 12235 64 11 individual individual JJ 12235 64 12 document document NN 12235 64 13 will will MD 12235 64 14 affect affect VB 12235 64 15 the the DT 12235 64 16 degree degree NN 12235 64 17 of of IN 12235 64 18 Long long JJ 12235 64 19 - - HYPH 12235 64 20 S s NN 12235 64 21 impact impact NN 12235 64 22 upon upon IN 12235 64 23 the the DT 12235 64 24 results result NNS 12235 64 25 . . . 12235 65 1 This this DT 12235 65 2 challenge challenge NN 12235 65 3 , , , 12235 65 4 and and CC 12235 65 5 issues issue NNS 12235 65 6 reviewed review VBN 12235 65 7 above above RB 12235 65 8 , , , 12235 65 9 are be VBP 12235 65 10 in in IN 12235 65 11 the the DT 12235 65 12 research research NN 12235 65 13 presented present VBN 12235 65 14 below below RB 12235 65 15 . . . 12235 66 1 OBJECTIVES OBJECTIVES NNP 12235 66 2 The the DT 12235 66 3 overriding override VBG 12235 66 4 goal goal NN 12235 66 5 of of IN 12235 66 6 this this DT 12235 66 7 work work NN 12235 66 8 is be VBZ 12235 66 9 to to TO 12235 66 10 determine determine VB 12235 66 11 the the DT 12235 66 12 prevalence prevalence NN 12235 66 13 of of IN 12235 66 14 omitted omit VBN 12235 66 15 terms term NNS 12235 66 16 in in IN 12235 66 17 automatic automatic JJ 12235 66 18 subject subject JJ 12235 66 19 indexing indexing NN 12235 66 20 results result NNS 12235 66 21 when when WRB 12235 66 22 the the DT 12235 66 23 Long Long NNP 12235 66 24 - - HYPH 12235 66 25 S S NNP 12235 66 26 is be VBZ 12235 66 27 not not RB 12235 66 28 corrected correct VBN 12235 66 29 in in IN 12235 66 30 the the DT 12235 66 31 3rd 3rd JJ 12235 66 32 edition edition NN 12235 66 33 entries entry NNS 12235 66 34 of of IN 12235 66 35 the the DT 12235 66 36 Encyclopedia Encyclopedia NNP 12235 66 37 Britannica Britannica NNP 12235 66 38 . . . 12235 67 1 Research research NN 12235 67 2 questions question NNS 12235 67 3 : : : 12235 67 4 1 1 LS 12235 67 5 . . . 12235 68 1 What what WP 12235 68 2 is be VBZ 12235 68 3 the the DT 12235 68 4 average average JJ 12235 68 5 number number NN 12235 68 6 of of IN 12235 68 7 terms term NNS 12235 68 8 that that WDT 12235 68 9 are be VBP 12235 68 10 omitted omit VBN 12235 68 11 from from IN 12235 68 12 automatic automatic JJ 12235 68 13 subject subject JJ 12235 68 14 indexing indexing NN 12235 68 15 results result NNS 12235 68 16 when when WRB 12235 68 17 the the DT 12235 68 18 Long Long NNP 12235 68 19 - - HYPH 12235 68 20 S S NNP 12235 68 21 is be VBZ 12235 68 22 not not RB 12235 68 23 corrected correct VBN 12235 68 24 to to IN 12235 68 25 a a DT 12235 68 26 standard standard NN 12235 68 27 < < XX 12235 68 28 s s XX 12235 68 29 > > XX 12235 68 30 ? ? . 12235 69 1 2 2 LS 12235 69 2 . . . 12235 70 1 How how WRB 12235 70 2 does do VBZ 12235 70 3 the the DT 12235 70 4 encyclopedia encyclopedia NN 12235 70 5 entry entry NN 12235 70 6 length length NN 12235 70 7 affect affect VBP 12235 70 8 the the DT 12235 70 9 number number NN 12235 70 10 of of IN 12235 70 11 terms term NNS 12235 70 12 that that WDT 12235 70 13 are be VBP 12235 70 14 omitted omit VBN 12235 70 15 when when WRB 12235 70 16 the the DT 12235 70 17 Long Long NNP 12235 70 18 - - HYPH 12235 70 19 S S NNP 12235 70 20 is be VBZ 12235 70 21 not not RB 12235 70 22 corrected correct VBN 12235 70 23 to to IN 12235 70 24 a a DT 12235 70 25 standard standard NN 12235 70 26 < < XX 12235 70 27 s s XX 12235 70 28 > > XX 12235 70 29 ? ? . 12235 71 1 This this DT 12235 71 2 analysis analysis NN 12235 71 3 will will MD 12235 71 4 approach approach VB 12235 71 5 these these DT 12235 71 6 goals goal NNS 12235 71 7 by by IN 12235 71 8 performing perform VBG 12235 71 9 a a DT 12235 71 10 comparative comparative JJ 12235 71 11 analysis analysis NN 12235 71 12 of of IN 12235 71 13 automatic automatic JJ 12235 71 14 subject subject JJ 12235 71 15 indexing indexing NN 12235 71 16 results result NNS 12235 71 17 to to TO 12235 71 18 determine determine VB 12235 71 19 the the DT 12235 71 20 number number NN 12235 71 21 of of IN 12235 71 22 terms term NNS 12235 71 23 that that WDT 12235 71 24 are be VBP 12235 71 25 omitted omit VBN 12235 71 26 from from IN 12235 71 27 the the DT 12235 71 28 results result NNS 12235 71 29 when when WRB 12235 71 30 the the DT 12235 71 31 Long Long NNP 12235 71 32 - - HYPH 12235 71 33 S S NNP 12235 71 34 is be VBZ 12235 71 35 not not RB 12235 71 36 corrected correct VBN 12235 71 37 to to IN 12235 71 38 a a DT 12235 71 39 standard standard JJ 12235 71 40 letter letter NN 12235 71 41 < < XX 12235 71 42 s s NNP 12235 71 43 > > XX 12235 71 44 . . . 12235 72 1 Basic basic JJ 12235 72 2 descriptive descriptive JJ 12235 72 3 statistics statistic NNS 12235 72 4 are be VBP 12235 72 5 generated generate VBN 12235 72 6 to to TO 12235 72 7 determine determine VB 12235 72 8 central central JJ 12235 72 9 tendency tendency NN 12235 72 10 . . . 12235 73 1 The the DT 12235 73 2 quantity quantity NN 12235 73 3 of of IN 12235 73 4 terms term NNS 12235 73 5 omitted omit VBN 12235 73 6 are be VBP 12235 73 7 then then RB 12235 73 8 compared compare VBN 12235 73 9 with with IN 12235 73 10 encyclopedia encyclopedia JJ 12235 73 11 INFORMATION INFORMATION NNP 12235 73 12 TECHNOLOGY technology NN 12235 73 13 AND and CC 12235 73 14 LIBRARIES library NNS 12235 73 15 SEPTEMBER SEPTEMBER NNP 12235 73 16 2020 2020 CD 12235 73 17 EVALUATING evaluate VBG 12235 73 18 THE the DT 12235 73 19 IMPACT impact NN 12235 73 20 OF of IN 12235 73 21 THE the DT 12235 73 22 LONG long JJ 12235 73 23 - - HYPH 12235 73 24 S S NNP 12235 73 25 | | NNP 12235 73 26 GRABUS grabus NN 12235 73 27 6 6 CD 12235 73 28 entry entry NN 12235 73 29 word word NN 12235 73 30 counts count NNS 12235 73 31 . . . 12235 74 1 These these DT 12235 74 2 objectives objective NNS 12235 74 3 were be VBD 12235 74 4 shaped shape VBN 12235 74 5 by by IN 12235 74 6 collaboration collaboration NN 12235 74 7 between between IN 12235 74 8 Drexel Drexel NNP 12235 74 9 University University NNP 12235 74 10 ’s ’s POS 12235 74 11 Metadata Metadata NNP 12235 74 12 Research Research NNP 12235 74 13 Center Center NNP 12235 74 14 and and CC 12235 74 15 Temple Temple NNP 12235 74 16 University University NNP 12235 74 17 ’s ’s POS 12235 74 18 Digital Digital NNP 12235 74 19 Scholarship Scholarship NNP 12235 74 20 Center Center NNP 12235 74 21 . . . 12235 75 1 The the DT 12235 75 2 next next JJ 12235 75 3 section section NN 12235 75 4 of of IN 12235 75 5 this this DT 12235 75 6 paper paper NN 12235 75 7 will will MD 12235 75 8 report report VB 12235 75 9 on on IN 12235 75 10 methods method NNS 12235 75 11 and and CC 12235 75 12 steps step NNS 12235 75 13 taken take VBN 12235 75 14 to to TO 12235 75 15 address address VB 12235 75 16 these these DT 12235 75 17 objectives objective NNS 12235 75 18 . . . 12235 76 1 METHODS METHODS NNP 12235 76 2 We -PRON- PRP 12235 76 3 approached approach VBD 12235 76 4 this this DT 12235 76 5 research research NN 12235 76 6 by by IN 12235 76 7 performing perform VBG 12235 76 8 a a DT 12235 76 9 comparative comparative JJ 12235 76 10 analysis analysis NN 12235 76 11 of of IN 12235 76 12 subject subject JJ 12235 76 13 metadata metadata NN 12235 76 14 generated generate VBD 12235 76 15 both both CC 12235 76 16 before before IN 12235 76 17 and and CC 12235 76 18 after after IN 12235 76 19 the the DT 12235 76 20 correction correction NN 12235 76 21 of of IN 12235 76 22 the the DT 12235 76 23 historical historical JJ 12235 76 24 Long Long NNP 12235 76 25 - - HYPH 12235 76 26 S S NNP 12235 76 27 in in IN 12235 76 28 the the DT 12235 76 29 3rd 3rd JJ 12235 76 30 edition edition NN 12235 76 31 of of IN 12235 76 32 the the DT 12235 76 33 Encyclopedia Encyclopedia NNP 12235 76 34 Britannica Britannica NNP 12235 76 35 . . . 12235 77 1 The the DT 12235 77 2 HIVE HIVE NNP 12235 77 3 tool tool NN 12235 77 4 was be VBD 12235 77 5 used use VBN 12235 77 6 to to TO 12235 77 7 automatically automatically RB 12235 77 8 generate generate VB 12235 77 9 the the DT 12235 77 10 subject subject JJ 12235 77 11 metadata metadata NN 12235 77 12 . . . 12235 78 1 Descriptive descriptive JJ 12235 78 2 statistics statistic NNS 12235 78 3 were be VBD 12235 78 4 applied apply VBN 12235 78 5 , , , 12235 78 6 and and CC 12235 78 7 visualizations visualization NNS 12235 78 8 produced produce VBN 12235 78 9 from from IN 12235 78 10 the the DT 12235 78 11 results result NNS 12235 78 12 were be VBD 12235 78 13 also also RB 12235 78 14 examined examine VBN 12235 78 15 to to TO 12235 78 16 identify identify VB 12235 78 17 trends trend NNS 12235 78 18 . . . 12235 79 1 Figure figure NN 12235 79 2 4 4 CD 12235 79 3 . . . 12235 80 1 The the DT 12235 80 2 30 30 CD 12235 80 3 Encyclopedia Encyclopedia NNP 12235 80 4 Britannica Britannica NNP 12235 80 5 3rd 3rd JJ 12235 80 6 edition edition NN 12235 80 7 Encyclopedia Encyclopedia NNP 12235 80 8 Britannica Britannica NNP 12235 80 9 entries entry NNS 12235 80 10 randomly randomly RB 12235 80 11 selected select VBN 12235 80 12 for for IN 12235 80 13 this this DT 12235 80 14 study study NN 12235 80 15 , , , 12235 80 16 sorted sort VBN 12235 80 17 in in IN 12235 80 18 ascending ascend VBG 12235 80 19 order order NN 12235 80 20 by by IN 12235 80 21 their -PRON- PRP$ 12235 80 22 word word NN 12235 80 23 counts count NNS 12235 80 24 . . . 12235 81 1 The the DT 12235 81 2 protocol protocol NN 12235 81 3 for for IN 12235 81 4 performing perform VBG 12235 81 5 this this DT 12235 81 6 research research NN 12235 81 7 involved involve VBD 12235 81 8 the the DT 12235 81 9 following following JJ 12235 81 10 steps step NNS 12235 81 11 : : : 12235 81 12 1 1 CD 12235 81 13 . . . 12235 82 1 Compile compile VB 12235 82 2 a a DT 12235 82 3 sample sample NN 12235 82 4 for for IN 12235 82 5 testing testing NN 12235 82 6 : : : 12235 82 7 1.1 1.1 CD 12235 82 8 . . . 12235 83 1 A a DT 12235 83 2 random random JJ 12235 83 3 sample sample NN 12235 83 4 of of IN 12235 83 5 30 30 CD 12235 83 6 encyclopedia encyclopedia NN 12235 83 7 entries entry NNS 12235 83 8 was be VBD 12235 83 9 identified identify VBN 12235 83 10 from from IN 12235 83 11 a a DT 12235 83 12 convenience convenience NN 12235 83 13 sample sample NN 12235 83 14 of of IN 12235 83 15 entries entry NNS 12235 83 16 that that WDT 12235 83 17 comprise comprise VBP 12235 83 18 the the DT 12235 83 19 letter letter NN 12235 83 20 S S NNP 12235 83 21 volumes volume NNS 12235 83 22 of of IN 12235 83 23 the the DT 12235 83 24 3rd 3rd JJ 12235 83 25 edition edition NN 12235 83 26 . . . 12235 84 1 The the DT 12235 84 2 entries entry NNS 12235 84 3 range range VBP 12235 84 4 , , , 12235 84 5 in in IN 12235 84 6 length length NN 12235 84 7 , , , 12235 84 8 from from IN 12235 84 9 6 6 CD 12235 84 10 to to IN 12235 84 11 6,114 6,114 CD 12235 84 12 words word NNS 12235 84 13 . . . 12235 85 1 The the DT 12235 85 2 median median JJ 12235 85 3 word word NN 12235 85 4 count count NN 12235 85 5 for for IN 12235 85 6 entries entry NNS 12235 85 7 in in IN 12235 85 8 this this DT 12235 85 9 sample sample NN 12235 85 10 is be VBZ 12235 85 11 99 99 CD 12235 85 12 words word NNS 12235 85 13 . . . 12235 86 1 1.2 1.2 CD 12235 86 2 . . . 12235 87 1 The the DT 12235 87 2 sample sample NN 12235 87 3 of of IN 12235 87 4 terms term NNS 12235 87 5 selected select VBN 12235 87 6 for for IN 12235 87 7 this this DT 12235 87 8 study study NN 12235 87 9 and and CC 12235 87 10 their -PRON- PRP$ 12235 87 11 respective respective JJ 12235 87 12 word word NN 12235 87 13 counts count NNS 12235 87 14 are be VBP 12235 87 15 visualized visualize VBN 12235 87 16 in in IN 12235 87 17 figure figure NN 12235 87 18 4 4 CD 12235 87 19 . . . 12235 88 1 1.3 1.3 CD 12235 88 2 . . . 12235 89 1 For for IN 12235 89 2 each each DT 12235 89 3 entry entry NN 12235 89 4 , , , 12235 89 5 the the DT 12235 89 6 Long long JJ 12235 89 7 - - HYPH 12235 89 8 S S NNP 12235 89 9 terms term NNS 12235 89 10 in in IN 12235 89 11 the the DT 12235 89 12 original original JJ 12235 89 13 XML xml NN 12235 89 14 file file NN 12235 89 15 were be VBD 12235 89 16 extracted extract VBN 12235 89 17 to to IN 12235 89 18 a a DT 12235 89 19 list list NN 12235 89 20 . . . 12235 90 1 2 2 LS 12235 90 2 . . . 12235 91 1 Perform perform VB 12235 91 2 automatic automatic JJ 12235 91 3 subject subject JJ 12235 91 4 indexing indexing NN 12235 91 5 sequence sequence NN 12235 91 6 upon upon IN 12235 91 7 entries entry NNS 12235 91 8 to to TO 12235 91 9 generate generate VB 12235 91 10 lists list NNS 12235 91 11 of of IN 12235 91 12 terms term NNS 12235 91 13 : : : 12235 91 14 2.1 2.1 CD 12235 91 15 . . . 12235 92 1 Using use VBG 12235 92 2 the the DT 12235 92 3 2018 2018 CD 12235 92 4 and and CC 12235 92 5 1910 1910 CD 12235 92 6 versions version NNS 12235 92 7 of of IN 12235 92 8 the the DT 12235 92 9 LCSH LCSH NNP 12235 92 10 . . . 12235 93 1 2.2 2.2 CD 12235 93 2 . . . 12235 94 1 With with IN 12235 94 2 fixed fix VBN 12235 94 3 maximum maximum JJ 12235 94 4 subject subject JJ 12235 94 5 heading heading NN 12235 94 6 results result NNS 12235 94 7 set set VBN 12235 94 8 to to IN 12235 94 9 40 40 CD 12235 94 10 : : : 12235 94 11 20 20 CD 12235 94 12 maximum maximum JJ 12235 94 13 terms term NNS 12235 94 14 returned return VBD 12235 94 15 with with IN 12235 94 16 the the DT 12235 94 17 2018 2018 CD 12235 94 18 LCSH LCSH NNP 12235 94 19 , , , 12235 94 20 and and CC 12235 94 21 20 20 CD 12235 94 22 maximum maximum JJ 12235 94 23 terms term NNS 12235 94 24 returned return VBD 12235 94 25 with with IN 12235 94 26 the the DT 12235 94 27 1910 1910 CD 12235 94 28 LCSH LCSH NNP 12235 94 29 . . . 12235 95 1 2.3 2.3 LS 12235 95 2 . . . 12235 96 1 Before before IN 12235 96 2 Long Long NNP 12235 96 3 - - HYPH 12235 96 4 S S NNP 12235 96 5 correction correction NN 12235 96 6 and and CC 12235 96 7 after after IN 12235 96 8 Long Long NNP 12235 96 9 - - HYPH 12235 96 10 S S NNP 12235 96 11 correction correction NN 12235 96 12 , , , 12235 96 13 using use VBG 12235 96 14 the the DT 12235 96 15 Oxygen Oxygen NNP 12235 96 16 XML XML NNP 12235 96 17 Editor Editor NNP 12235 96 18 TEI TEI NNP 12235 96 19 to to IN 12235 96 20 TXT TXT NNP 12235 96 21 transformation transformation NN 12235 96 22 . . . 12235 97 1 INFORMATION INFORMATION NNP 12235 97 2 TECHNOLOGY technology NN 12235 97 3 AND and CC 12235 97 4 LIBRARIES library NNS 12235 97 5 SEPTEMBER SEPTEMBER NNP 12235 97 6 2020 2020 CD 12235 97 7 EVALUATING evaluate VBG 12235 97 8 THE the DT 12235 97 9 IMPACT impact NN 12235 97 10 OF of IN 12235 97 11 THE the DT 12235 97 12 LONG long JJ 12235 97 13 - - HYPH 12235 97 14 S S NNP 12235 97 15 | | NNP 12235 97 16 GRABUS grabus NN 12235 97 17 7 7 CD 12235 97 18 3 3 CD 12235 97 19 . . . 12235 98 1 Perform perform VB 12235 98 2 outer outer JJ 12235 98 3 join join VBP 12235 98 4 on on IN 12235 98 5 Python Python NNP 12235 98 6 Data Data NNP 12235 98 7 Frames Frames NNPS 12235 98 8 , , , 12235 98 9 between between IN 12235 98 10 terms term NNS 12235 98 11 generated generate VBN 12235 98 12 when when WRB 12235 98 13 the the DT 12235 98 14 Long Long NNP 12235 98 15 - - HYPH 12235 98 16 S S NNP 12235 98 17 has have VBZ 12235 98 18 been be VBN 12235 98 19 corrected correct VBN 12235 98 20 vs. vs. IN 12235 98 21 terms term NNS 12235 98 22 generated generate VBN 12235 98 23 when when WRB 12235 98 24 the the DT 12235 98 25 Long Long NNP 12235 98 26 - - HYPH 12235 98 27 S S NNP 12235 98 28 has have VBZ 12235 98 29 not not RB 12235 98 30 been be VBN 12235 98 31 corrected correct VBN 12235 98 32 . . . 12235 99 1 The the DT 12235 99 2 resulting result VBG 12235 99 3 left left JJ 12235 99 4 outer outer JJ 12235 99 5 join join NN 12235 99 6 list list NN 12235 99 7 displays display VBZ 12235 99 8 terms term NNS 12235 99 9 that that WDT 12235 99 10 are be VBP 12235 99 11 omitted omit VBN 12235 99 12 from from IN 12235 99 13 the the DT 12235 99 14 automatic automatic JJ 12235 99 15 indexing indexing NN 12235 99 16 results result NNS 12235 99 17 if if IN 12235 99 18 the the DT 12235 99 19 Long Long NNP 12235 99 20 - - HYPH 12235 99 21 S S NNP 12235 99 22 is be VBZ 12235 99 23 not not RB 12235 99 24 corrected correct VBN 12235 99 25 to to IN 12235 99 26 a a DT 12235 99 27 standard standard JJ 12235 99 28 small small JJ 12235 99 29 < < XX 12235 99 30 s s NNP 12235 99 31 > > XX 12235 99 32 . . . 12235 100 1 The the DT 12235 100 2 quantity quantity NN 12235 100 3 of of IN 12235 100 4 terms term NNS 12235 100 5 omitted omit VBN 12235 100 6 are be VBP 12235 100 7 recorded record VBN 12235 100 8 for for IN 12235 100 9 comparison comparison NN 12235 100 10 . . . 12235 101 1 4 4 LS 12235 101 2 . . . 12235 102 1 Analysis analysis NN 12235 102 2 : : : 12235 102 3 Descriptive descriptive JJ 12235 102 4 statistics statistic NNS 12235 102 5 were be VBD 12235 102 6 generated generate VBN 12235 102 7 to to TO 12235 102 8 determine determine VB 12235 102 9 central central JJ 12235 102 10 tendency tendency NN 12235 102 11 for for IN 12235 102 12 the the DT 12235 102 13 number number NN 12235 102 14 and and CC 12235 102 15 percentage percentage NN 12235 102 16 of of IN 12235 102 17 words word NNS 12235 102 18 omitted omit VBN 12235 102 19 when when WRB 12235 102 20 the the DT 12235 102 21 Long Long NNP 12235 102 22 - - HYPH 12235 102 23 S S NNP 12235 102 24 is be VBZ 12235 102 25 not not RB 12235 102 26 corrected correct VBN 12235 102 27 . . . 12235 103 1 The the DT 12235 103 2 quantity quantity NN 12235 103 3 of of IN 12235 103 4 terms term NNS 12235 103 5 omitted omit VBN 12235 103 6 are be VBP 12235 103 7 also also RB 12235 103 8 visualized visualize VBN 12235 103 9 in in IN 12235 103 10 a a DT 12235 103 11 continuous continuous JJ 12235 103 12 scatterplot scatterplot NN 12235 103 13 with with IN 12235 103 14 the the DT 12235 103 15 corresponding correspond VBG 12235 103 16 word word NN 12235 103 17 counts count NNS 12235 103 18 , , , 12235 103 19 to to TO 12235 103 20 demonstrate demonstrate VB 12235 103 21 that that IN 12235 103 22 the the DT 12235 103 23 quantity quantity NN 12235 103 24 of of IN 12235 103 25 terms term NNS 12235 103 26 omitted omit VBN 12235 103 27 when when WRB 12235 103 28 the the DT 12235 103 29 Long Long NNP 12235 103 30 - - HYPH 12235 103 31 S S NNP 12235 103 32 is be VBZ 12235 103 33 not not RB 12235 103 34 corrected correct VBN 12235 103 35 seems seem VBZ 12235 103 36 to to TO 12235 103 37 relate relate VB 12235 103 38 to to IN 12235 103 39 the the DT 12235 103 40 length length NN 12235 103 41 of of IN 12235 103 42 the the DT 12235 103 43 document document NN 12235 103 44 being be VBG 12235 103 45 automatically automatically RB 12235 103 46 classified classify VBN 12235 103 47 . . . 12235 104 1 RESULTS RESULTS NNP 12235 104 2 The the DT 12235 104 3 results result NNS 12235 104 4 report report VBP 12235 104 5 the the DT 12235 104 6 prevalence prevalence NN 12235 104 7 of of IN 12235 104 8 omitted omitted JJ 12235 104 9 terms term NNS 12235 104 10 when when WRB 12235 104 11 the the DT 12235 104 12 Long Long NNP 12235 104 13 - - HYPH 12235 104 14 S S NNP 12235 104 15 is be VBZ 12235 104 16 not not RB 12235 104 17 corrected correct VBN 12235 104 18 to to IN 12235 104 19 a a DT 12235 104 20 standard standard NN 12235 104 21 < < XX 12235 104 22 s s NNP 12235 104 23 > > XX 12235 104 24 , , , 12235 104 25 as as RB 12235 104 26 well well RB 12235 104 27 as as IN 12235 104 28 a a DT 12235 104 29 visualization visualization NN 12235 104 30 of of IN 12235 104 31 the the DT 12235 104 32 number number NN 12235 104 33 of of IN 12235 104 34 terms term NNS 12235 104 35 omitted omit VBN 12235 104 36 as as IN 12235 104 37 they -PRON- PRP 12235 104 38 relate relate VBP 12235 104 39 to to IN 12235 104 40 the the DT 12235 104 41 encyclopedia encyclopedia NNS 12235 104 42 entry entry NN 12235 104 43 length length NN 12235 104 44 . . . 12235 105 1 For for IN 12235 105 2 each each DT 12235 105 3 of of IN 12235 105 4 the the DT 12235 105 5 30 30 CD 12235 105 6 sample sample NN 12235 105 7 entries entry NNS 12235 105 8 automatically automatically RB 12235 105 9 indexed index VBN 12235 105 10 with with IN 12235 105 11 HIVE HIVE NNP 12235 105 12 , , , 12235 105 13 a a DT 12235 105 14 fixed fix VBN 12235 105 15 maximum maximum JJ 12235 105 16 number number NN 12235 105 17 of of IN 12235 105 18 40 40 CD 12235 105 19 entries entry NNS 12235 105 20 were be VBD 12235 105 21 returned return VBN 12235 105 22 : : : 12235 105 23 a a DT 12235 105 24 maximum maximum NN 12235 105 25 of of IN 12235 105 26 20 20 CD 12235 105 27 terms term NNS 12235 105 28 using use VBG 12235 105 29 the the DT 12235 105 30 2018 2018 CD 12235 105 31 LCSH LCSH NNP 12235 105 32 , , , 12235 105 33 and and CC 12235 105 34 a a DT 12235 105 35 maximum maximum NN 12235 105 36 of of IN 12235 105 37 20 20 CD 12235 105 38 terms term NNS 12235 105 39 using use VBG 12235 105 40 the the DT 12235 105 41 1910 1910 CD 12235 105 42 LCSH LCSH NNP 12235 105 43 . . . 12235 106 1 As as IN 12235 106 2 seen see VBN 12235 106 3 in in IN 12235 106 4 table table NN 12235 106 5 1 1 CD 12235 106 6 , , , 12235 106 7 central central JJ 12235 106 8 tendency tendency NN 12235 106 9 is be VBZ 12235 106 10 measured measure VBN 12235 106 11 using use VBG 12235 106 12 the the DT 12235 106 13 arithmetic arithmetic JJ 12235 106 14 mean mean NNP 12235 106 15 and and CC 12235 106 16 median median NNP 12235 106 17 , , , 12235 106 18 along along IN 12235 106 19 with with IN 12235 106 20 the the DT 12235 106 21 standard standard JJ 12235 106 22 deviation deviation NN 12235 106 23 and and CC 12235 106 24 range range NN 12235 106 25 . . . 12235 107 1 The the DT 12235 107 2 average average JJ 12235 107 3 number number NN 12235 107 4 of of IN 12235 107 5 terms term NNS 12235 107 6 omitted omit VBN 12235 107 7 from from IN 12235 107 8 an an DT 12235 107 9 entry entry NN 12235 107 10 ’s ’s POS 12235 107 11 results result NNS 12235 107 12 is be VBZ 12235 107 13 6.73 6.73 CD 12235 107 14 , , , 12235 107 15 and and CC 12235 107 16 the the DT 12235 107 17 average average JJ 12235 107 18 percentage percentage NN 12235 107 19 of of IN 12235 107 20 terms term NNS 12235 107 21 omitted omit VBN 12235 107 22 from from IN 12235 107 23 an an DT 12235 107 24 entry entry NN 12235 107 25 ’s ’s POS 12235 107 26 results result NNS 12235 107 27 is be VBZ 12235 107 28 26.51 26.51 CD 12235 107 29 percent percent NN 12235 107 30 , , , 12235 107 31 with with IN 12235 107 32 the the DT 12235 107 33 2018 2018 CD 12235 107 34 and and CC 12235 107 35 1910 1910 CD 12235 107 36 editions edition NNS 12235 107 37 of of IN 12235 107 38 LCSH LCSH NNP 12235 107 39 performing perform VBG 12235 107 40 at at IN 12235 107 41 similar similar JJ 12235 107 42 rates rate NNS 12235 107 43 . . . 12235 108 1 The the DT 12235 108 2 full full JJ 12235 108 3 results result NNS 12235 108 4 are be VBP 12235 108 5 displayed display VBN 12235 108 6 in in IN 12235 108 7 appendix appendix NNP 12235 108 8 A. A. NNP 12235 109 1 Table table NN 12235 109 2 1 1 CD 12235 109 3 . . . 12235 110 1 Measures measure NNS 12235 110 2 of of IN 12235 110 3 centrality centrality NN 12235 110 4 , , , 12235 110 5 standard standard JJ 12235 110 6 deviation deviation NN 12235 110 7 , , , 12235 110 8 range range NN 12235 110 9 , , , 12235 110 10 and and CC 12235 110 11 percentage percentage NN 12235 110 12 for for IN 12235 110 13 quantity quantity NN 12235 110 14 of of IN 12235 110 15 terms term NNS 12235 110 16 omitted omit VBN 12235 110 17 when when WRB 12235 110 18 the the DT 12235 110 19 Long Long NNP 12235 110 20 - - HYPH 12235 110 21 S S NNP 12235 110 22 is be VBZ 12235 110 23 not not RB 12235 110 24 corrected correct VBN 12235 110 25 to to IN 12235 110 26 a a DT 12235 110 27 standard standard NN 12235 110 28 < < XX 12235 110 29 s s NNP 12235 110 30 > > XX 12235 110 31 , , , 12235 110 32 rounded round VBD 12235 110 33 to to IN 12235 110 34 the the DT 12235 110 35 hundredth hundredth JJ 12235 110 36 . . . 12235 111 1 For for IN 12235 111 2 each each DT 12235 111 3 entry entry NN 12235 111 4 , , , 12235 111 5 a a DT 12235 111 6 maximum maximum NN 12235 111 7 of of IN 12235 111 8 40 40 CD 12235 111 9 terms term NNS 12235 111 10 were be VBD 12235 111 11 returned return VBN 12235 111 12 : : : 12235 111 13 20 20 CD 12235 111 14 using use VBG 12235 111 15 2018 2018 CD 12235 111 16 LCSH LCSH NNP 12235 111 17 and and CC 12235 111 18 20 20 CD 12235 111 19 using use VBG 12235 111 20 1910 1910 CD 12235 111 21 LCSH LCSH NNP 12235 111 22 . . . 12235 112 1 The the DT 12235 112 2 total total JJ 12235 112 3 results result NNS 12235 112 4 returned return VBD 12235 112 5 varies varie NNS 12235 112 6 according accord VBG 12235 112 7 to to IN 12235 112 8 the the DT 12235 112 9 entry entry NN 12235 112 10 length length NN 12235 112 11 . . . 12235 113 1 These these DT 12235 113 2 totals total NNS 12235 113 3 are be VBP 12235 113 4 reported report VBN 12235 113 5 in in IN 12235 113 6 appendix appendix NNP 12235 113 7 B. B. NNP 12235 114 1 ( ( -LRB- 12235 114 2 N= N= NNP 12235 114 3 30 30 CD 12235 114 4 entries entry NNS 12235 114 5 . . . 12235 114 6 ) ) -RRB- 12235 115 1 For for IN 12235 115 2 each each DT 12235 115 3 entry entry NN 12235 115 4 in in IN 12235 115 5 the the DT 12235 115 6 sample sample NN 12235 115 7 , , , 12235 115 8 the the DT 12235 115 9 results result NNS 12235 115 10 in in IN 12235 115 11 appendix appendix NNP 12235 115 12 A a DT 12235 115 13 display display NN 12235 115 14 the the DT 12235 115 15 total total JJ 12235 115 16 words word NNS 12235 115 17 omitted omit VBN 12235 115 18 when when WRB 12235 115 19 the the DT 12235 115 20 Long Long NNP 12235 115 21 - - HYPH 12235 115 22 S S NNP 12235 115 23 is be VBZ 12235 115 24 not not RB 12235 115 25 corrected correct VBN 12235 115 26 , , , 12235 115 27 the the DT 12235 115 28 number number NN 12235 115 29 of of IN 12235 115 30 2018 2018 CD 12235 115 31 LCSH LCSH NNP 12235 115 32 terms term NNS 12235 115 33 omitted omit VBN 12235 115 34 , , , 12235 115 35 the the DT 12235 115 36 number number NN 12235 115 37 of of IN 12235 115 38 1910 1910 CD 12235 115 39 LCSH LCSH NNP 12235 115 40 terms term NNS 12235 115 41 omitted omit VBD 12235 115 42 , , , 12235 115 43 and and CC 12235 115 44 the the DT 12235 115 45 encyclopedia encyclopedia NNS 12235 115 46 entry entry NN 12235 115 47 word word NN 12235 115 48 count count NN 12235 115 49 . . . 12235 116 1 Figure figure NN 12235 116 2 5 5 CD 12235 116 3 visualizes visualize VBZ 12235 116 4 the the DT 12235 116 5 total total JJ 12235 116 6 number number NN 12235 116 7 of of IN 12235 116 8 terms term NNS 12235 116 9 omitted omit VBN 12235 116 10 for for IN 12235 116 11 each each DT 12235 116 12 entry entry NN 12235 116 13 when when WRB 12235 116 14 the the DT 12235 116 15 Long Long NNP 12235 116 16 - - HYPH 12235 116 17 S S NNP 12235 116 18 is be VBZ 12235 116 19 not not RB 12235 116 20 corrected correct VBN 12235 116 21 , , , 12235 116 22 demonstrating demonstrate VBG 12235 116 23 an an DT 12235 116 24 increase increase NN 12235 116 25 in in IN 12235 116 26 terms term NNS 12235 116 27 omitted omit VBN 12235 116 28 for for IN 12235 116 29 entries entry NNS 12235 116 30 with with IN 12235 116 31 lower low JJR 12235 116 32 word word NN 12235 116 33 counts count NNS 12235 116 34 . . . 12235 117 1 These these DT 12235 117 2 results result NNS 12235 117 3 are be VBP 12235 117 4 broken break VBN 12235 117 5 down down RP 12235 117 6 by by IN 12235 117 7 vocabulary vocabulary NN 12235 117 8 used use VBN 12235 117 9 in in IN 12235 117 10 figure figure NN 12235 117 11 6 6 CD 12235 117 12 , , , 12235 117 13 demonstrating demonstrate VBG 12235 117 14 that that IN 12235 117 15 both both DT 12235 117 16 vocabularies vocabulary NNS 12235 117 17 used use VBD 12235 117 18 to to TO 12235 117 19 generate generate VB 12235 117 20 these these DT 12235 117 21 results result NNS 12235 117 22 indicate indicate VBP 12235 117 23 a a DT 12235 117 24 significant significant JJ 12235 117 25 increase increase NN 12235 117 26 in in IN 12235 117 27 omitted omitted JJ 12235 117 28 terms term NNS 12235 117 29 for for IN 12235 117 30 shorter short JJR 12235 117 31 entries entry NNS 12235 117 32 . . . 12235 118 1 Column1 Column1 NNP 12235 118 2 Both both DT 12235 118 3 Vocabularies vocabulary NNS 12235 118 4 2018 2018 CD 12235 118 5 LCSH LCSH NNP 12235 118 6 1910 1910 CD 12235 118 7 LCSH LCSH NNP 12235 118 8 Average Average NNP 12235 118 9 , , , 12235 118 10 Terms term NNS 12235 118 11 Omitted omit VBN 12235 118 12 6.73 6.73 CD 12235 118 13 3.67 3.67 CD 12235 118 14 3.07 3.07 CD 12235 118 15 Median Median NNP 12235 118 16 , , , 12235 118 17 Terms term NNS 12235 118 18 Omitted omit VBD 12235 118 19 5 5 CD 12235 118 20 3 3 CD 12235 118 21 2 2 CD 12235 118 22 Standard Standard NNP 12235 118 23 Deviation deviation NN 12235 118 24 6.53 6.53 CD 12235 118 25 3.84 3.84 CD 12235 118 26 3.17 3.17 CD 12235 118 27 Range Range NNP 12235 118 28 , , , 12235 118 29 Terms term NNS 12235 118 30 Omitted omit VBD 12235 118 31 0 0 CD 12235 118 32 - - SYM 12235 118 33 24 24 CD 12235 118 34 0 0 CD 12235 118 35 - - SYM 12235 118 36 13 13 CD 12235 118 37 0 0 CD 12235 118 38 - - HYPH 12235 118 39 11 11 CD 12235 118 40 Average Average NNP 12235 118 41 Percentage percentage NN 12235 118 42 , , , 12235 118 43 Omitted Omitted NNP 12235 118 44 Terms term NNS 12235 118 45 26.51 26.51 CD 12235 118 46 % % NN 12235 118 47 27.51 27.51 CD 12235 118 48 % % NN 12235 118 49 24.28 24.28 CD 12235 118 50 % % NN 12235 118 51 Median median JJ 12235 118 52 Percentage Percentage NNP 12235 118 53 , , , 12235 118 54 Omitted Omitted NNP 12235 118 55 Terms term NNS 12235 118 56 22.36 22.36 CD 12235 118 57 % % NN 12235 118 58 20.00 20.00 CD 12235 118 59 % % NN 12235 118 60 19.09 19.09 CD 12235 118 61 % % NN 12235 118 62 INFORMATION information NN 12235 118 63 TECHNOLOGY technology NN 12235 118 64 AND and CC 12235 118 65 LIBRARIES library NNS 12235 118 66 SEPTEMBER SEPTEMBER NNP 12235 118 67 2020 2020 CD 12235 118 68 EVALUATING evaluate VBG 12235 118 69 THE the DT 12235 118 70 IMPACT impact NN 12235 118 71 OF of IN 12235 118 72 THE the DT 12235 118 73 LONG long JJ 12235 118 74 - - HYPH 12235 118 75 S s NN 12235 118 76 | | NNP 12235 118 77 GRABUS grabus NN 12235 118 78 8 8 CD 12235 118 79 Figure figure NN 12235 118 80 5 5 CD 12235 118 81 . . . 12235 119 1 Number number NN 12235 119 2 of of IN 12235 119 3 automatic automatic JJ 12235 119 4 subject subject JJ 12235 119 5 indexing indexing NN 12235 119 6 terms term NNS 12235 119 7 that that WDT 12235 119 8 are be VBP 12235 119 9 omitted omit VBN 12235 119 10 when when WRB 12235 119 11 the the DT 12235 119 12 Long Long NNP 12235 119 13 - - HYPH 12235 119 14 S S NNP 12235 119 15 is be VBZ 12235 119 16 not not RB 12235 119 17 corrected correct VBN 12235 119 18 to to IN 12235 119 19 a a DT 12235 119 20 standard standard NN 12235 119 21 < < XX 12235 119 22 s s XX 12235 119 23 > > XX 12235 119 24 as as IN 12235 119 25 compared compare VBN 12235 119 26 by by IN 12235 119 27 encyclopedia encyclopedia NNS 12235 119 28 entry entry NN 12235 119 29 word word NN 12235 119 30 count count NN 12235 119 31 . . . 12235 120 1 Figure figure NN 12235 120 2 6 6 CD 12235 120 3 . . . 12235 121 1 Number number NN 12235 121 2 of of IN 12235 121 3 automatic automatic JJ 12235 121 4 subject subject JJ 12235 121 5 indexing indexing NN 12235 121 6 terms term NNS 12235 121 7 that that WDT 12235 121 8 are be VBP 12235 121 9 omitted omit VBN 12235 121 10 when when WRB 12235 121 11 the the DT 12235 121 12 Long Long NNP 12235 121 13 - - HYPH 12235 121 14 S S NNP 12235 121 15 is be VBZ 12235 121 16 not not RB 12235 121 17 corrected correct VBN 12235 121 18 to to IN 12235 121 19 a a DT 12235 121 20 standard standard NN 12235 121 21 < < XX 12235 121 22 s s XX 12235 121 23 > > XX 12235 121 24 as as IN 12235 121 25 compared compare VBN 12235 121 26 by by IN 12235 121 27 encyclopedia encyclopedia NNS 12235 121 28 entry entry NNP 12235 121 29 word word NN 12235 121 30 count count NN 12235 121 31 , , , 12235 121 32 separated separate VBN 12235 121 33 by by IN 12235 121 34 controlled control VBN 12235 121 35 vocabulary vocabulary JJ 12235 121 36 version version NN 12235 121 37 . . . 12235 122 1 INFORMATION INFORMATION NNP 12235 122 2 TECHNOLOGY technology NN 12235 122 3 AND and CC 12235 122 4 LIBRARIES library NNS 12235 122 5 SEPTEMBER SEPTEMBER NNP 12235 122 6 2020 2020 CD 12235 122 7 EVALUATING evaluate VBG 12235 122 8 THE the DT 12235 122 9 IMPACT impact NN 12235 122 10 OF of IN 12235 122 11 THE the DT 12235 122 12 LONG long JJ 12235 122 13 - - HYPH 12235 122 14 S S NNP 12235 122 15 | | NNP 12235 122 16 GRABUS grabus NN 12235 122 17 9 9 CD 12235 122 18 DISCUSSION discussion NN 12235 122 19 The the DT 12235 122 20 analysis analysis NN 12235 122 21 above above IN 12235 122 22 presents present VBZ 12235 122 23 measures measure NNS 12235 122 24 of of IN 12235 122 25 centrality centrality NN 12235 122 26 for for IN 12235 122 27 quantity quantity NN 12235 122 28 of of IN 12235 122 29 terms term NNS 12235 122 30 omitted omit VBN 12235 122 31 if if IN 12235 122 32 the the DT 12235 122 33 Long Long NNP 12235 122 34 - - HYPH 12235 122 35 S S NNP 12235 122 36 is be VBZ 12235 122 37 not not RB 12235 122 38 corrected correct VBN 12235 122 39 to to IN 12235 122 40 a a DT 12235 122 41 standard standard NN 12235 122 42 < < XX 12235 122 43 s s NNP 12235 122 44 > > XX 12235 122 45 prior prior RB 12235 122 46 to to IN 12235 122 47 automatic automatic JJ 12235 122 48 subject subject JJ 12235 122 49 indexing indexing NN 12235 122 50 using use VBG 12235 122 51 HIVE HIVE NNP 12235 122 52 , , , 12235 122 53 as as RB 12235 122 54 well well RB 12235 122 55 as as IN 12235 122 56 a a DT 12235 122 57 visualization visualization NN 12235 122 58 to to TO 12235 122 59 represent represent VB 12235 122 60 the the DT 12235 122 61 relationship relationship NN 12235 122 62 between between IN 12235 122 63 encyclopedia encyclopedia NNP 12235 122 64 entry entry NN 12235 122 65 word word NN 12235 122 66 count count NN 12235 122 67 and and CC 12235 122 68 number number NN 12235 122 69 of of IN 12235 122 70 terms term NNS 12235 122 71 omitted omit VBN 12235 122 72 . . . 12235 123 1 Although although IN 12235 123 2 researchers researcher NNS 12235 123 3 have have VBP 12235 123 4 identified identify VBN 12235 123 5 challenges challenge NNS 12235 123 6 with with IN 12235 123 7 the the DT 12235 123 8 Long Long NNP 12235 123 9 - - HYPH 12235 123 10 S S NNP 12235 123 11 and and CC 12235 123 12 have have VBP 12235 123 13 focused focus VBN 12235 123 14 a a DT 12235 123 15 great great JJ 12235 123 16 deal deal NN 12235 123 17 on on IN 12235 123 18 the the DT 12235 123 19 technologies technology NNS 12235 123 20 and and CC 12235 123 21 methods method NNS 12235 123 22 used use VBN 12235 123 23 to to TO 12235 123 24 correct correct VB 12235 123 25 it -PRON- PRP 12235 123 26 , , , 12235 123 27 there there EX 12235 123 28 is be VBZ 12235 123 29 still still RB 12235 123 30 limited limited JJ 12235 123 31 work work NN 12235 123 32 on on IN 12235 123 33 looking look VBG 12235 123 34 at at IN 12235 123 35 the the DT 12235 123 36 results result NNS 12235 123 37 of of IN 12235 123 38 not not RB 12235 123 39 correcting correct VBG 12235 123 40 the the DT 12235 123 41 Long long JJ 12235 123 42 - - HYPH 12235 123 43 S S NNP 12235 123 44 character character NN 12235 123 45 when when WRB 12235 123 46 performing perform VBG 12235 123 47 an an DT 12235 123 48 automatic automatic JJ 12235 123 49 subject subject JJ 12235 123 50 indexing indexing NN 12235 123 51 sequence sequence NN 12235 123 52 . . . 12235 124 1 This this DT 12235 124 2 research research NN 12235 124 3 demonstrated demonstrate VBD 12235 124 4 an an DT 12235 124 5 average average NN 12235 124 6 of of IN 12235 124 7 6.73 6.73 CD 12235 124 8 potentially potentially RB 12235 124 9 relevant relevant JJ 12235 124 10 terms term NNS 12235 124 11 omitted omit VBN 12235 124 12 from from IN 12235 124 13 automatic automatic JJ 12235 124 14 indexing indexing NN 12235 124 15 results result NNS 12235 124 16 when when WRB 12235 124 17 the the DT 12235 124 18 Long Long NNP 12235 124 19 - - HYPH 12235 124 20 S S NNP 12235 124 21 is be VBZ 12235 124 22 not not RB 12235 124 23 corrected correct VBN 12235 124 24 , , , 12235 124 25 accounting account VBG 12235 124 26 for for IN 12235 124 27 an an DT 12235 124 28 average average NN 12235 124 29 of of IN 12235 124 30 26.51 26.51 CD 12235 124 31 percent percent NN 12235 124 32 of of IN 12235 124 33 the the DT 12235 124 34 total total JJ 12235 124 35 results result NNS 12235 124 36 , , , 12235 124 37 with with IN 12235 124 38 an an DT 12235 124 39 approximately approximately RB 12235 124 40 equal equal JJ 12235 124 41 distribution distribution NN 12235 124 42 of of IN 12235 124 43 omitted omit VBN 12235 124 44 terms term NNS 12235 124 45 across across IN 12235 124 46 the the DT 12235 124 47 two two CD 12235 124 48 controlled control VBN 12235 124 49 vocabulary vocabulary JJ 12235 124 50 versions version NNS 12235 124 51 used use VBN 12235 124 52 . . . 12235 125 1 When when WRB 12235 125 2 the the DT 12235 125 3 quantity quantity NN 12235 125 4 of of IN 12235 125 5 terms term NNS 12235 125 6 omitted omit VBN 12235 125 7 is be VBZ 12235 125 8 visualized visualize VBN 12235 125 9 using use VBG 12235 125 10 a a DT 12235 125 11 continuous continuous JJ 12235 125 12 scatterplot scatterplot NN 12235 125 13 , , , 12235 125 14 the the DT 12235 125 15 results result NNS 12235 125 16 also also RB 12235 125 17 demonstrated demonstrate VBD 12235 125 18 a a DT 12235 125 19 significant significant JJ 12235 125 20 increase increase NN 12235 125 21 in in IN 12235 125 22 omitted omitted JJ 12235 125 23 terms term NNS 12235 125 24 for for IN 12235 125 25 shorter short JJR 12235 125 26 entries entry NNS 12235 125 27 , , , 12235 125 28 with with IN 12235 125 29 longer long JJR 12235 125 30 entries entry NNS 12235 125 31 less less RBR 12235 125 32 affected affected JJ 12235 125 33 . . . 12235 126 1 These these DT 12235 126 2 results result NNS 12235 126 3 reflect reflect VBP 12235 126 4 the the DT 12235 126 5 impact impact NN 12235 126 6 of of IN 12235 126 7 term term NN 12235 126 8 frequency frequency NN 12235 126 9 and and CC 12235 126 10 total total JJ 12235 126 11 word word NN 12235 126 12 count count NN 12235 126 13 in in IN 12235 126 14 keyword keyword NNP 12235 126 15 extraction extraction NNP 12235 126 16 and and CC 12235 126 17 automatic automatic JJ 12235 126 18 subject subject JJ 12235 126 19 indexing indexing NN 12235 126 20 , , , 12235 126 21 with with IN 12235 126 22 longer long JJR 12235 126 23 documents document NNS 12235 126 24 having have VBG 12235 126 25 a a DT 12235 126 26 greater great JJR 12235 126 27 pool pool NN 12235 126 28 of of IN 12235 126 29 total total JJ 12235 126 30 terms term NNS 12235 126 31 from from IN 12235 126 32 which which WDT 12235 126 33 to to TO 12235 126 34 identify identify VB 12235 126 35 key key JJ 12235 126 36 terms term NNS 12235 126 37 . . . 12235 127 1 Considering consider VBG 12235 127 2 the the DT 12235 127 3 complexities complexity NNS 12235 127 4 and and CC 12235 127 5 similarities similarity NNS 12235 127 6 of of IN 12235 127 7 the the DT 12235 127 8 typographical typographical JJ 12235 127 9 characters character NNS 12235 127 10 in in IN 12235 127 11 the the DT 12235 127 12 original original JJ 12235 127 13 manuscript manuscript NN 12235 127 14 , , , 12235 127 15 the the DT 12235 127 16 OCR OCR NNP 12235 127 17 output output NN 12235 127 18 process process NN 12235 127 19 for for IN 12235 127 20 this this DT 12235 127 21 corpus corpus NNP 12235 127 22 occasionally occasionally RB 12235 127 23 mistakes mistake VBZ 12235 127 24 the the DT 12235 127 25 letters letter NNS 12235 127 26 < < XX 12235 127 27 s s NNP 12235 127 28 > > XX 12235 127 29 , , , 12235 127 30 < < XX 12235 127 31 f f NNP 12235 127 32 > > XX 12235 127 33 , , , 12235 127 34 < < XX 12235 127 35 r r NNP 12235 127 36 > > XX 12235 127 37 , , , 12235 127 38 and and CC 12235 127 39 < < XX 12235 127 40 l l NNP 12235 127 41 > > XX 12235 127 42 . . . 12235 128 1 As as IN 12235 128 2 a a DT 12235 128 3 result result NN 12235 128 4 , , , 12235 128 5 an an DT 12235 128 6 occasional occasional JJ 12235 128 7 Long long JJ 12235 128 8 - - HYPH 12235 128 9 S s NN 12235 128 10 word word NN 12235 128 11 in in IN 12235 128 12 this this DT 12235 128 13 study study NN 12235 128 14 did do VBD 12235 128 15 not not RB 12235 128 16 originally originally RB 12235 128 17 contain contain VB 12235 128 18 an an DT 12235 128 19 < < XX 12235 128 20 s s XX 12235 128 21 > > XX 12235 128 22 ( ( -LRB- 12235 128 23 e.g. e.g. RB 12235 128 24 , , , 12235 128 25 sor sor NN 12235 128 26 instead instead RB 12235 128 27 of of IN 12235 128 28 for for IN 12235 128 29 ) ) -RRB- 12235 128 30 . . . 12235 129 1 Correction correction NN 12235 129 2 of of IN 12235 129 3 these these DT 12235 129 4 Long long JJ 12235 129 5 - - HYPH 12235 129 6 S S NNP 12235 129 7 OCR OCR NNP 12235 129 8 errors error NNS 12235 129 9 requires require VBZ 12235 129 10 the the DT 12235 129 11 development development NN 12235 129 12 of of IN 12235 129 13 a a DT 12235 129 14 dictionary dictionary NN 12235 129 15 - - HYPH 12235 129 16 based base VBN 12235 129 17 script script NN 12235 129 18 . . . 12235 130 1 An an DT 12235 130 2 additional additional JJ 12235 130 3 complication complication NN 12235 130 4 of of IN 12235 130 5 this this DT 12235 130 6 research research NN 12235 130 7 is be VBZ 12235 130 8 that that IN 12235 130 9 the the DT 12235 130 10 corrected correct VBN 12235 130 11 OCR ocr NN 12235 130 12 output output NN 12235 130 13 for for IN 12235 130 14 the the DT 12235 130 15 encyclopedia encyclopedia JJ 12235 130 16 entries entry NNS 12235 130 17 still still RB 12235 130 18 contains contain VBZ 12235 130 19 a a DT 12235 130 20 few few JJ 12235 130 21 errors error NNS 12235 130 22 not not RB 12235 130 23 related relate VBN 12235 130 24 to to IN 12235 130 25 the the DT 12235 130 26 Long Long NNP 12235 130 27 - - HYPH 12235 130 28 S S NNP 12235 130 29 , , , 12235 130 30 which which WDT 12235 130 31 will will MD 12235 130 32 prevent prevent VB 12235 130 33 the the DT 12235 130 34 mapping mapping NN 12235 130 35 of of IN 12235 130 36 the the DT 12235 130 37 term term NN 12235 130 38 to to IN 12235 130 39 any any DT 12235 130 40 controlled controlled JJ 12235 130 41 vocabulary vocabulary NN 12235 130 42 term term NN 12235 130 43 ( ( -LRB- 12235 130 44 e.g. e.g. RB 12235 130 45 , , , 12235 130 46 in in IN 12235 130 47 the the DT 12235 130 48 entry entry NN 12235 130 49 on on IN 12235 130 50 Sepulchre Sepulchre NNP 12235 130 51 , , , 12235 130 52 the the DT 12235 130 53 OCR OCR NNP 12235 130 54 output output NN 12235 130 55 for for IN 12235 130 56 the the DT 12235 130 57 term term NN 12235 130 58 Palestine Palestine NNP 12235 130 59 was be VBD 12235 130 60 Palestinc Palestinc NNP 12235 130 61 ) ) -RRB- 12235 130 62 . . . 12235 131 1 These these DT 12235 131 2 results result NNS 12235 131 3 are be VBP 12235 131 4 specific specific JJ 12235 131 5 to to IN 12235 131 6 this this DT 12235 131 7 particular particular JJ 12235 131 8 corpus corpus NN 12235 131 9 of of IN 12235 131 10 3rd 3rd NNP 12235 131 11 edition edition NN 12235 131 12 Encyclopedia Encyclopedia NNP 12235 131 13 Britannica Britannica NNP 12235 131 14 entries entrie VBZ 12235 131 15 , , , 12235 131 16 but but CC 12235 131 17 it -PRON- PRP 12235 131 18 is be VBZ 12235 131 19 very very RB 12235 131 20 likely likely JJ 12235 131 21 that that IN 12235 131 22 testing test VBG 12235 131 23 another another DT 12235 131 24 set set NN 12235 131 25 of of IN 12235 131 26 pre-1800s pre-1800s `` 12235 131 27 documents document NNS 12235 131 28 containing contain VBG 12235 131 29 the the DT 12235 131 30 Long Long NNP 12235 131 31 - - HYPH 12235 131 32 S S NNP 12235 131 33 would would MD 12235 131 34 also also RB 12235 131 35 illustrate illustrate VB 12235 131 36 that that IN 12235 131 37 for for IN 12235 131 38 best good JJS 12235 131 39 results result NNS 12235 131 40 with with IN 12235 131 41 any any DT 12235 131 42 algorithm algorithm NN 12235 131 43 or or CC 12235 131 44 tool tool NN 12235 131 45 , , , 12235 131 46 the the DT 12235 131 47 Long Long NNP 12235 131 48 - - HYPH 12235 131 49 S S NNP 12235 131 50 needs need VBZ 12235 131 51 to to TO 12235 131 52 be be VB 12235 131 53 corrected correct VBN 12235 131 54 . . . 12235 132 1 The the DT 12235 132 2 results result NNS 12235 132 3 are be VBP 12235 132 4 also also RB 12235 132 5 specific specific JJ 12235 132 6 to to IN 12235 132 7 the the DT 12235 132 8 two two CD 12235 132 9 versions version NNS 12235 132 10 of of IN 12235 132 11 the the DT 12235 132 12 LCSH LCSH NNP 12235 132 13 used use VBD 12235 132 14 , , , 12235 132 15 both both CC 12235 132 16 the the DT 12235 132 17 1910 1910 CD 12235 132 18 LCSH LCSH NNP 12235 132 19 and and CC 12235 132 20 the the DT 12235 132 21 2018 2018 CD 12235 132 22 LCSH LCSH NNP 12235 132 23 , , , 12235 132 24 which which WDT 12235 132 25 are be VBP 12235 132 26 available available JJ 12235 132 27 in in IN 12235 132 28 the the DT 12235 132 29 HIVE HIVE NNP 12235 132 30 tool tool NN 12235 132 31 . . . 12235 133 1 The the DT 12235 133 2 1910 1910 CD 12235 133 3 version version NN 12235 133 4 is be VBZ 12235 133 5 key key JJ 12235 133 6 for for IN 12235 133 7 the the DT 12235 133 8 time time NN 12235 133 9 period period NN 12235 133 10 being be VBG 12235 133 11 studied study VBN 12235 133 12 , , , 12235 133 13 and and CC 12235 133 14 the the DT 12235 133 15 2018 2018 CD 12235 133 16 , , , 12235 133 17 more more RBR 12235 133 18 contemporary contemporary JJ 12235 133 19 to to IN 12235 133 20 today today NN 12235 133 21 , , , 12235 133 22 has have VBZ 12235 133 23 supported support VBN 12235 133 24 additional additional JJ 12235 133 25 analysis analysis NN 12235 133 26 on on IN 12235 133 27 the the DT 12235 133 28 impact impact NN 12235 133 29 of of IN 12235 133 30 the the DT 12235 133 31 Long Long NNP 12235 133 32 - - HYPH 12235 133 33 S. S. NNP 12235 134 1 Both both DT 12235 134 2 of of IN 12235 134 3 these these DT 12235 134 4 vocabularies vocabulary NNS 12235 134 5 are be VBP 12235 134 6 important important JJ 12235 134 7 to to IN 12235 134 8 the the DT 12235 134 9 larger large JJR 12235 134 10 19th 19th JJ 12235 134 11 - - HYPH 12235 134 12 Century Century NNP 12235 134 13 Knowledge Knowledge NNP 12235 134 14 Project Project NNP 12235 134 15 . . . 12235 135 1 It -PRON- PRP 12235 135 2 should should MD 12235 135 3 be be VB 12235 135 4 noted note VBN 12235 135 5 that that IN 12235 135 6 while while IN 12235 135 7 the the DT 12235 135 8 LCSH LCSH NNP 12235 135 9 is be VBZ 12235 135 10 updated update VBN 12235 135 11 weekly weekly RB 12235 135 12 , , , 12235 135 13 we -PRON- PRP 12235 135 14 were be VBD 12235 135 15 limited limited JJ 12235 135 16 to to IN 12235 135 17 what what WP 12235 135 18 is be VBZ 12235 135 19 available available JJ 12235 135 20 via via IN 12235 135 21 the the DT 12235 135 22 HIVE HIVE NNP 12235 135 23 tool tool NN 12235 135 24 , , , 12235 135 25 and and CC 12235 135 26 any any DT 12235 135 27 discrepancies discrepancy NNS 12235 135 28 that that WDT 12235 135 29 may may MD 12235 135 30 be be VB 12235 135 31 found find VBN 12235 135 32 with with IN 12235 135 33 the the DT 12235 135 34 2020 2020 CD 12235 135 35 LCSH LCSH NNP 12235 135 36 will will MD 12235 135 37 very very RB 12235 135 38 likely likely RB 12235 135 39 have have VB 12235 135 40 a a DT 12235 135 41 minimal minimal JJ 12235 135 42 effect effect NN 12235 135 43 upon upon IN 12235 135 44 metadata metadata NN 12235 135 45 generation generation NN 12235 135 46 results result NNS 12235 135 47 . . . 12235 136 1 It -PRON- PRP 12235 136 2 should should MD 12235 136 3 be be VB 12235 136 4 noted note VBN 12235 136 5 that that IN 12235 136 6 the the DT 12235 136 7 2020 2020 CD 12235 136 8 LCSH LCSH NNP 12235 136 9 will will MD 12235 136 10 be be VB 12235 136 11 incorporated incorporate VBN 12235 136 12 into into IN 12235 136 13 HIVE HIVE NNP 12235 136 14 soon soon RB 12235 136 15 and and CC 12235 136 16 can can MD 12235 136 17 be be VB 12235 136 18 explored explore VBN 12235 136 19 in in IN 12235 136 20 future future JJ 12235 136 21 research research NN 12235 136 22 . . . 12235 137 1 CONCLUSION conclusion NN 12235 137 2 AND and CC 12235 137 3 NEXT NEXT NNP 12235 137 4 STEPS STEPS NNP 12235 137 5 The the DT 12235 137 6 objective objective NN 12235 137 7 of of IN 12235 137 8 this this DT 12235 137 9 research research NN 12235 137 10 was be VBD 12235 137 11 to to TO 12235 137 12 determine determine VB 12235 137 13 the the DT 12235 137 14 impact impact NN 12235 137 15 of of IN 12235 137 16 correcting correct VBG 12235 137 17 the the DT 12235 137 18 Long Long NNP 12235 137 19 - - HYPH 12235 137 20 S s NN 12235 137 21 in in IN 12235 137 22 pre-1800s pre-1800s , 12235 137 23 documents document NNS 12235 137 24 when when WRB 12235 137 25 performing perform VBG 12235 137 26 an an DT 12235 137 27 automatic automatic JJ 12235 137 28 metadata metadata NN 12235 137 29 generation generation NN 12235 137 30 sequence sequence NN 12235 137 31 using use VBG 12235 137 32 keyword keyword NNP 12235 137 33 extraction extraction NNP 12235 137 34 and and CC 12235 137 35 controlled control VBN 12235 137 36 vocabulary vocabulary JJ 12235 137 37 mapping mapping NN 12235 137 38 . . . 12235 138 1 This this DT 12235 138 2 was be VBD 12235 138 3 accomplished accomplish VBN 12235 138 4 by by IN 12235 138 5 performing perform VBG 12235 138 6 an an DT 12235 138 7 automatic automatic JJ 12235 138 8 subject subject JJ 12235 138 9 indexing indexing NN 12235 138 10 sequence sequence NN 12235 138 11 using use VBG 12235 138 12 the the DT 12235 138 13 HIVE HIVE NNP 12235 138 14 tool tool NN 12235 138 15 , , , 12235 138 16 followed follow VBN 12235 138 17 by by IN 12235 138 18 a a DT 12235 138 19 basic basic JJ 12235 138 20 statistical statistical JJ 12235 138 21 analysis analysis NN 12235 138 22 to to TO 12235 138 23 determine determine VB 12235 138 24 the the DT 12235 138 25 quantity quantity NN 12235 138 26 of of IN 12235 138 27 terms term NNS 12235 138 28 omitted omit VBN 12235 138 29 from from IN 12235 138 30 the the DT 12235 138 31 results result NNS 12235 138 32 when when WRB 12235 138 33 the the DT 12235 138 34 Long Long NNP 12235 138 35 - - HYPH 12235 138 36 S S NNP 12235 138 37 is be VBZ 12235 138 38 not not RB 12235 138 39 corrected correct VBN 12235 138 40 to to IN 12235 138 41 a a DT 12235 138 42 standard standard NN 12235 138 43 < < XX 12235 138 44 s s NNP 12235 138 45 > > XX 12235 138 46 . . . 12235 139 1 The the DT 12235 139 2 number number NN 12235 139 3 of of IN 12235 139 4 omitted omit VBN 12235 139 5 terms term NNS 12235 139 6 was be VBD 12235 139 7 also also RB 12235 139 8 compared compare VBN 12235 139 9 with with IN 12235 139 10 the the DT 12235 139 11 encyclopedia encyclopedia NNS 12235 139 12 entry entry NN 12235 139 13 word word NN 12235 139 14 count count NN 12235 139 15 and and CC 12235 139 16 visualized visualize VBN 12235 139 17 to to TO 12235 139 18 demonstrate demonstrate VB 12235 139 19 a a DT 12235 139 20 significant significant JJ 12235 139 21 increase increase NN 12235 139 22 in in IN 12235 139 23 omitted omitted JJ 12235 139 24 terms term NNS 12235 139 25 for for IN 12235 139 26 shorter short JJR 12235 139 27 INFORMATION information NN 12235 139 28 TECHNOLOGY technology NN 12235 139 29 AND and CC 12235 139 30 LIBRARIES library NNS 12235 139 31 SEPTEMBER SEPTEMBER NNP 12235 139 32 2020 2020 CD 12235 139 33 EVALUATING evaluate VBG 12235 139 34 THE the DT 12235 139 35 IMPACT impact NN 12235 139 36 OF of IN 12235 139 37 THE the DT 12235 139 38 LONG long JJ 12235 139 39 - - HYPH 12235 139 40 S S NNP 12235 139 41 | | NNP 12235 139 42 GRABUS grabus NN 12235 139 43 10 10 CD 12235 139 44 encyclopedia encyclopedia NN 12235 139 45 entries entry NNS 12235 139 46 . . . 12235 140 1 The the DT 12235 140 2 study study NN 12235 140 3 was be VBD 12235 140 4 conclusive conclusive JJ 12235 140 5 in in IN 12235 140 6 confirming confirm VBG 12235 140 7 that that IN 12235 140 8 the the DT 12235 140 9 correction correction NN 12235 140 10 of of IN 12235 140 11 the the DT 12235 140 12 Long Long NNP 12235 140 13 - - HYPH 12235 140 14 S S NNP 12235 140 15 is be VBZ 12235 140 16 a a DT 12235 140 17 critical critical JJ 12235 140 18 part part NN 12235 140 19 of of IN 12235 140 20 our -PRON- PRP$ 12235 140 21 workflow workflow NN 12235 140 22 . . . 12235 141 1 The the DT 12235 141 2 significance significance NN 12235 141 3 of of IN 12235 141 4 this this DT 12235 141 5 research research NN 12235 141 6 is be VBZ 12235 141 7 that that IN 12235 141 8 it -PRON- PRP 12235 141 9 demonstrates demonstrate VBZ 12235 141 10 the the DT 12235 141 11 necessity necessity NN 12235 141 12 of of IN 12235 141 13 correcting correct VBG 12235 141 14 the the DT 12235 141 15 Long Long NNP 12235 141 16 - - HYPH 12235 141 17 S S NNP 12235 141 18 prior prior RB 12235 141 19 to to IN 12235 141 20 performing perform VBG 12235 141 21 an an DT 12235 141 22 automatic automatic JJ 12235 141 23 subject subject JJ 12235 141 24 indexing indexing NN 12235 141 25 on on IN 12235 141 26 historical historical JJ 12235 141 27 documents document NNS 12235 141 28 . . . 12235 142 1 Beyond beyond IN 12235 142 2 the the DT 12235 142 3 correction correction NN 12235 142 4 of of IN 12235 142 5 the the DT 12235 142 6 Long Long NNP 12235 142 7 - - HYPH 12235 142 8 S S NNP 12235 142 9 , , , 12235 142 10 the the DT 12235 142 11 larger large JJR 12235 142 12 next next JJ 12235 142 13 steps step NNS 12235 142 14 for for IN 12235 142 15 this this DT 12235 142 16 project project NN 12235 142 17 are be VBP 12235 142 18 to to TO 12235 142 19 continue continue VB 12235 142 20 to to TO 12235 142 21 explore explore VB 12235 142 22 automatic automatic JJ 12235 142 23 metadata metadata NN 12235 142 24 generation generation NN 12235 142 25 for for IN 12235 142 26 this this DT 12235 142 27 corpus corpus NN 12235 142 28 . . . 12235 143 1 These these DT 12235 143 2 next next JJ 12235 143 3 steps step NNS 12235 143 4 include include VBP 12235 143 5 the the DT 12235 143 6 comparison comparison NN 12235 143 7 of of IN 12235 143 8 results result NNS 12235 143 9 using use VBG 12235 143 10 contemporary contemporary JJ 12235 143 11 vs. vs. IN 12235 143 12 historical historical JJ 12235 143 13 vocabularies vocabulary NNS 12235 143 14 and and CC 12235 143 15 streamlining streamline VBG 12235 143 16 a a DT 12235 143 17 protocol protocol NN 12235 143 18 for for IN 12235 143 19 bulk bulk JJ 12235 143 20 classification classification NN 12235 143 21 procedures procedure NNS 12235 143 22 and and CC 12235 143 23 integration integration NN 12235 143 24 of of IN 12235 143 25 terms term NNS 12235 143 26 into into IN 12235 143 27 the the DT 12235 143 28 TEI tei NN 12235 143 29 - - HYPH 12235 143 30 XML xml NN 12235 143 31 headers header NNS 12235 143 32 . . . 12235 144 1 The the DT 12235 144 2 research research NN 12235 144 3 presented present VBN 12235 144 4 here here RB 12235 144 5 can can MD 12235 144 6 inform inform VB 12235 144 7 other other JJ 12235 144 8 digital digital JJ 12235 144 9 humanities humanity NNS 12235 144 10 and and CC 12235 144 11 even even RB 12235 144 12 science science NN 12235 144 13 - - HYPH 12235 144 14 oriented orient VBN 12235 144 15 projects project NNS 12235 144 16 , , , 12235 144 17 where where WRB 12235 144 18 researchers researcher NNS 12235 144 19 may may MD 12235 144 20 not not RB 12235 144 21 be be VB 12235 144 22 aware aware JJ 12235 144 23 of of IN 12235 144 24 the the DT 12235 144 25 impact impact NN 12235 144 26 of of IN 12235 144 27 the the DT 12235 144 28 Long Long NNP 12235 144 29 - - HYPH 12235 144 30 S S NNP 12235 144 31 on on IN 12235 144 32 automatic automatic JJ 12235 144 33 metadata metadata NN 12235 144 34 generation generation NN 12235 144 35 not not RB 12235 144 36 only only RB 12235 144 37 for for IN 12235 144 38 subjects subject NNS 12235 144 39 , , , 12235 144 40 but but CC 12235 144 41 also also RB 12235 144 42 named name VBN 12235 144 43 entities entity NNS 12235 144 44 , , , 12235 144 45 particularly particularly RB 12235 144 46 when when WRB 12235 144 47 automatic automatic JJ 12235 144 48 approaches approach NNS 12235 144 49 with with IN 12235 144 50 controlled controlled JJ 12235 144 51 vocabularies vocabulary NNS 12235 144 52 are be VBP 12235 144 53 desired desire VBN 12235 144 54 . . . 12235 145 1 ACKNOWLEDGEMENTS ACKNOWLEDGEMENTS NNP 12235 145 2 The the DT 12235 145 3 author author NN 12235 145 4 thanks thank NNS 12235 145 5 Dr. Dr. NNP 12235 145 6 Jane Jane NNP 12235 145 7 Greenberg Greenberg NNP 12235 145 8 and and CC 12235 145 9 Dr. Dr. NNP 12235 145 10 Peter Peter NNP 12235 145 11 Logan Logan NNP 12235 145 12 for for IN 12235 145 13 their -PRON- PRP$ 12235 145 14 guidance guidance NN 12235 145 15 . . . 12235 146 1 The the DT 12235 146 2 author author NN 12235 146 3 acknowledges acknowledge VBZ 12235 146 4 the the DT 12235 146 5 support support NN 12235 146 6 of of IN 12235 146 7 the the DT 12235 146 8 NEH NEH NNP 12235 146 9 grant grant NN 12235 146 10 # # NNP 12235 146 11 HAA-261228 HAA-261228 NNP 12235 146 12 - - HYPH 12235 146 13 18 18 CD 12235 146 14 . . . 12235 147 1 INFORMATION INFORMATION NNP 12235 147 2 TECHNOLOGY technology NN 12235 147 3 AND and CC 12235 147 4 LIBRARIES library NNS 12235 147 5 SEPTEMBER SEPTEMBER NNP 12235 147 6 2020 2020 CD 12235 147 7 EVALUATING evaluate VBG 12235 147 8 THE the DT 12235 147 9 IMPACT impact NN 12235 147 10 OF of IN 12235 147 11 THE the DT 12235 147 12 LONG long JJ 12235 147 13 - - HYPH 12235 147 14 S S NNP 12235 147 15 | | NNP 12235 147 16 GRABUS grabus NN 12235 147 17 11 11 CD 12235 147 18 APPENDIX APPENDIX NNP 12235 147 19 A a DT 12235 147 20 Entry Entry NNP 12235 147 21 Term Term NNP 12235 147 22 Total Total NNP 12235 147 23 Words word NNS 12235 147 24 Omitted omit VBN 12235 147 25 2018 2018 CD 12235 147 26 LCSH LCSH NNP 12235 147 27 Terms term NNS 12235 147 28 Omitted omit VBD 12235 147 29 1910 1910 CD 12235 147 30 LCSH LCSH NNP 12235 147 31 Terms term NNS 12235 147 32 Omitted omit VBD 12235 147 33 Encyclopedia Encyclopedia NNS 12235 147 34 Entry entry NN 12235 147 35 Word word NN 12235 147 36 Count Count NNP 12235 147 37 SARDIS SARDIS NNP 12235 147 38 24 24 CD 12235 147 39 13 13 CD 12235 147 40 11 11 CD 12235 147 41 381 381 CD 12235 147 42 SUCTION suction NN 12235 147 43 24 24 CD 12235 147 44 13 13 CD 12235 147 45 11 11 CD 12235 147 46 38 38 CD 12235 147 47 STYLITES stylite NNS 12235 147 48 , , , 12235 147 49 PILLAR pillar NN 12235 147 50 SAINTS SAINTS NNP 12235 147 51 19 19 CD 12235 147 52 13 13 CD 12235 147 53 6 6 CD 12235 147 54 199 199 CD 12235 147 55 SHADWELL SHADWELL NNP 12235 147 56 14 14 CD 12235 147 57 10 10 CD 12235 147 58 4 4 CD 12235 147 59 211 211 CD 12235 147 60 SALICORNIA SALICORNIA NNP 12235 147 61 13 13 CD 12235 147 62 6 6 CD 12235 147 63 7 7 CD 12235 147 64 254 254 CD 12235 147 65 SEPULCHRE sepulchre NN 12235 147 66 11 11 CD 12235 147 67 3 3 CD 12235 147 68 8 8 CD 12235 147 69 348 348 CD 12235 147 70 SITTA sitta NN 12235 147 71 NUTHATCH nuthatch NN 12235 147 72 9 9 CD 12235 147 73 5 5 CD 12235 147 74 4 4 CD 12235 147 75 620 620 CD 12235 147 76 SPRAT SPRAT NNS 12235 147 77 9 9 CD 12235 147 78 3 3 CD 12235 147 79 6 6 CD 12235 147 80 475 475 CD 12235 147 81 SERAPIS SERAPIS NNP 12235 147 82 8 8 CD 12235 147 83 5 5 CD 12235 147 84 3 3 CD 12235 147 85 587 587 CD 12235 147 86 STRADA STRADA NNP 12235 147 87 8 8 CD 12235 147 88 1 1 CD 12235 147 89 7 7 CD 12235 147 90 189 189 CD 12235 147 91 SHOAD shoad NN 12235 147 92 7 7 CD 12235 147 93 4 4 CD 12235 147 94 3 3 CD 12235 147 95 463 463 CD 12235 147 96 SIGN sign NN 12235 147 97 7 7 CD 12235 147 98 5 5 CD 12235 147 99 2 2 CD 12235 147 100 68 68 CD 12235 147 101 SHOOTING shooting SYM 12235 147 102 6 6 CD 12235 147 103 3 3 CD 12235 147 104 3 3 CD 12235 147 105 6114 6114 CD 12235 147 106 STRATA STRATA VBD 12235 147 107 6 6 CD 12235 147 108 3 3 CD 12235 147 109 3 3 CD 12235 147 110 2920 2920 CD 12235 147 111 STEWARTIA STEWARTIA VBN 12235 147 112 5 5 CD 12235 147 113 4 4 CD 12235 147 114 1 1 CD 12235 147 115 72 72 CD 12235 147 116 SUBCLAVIAN SUBCLAVIAN NNP 12235 147 117 5 5 CD 12235 147 118 3 3 CD 12235 147 119 2 2 CD 12235 147 120 20 20 CD 12235 147 121 SCHWEINFURT schweinfurt NN 12235 147 122 4 4 CD 12235 147 123 2 2 CD 12235 147 124 2 2 CD 12235 147 125 84 84 CD 12235 147 126 SCROLL scroll NN 12235 147 127 4 4 CD 12235 147 128 2 2 CD 12235 147 129 2 2 CD 12235 147 130 45 45 CD 12235 147 131 SPALATRO SPALATRO NNP 12235 147 132 4 4 CD 12235 147 133 3 3 CD 12235 147 134 1 1 CD 12235 147 135 99 99 CD 12235 147 136 SPECIAL special NN 12235 147 137 4 4 CD 12235 147 138 3 3 CD 12235 147 139 1 1 CD 12235 147 140 24 24 CD 12235 147 141 SAMOGITIA SAMOGITIA NNP 12235 147 142 3 3 CD 12235 147 143 2 2 CD 12235 147 144 1 1 CD 12235 147 145 112 112 CD 12235 147 146 SHAKESPEARE shakespeare NN 12235 147 147 3 3 CD 12235 147 148 0 0 CD 12235 147 149 3 3 CD 12235 147 150 3855 3855 CD 12235 147 151 SINAPISM sinapism NN 12235 147 152 2 2 CD 12235 147 153 1 1 CD 12235 147 154 1 1 CD 12235 147 155 25 25 CD 12235 147 156 SECT SECT NNS 12235 147 157 1 1 CD 12235 147 158 1 1 CD 12235 147 159 0 0 CD 12235 147 160 20 20 CD 12235 147 161 SEVERINO severino NN 12235 147 162 1 1 CD 12235 147 163 1 1 CD 12235 147 164 0 0 CD 12235 147 165 38 38 CD 12235 147 166 SHADDOCK SHADDOCK NNP 12235 147 167 1 1 CD 12235 147 168 1 1 CD 12235 147 169 0 0 CD 12235 147 170 6 6 CD 12235 147 171 SCARLET SCARLET NNP 12235 147 172 0 0 CD 12235 147 173 0 0 CD 12235 147 174 0 0 CD 12235 147 175 65 65 CD 12235 147 176 SHALLOP shallop NN 12235 147 177 , , , 12235 147 178 SHALLOOP shalloop NN 12235 147 179 0 0 NFP 12235 147 180 0 0 CD 12235 147 181 0 0 CD 12235 147 182 42 42 CD 12235 147 183 SOLDANELLA soldanella NN 12235 147 184 0 0 NFP 12235 147 185 0 0 CD 12235 147 186 0 0 CD 12235 147 187 56 56 CD 12235 147 188 SPOLETTO spoletto NN 12235 147 189 0 0 CD 12235 147 190 0 0 CD 12235 147 191 0 0 CD 12235 147 192 99 99 CD 12235 147 193 INFORMATION information NN 12235 147 194 TECHNOLOGY technology NN 12235 147 195 AND and CC 12235 147 196 LIBRARIES library NNS 12235 147 197 SEPTEMBER SEPTEMBER NNP 12235 147 198 2020 2020 CD 12235 147 199 EVALUATING evaluate VBG 12235 147 200 THE the DT 12235 147 201 IMPACT impact NN 12235 147 202 OF of IN 12235 147 203 THE the DT 12235 147 204 LONG long JJ 12235 147 205 - - HYPH 12235 147 206 S S NNP 12235 147 207 | | NNP 12235 147 208 GRABUS grabus NN 12235 147 209 12 12 CD 12235 147 210 APPENDIX APPENDIX NNP 12235 147 211 B B NNP 12235 147 212 * * NNP 12235 147 213 N n NN 12235 147 214 = = SYM 12235 147 215 30 30 CD 12235 147 216 entries entry NNS 12235 147 217 Average average JJ 12235 147 218 Terms term NNS 12235 147 219 Returned return VBD 12235 147 220 Median median JJ 12235 147 221 Terms term NNS 12235 147 222 Returned return VBD 12235 147 223 Corrected Corrected NNP 12235 147 224 24.77 24.77 CD 12235 147 225 / / SYM 12235 147 226 40 40 CD 12235 147 227 possible possible JJ 12235 147 228 28 28 CD 12235 147 229 / / SYM 12235 147 230 40 40 CD 12235 147 231 possible possible JJ 12235 147 232 Uncorrected uncorrected JJ 12235 147 233 26.47 26.47 CD 12235 147 234 / / SYM 12235 147 235 40 40 CD 12235 147 236 possible possible JJ 12235 147 237 29 29 CD 12235 147 238 / / SYM 12235 147 239 40 40 CD 12235 147 240 possible possible JJ 12235 147 241 2018 2018 CD 12235 147 242 LCSH LCSH NNP 12235 147 243 Corrected correct VBD 12235 147 244 14.10 14.10 CD 12235 147 245 / / SYM 12235 147 246 20 20 CD 12235 147 247 possible possible JJ 12235 147 248 19 19 CD 12235 147 249 / / SYM 12235 147 250 20 20 CD 12235 147 251 possible possible JJ 12235 147 252 2018 2018 CD 12235 147 253 LCSH LCSH NNP 12235 147 254 Uncorrected Uncorrected NNP 12235 147 255 13.47 13.47 CD 12235 147 256 / / SYM 12235 147 257 20 20 CD 12235 147 258 possible possible JJ 12235 147 259 18.5 18.5 CD 12235 147 260 / / SYM 12235 147 261 20 20 CD 12235 147 262 possible possible JJ 12235 147 263 1910 1910 CD 12235 147 264 LCSH LCSH NNP 12235 147 265 Corrected correct VBD 12235 147 266 11.27 11.27 CD 12235 147 267 / / SYM 12235 147 268 20 20 CD 12235 147 269 possible possible JJ 12235 147 270 11 11 CD 12235 147 271 / / SYM 12235 147 272 20 20 CD 12235 147 273 possible possible JJ 12235 147 274 1910 1910 CD 12235 147 275 LCSH LCSH NNP 12235 147 276 Uncorrected uncorrected JJ 12235 147 277 10.13 10.13 CD 12235 147 278 / / SYM 12235 147 279 20 20 CD 12235 147 280 possible possible JJ 12235 147 281 9 9 CD 12235 147 282 / / SYM 12235 147 283 20 20 CD 12235 147 284 possible possible JJ 12235 147 285 INFORMATION information JJ 12235 147 286 TECHNOLOGY technology NN 12235 147 287 AND and CC 12235 147 288 LIBRARIES library NNS 12235 147 289 SEPTEMBER SEPTEMBER NNP 12235 147 290 2020 2020 CD 12235 147 291 EVALUATING evaluate VBG 12235 147 292 THE the DT 12235 147 293 IMPACT impact NN 12235 147 294 OF of IN 12235 147 295 THE the DT 12235 147 296 LONG long JJ 12235 147 297 - - HYPH 12235 147 298 S S NNP 12235 147 299 | | NNP 12235 147 300 GRABUS grabus NN 12235 147 301 13 13 CD 12235 147 302 ENDNOTES endnotes NN 12235 147 303 1 1 CD 12235 147 304 Liz Liz NNP 12235 147 305 Woolcott Woolcott NNP 12235 147 306 , , , 12235 147 307 “ " `` 12235 147 308 Understanding Understanding NNP 12235 147 309 Metadata Metadata NNP 12235 147 310 : : : 12235 147 311 What what WP 12235 147 312 is be VBZ 12235 147 313 Metadata Metadata NNP 12235 147 314 , , , 12235 147 315 and and CC 12235 147 316 What what WP 12235 147 317 is be VBZ 12235 147 318 it -PRON- PRP 12235 147 319 For for IN 12235 147 320 ? ? . 12235 147 321 , , , 12235 147 322 ” " '' 12235 147 323 Routledge Routledge NNP 12235 147 324 ( ( -LRB- 12235 147 325 November November NNP 12235 147 326 17 17 CD 12235 147 327 , , , 12235 147 328 2017 2017 CD 12235 147 329 ) ) -RRB- 12235 147 330 , , , 12235 147 331 https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1080/01639374.2017.1358232 NNP 12235 147 332 ; ; : 12235 147 333 Koraljka Koraljka NNP 12235 147 334 Golub Golub NNP 12235 147 335 et et NNP 12235 147 336 al al NNP 12235 147 337 . . NNP 12235 147 338 , , , 12235 147 339 “ " `` 12235 147 340 A a DT 12235 147 341 framework framework NN 12235 147 342 for for IN 12235 147 343 evaluating evaluate VBG 12235 147 344 automatic automatic JJ 12235 147 345 indexing indexing NN 12235 147 346 or or CC 12235 147 347 classification classification NN 12235 147 348 in in IN 12235 147 349 the the DT 12235 147 350 context context NN 12235 147 351 of of IN 12235 147 352 retrieval retrieval NN 12235 147 353 , , , 12235 147 354 “ " `` 12235 147 355 Journal Journal NNP 12235 147 356 of of IN 12235 147 357 the the DT 12235 147 358 Association Association NNP 12235 147 359 for for IN 12235 147 360 Information Information NNP 12235 147 361 Science Science NNP 12235 147 362 and and CC 12235 147 363 Technology Technology NNP 12235 147 364 67 67 CD 12235 147 365 , , , 12235 147 366 no no UH 12235 147 367 . . . 12235 148 1 1 1 CD 12235 148 2 ( ( -LRB- 12235 148 3 2016 2016 CD 12235 148 4 ) ) -RRB- 12235 148 5 , , , 12235 148 6 https://doi.org/10.1002/asi.23600 https://doi.org/10.1002/asi.23600 NN 12235 148 7 ; ; : 12235 148 8 Lynne Lynne NNP 12235 148 9 C. C. NNP 12235 148 10 Howarth Howarth NNP 12235 148 11 , , , 12235 148 12 “ " `` 12235 148 13 Metadata Metadata NNP 12235 148 14 and and CC 12235 148 15 Bibliographic Bibliographic NNP 12235 148 16 Control Control NNP 12235 148 17 : : : 12235 148 18 Soul Soul NNP 12235 148 19 - - HYPH 12235 148 20 Mates Mates NNPS 12235 148 21 or or CC 12235 148 22 Two Two NNP 12235 148 23 Solitudes Solitudes NNPS 12235 148 24 ? ? . 12235 148 25 , , , 12235 148 26 “ " `` 12235 148 27 Cataloging Cataloging NNP 12235 148 28 & & CC 12235 148 29 Classification Classification NNP 12235 148 30 Quarterly Quarterly NNP 12235 148 31 40 40 CD 12235 148 32 , , , 12235 148 33 no no UH 12235 148 34 . . . 12235 149 1 3 3 CD 12235 149 2 - - SYM 12235 149 3 4 4 CD 12235 149 4 ( ( -LRB- 12235 149 5 2005 2005 CD 12235 149 6 ) ) -RRB- 12235 149 7 , , , 12235 149 8 https://doi.org/10.1300/J104v40n03_03 https://doi.org/10.1300/J104v40n03_03 NNP 12235 149 9 . . . 12235 150 1 2 2 CD 12235 150 2 A. A. NNP 12235 150 3 Belaid Belaid NNP 12235 150 4 et et NNP 12235 150 5 al al NNP 12235 150 6 . . NNP 12235 150 7 , , , 12235 150 8 “ " `` 12235 150 9 Automatic automatic JJ 12235 150 10 indexing indexing NN 12235 150 11 and and CC 12235 150 12 reformulation reformulation NN 12235 150 13 of of IN 12235 150 14 ancient ancient JJ 12235 150 15 dictionaries dictionary NNS 12235 150 16 “ " '' 12235 150 17 ( ( -LRB- 12235 150 18 paper paper NN 12235 150 19 presented present VBN 12235 150 20 at at IN 12235 150 21 the the DT 12235 150 22 First First NNP 12235 150 23 International International NNP 12235 150 24 Workshop Workshop NNP 12235 150 25 on on IN 12235 150 26 Document Document NNP 12235 150 27 Image Image NNP 12235 150 28 Analysis Analysis NNP 12235 150 29 for for IN 12235 150 30 Libraries Libraries NNPS 12235 150 31 , , , 12235 150 32 Palo Palo NNP 12235 150 33 Alto Alto NNP 12235 150 34 , , , 12235 150 35 CA CA NNP 12235 150 36 , , , 12235 150 37 2004 2004 CD 12235 150 38 ) ) -RRB- 12235 150 39 , , , 12235 150 40 https://doi.org/10.1109/DIAL.2004.1263264 https://doi.org/10.1109/DIAL.2004.1263264 NNP 12235 150 41 . . . 12235 151 1 3 3 CD 12235 151 2 Beatrice Beatrice NNP 12235 151 3 Alex Alex NNP 12235 151 4 et et FW 12235 151 5 al al NNP 12235 151 6 . . NNP 12235 151 7 , , , 12235 151 8 “ " `` 12235 151 9 Digitised Digitised NNP 12235 151 10 Historical Historical NNP 12235 151 11 Text Text NNP 12235 151 12 : : : 12235 151 13 Does do VBZ 12235 151 14 it -PRON- PRP 12235 151 15 have have VB 12235 151 16 to to TO 12235 151 17 be be VB 12235 151 18 mediOCRe mediOCRe NNP 12235 151 19 " " '' 12235 151 20 ( ( -LRB- 12235 151 21 paper paper NN 12235 151 22 presented present VBN 12235 151 23 at at IN 12235 151 24 the the DT 12235 151 25 KONVENS KONVENS NNP 12235 151 26 2012 2012 CD 12235 151 27 ( ( -LRB- 12235 151 28 LThist LThist NNP 12235 151 29 2012 2012 CD 12235 151 30 workshop workshop NN 12235 151 31 ) ) -RRB- 12235 151 32 , , , 12235 151 33 Vienna Vienna NNP 12235 151 34 , , , 12235 151 35 September September NNP 12235 151 36 21 21 CD 12235 151 37 , , , 12235 151 38 2012 2012 CD 12235 151 39 ) ) -RRB- 12235 151 40 ; ; : 12235 151 41 Ted Ted NNP 12235 151 42 Underwood Underwood NNP 12235 151 43 , , , 12235 151 44 “ " `` 12235 151 45 A a DT 12235 151 46 half half JJ 12235 151 47 - - HYPH 12235 151 48 decent decent JJ 12235 151 49 OCR ocr NN 12235 151 50 normalizer normalizer NN 12235 151 51 for for IN 12235 151 52 English English NNP 12235 151 53 texts text NNS 12235 151 54 after after IN 12235 151 55 1700 1700 CD 12235 151 56 , , , 12235 151 57 " " `` 12235 151 58 The the DT 12235 151 59 Stone Stone NNP 12235 151 60 and and CC 12235 151 61 the the DT 12235 151 62 Shell Shell NNP 12235 151 63 , , , 12235 151 64 December December NNP 12235 151 65 10 10 CD 12235 151 66 , , , 12235 151 67 2013 2013 CD 12235 151 68 , , , 12235 151 69 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english- https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english- NNP 12235 151 70 texts text NNS 12235 151 71 - - HYPH 12235 151 72 after-1700/. after-1700/. NNP 12235 152 1 4 4 LS 12235 152 2 “ " `` 12235 152 3 Nineteenth nineteenth JJ 12235 152 4 - - HYPH 12235 152 5 century century NN 12235 152 6 knowledge knowledge NN 12235 152 7 project project NN 12235 152 8 , , , 12235 152 9 " " '' 12235 152 10 ( ( -LRB- 12235 152 11 GitHub GitHub NNP 12235 152 12 Repository Repository NNP 12235 152 13 ) ) -RRB- 12235 152 14 , , , 12235 152 15 2020 2020 CD 12235 152 16 , , , 12235 152 17 https://tu- https://tu- NN 12235 152 18 plogan.github.io/. plogan.github.io/. NNP 12235 153 1 5 5 LS 12235 153 2 “ " `` 12235 153 3 Nineteenth nineteenth JJ 12235 153 4 - - HYPH 12235 153 5 century century NN 12235 153 6 Knowledge Knowledge NNP 12235 153 7 Project Project NNP 12235 153 8 . . . 12235 153 9 ” " '' 12235 153 10 6 6 CD 12235 153 11 Marcia Marcia NNP 12235 153 12 Lei Lei NNP 12235 153 13 Zeng Zeng NNP 12235 153 14 and and CC 12235 153 15 Lois Lois NNP 12235 153 16 Mai Mai NNP 12235 153 17 Chan Chan NNP 12235 153 18 , , , 12235 153 19 “ " `` 12235 153 20 Metadata Metadata NNP 12235 153 21 Interoperability Interoperability NNP 12235 153 22 and and CC 12235 153 23 Standardization Standardization NNP 12235 153 24 - - HYPH 12235 153 25 A A NNP 12235 153 26 Study Study NNP 12235 153 27 of of IN 12235 153 28 Methodology Methodology NNP 12235 153 29 , , , 12235 153 30 Part Part NNP 12235 153 31 II II NNP 12235 153 32 , , , 12235 153 33 " " `` 12235 153 34 D D NNP 12235 153 35 - - HYPH 12235 153 36 Lib Lib NNP 12235 153 37 Magazine Magazine NNP 12235 153 38 12 12 CD 12235 153 39 , , , 12235 153 40 no no UH 12235 153 41 . . . 12235 154 1 6 6 CD 12235 154 2 ( ( -LRB- 12235 154 3 2006 2006 CD 12235 154 4 ) ) -RRB- 12235 154 5 ; ; : 12235 154 6 G. G. NNP 12235 154 7 Bueno Bueno NNP 12235 154 8 - - HYPH 12235 154 9 de de NNP 12235 154 10 - - JJ 12235 154 11 la la JJ 12235 154 12 - - HYPH 12235 154 13 Fuente Fuente NNP 12235 154 14 , , , 12235 154 15 D. D. NNP 12235 154 16 Rodríguez Rodríguez NNP 12235 154 17 Mateos Mateos NNP 12235 154 18 , , , 12235 154 19 and and CC 12235 154 20 J. J. NNP 12235 154 21 Greenberg Greenberg NNP 12235 154 22 , , , 12235 154 23 “ " `` 12235 154 24 Chapter chapter NN 12235 154 25 10 10 CD 12235 154 26 - - HYPH 12235 154 27 Automatic Automatic NNP 12235 154 28 Text Text NNP 12235 154 29 Indexing Indexing NNP 12235 154 30 with with IN 12235 154 31 SKOS SKOS NNP 12235 154 32 Vocabularies Vocabularies NNPS 12235 154 33 in in IN 12235 154 34 HIVE HIVE NNP 12235 154 35 " " '' 12235 154 36 ( ( -LRB- 12235 154 37 Elsevier Elsevier NNP 12235 154 38 Ltd Ltd NNP 12235 154 39 , , , 12235 154 40 2016 2016 CD 12235 154 41 ) ) -RRB- 12235 154 42 ; ; : 12235 154 43 Sheila Sheila NNP 12235 154 44 Bair Bair NNP 12235 154 45 and and CC 12235 154 46 Sharon Sharon NNP 12235 154 47 Carlson Carlson NNP 12235 154 48 , , , 12235 154 49 “ " `` 12235 154 50 Where where WRB 12235 154 51 Keywords Keywords NNPS 12235 154 52 Fail Fail NNP 12235 154 53 : : : 12235 154 54 Using use VBG 12235 154 55 Metadata Metadata NNP 12235 154 56 to to TO 12235 154 57 Facilitate facilitate VB 12235 154 58 Digital Digital NNP 12235 154 59 Humanities Humanities NNP 12235 154 60 Scholarship Scholarship NNP 12235 154 61 , , , 12235 154 62 " " '' 12235 154 63 Journal Journal NNP 12235 154 64 of of IN 12235 154 65 Library Library NNP 12235 154 66 Metadata Metadata NNP 12235 154 67 8 8 CD 12235 154 68 , , , 12235 154 69 no no UH 12235 154 70 . . . 12235 155 1 3 3 CD 12235 155 2 ( ( -LRB- 12235 155 3 2008 2008 CD 12235 155 4 ) ) -RRB- 12235 155 5 , , , 12235 155 6 https://doi.org/10.1080/19386380802398503 https://doi.org/10.1080/19386380802398503 NNP 12235 155 7 . . . 12235 156 1 7 7 CD 12235 156 2 John John NNP 12235 156 3 Walsh Walsh NNP 12235 156 4 , , , 12235 156 5 “ " `` 12235 156 6 The the DT 12235 156 7 use use NN 12235 156 8 of of IN 12235 156 9 Library Library NNP 12235 156 10 of of IN 12235 156 11 Congress Congress NNP 12235 156 12 Subject Subject NNP 12235 156 13 Headings Headings NNPS 12235 156 14 in in IN 12235 156 15 digital digital JJ 12235 156 16 collections collection NNS 12235 156 17 , , , 12235 156 18 " " `` 12235 156 19 Library Library NNP 12235 156 20 Review Review NNP 12235 156 21 60 60 CD 12235 156 22 , , , 12235 156 23 no no UH 12235 156 24 . . . 12235 157 1 4 4 CD 12235 157 2 ( ( -LRB- 12235 157 3 2011 2011 CD 12235 157 4 ) ) -RRB- 12235 157 5 , , , 12235 157 6 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1108/00242531111127875 NNP 12235 157 7 . . . 12235 158 1 8 8 CD 12235 158 2 Jane Jane NNP 12235 158 3 Greenberg Greenberg NNP 12235 158 4 et et NNP 12235 158 5 al al NNP 12235 158 6 . . NNP 12235 158 7 , , , 12235 158 8 “ " `` 12235 158 9 HIVE hive RB 12235 158 10 : : : 12235 158 11 Helping help VBG 12235 158 12 interdisciplinary interdisciplinary JJ 12235 158 13 vocabulary vocabulary NN 12235 158 14 engineering engineering NN 12235 158 15 , , , 12235 158 16 “ " `` 12235 158 17 Bulletin Bulletin NNP 12235 158 18 of of IN 12235 158 19 the the DT 12235 158 20 American American NNP 12235 158 21 Society Society NNP 12235 158 22 for for IN 12235 158 23 Information Information NNP 12235 158 24 Science Science NNP 12235 158 25 and and CC 12235 158 26 Technology Technology NNP 12235 158 27 37 37 CD 12235 158 28 , , , 12235 158 29 no no UH 12235 158 30 . . . 12235 159 1 4 4 CD 12235 159 2 ( ( -LRB- 12235 159 3 2011 2011 CD 12235 159 4 ) ) -RRB- 12235 159 5 , , , 12235 159 6 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.1002/bult.2011.1720370407 NNP 12235 159 7 . . . 12235 160 1 9 9 CD 12235 160 2 Sam Sam NNP 12235 160 3 Grabus Grabus NNP 12235 160 4 et et FW 12235 160 5 al al NNP 12235 160 6 . . NNP 12235 160 7 , , , 12235 160 8 “ " `` 12235 160 9 Representing represent VBG 12235 160 10 Aboutness Aboutness NNP 12235 160 11 : : : 12235 160 12 Automatically automatically RB 12235 160 13 Indexing indexing NN 12235 160 14 19th- 19th- NN 12235 160 15 Century century NN 12235 160 16 Encyclopedia Encyclopedia NNP 12235 160 17 Britannica Britannica NNP 12235 160 18 Entries Entries NNP 12235 160 19 , , , 12235 160 20 ” " '' 12235 160 21 NASKO NASKO NNP 12235 160 22 7 7 CD 12235 160 23 ( ( -LRB- 12235 160 24 2019 2019 CD 12235 160 25 ) ) -RRB- 12235 160 26 , , , 12235 160 27 pp pp NNP 12235 160 28 . . . 12235 161 1 138 138 CD 12235 161 2 - - SYM 12235 161 3 48 48 CD 12235 161 4 , , , 12235 161 5 https://doi.org/10.7152/nasko.v7i1.15635 https://doi.org/10.7152/nasko.v7i1.15635 NNP 12235 161 6 . . . 12235 162 1 10 10 CD 12235 162 2 Karen Karen NNP 12235 162 3 Attar Attar NNP 12235 162 4 , , , 12235 162 5 “ " `` 12235 162 6 S S NNP 12235 162 7 and and CC 12235 162 8 Long Long NNP 12235 162 9 S S NNP 12235 162 10 , , , 12235 162 11 " " '' 12235 162 12 in in IN 12235 162 13 Oxford Oxford NNP 12235 162 14 Companion Companion NNP 12235 162 15 to to IN 12235 162 16 the the DT 12235 162 17 Book Book NNP 12235 162 18 , , , 12235 162 19 eds eds XX 12235 162 20 . . . 12235 163 1 Michael Michael NNP 12235 163 2 Felix Felix NNP 12235 163 3 Suarez Suarez NNP 12235 163 4 and and CC 12235 163 5 H. H. NNP 12235 163 6 R. R. NNP 12235 163 7 II II NNP 12235 163 8 Woudhuysen Woudhuysen NNP 12235 163 9 ( ( -LRB- 12235 163 10 Oxford Oxford NNP 12235 163 11 : : : 12235 163 12 Oxford Oxford NNP 12235 163 13 University University NNP 12235 163 14 Press Press NNP 12235 163 15 , , , 12235 163 16 2010 2010 CD 12235 163 17 ) ) -RRB- 12235 163 18 ; ; : 12235 163 19 Ingrid Ingrid NNP 12235 163 20 Tieken Tieken NNP 12235 163 21 - - HYPH 12235 163 22 Boon Boon NNP 12235 163 23 van van NNP 12235 163 24 Ostade Ostade NNP 12235 163 25 , , , 12235 163 26 “ " `` 12235 163 27 Spelling spelling NN 12235 163 28 systems system NNS 12235 163 29 , , , 12235 163 30 “ " `` 12235 163 31 in in IN 12235 163 32 An an DT 12235 163 33 Introduction introduction NN 12235 163 34 to to IN 12235 163 35 Late late JJ 12235 163 36 Modern modern JJ 12235 163 37 English English NNP 12235 163 38 ( ( -LRB- 12235 163 39 Edinburgh Edinburgh NNP 12235 163 40 University University NNP 12235 163 41 Press Press NNP 12235 163 42 , , , 12235 163 43 2009 2009 CD 12235 163 44 ) ) -RRB- 12235 163 45 . . . 12235 164 1 11 11 CD 12235 164 2 Andrew Andrew NNP 12235 164 3 West West NNP 12235 164 4 , , , 12235 164 5 “ " `` 12235 164 6 The the DT 12235 164 7 Rules Rules NNPS 12235 164 8 for for IN 12235 164 9 Long Long NNP 12235 164 10 - - HYPH 12235 164 11 S S NNP 12235 164 12 , , , 12235 164 13 " " '' 12235 164 14 TUGboat TUGboat NNP 12235 164 15 32 32 CD 12235 164 16 , , , 12235 164 17 no no UH 12235 164 18 . . . 12235 165 1 1 1 CD 12235 165 2 ( ( -LRB- 12235 165 3 2011 2011 CD 12235 165 4 ) ) -RRB- 12235 165 5 . . . 12235 166 1 12 12 CD 12235 166 2 Attar Attar NNP 12235 166 3 , , , 12235 166 4 “ " `` 12235 166 5 S S NNP 12235 166 6 and and CC 12235 166 7 Long Long NNP 12235 166 8 S. S. NNP 12235 166 9 ” " '' 12235 166 10 https://doi.org/10.1080/01639374.2017.1358232 https://doi.org/10.1080/01639374.2017.1358232 NNP 12235 166 11 https://doi.org/10.1002/asi.23600 https://doi.org/10.1002/asi.23600 NN 12235 166 12 https://doi.org/10.1300/J104v40n03_03 https://doi.org/10.1300/j104v40n03_03 NN 12235 166 13 https://doi.org/10.1109/DIAL.2004.1263264 https://doi.org/10.1109/DIAL.2004.1263264 . 12235 166 14 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ -LRB- 12235 166 15 https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ https://tedunderwood.com/2013/12/10/a-half-decent-ocr-normalizer-for-english-texts-after-1700/ -LRB- 12235 166 16 https://tu-plogan.github.io/ https://tu-plogan.github.io/ NFP 12235 166 17 https://tu-plogan.github.io/ https://tu-plogan.github.io/ NFP 12235 166 18 https://doi.org/10.1080/19386380802398503 https://doi.org/10.1080/19386380802398503 UH 12235 166 19 https://doi.org/10.1108/00242531111127875 https://doi.org/10.1108/00242531111127875 . 12235 166 20 https://doi.org/10.1002/bult.2011.1720370407 https://doi.org/10.1002/bult.2011.1720370407 NNP 12235 166 21 https://doi.org/10.7152/nasko.v7i1.15635 https://doi.org/10.7152/nasko.v7i1.15635 NNP 12235 166 22 ABSTRACT abstract JJ 12235 166 23 INTRODUCTION introduction NN 12235 166 24 Background background NN 12235 166 25 Indexing indexing NN 12235 166 26 for for IN 12235 166 27 the the DT 12235 166 28 19th 19th JJ 12235 166 29 - - HYPH 12235 166 30 Century Century NNP 12235 166 31 Knowledge Knowledge NNP 12235 166 32 Project Project NNP 12235 166 33 The the DT 12235 166 34 Long Long NNP 12235 166 35 - - HYPH 12235 166 36 S S NNP 12235 166 37 Problem problem NN 12235 166 38 Encyclopedia encyclopedia NN 12235 166 39 Entry entry NN 12235 166 40 Lengths length VBZ 12235 166 41 Objectives Objectives NNP 12235 166 42 Methods Methods NNP 12235 166 43 Results Results NNPS 12235 166 44 Discussion Discussion NNP 12235 166 45 Conclusion Conclusion NNP 12235 166 46 and and CC 12235 166 47 Next Next NNP 12235 166 48 Steps Steps NNPS 12235 166 49 Acknowledgements Acknowledgements NNPS 12235 166 50 Appendix Appendix NNP 12235 166 51 A a DT 12235 166 52 Appendix Appendix NNP 12235 166 53 B b NN