id sid tid token lemma pos 9878 1 1 Digitization Digitization NNP 9878 1 2 of of IN 9878 1 3 Text Text NNP 9878 1 4 documents document NNS 9878 1 5 Using use VBG 9878 1 6 PDF PDF NNP 9878 1 7 / / , 9878 1 8 A a DT 9878 1 9 Yan Yan NNP 9878 1 10 Han Han NNP 9878 1 11 and and CC 9878 1 12 Xueheng Xueheng NNP 9878 1 13 Wan Wan NNP 9878 1 14 INFORMATION INFORMATION NNP 9878 1 15 TECHNOLOGY TECHNOLOGY NNP 9878 1 16 AND and CC 9878 1 17 LIBRARIES library NNS 9878 1 18 | | NNP 9878 1 19 MARCH MARCH NNP 9878 1 20 2018 2018 CD 9878 1 21 52 52 CD 9878 1 22 Yan Yan NNP 9878 1 23 Han Han NNP 9878 1 24 ( ( -LRB- 9878 1 25 yhan@email.arizona.edu yhan@email.arizona.edu NNP 9878 1 26 ) ) -RRB- 9878 1 27 is be VBZ 9878 1 28 Full full JJ 9878 1 29 Librarian Librarian NNP 9878 1 30 , , , 9878 1 31 the the DT 9878 1 32 University University NNP 9878 1 33 of of IN 9878 1 34 Arizona Arizona NNP 9878 1 35 Libraries Libraries NNPS 9878 1 36 , , , 9878 1 37 and and CC 9878 1 38 Xueheng Xueheng NNP 9878 1 39 Wan Wan NNP 9878 1 40 ( ( -LRB- 9878 1 41 wanxueheng@email.arizona.edu wanxueheng@email.arizona.edu NNP 9878 1 42 ) ) -RRB- 9878 1 43 is be VBZ 9878 1 44 a a DT 9878 1 45 student student NN 9878 1 46 , , , 9878 1 47 Department Department NNP 9878 1 48 of of IN 9878 1 49 Computer Computer NNP 9878 1 50 Science Science NNP 9878 1 51 , , , 9878 1 52 University University NNP 9878 1 53 of of IN 9878 1 54 Arizona Arizona NNP 9878 1 55 . . . 9878 2 1 ABSTRACT ABSTRACT NNP 9878 2 2 The the DT 9878 2 3 purpose purpose NN 9878 2 4 of of IN 9878 2 5 this this DT 9878 2 6 article article NN 9878 2 7 is be VBZ 9878 2 8 to to TO 9878 2 9 demonstrate demonstrate VB 9878 2 10 a a DT 9878 2 11 practical practical JJ 9878 2 12 use use NN 9878 2 13 case case NN 9878 2 14 of of IN 9878 2 15 PDF PDF NNP 9878 2 16 / / SYM 9878 2 17 A A NNP 9878 2 18 for for IN 9878 2 19 digitization digitization NN 9878 2 20 of of IN 9878 2 21 text text NN 9878 2 22 documents document NNS 9878 2 23 following follow VBG 9878 2 24 FADGI FADGI NNP 9878 2 25 ’s ’s POS 9878 2 26 recommendation recommendation NN 9878 2 27 of of IN 9878 2 28 using use VBG 9878 2 29 PDF PDF NNP 9878 2 30 / / , 9878 2 31 A A NNP 9878 2 32 as as IN 9878 2 33 a a DT 9878 2 34 preferred preferred JJ 9878 2 35 digitization digitization NN 9878 2 36 file file NN 9878 2 37 format format NN 9878 2 38 . . . 9878 3 1 The the DT 9878 3 2 authors author NNS 9878 3 3 demonstrate demonstrate VBP 9878 3 4 how how WRB 9878 3 5 to to TO 9878 3 6 convert convert VB 9878 3 7 and and CC 9878 3 8 combine combine VB 9878 3 9 TIFFs TIFFs NNPS 9878 3 10 with with IN 9878 3 11 associated associated JJ 9878 3 12 metadata metadata NN 9878 3 13 into into IN 9878 3 14 a a DT 9878 3 15 single single JJ 9878 3 16 PDF PDF NNP 9878 3 17 / / SYM 9878 3 18 A-2b A-2b NNP 9878 3 19 file file NN 9878 3 20 for for IN 9878 3 21 a a DT 9878 3 22 document document NN 9878 3 23 . . . 9878 4 1 Using use VBG 9878 4 2 real real JJ 9878 4 3 - - HYPH 9878 4 4 life life NN 9878 4 5 examples example NNS 9878 4 6 and and CC 9878 4 7 open open JJ 9878 4 8 source source NN 9878 4 9 software software NN 9878 4 10 , , , 9878 4 11 the the DT 9878 4 12 authors author NNS 9878 4 13 show show VBP 9878 4 14 readers reader NNS 9878 4 15 how how WRB 9878 4 16 to to TO 9878 4 17 convert convert VB 9878 4 18 TIFF TIFF NNP 9878 4 19 images image NNS 9878 4 20 , , , 9878 4 21 extract extract VB 9878 4 22 associated associate VBN 9878 4 23 metadata metadata NN 9878 4 24 and and CC 9878 4 25 International International NNP 9878 4 26 Color Color NNP 9878 4 27 Consortium Consortium NNP 9878 4 28 ( ( -LRB- 9878 4 29 ICC ICC NNP 9878 4 30 ) ) -RRB- 9878 4 31 profiles profile NNS 9878 4 32 , , , 9878 4 33 and and CC 9878 4 34 validate validate VB 9878 4 35 against against IN 9878 4 36 the the DT 9878 4 37 newly newly RB 9878 4 38 released release VBN 9878 4 39 PDF PDF NNP 9878 4 40 / / , 9878 4 41 A a DT 9878 4 42 validator validator NN 9878 4 43 . . . 9878 5 1 The the DT 9878 5 2 generated generated JJ 9878 5 3 PDF PDF NNP 9878 5 4 / / , 9878 5 5 A a DT 9878 5 6 file file NN 9878 5 7 is be VBZ 9878 5 8 a a DT 9878 5 9 self self NN 9878 5 10 - - HYPH 9878 5 11 contained contain VBN 9878 5 12 and and CC 9878 5 13 self self NN 9878 5 14 - - HYPH 9878 5 15 described describe VBN 9878 5 16 container container NN 9878 5 17 that that WDT 9878 5 18 accommodates accommodate VBZ 9878 5 19 all all PDT 9878 5 20 the the DT 9878 5 21 data datum NNS 9878 5 22 from from IN 9878 5 23 digitization digitization NN 9878 5 24 of of IN 9878 5 25 textual textual JJ 9878 5 26 materials material NNS 9878 5 27 , , , 9878 5 28 including include VBG 9878 5 29 page page NN 9878 5 30 - - HYPH 9878 5 31 level level NN 9878 5 32 metadata metadata NN 9878 5 33 and and CC 9878 5 34 ICC icc NN 9878 5 35 profiles profile NNS 9878 5 36 . . . 9878 6 1 Providing provide VBG 9878 6 2 theoretical theoretical JJ 9878 6 3 analysis analysis NN 9878 6 4 and and CC 9878 6 5 empirical empirical JJ 9878 6 6 examples example NNS 9878 6 7 , , , 9878 6 8 the the DT 9878 6 9 authors author NNS 9878 6 10 show show VBP 9878 6 11 that that IN 9878 6 12 PDF PDF NNP 9878 6 13 / / SYM 9878 6 14 A A NNP 9878 6 15 has have VBZ 9878 6 16 many many JJ 9878 6 17 advantages advantage NNS 9878 6 18 over over IN 9878 6 19 the the DT 9878 6 20 traditionally traditionally RB 9878 6 21 preferred prefer VBN 9878 6 22 file file NN 9878 6 23 format format NN 9878 6 24 , , , 9878 6 25 TIFF TIFF NNP 9878 6 26 / / SYM 9878 6 27 JPEG2000 JPEG2000 NNP 9878 6 28 , , , 9878 6 29 for for IN 9878 6 30 digitization digitization NN 9878 6 31 of of IN 9878 6 32 text text NN 9878 6 33 documents document NNS 9878 6 34 . . . 9878 7 1 BACKGROUND BACKGROUND NNP 9878 7 2 PDF PDF NNP 9878 7 3 has have VBZ 9878 7 4 been be VBN 9878 7 5 primarily primarily RB 9878 7 6 used use VBN 9878 7 7 as as IN 9878 7 8 a a DT 9878 7 9 file file NN 9878 7 10 delivery delivery NN 9878 7 11 format format NN 9878 7 12 across across IN 9878 7 13 many many JJ 9878 7 14 platforms platform NNS 9878 7 15 in in IN 9878 7 16 almost almost RB 9878 7 17 every every DT 9878 7 18 device device NN 9878 7 19 since since IN 9878 7 20 its -PRON- PRP$ 9878 7 21 initial initial JJ 9878 7 22 release release NN 9878 7 23 in in IN 9878 7 24 1993 1993 CD 9878 7 25 . . . 9878 8 1 PDF PDF NNP 9878 8 2 / / SYM 9878 8 3 A A NNP 9878 8 4 was be VBD 9878 8 5 designed design VBN 9878 8 6 to to TO 9878 8 7 address address VB 9878 8 8 concerns concern NNS 9878 8 9 about about IN 9878 8 10 long long JJ 9878 8 11 - - HYPH 9878 8 12 term term NN 9878 8 13 preservation preservation NN 9878 8 14 of of IN 9878 8 15 PDF PDF NNP 9878 8 16 files file NNS 9878 8 17 , , , 9878 8 18 but but CC 9878 8 19 there there EX 9878 8 20 has have VBZ 9878 8 21 been be VBN 9878 8 22 little little JJ 9878 8 23 research research NN 9878 8 24 and and CC 9878 8 25 few few JJ 9878 8 26 implementations implementation NNS 9878 8 27 of of IN 9878 8 28 this this DT 9878 8 29 file file NN 9878 8 30 format format NN 9878 8 31 . . . 9878 9 1 Since since IN 9878 9 2 the the DT 9878 9 3 first first JJ 9878 9 4 standard standard NN 9878 9 5 ( ( -LRB- 9878 9 6 ISO ISO NNP 9878 9 7 19005 19005 CD 9878 9 8 PDF PDF NNP 9878 9 9 / / SYM 9878 9 10 A-1 A-1 NNP 9878 9 11 ) ) -RRB- 9878 9 12 , , , 9878 9 13 published publish VBN 9878 9 14 in in IN 9878 9 15 2005 2005 CD 9878 9 16 , , , 9878 9 17 some some DT 9878 9 18 articles article NNS 9878 9 19 discuss discuss VBP 9878 9 20 the the DT 9878 9 21 PDF PDF NNP 9878 9 22 / / , 9878 9 23 A a DT 9878 9 24 family family NN 9878 9 25 of of IN 9878 9 26 standards standard NNS 9878 9 27 , , , 9878 9 28 relevant relevant JJ 9878 9 29 information information NN 9878 9 30 , , , 9878 9 31 and and CC 9878 9 32 how how WRB 9878 9 33 to to TO 9878 9 34 implement implement VB 9878 9 35 PDF PDF NNP 9878 9 36 / / , 9878 9 37 A A NNP 9878 9 38 for for IN 9878 9 39 born bear VBN 9878 9 40 - - HYPH 9878 9 41 digital digital NNP 9878 9 42 documents.1 documents.1 CD 9878 9 43 There there EX 9878 9 44 is be VBZ 9878 9 45 growing grow VBG 9878 9 46 interest interest NN 9878 9 47 in in IN 9878 9 48 the the DT 9878 9 49 PDF PDF NNP 9878 9 50 and and CC 9878 9 51 PDF PDF NNP 9878 9 52 / / SYM 9878 9 53 A a DT 9878 9 54 standards standard NNS 9878 9 55 after after IN 9878 9 56 both both CC 9878 9 57 the the DT 9878 9 58 US US NNP 9878 9 59 Library Library NNP 9878 9 60 of of IN 9878 9 61 Congress Congress NNP 9878 9 62 and and CC 9878 9 63 the the DT 9878 9 64 National National NNP 9878 9 65 Archives Archives NNPS 9878 9 66 and and CC 9878 9 67 Records Records NNPS 9878 9 68 Administration Administration NNP 9878 9 69 ( ( -LRB- 9878 9 70 NARA NARA NNP 9878 9 71 ) ) -RRB- 9878 9 72 joined join VBD 9878 9 73 the the DT 9878 9 74 PDF PDF NNP 9878 9 75 Association Association NNP 9878 9 76 in in IN 9878 9 77 2017 2017 CD 9878 9 78 . . . 9878 10 1 NARA NARA NNP 9878 10 2 joined join VBD 9878 10 3 the the DT 9878 10 4 PDF PDF NNP 9878 10 5 Association Association NNP 9878 10 6 because because IN 9878 10 7 PDF PDF NNP 9878 10 8 files file NNS 9878 10 9 are be VBP 9878 10 10 used use VBN 9878 10 11 as as IN 9878 10 12 electronic electronic JJ 9878 10 13 documents document NNS 9878 10 14 in in IN 9878 10 15 every every DT 9878 10 16 government government NN 9878 10 17 and and CC 9878 10 18 business business NN 9878 10 19 agency agency NN 9878 10 20 . . . 9878 11 1 As as IN 9878 11 2 explained explain VBN 9878 11 3 in in IN 9878 11 4 a a DT 9878 11 5 blog blog NN 9878 11 6 post post NN 9878 11 7 , , , 9878 11 8 the the DT 9878 11 9 Library Library NNP 9878 11 10 of of IN 9878 11 11 Congress Congress NNP 9878 11 12 joined join VBD 9878 11 13 the the DT 9878 11 14 PDF PDF NNP 9878 11 15 Association Association NNP 9878 11 16 because because IN 9878 11 17 of of IN 9878 11 18 the the DT 9878 11 19 benefits benefit NNS 9878 11 20 to to IN 9878 11 21 libraries library NNS 9878 11 22 , , , 9878 11 23 including include VBG 9878 11 24 participating participate VBG 9878 11 25 in in IN 9878 11 26 developing develop VBG 9878 11 27 PDF PDF NNP 9878 11 28 standards standard NNS 9878 11 29 , , , 9878 11 30 promoting promote VBG 9878 11 31 best good JJS 9878 11 32 - - HYPH 9878 11 33 practice practice NN 9878 11 34 use use NN 9878 11 35 of of IN 9878 11 36 PDF PDF NNP 9878 11 37 , , , 9878 11 38 and and CC 9878 11 39 access access NN 9878 11 40 to to IN 9878 11 41 the the DT 9878 11 42 global global JJ 9878 11 43 expertise expertise NN 9878 11 44 in in IN 9878 11 45 PDF PDF NNP 9878 11 46 technology.2 technology.2 NNP 9878 11 47 Few few JJ 9878 11 48 articles article NNS 9878 11 49 , , , 9878 11 50 if if IN 9878 11 51 any any DT 9878 11 52 , , , 9878 11 53 have have VBP 9878 11 54 been be VBN 9878 11 55 published publish VBN 9878 11 56 about about IN 9878 11 57 using use VBG 9878 11 58 this this DT 9878 11 59 file file NN 9878 11 60 format format NN 9878 11 61 for for IN 9878 11 62 preservation preservation NN 9878 11 63 of of IN 9878 11 64 digitized digitize VBN 9878 11 65 content content NN 9878 11 66 . . . 9878 12 1 Yan Yan NNP 9878 12 2 Han Han NNP 9878 12 3 published publish VBD 9878 12 4 a a DT 9878 12 5 related related JJ 9878 12 6 article article NN 9878 12 7 in in IN 9878 12 8 2015 2015 CD 9878 12 9 about about IN 9878 12 10 theoretical theoretical JJ 9878 12 11 research research NN 9878 12 12 on on IN 9878 12 13 using use VBG 9878 12 14 PDF PDF NNP 9878 12 15 / / , 9878 12 16 A a NN 9878 12 17 for for IN 9878 12 18 text text NN 9878 12 19 documents.3 documents.3 NN 9878 12 20 In in IN 9878 12 21 this this DT 9878 12 22 article article NN 9878 12 23 , , , 9878 12 24 Han Han NNP 9878 12 25 discussed discuss VBD 9878 12 26 the the DT 9878 12 27 shortcomings shortcoming NNS 9878 12 28 of of IN 9878 12 29 the the DT 9878 12 30 widely widely RB 9878 12 31 used use VBN 9878 12 32 TIFF TIFF NNP 9878 12 33 and and CC 9878 12 34 JPEG2000 JPEG2000 NNP 9878 12 35 as as IN 9878 12 36 master master NN 9878 12 37 preservation preservation NN 9878 12 38 file file NN 9878 12 39 formats format NNS 9878 12 40 and and CC 9878 12 41 proposed propose VBD 9878 12 42 using use VBG 9878 12 43 the the DT 9878 12 44 then then RB 9878 12 45 - - HYPH 9878 12 46 emerging emerge VBG 9878 12 47 PDF PDF NNP 9878 12 48 / / , 9878 12 49 A A NNP 9878 12 50 as as IN 9878 12 51 the the DT 9878 12 52 preferred preferred JJ 9878 12 53 file file NN 9878 12 54 format format NN 9878 12 55 for for IN 9878 12 56 digitization digitization NN 9878 12 57 of of IN 9878 12 58 text text NN 9878 12 59 documents document NNS 9878 12 60 . . . 9878 13 1 Han Han NNP 9878 13 2 further further RB 9878 13 3 analyzed analyze VBD 9878 13 4 the the DT 9878 13 5 requirements requirement NNS 9878 13 6 mailto:yhan@email.arizona.edu mailto:yhan@email.arizona.edu NNP 9878 13 7 mailto:wanxueheng@email.arizona.edu mailto:wanxueheng@email.arizona.edu NNP 9878 13 8 DIGITIZATION DIGITIZATION NNS 9878 13 9 OF of IN 9878 13 10 TEXT text NN 9878 13 11 DOCUMENTS documents NN 9878 13 12 USING USING NNP 9878 13 13 PDF PDF NNP 9878 13 14 / / SYM 9878 13 15 A A NNP 9878 13 16 | | NNP 9878 13 17 HAN HAN NNP 9878 13 18 AND and CC 9878 13 19 WAN WAN NNP 9878 13 20 53 53 CD 9878 13 21 HTTPS://DOI.ORG/10.6017 HTTPS://DOI.ORG/10.6017 NNS 9878 13 22 / / SYM 9878 13 23 ITAL.V37I1.9878 ital.v37i1.9878 NN 9878 13 24 of of IN 9878 13 25 digitization digitization NN 9878 13 26 of of IN 9878 13 27 text text NN 9878 13 28 documents document NNS 9878 13 29 and and CC 9878 13 30 discussed discuss VBD 9878 13 31 the the DT 9878 13 32 advantages advantage NNS 9878 13 33 of of IN 9878 13 34 PDF PDF NNP 9878 13 35 / / SYM 9878 13 36 A A NNP 9878 13 37 over over IN 9878 13 38 TIFF TIFF NNP 9878 13 39 and and CC 9878 13 40 JPEG2000 JPEG2000 NNP 9878 13 41 . . . 9878 14 1 These these DT 9878 14 2 benefits benefit NNS 9878 14 3 include include VBP 9878 14 4 platform platform NN 9878 14 5 independence independence NN 9878 14 6 , , , 9878 14 7 smaller small JJR 9878 14 8 file file NN 9878 14 9 size size NN 9878 14 10 , , , 9878 14 11 better well JJR 9878 14 12 compression compression NN 9878 14 13 algorithms algorithm NNS 9878 14 14 , , , 9878 14 15 and and CC 9878 14 16 metadata metadata NN 9878 14 17 encoding encode VBG 9878 14 18 . . . 9878 15 1 In in IN 9878 15 2 addition addition NN 9878 15 3 , , , 9878 15 4 the the DT 9878 15 5 file file NN 9878 15 6 format format NN 9878 15 7 reduces reduce VBZ 9878 15 8 workload workload JJ 9878 15 9 and and CC 9878 15 10 simplifies simplifie NNS 9878 15 11 post- post- NNP 9878 15 12 digitization digitization NN 9878 15 13 processing processing NN 9878 15 14 such such JJ 9878 15 15 as as IN 9878 15 16 quality quality NN 9878 15 17 control control NN 9878 15 18 , , , 9878 15 19 adding add VBG 9878 15 20 and and CC 9878 15 21 updating update VBG 9878 15 22 missing missing JJ 9878 15 23 pages page NNS 9878 15 24 , , , 9878 15 25 and and CC 9878 15 26 creating create VBG 9878 15 27 new new JJ 9878 15 28 metadata metadata NN 9878 15 29 and and CC 9878 15 30 OCR ocr NN 9878 15 31 data datum NNS 9878 15 32 for for IN 9878 15 33 discovery discovery NN 9878 15 34 and and CC 9878 15 35 digital digital JJ 9878 15 36 preservation preservation NN 9878 15 37 . . . 9878 16 1 As as IN 9878 16 2 a a DT 9878 16 3 result result NN 9878 16 4 , , , 9878 16 5 PDF PDF NNP 9878 16 6 / / SYM 9878 16 7 A a DT 9878 16 8 can can MD 9878 16 9 be be VB 9878 16 10 used use VBN 9878 16 11 in in IN 9878 16 12 every every DT 9878 16 13 phase phase NN 9878 16 14 of of IN 9878 16 15 a a DT 9878 16 16 digital digital JJ 9878 16 17 object object NN 9878 16 18 in in IN 9878 16 19 an an DT 9878 16 20 Open open JJ 9878 16 21 Archival Archival NNP 9878 16 22 Information Information NNP 9878 16 23 System System NNP 9878 16 24 ( ( -LRB- 9878 16 25 OAIS)—for OAIS)—for NNP 9878 16 26 example example NN 9878 16 27 , , , 9878 16 28 a a DT 9878 16 29 Submission Submission NNP 9878 16 30 Information Information NNP 9878 16 31 Package Package NNP 9878 16 32 ( ( -LRB- 9878 16 33 SIP SIP NNP 9878 16 34 ) ) -RRB- 9878 16 35 , , , 9878 16 36 Archive Archive NNP 9878 16 37 Information Information NNP 9878 16 38 Package Package NNP 9878 16 39 ( ( -LRB- 9878 16 40 AIP AIP NNP 9878 16 41 ) ) -RRB- 9878 16 42 , , , 9878 16 43 and and CC 9878 16 44 Dissemination Dissemination NNP 9878 16 45 Information Information NNP 9878 16 46 Package Package NNP 9878 16 47 ( ( -LRB- 9878 16 48 DIP DIP NNP 9878 16 49 ) ) -RRB- 9878 16 50 . . . 9878 17 1 In in IN 9878 17 2 summary summary NN 9878 17 3 , , , 9878 17 4 a a DT 9878 17 5 PDF PDF NNP 9878 17 6 / / SYM 9878 17 7 A a DT 9878 17 8 file file NN 9878 17 9 can can MD 9878 17 10 be be VB 9878 17 11 a a DT 9878 17 12 structured structured JJ 9878 17 13 , , , 9878 17 14 self self NN 9878 17 15 - - HYPH 9878 17 16 contained contain VBN 9878 17 17 , , , 9878 17 18 and and CC 9878 17 19 self- self- NNP 9878 17 20 described describe VBN 9878 17 21 container container NNP 9878 17 22 allowing allow VBG 9878 17 23 a a DT 9878 17 24 simpler simple JJR 9878 17 25 one one CD 9878 17 26 - - HYPH 9878 17 27 to to IN 9878 17 28 - - HYPH 9878 17 29 one one CD 9878 17 30 relationship relationship NN 9878 17 31 between between IN 9878 17 32 an an DT 9878 17 33 original original JJ 9878 17 34 physical physical JJ 9878 17 35 document document NN 9878 17 36 and and CC 9878 17 37 its -PRON- PRP$ 9878 17 38 digital digital JJ 9878 17 39 surrogate surrogate NN 9878 17 40 . . . 9878 18 1 In in IN 9878 18 2 September September NNP 9878 18 3 2016 2016 CD 9878 18 4 , , , 9878 18 5 the the DT 9878 18 6 Federal Federal NNP 9878 18 7 Agencies Agencies NNPS 9878 18 8 Digital Digital NNP 9878 18 9 Guidelines Guidelines NNP 9878 18 10 Initiative Initiative NNP 9878 18 11 ( ( -LRB- 9878 18 12 FADGI FADGI NNP 9878 18 13 ) ) -RRB- 9878 18 14 released release VBD 9878 18 15 its -PRON- PRP$ 9878 18 16 latest late JJS 9878 18 17 guidelines guideline NNS 9878 18 18 for for IN 9878 18 19 digitization digitization NN 9878 18 20 related relate VBN 9878 18 21 to to IN 9878 18 22 raster raster NN 9878 18 23 images image NNS 9878 18 24 : : : 9878 18 25 Technical Technical NNP 9878 18 26 Guidelines Guidelines NNPS 9878 18 27 for for IN 9878 18 28 Digitizing Digitizing NNP 9878 18 29 Heritage Heritage NNP 9878 18 30 Materials.4 Materials.4 NNP 9878 18 31 The the DT 9878 18 32 de de JJ 9878 18 33 - - JJ 9878 18 34 facto facto VB 9878 18 35 best good JJS 9878 18 36 practices practice NNS 9878 18 37 for for IN 9878 18 38 digitization digitization NN 9878 18 39 , , , 9878 18 40 these these DT 9878 18 41 guidelines guideline NNS 9878 18 42 provide provide VBP 9878 18 43 federal federal JJ 9878 18 44 agencies agency NNS 9878 18 45 guidance guidance NN 9878 18 46 and and CC 9878 18 47 have have VBP 9878 18 48 been be VBN 9878 18 49 used use VBN 9878 18 50 in in IN 9878 18 51 many many JJ 9878 18 52 cultural cultural JJ 9878 18 53 heritage heritage NN 9878 18 54 institutions institution NNS 9878 18 55 . . . 9878 19 1 Both both DT 9878 19 2 the the DT 9878 19 3 PDF PDF NNP 9878 19 4 Association Association NNP 9878 19 5 and and CC 9878 19 6 the the DT 9878 19 7 authors author NNS 9878 19 8 welcomed welcome VBD 9878 19 9 the the DT 9878 19 10 recognition recognition NN 9878 19 11 of of IN 9878 19 12 PDF PDF NNP 9878 19 13 / / , 9878 19 14 A A NNP 9878 19 15 as as IN 9878 19 16 the the DT 9878 19 17 preferred preferred JJ 9878 19 18 master master NN 9878 19 19 file file NN 9878 19 20 format format NN 9878 19 21 for for IN 9878 19 22 digitization digitization NN 9878 19 23 of of IN 9878 19 24 text text NN 9878 19 25 documents document NNS 9878 19 26 such such JJ 9878 19 27 as as IN 9878 19 28 unbound unbound JJ 9878 19 29 documents document NNS 9878 19 30 , , , 9878 19 31 bound bind VBN 9878 19 32 volumes volume NNS 9878 19 33 , , , 9878 19 34 and and CC 9878 19 35 newspapers.5 newspapers.5 CD 9878 19 36 GOALS goal NNS 9878 19 37 AND and CC 9878 19 38 TASKS task NNS 9878 19 39 Since since IN 9878 19 40 Han Han NNP 9878 19 41 has have VBZ 9878 19 42 previously previously RB 9878 19 43 provided provide VBN 9878 19 44 theoretical theoretical JJ 9878 19 45 methods method NNS 9878 19 46 of of IN 9878 19 47 coding code VBG 9878 19 48 raster raster NN 9878 19 49 images image NNS 9878 19 50 , , , 9878 19 51 metadata metadata NN 9878 19 52 , , , 9878 19 53 and and CC 9878 19 54 related related JJ 9878 19 55 information information NN 9878 19 56 in in IN 9878 19 57 PDF PDF NNP 9878 19 58 / / SYM 9878 19 59 A A NNP 9878 19 60 , , , 9878 19 61 the the DT 9878 19 62 goals goal NNS 9878 19 63 of of IN 9878 19 64 this this DT 9878 19 65 article article NN 9878 19 66 are be VBP 9878 19 67 threefold threefold JJ 9878 19 68 : : : 9878 19 69 1 1 LS 9878 19 70 . . . 9878 19 71 present present JJ 9878 19 72 real real JJ 9878 19 73 - - HYPH 9878 19 74 life life NN 9878 19 75 experience experience NN 9878 19 76 of of IN 9878 19 77 converting convert VBG 9878 19 78 TIFFs TIFFs NNPS 9878 19 79 / / SYM 9878 19 80 JPEG2000s JPEG2000s NNP 9878 19 81 to to IN 9878 19 82 PDF PDF NNP 9878 19 83 / / SYM 9878 19 84 A a NN 9878 19 85 and and CC 9878 19 86 back back NN 9878 19 87 , , , 9878 19 88 along along IN 9878 19 89 with with IN 9878 19 90 image image NN 9878 19 91 metadata metadata NN 9878 19 92 2 2 CD 9878 19 93 . . . 9878 19 94 test test VB 9878 19 95 open open JJ 9878 19 96 source source NN 9878 19 97 libraries library NNS 9878 19 98 to to TO 9878 19 99 create create VB 9878 19 100 and and CC 9878 19 101 manipulate manipulate VB 9878 19 102 images image NNS 9878 19 103 , , , 9878 19 104 image image NN 9878 19 105 metadata metadata NN 9878 19 106 , , , 9878 19 107 and and CC 9878 19 108 PDF PDF NNP 9878 19 109 / / SYM 9878 19 110 A A NNP 9878 19 111 3 3 CD 9878 19 112 . . . 9878 19 113 validate validate NN 9878 19 114 generated generate VBD 9878 19 115 PDF PDF NNP 9878 19 116 / / , 9878 19 117 As as IN 9878 19 118 with with IN 9878 19 119 the the DT 9878 19 120 first first JJ 9878 19 121 legitimate legitimate JJ 9878 19 122 validator validator NN 9878 19 123 for for IN 9878 19 124 PDF PDF NNP 9878 19 125 / / SYM 9878 19 126 A a DT 9878 19 127 validation validation NN 9878 19 128 The the DT 9878 19 129 tasks task NNS 9878 19 130 included include VBD 9878 19 131 the the DT 9878 19 132 following follow VBG 9878 19 133 : : : 9878 19 134 ● ● NFP 9878 19 135 Convert convert VB 9878 19 136 all all PDT 9878 19 137 the the DT 9878 19 138 master master NN 9878 19 139 files file NNS 9878 19 140 in in IN 9878 19 141 TIFFs TIFFs NNP 9878 19 142 / / SYM 9878 19 143 JPEG2000 JPEG2000 NNP 9878 19 144 from from IN 9878 19 145 digitization digitization NN 9878 19 146 of of IN 9878 19 147 text text NN 9878 19 148 documents document NNS 9878 19 149 into into IN 9878 19 150 single single JJ 9878 19 151 PDF pdf NN 9878 19 152 / / , 9878 19 153 A a DT 9878 19 154 files file NNS 9878 19 155 losslessly losslessly RB 9878 19 156 . . . 9878 20 1 One one CD 9878 20 2 document document NN 9878 20 3 , , , 9878 20 4 one one CD 9878 20 5 PDF PDF NNP 9878 20 6 / / , 9878 20 7 A a DT 9878 20 8 file file NN 9878 20 9 . . . 9878 21 1 ● ● NFP 9878 21 2 Evaluate evaluate VB 9878 21 3 and and CC 9878 21 4 extract extract VB 9878 21 5 metadata metadata NN 9878 21 6 from from IN 9878 21 7 each each DT 9878 21 8 TIFF TIFF NNP 9878 21 9 / / SYM 9878 21 10 JPEG2000 JPEG2000 NNP 9878 21 11 image image NN 9878 21 12 and and CC 9878 21 13 encode encode VB 9878 21 14 it -PRON- PRP 9878 21 15 along along RP 9878 21 16 with with IN 9878 21 17 its -PRON- PRP$ 9878 21 18 image image NN 9878 21 19 when when WRB 9878 21 20 creating create VBG 9878 21 21 the the DT 9878 21 22 corresponding correspond VBG 9878 21 23 PDF PDF NNP 9878 21 24 / / , 9878 21 25 A a DT 9878 21 26 file file NN 9878 21 27 . . . 9878 22 1 ● ● NFP 9878 22 2 Demonstrate demonstrate VB 9878 22 3 the the DT 9878 22 4 runtimes runtime NNS 9878 22 5 of of IN 9878 22 6 the the DT 9878 22 7 above above JJ 9878 22 8 tasks task NNS 9878 22 9 for for IN 9878 22 10 feasibility feasibility NN 9878 22 11 evaluation evaluation NN 9878 22 12 . . . 9878 23 1 ● ● NFP 9878 23 2 Validate validate VB 9878 23 3 the the DT 9878 23 4 PDF pdf NN 9878 23 5 / / , 9878 23 6 A a DT 9878 23 7 files file NNS 9878 23 8 against against IN 9878 23 9 the the DT 9878 23 10 newly newly RB 9878 23 11 released release VBN 9878 23 12 open open JJ 9878 23 13 source source NN 9878 23 14 PDF PDF NNP 9878 23 15 / / , 9878 23 16 A a DT 9878 23 17 validator validator NN 9878 23 18 veraPDF verapdf NN 9878 23 19 . . . 9878 24 1 ● ● NFP 9878 24 2 Extract extract VB 9878 24 3 each each DT 9878 24 4 digital digital JJ 9878 24 5 image image NN 9878 24 6 from from IN 9878 24 7 the the DT 9878 24 8 PDF PDF NNP 9878 24 9 / / SYM 9878 24 10 A a DT 9878 24 11 file file NN 9878 24 12 back back RB 9878 24 13 to to IN 9878 24 14 its -PRON- PRP$ 9878 24 15 original original JJ 9878 24 16 master master NN 9878 24 17 image image NN 9878 24 18 files file NNS 9878 24 19 along along IN 9878 24 20 with with IN 9878 24 21 associated associated JJ 9878 24 22 metadata metadata NN 9878 24 23 . . . 9878 25 1 ● ● NFP 9878 25 2 Verify verify VB 9878 25 3 the the DT 9878 25 4 extracted extract VBN 9878 25 5 image image NN 9878 25 6 files file NNS 9878 25 7 in in IN 9878 25 8 the the DT 9878 25 9 back back JJ 9878 25 10 - - HYPH 9878 25 11 and and CC 9878 25 12 - - HYPH 9878 25 13 forth forth NN 9878 25 14 conversion conversion NN 9878 25 15 process process NN 9878 25 16 against against IN 9878 25 17 the the DT 9878 25 18 original original JJ 9878 25 19 master master NN 9878 25 20 image image NN 9878 25 21 files file VBZ 9878 25 22 Choices Choices NNPS 9878 25 23 of of IN 9878 25 24 PDF PDF NNP 9878 25 25 / / SYM 9878 25 26 A A NNP 9878 25 27 Standards Standards NNPS 9878 25 28 and and CC 9878 25 29 Conformance Conformance NNP 9878 25 30 Level Level NNP 9878 25 31 This this DT 9878 25 32 article article NN 9878 25 33 demonstrates demonstrate VBZ 9878 25 34 using use VBG 9878 25 35 PDF PDF NNP 9878 25 36 / / SYM 9878 25 37 A-2b A-2b NNP 9878 25 38 as as IN 9878 25 39 a a DT 9878 25 40 self self NN 9878 25 41 - - HYPH 9878 25 42 contained contain VBN 9878 25 43 self self NN 9878 25 44 - - HYPH 9878 25 45 describing describe VBG 9878 25 46 file file NN 9878 25 47 format format NN 9878 25 48 . . . 9878 26 1 Currently currently RB 9878 26 2 , , , 9878 26 3 there there EX 9878 26 4 are be VBP 9878 26 5 three three CD 9878 26 6 related related JJ 9878 26 7 PDF PDF NNP 9878 26 8 / / , 9878 26 9 A a DT 9878 26 10 standards standard NNS 9878 26 11 ( ( -LRB- 9878 26 12 PDF PDF NNP 9878 26 13 / / SYM 9878 26 14 A-1 A-1 NNP 9878 26 15 , , , 9878 26 16 PDF PDF NNP 9878 26 17 / / SYM 9878 26 18 A-2 A-2 NNP 9878 26 19 , , , 9878 26 20 and and CC 9878 26 21 PDF PDF NNP 9878 26 22 / / SYM 9878 26 23 A-3 A-3 NNP 9878 26 24 ) ) -RRB- 9878 26 25 , , , 9878 26 26 each each DT 9878 26 27 with with IN 9878 26 28 INFORMATION INFORMATION NNP 9878 26 29 TECHNOLOGY TECHNOLOGY NNP 9878 26 30 AND and CC 9878 26 31 LIBRARIES library NNS 9878 26 32 | | NNP 9878 26 33 MARCH MARCH NNP 9878 26 34 2018 2018 CD 9878 26 35 54 54 CD 9878 26 36 three three CD 9878 26 37 conformance conformance NN 9878 26 38 levels level NNS 9878 26 39 ( ( -LRB- 9878 26 40 a a DT 9878 26 41 , , , 9878 26 42 b b NN 9878 26 43 , , , 9878 26 44 and and CC 9878 26 45 u u NNP 9878 26 46 ) ) -RRB- 9878 26 47 . . . 9878 27 1 The the DT 9878 27 2 reasons reason NNS 9878 27 3 for for IN 9878 27 4 choosing choose VBG 9878 27 5 PDF PDF NNP 9878 27 6 / / SYM 9878 27 7 A-2 A-2 NNP 9878 27 8 ( ( -LRB- 9878 27 9 instead instead RB 9878 27 10 of of IN 9878 27 11 PDF PDF NNP 9878 27 12 / / SYM 9878 27 13 A-1 A-1 NNP 9878 27 14 or or CC 9878 27 15 PDF PDF NNP 9878 27 16 / / SYM 9878 27 17 A-3 A-3 NNP 9878 27 18 ) ) -RRB- 9878 27 19 are be VBP 9878 27 20 the the DT 9878 27 21 following follow VBG 9878 27 22 : : : 9878 27 23 ● ● NFP 9878 27 24 PDF PDF NNP 9878 27 25 / / SYM 9878 27 26 A-1 A-1 NNP 9878 27 27 is be VBZ 9878 27 28 based base VBN 9878 27 29 on on IN 9878 27 30 PDF PDF NNP 9878 27 31 1.4 1.4 CD 9878 27 32 . . . 9878 28 1 In in IN 9878 28 2 this this DT 9878 28 3 standard standard NN 9878 28 4 , , , 9878 28 5 images image NNS 9878 28 6 coded code VBN 9878 28 7 in in IN 9878 28 8 PDF PDF NNP 9878 28 9 / / SYM 9878 28 10 A-1 A-1 NNP 9878 28 11 can can MD 9878 28 12 not not RB 9878 28 13 use use VB 9878 28 14 JPEG2000 JPEG2000 NNP 9878 28 15 compression compression NN 9878 28 16 ( ( -LRB- 9878 28 17 named name VBN 9878 28 18 in in IN 9878 28 19 PDF PDF NNP 9878 28 20 / / , 9878 28 21 A A NNP 9878 28 22 as as IN 9878 28 23 JPXDecode jpxdecode NN 9878 28 24 ) ) -RRB- 9878 28 25 . . . 9878 29 1 One one PRP 9878 29 2 can can MD 9878 29 3 still still RB 9878 29 4 convert convert VB 9878 29 5 TIFFs TIFFs NNP 9878 29 6 to to IN 9878 29 7 PDF PDF NNP 9878 29 8 / / SYM 9878 29 9 A-1 A-1 NNP 9878 29 10 using use VBG 9878 29 11 other other JJ 9878 29 12 lossless lossless JJ 9878 29 13 compression compression NN 9878 29 14 methods method NNS 9878 29 15 such such JJ 9878 29 16 as as IN 9878 29 17 LZW LZW NNP 9878 29 18 . . . 9878 30 1 However however RB 9878 30 2 , , , 9878 30 3 the the DT 9878 30 4 space- space- JJ 9878 30 5 saving save VBG 9878 30 6 benefits benefit NNS 9878 30 7 of of IN 9878 30 8 JPEG2000 JPEG2000 NNP 9878 30 9 compression compression NN 9878 30 10 over over IN 9878 30 11 other other JJ 9878 30 12 methods method NNS 9878 30 13 would would MD 9878 30 14 not not RB 9878 30 15 be be VB 9878 30 16 utilized utilize VBN 9878 30 17 . . . 9878 31 1 ● ● NFP 9878 31 2 PDF PDF NNP 9878 31 3 / / SYM 9878 31 4 A-2 A-2 NNP 9878 31 5 and and CC 9878 31 6 PDF PDF NNP 9878 31 7 / / SYM 9878 31 8 A-3 A-3 NNP 9878 31 9 are be VBP 9878 31 10 based base VBN 9878 31 11 on on IN 9878 31 12 PDF PDF NNP 9878 31 13 1.7 1.7 CD 9878 31 14 . . . 9878 32 1 One one CD 9878 32 2 significant significant JJ 9878 32 3 feature feature NN 9878 32 4 of of IN 9878 32 5 PDF PDF NNP 9878 32 6 1.7 1.7 CD 9878 32 7 is be VBZ 9878 32 8 that that IN 9878 32 9 it -PRON- PRP 9878 32 10 supports support VBZ 9878 32 11 JPEG2000 JPEG2000 NNP 9878 32 12 compression compression NN 9878 32 13 , , , 9878 32 14 which which WDT 9878 32 15 saves save VBZ 9878 32 16 40–60 40–60 CD 9878 32 17 percent percent NN 9878 32 18 of of IN 9878 32 19 space space NN 9878 32 20 for for IN 9878 32 21 raster raster NN 9878 32 22 images image NNS 9878 32 23 compared compare VBN 9878 32 24 to to IN 9878 32 25 uncompressed uncompress VBN 9878 32 26 TIFFs TIFFs NNPS 9878 32 27 . . . 9878 33 1 ● ● NFP 9878 33 2 PDF PDF NNP 9878 33 3 / / SYM 9878 33 4 A-3 A-3 NNP 9878 33 5 has have VBZ 9878 33 6 one one CD 9878 33 7 major major JJ 9878 33 8 feature feature NN 9878 33 9 that that WDT 9878 33 10 PDF PDF NNP 9878 33 11 / / SYM 9878 33 12 A-2 A-2 NNP 9878 33 13 does do VBZ 9878 33 14 not not RB 9878 33 15 have have VB 9878 33 16 , , , 9878 33 17 which which WDT 9878 33 18 is be VBZ 9878 33 19 to to TO 9878 33 20 allow allow VB 9878 33 21 arbitrary arbitrary JJ 9878 33 22 files file NNS 9878 33 23 to to TO 9878 33 24 be be VB 9878 33 25 embedded embed VBN 9878 33 26 within within IN 9878 33 27 the the DT 9878 33 28 PDF PDF NNP 9878 33 29 file file NN 9878 33 30 . . . 9878 34 1 In in IN 9878 34 2 this this DT 9878 34 3 case case NN 9878 34 4 , , , 9878 34 5 there there EX 9878 34 6 is be VBZ 9878 34 7 no no DT 9878 34 8 file file NN 9878 34 9 to to TO 9878 34 10 be be VB 9878 34 11 embedded embed VBN 9878 34 12 . . . 9878 35 1 The the DT 9878 35 2 authors author NNS 9878 35 3 chose choose VBD 9878 35 4 conformance conformance NN 9878 35 5 level level NN 9878 35 6 b b NN 9878 35 7 for for IN 9878 35 8 simplicity simplicity NN 9878 35 9 . . . 9878 36 1 ● ● NFP 9878 36 2 b b NN 9878 36 3 is be VBZ 9878 36 4 basic basic JJ 9878 36 5 conformance conformance NN 9878 36 6 , , , 9878 36 7 which which WDT 9878 36 8 requires require VBZ 9878 36 9 only only RB 9878 36 10 necessary necessary JJ 9878 36 11 components component NNS 9878 36 12 ( ( -LRB- 9878 36 13 e.g. e.g. RB 9878 36 14 , , , 9878 36 15 all all DT 9878 36 16 fonts font NNS 9878 36 17 embedded embed VBN 9878 36 18 in in IN 9878 36 19 the the DT 9878 36 20 PDF pdf NN 9878 36 21 ) ) -RRB- 9878 36 22 for for IN 9878 36 23 reproduction reproduction NN 9878 36 24 of of IN 9878 36 25 a a DT 9878 36 26 document document NN 9878 36 27 ’s ’s POS 9878 36 28 visual visual JJ 9878 36 29 appearance appearance NN 9878 36 30 . . . 9878 37 1 ● ● NFP 9878 37 2 a a DT 9878 37 3 is be VBZ 9878 37 4 accessible accessible JJ 9878 37 5 conformance conformance NN 9878 37 6 , , , 9878 37 7 which which WDT 9878 37 8 means mean VBZ 9878 37 9 b b NNP 9878 37 10 conformance conformance NN 9878 37 11 level level NN 9878 37 12 plus plus CC 9878 37 13 additional additional JJ 9878 37 14 accessibility accessibility NN 9878 37 15 ( ( -LRB- 9878 37 16 structural structural JJ 9878 37 17 and and CC 9878 37 18 semantic semantic JJ 9878 37 19 features feature NNS 9878 37 20 such such JJ 9878 37 21 as as IN 9878 37 22 document document NN 9878 37 23 structure structure NN 9878 37 24 ) ) -RRB- 9878 37 25 . . . 9878 38 1 One one PRP 9878 38 2 can can MD 9878 38 3 add add VB 9878 38 4 tags tag NNS 9878 38 5 to to TO 9878 38 6 convert convert VB 9878 38 7 PDF/2b pdf/2b NN 9878 38 8 to to IN 9878 38 9 PDF/2a pdf/2a XX 9878 38 10 . . . 9878 39 1 ● ● NFP 9878 39 2 u u NNP 9878 39 3 represents represent VBZ 9878 39 4 a a DT 9878 39 5 conformance conformance NN 9878 39 6 level level NN 9878 39 7 with with IN 9878 39 8 the the DT 9878 39 9 additional additional JJ 9878 39 10 requirement requirement NN 9878 39 11 that that IN 9878 39 12 all all DT 9878 39 13 text text NN 9878 39 14 in in IN 9878 39 15 the the DT 9878 39 16 document document NN 9878 39 17 have have VBP 9878 39 18 Unicode Unicode NNP 9878 39 19 equivalents equivalent NNS 9878 39 20 . . . 9878 40 1 This this DT 9878 40 2 article article NN 9878 40 3 does do VBZ 9878 40 4 not not RB 9878 40 5 cover cover VB 9878 40 6 any any DT 9878 40 7 post post NN 9878 40 8 - - JJ 9878 40 9 processing processing NN 9878 40 10 of of IN 9878 40 11 additional additional JJ 9878 40 12 manual manual JJ 9878 40 13 or or CC 9878 40 14 computational computational JJ 9878 40 15 features feature NNS 9878 40 16 such such JJ 9878 40 17 as as IN 9878 40 18 adding add VBG 9878 40 19 OCR ocr NN 9878 40 20 text text NN 9878 40 21 to to IN 9878 40 22 the the DT 9878 40 23 generated generate VBN 9878 40 24 PDF PDF NNP 9878 40 25 / / , 9878 40 26 A a DT 9878 40 27 files file NNS 9878 40 28 . . . 9878 41 1 These these DT 9878 41 2 features feature NNS 9878 41 3 do do VBP 9878 41 4 not not RB 9878 41 5 help help VB 9878 41 6 faithfully faithfully RB 9878 41 7 capture capture VB 9878 41 8 the the DT 9878 41 9 look look NN 9878 41 10 and and CC 9878 41 11 feel feel NN 9878 41 12 of of IN 9878 41 13 original original JJ 9878 41 14 pages page NNS 9878 41 15 in in IN 9878 41 16 digitization digitization NN 9878 41 17 , , , 9878 41 18 and and CC 9878 41 19 they -PRON- PRP 9878 41 20 can can MD 9878 41 21 be be VB 9878 41 22 added add VBN 9878 41 23 or or CC 9878 41 24 updated update VBN 9878 41 25 later later RB 9878 41 26 without without IN 9878 41 27 any any DT 9878 41 28 loss loss NN 9878 41 29 of of IN 9878 41 30 information information NN 9878 41 31 . . . 9878 42 1 In in IN 9878 42 2 addition addition NN 9878 42 3 , , , 9878 42 4 OCR ocr NN 9878 42 5 results result NNS 9878 42 6 rely rely VBP 9878 42 7 on on IN 9878 42 8 the the DT 9878 42 9 availability availability NN 9878 42 10 of of IN 9878 42 11 OCR ocr NN 9878 42 12 engines engine NNS 9878 42 13 for for IN 9878 42 14 the the DT 9878 42 15 document document NN 9878 42 16 ’s ’s POS 9878 42 17 language language NN 9878 42 18 , , , 9878 42 19 and and CC 9878 42 20 results result NNS 9878 42 21 can can MD 9878 42 22 vary vary VB 9878 42 23 between between IN 9878 42 24 different different JJ 9878 42 25 OCR ocr NN 9878 42 26 engines engine NNS 9878 42 27 over over IN 9878 42 28 time time NN 9878 42 29 . . . 9878 43 1 OCR ocr NN 9878 43 2 technology technology NN 9878 43 3 is be VBZ 9878 43 4 getting get VBG 9878 43 5 better well JJR 9878 43 6 and and CC 9878 43 7 will will MD 9878 43 8 produce produce VB 9878 43 9 better well JJR 9878 43 10 results result NNS 9878 43 11 in in IN 9878 43 12 the the DT 9878 43 13 future future NN 9878 43 14 . . . 9878 44 1 For for IN 9878 44 2 example example NN 9878 44 3 , , , 9878 44 4 current current JJ 9878 44 5 OCR ocr NN 9878 44 6 technology technology NN 9878 44 7 for for IN 9878 44 8 English English NNP 9878 44 9 gives give VBZ 9878 44 10 very very RB 9878 44 11 reliable reliable JJ 9878 44 12 ( ( -LRB- 9878 44 13 more more JJR 9878 44 14 than than IN 9878 44 15 90 90 CD 9878 44 16 percent percent NN 9878 44 17 ) ) -RRB- 9878 44 18 accuracy accuracy NN 9878 44 19 . . . 9878 45 1 In in IN 9878 45 2 comparison comparison NN 9878 45 3 , , , 9878 45 4 traditional traditional JJ 9878 45 5 Chinese chinese JJ 9878 45 6 manuscripts manuscript NNS 9878 45 7 and and CC 9878 45 8 Pashto Pashto NNP 9878 45 9 / / SYM 9878 45 10 Persian Persian NNP 9878 45 11 give give VBP 9878 45 12 unacceptably unacceptably RB 9878 45 13 low low JJ 9878 45 14 accuracy accuracy NN 9878 45 15 ( ( -LRB- 9878 45 16 less less JJR 9878 45 17 than than IN 9878 45 18 60 60 CD 9878 45 19 percent percent NN 9878 45 20 ) ) -RRB- 9878 45 21 . . . 9878 46 1 The the DT 9878 46 2 cutting cutting JJ 9878 46 3 edge edge NN 9878 46 4 on on IN 9878 46 5 OCR ocr NN 9878 46 6 engines engine NNS 9878 46 7 has have VBZ 9878 46 8 started start VBN 9878 46 9 to to TO 9878 46 10 utilize utilize VB 9878 46 11 artificial artificial JJ 9878 46 12 intelligence intelligence NN 9878 46 13 networks network NNS 9878 46 14 , , , 9878 46 15 and and CC 9878 46 16 the the DT 9878 46 17 authors author NNS 9878 46 18 believe believe VBP 9878 46 19 that that IN 9878 46 20 a a DT 9878 46 21 breakthrough breakthrough NN 9878 46 22 will will MD 9878 46 23 happen happen VB 9878 46 24 soon soon RB 9878 46 25 . . . 9878 47 1 Data Data NNP 9878 47 2 Source Source NNP 9878 47 3 The the DT 9878 47 4 University University NNP 9878 47 5 of of IN 9878 47 6 Arizona Arizona NNP 9878 47 7 Libraries Libraries NNPS 9878 47 8 ( ( -LRB- 9878 47 9 UAL UAL NNP 9878 47 10 ) ) -RRB- 9878 47 11 and and CC 9878 47 12 Afghanistan Afghanistan NNP 9878 47 13 Center Center NNP 9878 47 14 at at IN 9878 47 15 Kabul Kabul NNP 9878 47 16 University University NNP 9878 47 17 ( ( -LRB- 9878 47 18 ACKU ACKU NNP 9878 47 19 ) ) -RRB- 9878 47 20 have have VBP 9878 47 21 been be VBN 9878 47 22 partnering partner VBG 9878 47 23 to to TO 9878 47 24 digitize digitize VB 9878 47 25 and and CC 9878 47 26 preserve preserve VB 9878 47 27 ACKU ACKU NNP 9878 47 28 ’s ’s POS 9878 47 29 permanent permanent JJ 9878 47 30 collection collection NN 9878 47 31 held hold VBN 9878 47 32 in in IN 9878 47 33 Kabul Kabul NNP 9878 47 34 . . . 9878 48 1 This this DT 9878 48 2 collaborative collaborative JJ 9878 48 3 project project NN 9878 48 4 created create VBD 9878 48 5 the the DT 9878 48 6 largest large JJS 9878 48 7 Afghan afghan JJ 9878 48 8 digital digital JJ 9878 48 9 repository repository NN 9878 48 10 in in IN 9878 48 11 the the DT 9878 48 12 world world NN 9878 48 13 . . . 9878 49 1 Currently currently RB 9878 49 2 the the DT 9878 49 3 Afghan afghan JJ 9878 49 4 digital digital JJ 9878 49 5 repository repository NN 9878 49 6 ( ( -LRB- 9878 49 7 http://www.afghandata.org http://www.afghandata.org NNP 9878 49 8 ) ) -RRB- 9878 49 9 contains contain VBZ 9878 49 10 more more JJR 9878 49 11 than than IN 9878 49 12 fifteen fifteen CD 9878 49 13 thousand thousand CD 9878 49 14 titles title NNS 9878 49 15 and and CC 9878 49 16 1.6 1.6 CD 9878 49 17 million million CD 9878 49 18 pages page NNS 9878 49 19 of of IN 9878 49 20 documents document NNS 9878 49 21 . . . 9878 50 1 Digitization digitization NN 9878 50 2 of of IN 9878 50 3 these these DT 9878 50 4 text text NN 9878 50 5 documents document NNS 9878 50 6 follows follow VBZ 9878 50 7 the the DT 9878 50 8 previous previous JJ 9878 50 9 version version NN 9878 50 10 of of IN 9878 50 11 the the DT 9878 50 12 FADGI FADGI NNP 9878 50 13 guideline guideline NN 9878 50 14 , , , 9878 50 15 which which WDT 9878 50 16 recommended recommend VBD 9878 50 17 scanning scanning NN 9878 50 18 each each DT 9878 50 19 page page NN 9878 50 20 of of IN 9878 50 21 a a DT 9878 50 22 text text NN 9878 50 23 document document NN 9878 50 24 into into IN 9878 50 25 a a DT 9878 50 26 separate separate JJ 9878 50 27 TIFF TIFF NNP 9878 50 28 file file NN 9878 50 29 as as IN 9878 50 30 the the DT 9878 50 31 master master NN 9878 50 32 file file NN 9878 50 33 . . . 9878 51 1 These these DT 9878 51 2 TIFFs TIFFs NNPS 9878 51 3 were be VBD 9878 51 4 organized organize VBN 9878 51 5 by by IN 9878 51 6 directories directory NNS 9878 51 7 in in IN 9878 51 8 a a DT 9878 51 9 file file NN 9878 51 10 system system NN 9878 51 11 , , , 9878 51 12 where where WRB 9878 51 13 each each DT 9878 51 14 directory directory NN 9878 51 15 represents represent VBZ 9878 51 16 a a DT 9878 51 17 corresponding corresponding JJ 9878 51 18 document document NN 9878 51 19 containing contain VBG 9878 51 20 all all PDT 9878 51 21 the the DT 9878 51 22 scanned scan VBN 9878 51 23 pages page NNS 9878 51 24 of of IN 9878 51 25 this this DT 9878 51 26 title title NN 9878 51 27 . . . 9878 52 1 An an DT 9878 52 2 example example NN 9878 52 3 of of IN 9878 52 4 the the DT 9878 52 5 directory directory NN 9878 52 6 structure structure NN 9878 52 7 can can MD 9878 52 8 be be VB 9878 52 9 found find VBN 9878 52 10 in in IN 9878 52 11 Han Han NNP 9878 52 12 ’s ’s POS 9878 52 13 article article NN 9878 52 14 . . . 9878 53 1 http://www.afghandata.org/ http://www.afghandata.org/ NFP 9878 53 2 DIGITIZATION DIGITIZATION NNS 9878 53 3 OF of IN 9878 53 4 TEXT text NN 9878 53 5 DOCUMENTS documents NN 9878 53 6 USING USING NNP 9878 53 7 PDF PDF NNP 9878 53 8 / / SYM 9878 53 9 A A NNP 9878 53 10 | | NNP 9878 53 11 HAN HAN NNP 9878 53 12 AND and CC 9878 53 13 WAN WAN NNP 9878 53 14 55 55 CD 9878 53 15 HTTPS://DOI.ORG/10.6017 HTTPS://DOI.ORG/10.6017 NNS 9878 53 16 / / SYM 9878 53 17 ITAL.V37I1.9878 ITAL.V37I1.9878 NNP 9878 53 18 PDF PDF NNP 9878 53 19 / / SYM 9878 53 20 A A NNP 9878 53 21 and and CC 9878 53 22 Image Image NNP 9878 53 23 Manipulation Manipulation NNP 9878 53 24 Tools Tools NNP 9878 53 25 There there EX 9878 53 26 are be VBP 9878 53 27 a a DT 9878 53 28 few few JJ 9878 53 29 open open JJ 9878 53 30 source source NN 9878 53 31 and and CC 9878 53 32 proprietary proprietary JJ 9878 53 33 PDF PDF NNP 9878 53 34 software software NN 9878 53 35 development development NN 9878 53 36 kits kit NNS 9878 53 37 ( ( -LRB- 9878 53 38 SDK SDK NNP 9878 53 39 ) ) -RRB- 9878 53 40 . . . 9878 54 1 Adobe Adobe NNP 9878 54 2 PDF PDF NNP 9878 54 3 Library Library NNP 9878 54 4 and and CC 9878 54 5 Foxit Foxit NNP 9878 54 6 SDK SDK NNP 9878 54 7 are be VBP 9878 54 8 the the DT 9878 54 9 most most RBS 9878 54 10 well well RB 9878 54 11 - - HYPH 9878 54 12 known know VBN 9878 54 13 commercial commercial JJ 9878 54 14 tools tool NNS 9878 54 15 to to TO 9878 54 16 manipulate manipulate VB 9878 54 17 PDFs pdf NNS 9878 54 18 . . . 9878 55 1 To to TO 9878 55 2 show show VB 9878 55 3 readers reader NNS 9878 55 4 that that IN 9878 55 5 they -PRON- PRP 9878 55 6 can can MD 9878 55 7 manipulate manipulate VB 9878 55 8 and and CC 9878 55 9 generate generate VB 9878 55 10 PDF PDF NNP 9878 55 11 / / , 9878 55 12 A a DT 9878 55 13 documents document NNS 9878 55 14 themselves -PRON- PRP 9878 55 15 , , , 9878 55 16 open open JJ 9878 55 17 source source NN 9878 55 18 software software NN 9878 55 19 , , , 9878 55 20 rather rather RB 9878 55 21 than than IN 9878 55 22 commercial commercial JJ 9878 55 23 tools tool NNS 9878 55 24 , , , 9878 55 25 was be VBD 9878 55 26 used use VBN 9878 55 27 . . . 9878 56 1 Currently currently RB 9878 56 2 , , , 9878 56 3 only only RB 9878 56 4 a a DT 9878 56 5 very very RB 9878 56 6 limited limited JJ 9878 56 7 number number NN 9878 56 8 of of IN 9878 56 9 open open JJ 9878 56 10 source source NN 9878 56 11 PDF PDF NNP 9878 56 12 SDKs sdk NNS 9878 56 13 are be VBP 9878 56 14 available available JJ 9878 56 15 , , , 9878 56 16 including include VBG 9878 56 17 iText iText NNP 9878 56 18 and and CC 9878 56 19 PDFBox PDFBox NNP 9878 56 20 . . . 9878 57 1 iText iText NNP 9878 57 2 was be VBD 9878 57 3 chosen choose VBN 9878 57 4 because because IN 9878 57 5 it -PRON- PRP 9878 57 6 has have VBZ 9878 57 7 g g NN 9878 57 8 ood ood JJ 9878 57 9 documentation documentation NN 9878 57 10 and and CC 9878 57 11 provides provide VBZ 9878 57 12 a a DT 9878 57 13 well well RB 9878 57 14 - - HYPH 9878 57 15 built build VBN 9878 57 16 set set NN 9878 57 17 of of IN 9878 57 18 APIs api NNS 9878 57 19 to to TO 9878 57 20 support support VB 9878 57 21 almost almost RB 9878 57 22 all all PDT 9878 57 23 the the DT 9878 57 24 PDF PDF NNP 9878 57 25 and and CC 9878 57 26 PDF PDF NNP 9878 57 27 / / SYM 9878 57 28 A A NNP 9878 57 29 features feature NNS 9878 57 30 . . . 9878 58 1 Initially initially RB 9878 58 2 written write VBN 9878 58 3 by by IN 9878 58 4 Bruno Bruno NNP 9878 58 5 Lowagie Lowagie NNP 9878 58 6 ( ( -LRB- 9878 58 7 who who WP 9878 58 8 was be VBD 9878 58 9 in in IN 9878 58 10 the the DT 9878 58 11 ISO iso NN 9878 58 12 PDF PDF NNP 9878 58 13 standard standard NN 9878 58 14 working work VBG 9878 58 15 group group NN 9878 58 16 ) ) -RRB- 9878 58 17 in in IN 9878 58 18 1998 1998 CD 9878 58 19 as as IN 9878 58 20 an an DT 9878 58 21 in in IN 9878 58 22 - - HYPH 9878 58 23 house house NN 9878 58 24 project project NN 9878 58 25 , , , 9878 58 26 Lowagie Lowagie NNP 9878 58 27 later later RB 9878 58 28 started start VBD 9878 58 29 up up RP 9878 58 30 his -PRON- PRP$ 9878 58 31 own own JJ 9878 58 32 company company NN 9878 58 33 , , , 9878 58 34 iText iText NNP 9878 58 35 , , , 9878 58 36 and and CC 9878 58 37 published publish VBD 9878 58 38 iText iText NNP 9878 58 39 in in IN 9878 58 40 Action Action NNP 9878 58 41 with with IN 9878 58 42 many many JJ 9878 58 43 code code NN 9878 58 44 examples.6 examples.6 NN 9878 58 45 Moreover moreover RB 9878 58 46 , , , 9878 58 47 iText iText NNP 9878 58 48 has have VBZ 9878 58 49 Java Java NNP 9878 58 50 and and CC 9878 58 51 C C NNP 9878 58 52 # # $ 9878 58 53 coding code VBG 9878 58 54 options option NNS 9878 58 55 with with IN 9878 58 56 good good JJ 9878 58 57 code code NN 9878 58 58 documentation documentation NN 9878 58 59 . . . 9878 59 1 It -PRON- PRP 9878 59 2 is be VBZ 9878 59 3 worth worth JJ 9878 59 4 mentioning mention VBG 9878 59 5 that that IN 9878 59 6 iText iText NNP 9878 59 7 has have VBZ 9878 59 8 different different JJ 9878 59 9 versions version NNS 9878 59 10 . . . 9878 60 1 The the DT 9878 60 2 author author NN 9878 60 3 used use VBD 9878 60 4 iText iText NNP 9878 60 5 5.5.10 5.5.10 CD 9878 60 6 and and CC 9878 60 7 5.4.4 5.4.4 CD 9878 60 8 . . . 9878 61 1 Using use VBG 9878 61 2 an an DT 9878 61 3 older old JJR 9878 61 4 version version NN 9878 61 5 in in IN 9878 61 6 our -PRON- PRP$ 9878 61 7 implementation implementation NN 9878 61 8 generated generate VBD 9878 61 9 a a DT 9878 61 10 non non JJ 9878 61 11 - - JJ 9878 61 12 compatible compatible JJ 9878 61 13 PDF PDF NNP 9878 61 14 / / , 9878 61 15 A a DT 9878 61 16 file file NN 9878 61 17 because because IN 9878 61 18 the the DT 9878 61 19 it -PRON- PRP 9878 61 20 was be VBD 9878 61 21 not not RB 9878 61 22 aligned align VBN 9878 61 23 with with IN 9878 61 24 the the DT 9878 61 25 PDF PDF NNP 9878 61 26 / / SYM 9878 61 27 A a NN 9878 61 28 standard.7 standard.7 CD 9878 61 29 For for IN 9878 61 30 image image NN 9878 61 31 processing processing NN 9878 61 32 , , , 9878 61 33 there there EX 9878 61 34 were be VBD 9878 61 35 a a DT 9878 61 36 few few JJ 9878 61 37 popular popular JJ 9878 61 38 open open JJ 9878 61 39 source source NN 9878 61 40 options option NNS 9878 61 41 , , , 9878 61 42 including include VBG 9878 61 43 ImageMagick ImageMagick NNP 9878 61 44 and and CC 9878 61 45 GIMP GIMP NNP 9878 61 46 . . . 9878 62 1 ImageMagick ImageMagick NNP 9878 62 2 was be VBD 9878 62 3 chosen choose VBN 9878 62 4 because because IN 9878 62 5 of of IN 9878 62 6 its -PRON- PRP$ 9878 62 7 popularity popularity NN 9878 62 8 , , , 9878 62 9 stability stability NN 9878 62 10 , , , 9878 62 11 and and CC 9878 62 12 cross cross JJ 9878 62 13 - - JJ 9878 62 14 platform platform JJ 9878 62 15 implementation implementation NN 9878 62 16 . . . 9878 63 1 Our -PRON- PRP$ 9878 63 2 implementation implementation NN 9878 63 3 identified identify VBD 9878 63 4 one one CD 9878 63 5 issue issue NN 9878 63 6 with with IN 9878 63 7 ImageMagick ImageMagick NNP 9878 63 8 : : : 9878 63 9 the the DT 9878 63 10 current current JJ 9878 63 11 version version NN 9878 63 12 ( ( -LRB- 9878 63 13 7.0.4 7.0.4 NNP 9878 63 14 ) ) -RRB- 9878 63 15 could could MD 9878 63 16 not not RB 9878 63 17 retrieve retrieve VB 9878 63 18 all all PDT 9878 63 19 the the DT 9878 63 20 metadata metadata NN 9878 63 21 from from IN 9878 63 22 TIFF TIFF NNP 9878 63 23 files file NNS 9878 63 24 as as IN 9878 63 25 it -PRON- PRP 9878 63 26 did do VBD 9878 63 27 not not RB 9878 63 28 extract extract VB 9878 63 29 certain certain JJ 9878 63 30 information information NN 9878 63 31 such such JJ 9878 63 32 as as IN 9878 63 33 the the DT 9878 63 34 Image Image NNP 9878 63 35 File File NNP 9878 63 36 Directory directory NN 9878 63 37 and and CC 9878 63 38 color color NN 9878 63 39 profile profile NN 9878 63 40 . . . 9878 64 1 These these DT 9878 64 2 metadata metadata NN 9878 64 3 are be VBP 9878 64 4 critical critical JJ 9878 64 5 because because IN 9878 64 6 they -PRON- PRP 9878 64 7 are be VBP 9878 64 8 part part NN 9878 64 9 of of IN 9878 64 10 the the DT 9878 64 11 original original JJ 9878 64 12 data datum NNS 9878 64 13 from from IN 9878 64 14 digitization digitization NN 9878 64 15 . . . 9878 65 1 Unfortunately unfortunately RB 9878 65 2 , , , 9878 65 3 the the DT 9878 65 4 author author NN 9878 65 5 observed observe VBD 9878 65 6 that that IN 9878 65 7 some some DT 9878 65 8 image image NN 9878 65 9 editors editor NNS 9878 65 10 were be VBD 9878 65 11 unable unable JJ 9878 65 12 to to TO 9878 65 13 preserve preserve VB 9878 65 14 all all PDT 9878 65 15 the the DT 9878 65 16 metadata metadata NN 9878 65 17 from from IN 9878 65 18 the the DT 9878 65 19 image image NN 9878 65 20 files file NNS 9878 65 21 during during IN 9878 65 22 the the DT 9878 65 23 conversion conversion NN 9878 65 24 process process NN 9878 65 25 . . . 9878 66 1 Hart Hart NNP 9878 66 2 and and CC 9878 66 3 De De NNP 9878 66 4 Varies Varies NNP 9878 66 5 used use VBD 9878 66 6 case case NN 9878 66 7 studies study NNS 9878 66 8 to to TO 9878 66 9 show show VB 9878 66 10 the the DT 9878 66 11 vulnerability vulnerability NN 9878 66 12 of of IN 9878 66 13 metadata metadata NN 9878 66 14 , , , 9878 66 15 demonstrating demonstrate VBG 9878 66 16 metadata metadata NN 9878 66 17 elements element NNS 9878 66 18 in in IN 9878 66 19 a a DT 9878 66 20 digital digital JJ 9878 66 21 object object NN 9878 66 22 can can MD 9878 66 23 be be VB 9878 66 24 lost lose VBN 9878 66 25 and and CC 9878 66 26 corrupted corrupt VBN 9878 66 27 by by IN 9878 66 28 use use NN 9878 66 29 or or CC 9878 66 30 conversion conversion NN 9878 66 31 of of IN 9878 66 32 a a DT 9878 66 33 file file NN 9878 66 34 to to IN 9878 66 35 another another DT 9878 66 36 format format NN 9878 66 37 . . . 9878 67 1 They -PRON- PRP 9878 67 2 suggested suggest VBD 9878 67 3 that that IN 9878 67 4 action action NN 9878 67 5 is be VBZ 9878 67 6 needed need VBN 9878 67 7 to to TO 9878 67 8 ensure ensure VB 9878 67 9 proper proper JJ 9878 67 10 metadata metadata NN 9878 67 11 creation creation NN 9878 67 12 and and CC 9878 67 13 preservation preservation NN 9878 67 14 so so IN 9878 67 15 that that IN 9878 67 16 all all DT 9878 67 17 types type NNS 9878 67 18 of of IN 9878 67 19 metadata metadata NN 9878 67 20 must must MD 9878 67 21 be be VB 9878 67 22 captured capture VBN 9878 67 23 and and CC 9878 67 24 preserved preserve VBN 9878 67 25 to to TO 9878 67 26 achieve achieve VB 9878 67 27 the the DT 9878 67 28 most most RBS 9878 67 29 authentic authentic JJ 9878 67 30 , , , 9878 67 31 consistent consistent JJ 9878 67 32 , , , 9878 67 33 and and CC 9878 67 34 complete complete JJ 9878 67 35 digital digital JJ 9878 67 36 preservation preservation NN 9878 67 37 for for IN 9878 67 38 future future JJ 9878 67 39 use.8 use.8 CD 9878 67 40 Metadata Metadata NNP 9878 67 41 Extraction Extraction NNP 9878 67 42 Tools Tools NNPS 9878 67 43 and and CC 9878 67 44 Color Color NNP 9878 67 45 Profiles Profiles NNPS 9878 67 46 As as IN 9878 67 47 we -PRON- PRP 9878 67 48 digitize digitize VBP 9878 67 49 physical physical JJ 9878 67 50 documents document NNS 9878 67 51 and and CC 9878 67 52 manipulate manipulate NN 9878 67 53 images image NNS 9878 67 54 , , , 9878 67 55 color color NN 9878 67 56 management management NN 9878 67 57 is be VBZ 9878 67 58 important important JJ 9878 67 59 . . . 9878 68 1 The the DT 9878 68 2 goal goal NN 9878 68 3 of of IN 9878 68 4 color color NN 9878 68 5 management management NN 9878 68 6 is be VBZ 9878 68 7 to to TO 9878 68 8 obtain obtain VB 9878 68 9 a a DT 9878 68 10 controlled control VBN 9878 68 11 conversion conversion NN 9878 68 12 between between IN 9878 68 13 the the DT 9878 68 14 color color NN 9878 68 15 representations representation NNS 9878 68 16 of of IN 9878 68 17 various various JJ 9878 68 18 devices device NNS 9878 68 19 such such JJ 9878 68 20 as as IN 9878 68 21 image image NN 9878 68 22 scanners scanner NNS 9878 68 23 , , , 9878 68 24 digital digital JJ 9878 68 25 cameras camera NNS 9878 68 26 , , , 9878 68 27 and and CC 9878 68 28 monitors monitor NNS 9878 68 29 . . . 9878 69 1 A a DT 9878 69 2 color color NN 9878 69 3 profile profile NN 9878 69 4 is be VBZ 9878 69 5 a a DT 9878 69 6 set set NN 9878 69 7 of of IN 9878 69 8 data datum NNS 9878 69 9 that that WDT 9878 69 10 control control VBP 9878 69 11 input input NN 9878 69 12 and and CC 9878 69 13 output output NN 9878 69 14 of of IN 9878 69 15 a a DT 9878 69 16 color color NN 9878 69 17 space space NN 9878 69 18 . . . 9878 70 1 The the DT 9878 70 2 International International NNP 9878 70 3 Color Color NNP 9878 70 4 Consortium Consortium NNP 9878 70 5 ( ( -LRB- 9878 70 6 ICC ICC NNP 9878 70 7 ) ) -RRB- 9878 70 8 standards standard NNS 9878 70 9 and and CC 9878 70 10 profiles profile NNS 9878 70 11 were be VBD 9878 70 12 created create VBN 9878 70 13 to to TO 9878 70 14 bring bring VB 9878 70 15 various various JJ 9878 70 16 manufacturers manufacturer NNS 9878 70 17 together together RB 9878 70 18 because because IN 9878 70 19 embedding embed VBG 9878 70 20 color color NN 9878 70 21 profiles profile NNS 9878 70 22 into into IN 9878 70 23 images image NNS 9878 70 24 is be VBZ 9878 70 25 one one CD 9878 70 26 of of IN 9878 70 27 the the DT 9878 70 28 most most RBS 9878 70 29 important important JJ 9878 70 30 color color NN 9878 70 31 management management NN 9878 70 32 solutions solution NNS 9878 70 33 . . . 9878 71 1 Image image NN 9878 71 2 formats format NNS 9878 71 3 such such JJ 9878 71 4 as as IN 9878 71 5 TIFF TIFF NNP 9878 71 6 and and CC 9878 71 7 JPEG2000 JPEG2000 NNP 9878 71 8 and and CC 9878 71 9 document document NN 9878 71 10 formats format NNS 9878 71 11 such such JJ 9878 71 12 as as IN 9878 71 13 PDF PDF NNP 9878 71 14 may may MD 9878 71 15 contain contain VB 9878 71 16 embedded embed VBN 9878 71 17 color color NN 9878 71 18 profiles profile NNS 9878 71 19 . . . 9878 72 1 The the DT 9878 72 2 authors author NNS 9878 72 3 identified identify VBD 9878 72 4 a a DT 9878 72 5 few few JJ 9878 72 6 open open JJ 9878 72 7 source source NN 9878 72 8 tools tool NNS 9878 72 9 to to TO 9878 72 10 extract extract VB 9878 72 11 TIFF TIFF NNP 9878 72 12 metadata metadata NN 9878 72 13 , , , 9878 72 14 includin includin NNP 9878 72 15 g g NNP 9878 72 16 ExifTool ExifTool NNP 9878 72 17 , , , 9878 72 18 Exiv2 exiv2 NN 9878 72 19 , , , 9878 72 20 and and CC 9878 72 21 tiffInfo tiffinfo NN 9878 72 22 . . . 9878 73 1 ExifTool ExifTool NNP 9878 73 2 is be VBZ 9878 73 3 an an DT 9878 73 4 open open JJ 9878 73 5 source source NN 9878 73 6 tool tool NN 9878 73 7 for for IN 9878 73 8 reading reading NN 9878 73 9 , , , 9878 73 10 writing writing NN 9878 73 11 , , , 9878 73 12 and and CC 9878 73 13 manipulating manipulate VBG 9878 73 14 metadata metadata NN 9878 73 15 of of IN 9878 73 16 media medium NNS 9878 73 17 files file NNS 9878 73 18 . . . 9878 74 1 Exiv2 Exiv2 NNP 9878 74 2 is be VBZ 9878 74 3 another another DT 9878 74 4 free free JJ 9878 74 5 metadata metadata NN 9878 74 6 tool tool NN 9878 74 7 supporting support VBG 9878 74 8 different different JJ 9878 74 9 image image NN 9878 74 10 formats format NNS 9878 74 11 . . . 9878 75 1 The the DT 9878 75 2 tiffInfo tiffinfo NN 9878 75 3 program program NN 9878 75 4 is be VBZ 9878 75 5 widely widely RB 9878 75 6 used use VBN 9878 75 7 in in IN 9878 75 8 the the DT 9878 75 9 Linux Linux NNP 9878 75 10 platform platform NN 9878 75 11 , , , 9878 75 12 but but CC 9878 75 13 it -PRON- PRP 9878 75 14 has have VBZ 9878 75 15 not not RB 9878 75 16 been be VBN 9878 75 17 updated update VBN 9878 75 18 for for IN 9878 75 19 at at RB 9878 75 20 least least JJS 9878 75 21 ten ten CD 9878 75 22 years year NNS 9878 75 23 . . . 9878 76 1 Our -PRON- PRP$ 9878 76 2 implementations implementation NNS 9878 76 3 showed show VBD 9878 76 4 that that IN 9878 76 5 ExifTool ExifTool NNP 9878 76 6 was be VBD 9878 76 7 the the DT 9878 76 8 one one NN 9878 76 9 that that WDT 9878 76 10 most most RBS 9878 76 11 easily easily RB 9878 76 12 extracted extract VBD 9878 76 13 the the DT 9878 76 14 full full JJ 9878 76 15 ICC ICC NNP 9878 76 16 profiles profile NNS 9878 76 17 and and CC 9878 76 18 other other JJ 9878 76 19 metadata metadata NN 9878 76 20 from from IN 9878 76 21 TIFF TIFF NNP 9878 76 22 and and CC 9878 76 23 JPEG2000 JPEG2000 NNP 9878 76 24 files file NNS 9878 76 25 . . . 9878 77 1 ImageMagick ImageMagick NNP 9878 77 2 and and CC 9878 77 3 other other JJ 9878 77 4 image image NN 9878 77 5 processing processing NN 9878 77 6 software software NN 9878 77 7 were be VBD 9878 77 8 examined examine VBN 9878 77 9 in in IN 9878 77 10 Van Van NNP 9878 77 11 der der NN 9878 77 12 Knijff Knijff NNP 9878 77 13 ’s ’s POS 9878 77 14 article article NN 9878 77 15 discussing discuss VBG 9878 77 16 JPEG2000 JPEG2000 NNP 9878 77 17 for for IN 9878 77 18 long long JJ 9878 77 19 - - HYPH 9878 77 20 term term NN 9878 77 21 preservation.9 preservation.9 NN 9878 77 22 He -PRON- PRP 9878 77 23 found find VBD 9878 77 24 that that IN 9878 77 25 ICC ICC NNP 9878 77 26 profiles profile NNS 9878 77 27 were be VBD 9878 77 28 lost lose VBN 9878 77 29 in in IN 9878 77 30 ImageMagick ImageMagick NNP 9878 77 31 . . . 9878 78 1 Our -PRON- PRP$ 9878 78 2 implementation implementation NN 9878 78 3 has have VBZ 9878 78 4 INFORMATION INFORMATION VBN 9878 78 5 TECHNOLOGY technology NN 9878 78 6 AND and CC 9878 78 7 LIBRARIES library NNS 9878 78 8 | | NNP 9878 78 9 MARCH MARCH NNP 9878 78 10 2018 2018 CD 9878 78 11 56 56 CD 9878 78 12 showed show VBD 9878 78 13 the the DT 9878 78 14 current current JJ 9878 78 15 version version NN 9878 78 16 of of IN 9878 78 17 ImageMagick ImageMagick NNP 9878 78 18 has have VBZ 9878 78 19 fixed fix VBN 9878 78 20 this this DT 9878 78 21 issue issue NN 9878 78 22 . . . 9878 79 1 A a DT 9878 79 2 metadata metadata NN 9878 79 3 sample sample NN 9878 79 4 can can MD 9878 79 5 be be VB 9878 79 6 found find VBN 9878 79 7 in in IN 9878 79 8 appendix appendix NNP 9878 79 9 A. A. NNP 9878 80 1 IMPLEMENTATION implementation NN 9878 80 2 Converting convert VBG 9878 80 3 and and CC 9878 80 4 Ordering ordering NN 9878 80 5 TIFFs TIFFs NNPS 9878 80 6 into into IN 9878 80 7 a a DT 9878 80 8 Single single JJ 9878 80 9 PDF PDF NNP 9878 80 10 / / SYM 9878 80 11 A-2 A-2 NNP 9878 80 12 File File NNP 9878 80 13 When when WRB 9878 80 14 ordering order VBG 9878 80 15 and and CC 9878 80 16 combining combine VBG 9878 80 17 all all DT 9878 80 18 individual individual JJ 9878 80 19 TIFFs TIFFs NNPS 9878 80 20 of of IN 9878 80 21 a a DT 9878 80 22 document document NN 9878 80 23 into into IN 9878 80 24 a a DT 9878 80 25 single single JJ 9878 80 26 PDF PDF NNP 9878 80 27 / / SYM 9878 80 28 A-2b A-2b NNP 9878 80 29 file file NN 9878 80 30 , , , 9878 80 31 the the DT 9878 80 32 authors author NNS 9878 80 33 intended intend VBN 9878 80 34 to to TO 9878 80 35 preserve preserve VB 9878 80 36 all all DT 9878 80 37 information information NN 9878 80 38 from from IN 9878 80 39 the the DT 9878 80 40 TIFFs TIFFs NNPS 9878 80 41 , , , 9878 80 42 including include VBG 9878 80 43 raster raster NN 9878 80 44 image image NN 9878 80 45 data datum NNS 9878 80 46 streams stream NNS 9878 80 47 and and CC 9878 80 48 metadata metadata NN 9878 80 49 stored store VBN 9878 80 50 in in IN 9878 80 51 each each DT 9878 80 52 TIFF TIFF NNP 9878 80 53 ’s ’s POS 9878 80 54 header header NN 9878 80 55 . . . 9878 81 1 The the DT 9878 81 2 raster raster NN 9878 81 3 image image NN 9878 81 4 data datum NNS 9878 81 5 streams stream NNS 9878 81 6 are be VBP 9878 81 7 the the DT 9878 81 8 main main JJ 9878 81 9 images image NNS 9878 81 10 reflecting reflect VBG 9878 81 11 the the DT 9878 81 12 original original JJ 9878 81 13 look look NN 9878 81 14 and and CC 9878 81 15 feel feel VB 9878 81 16 of of IN 9878 81 17 these these DT 9878 81 18 pages page NNS 9878 81 19 , , , 9878 81 20 while while IN 9878 81 21 the the DT 9878 81 22 metadata metadata NN 9878 81 23 ( ( -LRB- 9878 81 24 including include VBG 9878 81 25 technical technical JJ 9878 81 26 and and CC 9878 81 27 administrative administrative JJ 9878 81 28 metadata metadata NN 9878 81 29 such such JJ 9878 81 30 as as IN 9878 81 31 BitsPerSample bitspersample RB 9878 81 32 , , , 9878 81 33 DateTime DateTime NNP 9878 81 34 , , , 9878 81 35 and and CC 9878 81 36 Make make VB 9878 81 37 / / SYM 9878 81 38 Model Model NNP 9878 81 39 / / SYM 9878 81 40 Software Software NNP 9878 81 41 ) ) -RRB- 9878 81 42 tells tell VBZ 9878 81 43 us -PRON- PRP 9878 81 44 important important JJ 9878 81 45 digitization digitization NN 9878 81 46 and and CC 9878 81 47 provenance provenance NN 9878 81 48 information information NN 9878 81 49 . . . 9878 82 1 Both both DT 9878 82 2 are be VBP 9878 82 3 critical critical JJ 9878 82 4 for for IN 9878 82 5 delivery delivery NN 9878 82 6 and and CC 9878 82 7 digital digital JJ 9878 82 8 preservation preservation NN 9878 82 9 . . . 9878 83 1 The the DT 9878 83 2 TIFF TIFF NNP 9878 83 3 images image NNS 9878 83 4 were be VBD 9878 83 5 first first RB 9878 83 6 converted convert VBN 9878 83 7 to to IN 9878 83 8 JPEG2000 JPEG2000 NNP 9878 83 9 with with IN 9878 83 10 lossless lossless JJ 9878 83 11 compression compression NN 9878 83 12 using use VBG 9878 83 13 the the DT 9878 83 14 open open JJ 9878 83 15 source source NN 9878 83 16 ImageMagick ImageMagick NNP 9878 83 17 software software NN 9878 83 18 . . . 9878 84 1 Our -PRON- PRP$ 9878 84 2 tests test NNS 9878 84 3 of of IN 9878 84 4 ImageMagick ImageMagick NNP 9878 84 5 demonstrated demonstrate VBD 9878 84 6 that that IN 9878 84 7 it -PRON- PRP 9878 84 8 can can MD 9878 84 9 handle handle VB 9878 84 10 different different JJ 9878 84 11 color color NN 9878 84 12 profiles profile NNS 9878 84 13 and and CC 9878 84 14 will will MD 9878 84 15 convert convert VB 9878 84 16 images image NNS 9878 84 17 correctly correctly RB 9878 84 18 if if IN 9878 84 19 the the DT 9878 84 20 original original JJ 9878 84 21 TIFF TIFF NNP 9878 84 22 comes come VBZ 9878 84 23 with with IN 9878 84 24 a a DT 9878 84 25 color color NN 9878 84 26 profile profile NN 9878 84 27 . . . 9878 85 1 This this DT 9878 85 2 gave give VBD 9878 85 3 us -PRON- PRP 9878 85 4 confidence confidence NN 9878 85 5 that that IN 9878 85 6 past past JJ 9878 85 7 concerns concern NNS 9878 85 8 about about IN 9878 85 9 JPEG2000 JPEG2000 NNP 9878 85 10 and and CC 9878 85 11 ImageMagick ImageMagick NNP 9878 85 12 had have VBD 9878 85 13 been be VBN 9878 85 14 resolved resolve VBN 9878 85 15 . . . 9878 86 1 These these DT 9878 86 2 images image NNS 9878 86 3 were be VBD 9878 86 4 then then RB 9878 86 5 properly properly RB 9878 86 6 sorted sort VBN 9878 86 7 into into IN 9878 86 8 their -PRON- PRP$ 9878 86 9 original original JJ 9878 86 10 order order NN 9878 86 11 and and CC 9878 86 12 combined combine VBN 9878 86 13 into into IN 9878 86 14 a a DT 9878 86 15 single single JJ 9878 86 16 PDF PDF NNP 9878 86 17 / / SYM 9878 86 18 A-2 A-2 NNP 9878 86 19 file file NN 9878 86 20 . . . 9878 87 1 An an DT 9878 87 2 alternative alternative NN 9878 87 3 is be VBZ 9878 87 4 to to TO 9878 87 5 directly directly RB 9878 87 6 code code VB 9878 87 7 TIFF TIFF NNP 9878 87 8 ’s ’s POS 9878 87 9 image image NN 9878 87 10 data datum NNS 9878 87 11 stream stream NN 9878 87 12 into into IN 9878 87 13 a a DT 9878 87 14 PDF pdf NN 9878 87 15 / / , 9878 87 16 A a DT 9878 87 17 file file NN 9878 87 18 , , , 9878 87 19 but but CC 9878 87 20 this this DT 9878 87 21 approach approach NN 9878 87 22 would would MD 9878 87 23 miss miss VB 9878 87 24 one one CD 9878 87 25 benefit benefit NN 9878 87 26 of of IN 9878 87 27 PDF PDF NNP 9878 87 28 / / SYM 9878 87 29 A-2 A-2 NNP 9878 87 30 : : : 9878 87 31 tremendous tremendous JJ 9878 87 32 file file NN 9878 87 33 size size NN 9878 87 34 reduction reduction NN 9878 87 35 with with IN 9878 87 36 JPEG2000 JPEG2000 NNP 9878 87 37 . . . 9878 88 1 The the DT 9878 88 2 following follow VBG 9878 88 3 is be VBZ 9878 88 4 the the DT 9878 88 5 pseudocode pseudocode NN 9878 88 6 of of IN 9878 88 7 ordering ordering NN 9878 88 8 and and CC 9878 88 9 combining combine VBG 9878 88 10 all all PDT 9878 88 11 the the DT 9878 88 12 TIFFs TIFFs NNPS 9878 88 13 in in IN 9878 88 14 a a DT 9878 88 15 text text NN 9878 88 16 document document NN 9878 88 17 into into IN 9878 88 18 a a DT 9878 88 19 single single JJ 9878 88 20 PDF pdf NN 9878 88 21 / / SYM 9878 88 22 A- a- NN 9878 88 23 2 2 CD 9878 88 24 file file NN 9878 88 25 . . . 9878 89 1 CreatePDFA2(queue CreatePDFA2(queue NNP 9878 89 2 TiffList TiffList NNP 9878 89 3 ) ) -RRB- 9878 89 4 { { -LRB- 9878 89 5 Create create VB 9878 89 6 an an DT 9878 89 7 empty empty JJ 9878 89 8 queue queue NN 9878 89 9 XMLQ XMLQ NNP 9878 89 10 ; ; : 9878 89 11 Create create VB 9878 89 12 an an DT 9878 89 13 empty empty JJ 9878 89 14 queue queue NN 9878 89 15 JP2Q jp2q NN 9878 89 16 ; ; : 9878 89 17 / / NFP 9878 89 18 * * NFP 9878 89 19 TiffFileList tifffilelist NN 9878 89 20 is be VBZ 9878 89 21 pre pre JJ 9878 89 22 - - JJ 9878 89 23 sorted sorted JJ 9878 89 24 queue queue NN 9878 89 25 based base VBN 9878 89 26 on on IN 9878 89 27 the the DT 9878 89 28 original original JJ 9878 89 29 order order NN 9878 89 30 * * NFP 9878 89 31 / / SYM 9878 89 32 / / SYM 9878 89 33 * * NFP 9878 89 34 Convert convert VB 9878 89 35 each each DT 9878 89 36 TIFF TIFF NNP 9878 89 37 to to IN 9878 89 38 JPEG2000 JPEG2000 NNP 9878 89 39 losslessly losslessly RB 9878 89 40 , , , 9878 89 41 then then RB 9878 89 42 add add VB 9878 89 43 each each DT 9878 89 44 JPEG2000 JPEG2000 NNP 9878 89 45 and and CC 9878 89 46 its -PRON- PRP$ 9878 89 47 metadata metadata NN 9878 89 48 into into IN 9878 89 49 a a DT 9878 89 50 queue queue NN 9878 89 51 * * NFP 9878 89 52 / / SYM 9878 89 53 while while IN 9878 89 54 ( ( -LRB- 9878 89 55 TiffList TiffList NNP 9878 89 56 is be VBZ 9878 89 57 NOT not RB 9878 89 58 empty empty JJ 9878 89 59 ) ) -RRB- 9878 89 60 { { -LRB- 9878 89 61 String String NNP 9878 89 62 TiffFilePath TiffFilePath NNP 9878 89 63 = = NFP 9878 89 64 TiffList.dequeue TiffList.dequeue NNP 9878 89 65 ( ( -LRB- 9878 89 66 ) ) -RRB- 9878 89 67 ; ; : 9878 89 68 string stre VBG 9878 89 69 xmlFilePath xmlfilepath NN 9878 89 70 = = NFP 9878 89 71 Tiff Tiff NNP 9878 89 72 metadata metadata NN 9878 89 73 extracted extract VBD 9878 89 74 using use VBG 9878 89 75 exiftool exiftool NN 9878 89 76 ; ; : 9878 89 77 XMLQ.enqueue(xmlFilePath XMLQ.enqueue(xmlFilePath NNP 9878 89 78 ) ) -RRB- 9878 89 79 ; ; : 9878 89 80 String string NN 9878 89 81 jp2FilePath jp2filepath NN 9878 89 82 = = SYM 9878 89 83 JPEG2000 JPEG2000 NNP 9878 89 84 file file NN 9878 89 85 location location NN 9878 89 86 from from IN 9878 89 87 Tiff Tiff NNP 9878 89 88 converted convert VBN 9878 89 89 by by IN 9878 89 90 ImageMagick ImageMagick NNP 9878 89 91 ; ; : 9878 89 92 JP2Q.enqueue(jp2FilePath JP2Q.enqueue(jp2FilePath NNP 9878 89 93 ) ) -RRB- 9878 89 94 ; ; : 9878 89 95 } } -RRB- 9878 89 96 / / NFP 9878 89 97 * * NFP 9878 89 98 Convert convert VB 9878 89 99 each each DT 9878 89 100 image image NN 9878 89 101 ’s ’s POS 9878 89 102 metadata metadata NN 9878 89 103 to to IN 9878 89 104 XMP XMP NNP 9878 89 105 , , , 9878 89 106 add add VB 9878 89 107 each each DT 9878 89 108 JPEG2000 JPEG2000 NNP 9878 89 109 and and CC 9878 89 110 its -PRON- PRP$ 9878 89 111 metadata metadata NN 9878 89 112 into into IN 9878 89 113 the the DT 9878 89 114 PDF PDF NNP 9878 89 115 / / SYM 9878 89 116 A-2 A-2 NNP 9878 89 117 file file NN 9878 89 118 based base VBN 9878 89 119 on on IN 9878 89 120 its -PRON- PRP$ 9878 89 121 original original JJ 9878 89 122 order order NN 9878 89 123 * * NFP 9878 89 124 / / , 9878 89 125 Document document NN 9878 89 126 pdf2b pdf2b '' 9878 89 127 = = : 9878 89 128 new new JJ 9878 89 129 Document document NN 9878 89 130 ( ( -LRB- 9878 89 131 ) ) -RRB- 9878 89 132 ; ; : 9878 89 133 / / NFP 9878 89 134 * * NFP 9878 89 135 create create VBP 9878 89 136 PDF PDF NNP 9878 89 137 / / SYM 9878 89 138 A-2b A-2b NNP 9878 89 139 conformance conformance NN 9878 89 140 level level NN 9878 89 141 * * NFP 9878 89 142 / / SYM 9878 89 143 PdfAWriter pdfawriter NN 9878 89 144 writer writer NN 9878 89 145 = = SYM 9878 89 146 PdfAWriter.getInstance(doc PdfAWriter.getInstance(doc NNP 9878 89 147 , , , 9878 89 148 new new NNP 9878 89 149 FileOutputStream(PdfAFilePath),PdfAConformaceLevel FileOutputStream(PdfAFilePath),PdfAConformaceLevel . 9878 89 150 . . . 9878 89 151 PDF_A_2B PDF_A_2B NNP 9878 89 152 ) ) -RRB- 9878 89 153 ; ; : 9878 89 154 writer.createXmpMetadata writer.createXmpMetadata NNP 9878 89 155 ( ( -LRB- 9878 89 156 ) ) -RRB- 9878 89 157 ; ; : 9878 89 158 //Create //Create NFP 9878 89 159 Root Root NNP 9878 89 160 XMP XMP NNP 9878 89 161 DIGITIZATION DIGITIZATION NNS 9878 89 162 OF of IN 9878 89 163 TEXT text NN 9878 89 164 DOCUMENTS documents NN 9878 89 165 USING USING NNP 9878 89 166 PDF PDF NNP 9878 89 167 / / SYM 9878 89 168 A A NNP 9878 89 169 | | NNP 9878 89 170 HAN HAN NNP 9878 89 171 AND and CC 9878 89 172 WAN WAN NNP 9878 89 173 57 57 CD 9878 89 174 HTTPS://DOI.ORG/10.6017 HTTPS://DOI.ORG/10.6017 NNS 9878 89 175 / / SYM 9878 89 176 ITAL.V37I1.9878 ITAL.V37I1.9878 NNP 9878 89 177 pdf2b.open pdf2b.open NN 9878 89 178 ( ( -LRB- 9878 89 179 ) ) -RRB- 9878 89 180 ; ; : 9878 89 181 while(JP2Q while(JP2Q NNP 9878 89 182 is be VBZ 9878 89 183 NOT not RB 9878 89 184 empty empty JJ 9878 89 185 ) ) -RRB- 9878 89 186 { { -LRB- 9878 89 187 Image image NN 9878 89 188 jp2 jp2 NN 9878 89 189 = = -RRB- 9878 89 190 Image.getInstance(JP2Q.dequeue Image.getInstance(JP2Q.dequeue NNP 9878 89 191 ( ( -LRB- 9878 89 192 ) ) -RRB- 9878 89 193 ) ) -RRB- 9878 89 194 ; ; : 9878 89 195 Rectangle Rectangle NNP 9878 89 196 size size NN 9878 89 197 = = SYM 9878 89 198 new new JJ 9878 89 199 Rectangle(jp2.getWidth Rectangle(jp2.getWidth NNP 9878 89 200 ( ( -LRB- 9878 89 201 ) ) -RRB- 9878 89 202 , , , 9878 89 203 jp2.getHeight jp2.getHeight NNP 9878 89 204 ( ( -LRB- 9878 89 205 ) ) -RRB- 9878 89 206 ) ) -RRB- 9878 89 207 ; ; : 9878 89 208 //PDF //PDF . 9878 89 209 page page NN 9878 89 210 size size NN 9878 89 211 setting set VBG 9878 89 212 pdf2b.setPageSize(size pdf2b.setpagesize(size NN 9878 89 213 ) ) -RRB- 9878 89 214 ; ; : 9878 89 215 pdf2b.newPage pdf2b.newpage NN 9878 89 216 ( ( -LRB- 9878 89 217 ) ) -RRB- 9878 89 218 ; ; : 9878 89 219 // // SYM 9878 89 220 create create VBP 9878 89 221 a a DT 9878 89 222 new new JJ 9878 89 223 page page NN 9878 89 224 for for IN 9878 89 225 a a DT 9878 89 226 new new JJ 9878 89 227 image image NN 9878 89 228 byte byte NN 9878 89 229 [ [ -LRB- 9878 89 230 ] ] -RRB- 9878 89 231 bytearr bytearr NNP 9878 89 232 = = NFP 9878 89 233 XmpManipulation(XMLQ.dequeue XmpManipulation(XMLQ.dequeue NNP 9878 89 234 ( ( -LRB- 9878 89 235 ) ) -RRB- 9878 89 236 ) ) -RRB- 9878 89 237 ; ; : 9878 89 238 // // SYM 9878 89 239 convert convert VB 9878 89 240 original original JJ 9878 89 241 metadata metadata NN 9878 89 242 based base VBN 9878 89 243 on on IN 9878 89 244 the the DT 9878 89 245 XMP XMP NNP 9878 89 246 standard standard NN 9878 89 247 writer writer NN 9878 89 248 .setPageXmpMetadata(bytearr .setPageXmpMetadata(bytearr . 9878 89 249 ) ) -RRB- 9878 89 250 ; ; : 9878 89 251 pdf2b.add(jp2 pdf2b.add(jp2 NNP 9878 89 252 ) ) -RRB- 9878 89 253 ; ; : 9878 89 254 } } -RRB- 9878 89 255 pdf2b.close pdf2b.close NN 9878 89 256 ( ( -LRB- 9878 89 257 ) ) -RRB- 9878 89 258 ; ; : 9878 89 259 } } -RRB- 9878 89 260 Converting convert VBG 9878 89 261 PDF PDF NNP 9878 89 262 / / SYM 9878 89 263 A-2 A-2 NNP 9878 89 264 Files file VBZ 9878 89 265 back back RB 9878 89 266 to to IN 9878 89 267 TIFFs TIFFs NNPS 9878 89 268 and and CC 9878 89 269 JPEG2000s jpeg2000s $ 9878 89 270 To to TO 9878 89 271 ensure ensure VB 9878 89 272 that that IN 9878 89 273 we -PRON- PRP 9878 89 274 can can MD 9878 89 275 extract extract VB 9878 89 276 raster raster NN 9878 89 277 images image NNS 9878 89 278 from from IN 9878 89 279 the the DT 9878 89 280 newly newly RB 9878 89 281 created create VBN 9878 89 282 PDF PDF NNP 9878 89 283 / / SYM 9878 89 284 A-2 A-2 NNP 9878 89 285 file file NN 9878 89 286 , , , 9878 89 287 the the DT 9878 89 288 authors author NNS 9878 89 289 also also RB 9878 89 290 wrote write VBD 9878 89 291 code code NN 9878 89 292 to to TO 9878 89 293 convert convert VB 9878 89 294 a a DT 9878 89 295 PDF PDF NNP 9878 89 296 / / SYM 9878 89 297 A-2 A-2 NNP 9878 89 298 file file NN 9878 89 299 back back RB 9878 89 300 to to IN 9878 89 301 the the DT 9878 89 302 original original JJ 9878 89 303 TIFF TIFF NNP 9878 89 304 or or CC 9878 89 305 JPEG2000 JPEG2000 NNP 9878 89 306 format format NN 9878 89 307 . . . 9878 90 1 This this DT 9878 90 2 implementation implementation NN 9878 90 3 was be VBD 9878 90 4 a a DT 9878 90 5 reverse reverse JJ 9878 90 6 process process NN 9878 90 7 of of IN 9878 90 8 the the DT 9878 90 9 above above JJ 9878 90 10 operation operation NN 9878 90 11 . . . 9878 91 1 Once once IN 9878 91 2 the the DT 9878 91 3 reverse reverse JJ 9878 91 4 conversion conversion NN 9878 91 5 process process NN 9878 91 6 was be VBD 9878 91 7 completed complete VBN 9878 91 8 , , , 9878 91 9 the the DT 9878 91 10 authors author NNS 9878 91 11 verified verify VBD 9878 91 12 that that IN 9878 91 13 the the DT 9878 91 14 image image NN 9878 91 15 files file NNS 9878 91 16 created create VBN 9878 91 17 from from IN 9878 91 18 the the DT 9878 91 19 PDF PDF NNP 9878 91 20 / / SYM 9878 91 21 A-2 A-2 NNP 9878 91 22 file file NN 9878 91 23 were be VBD 9878 91 24 the the DT 9878 91 25 same same JJ 9878 91 26 as as IN 9878 91 27 before before IN 9878 91 28 the the DT 9878 91 29 conversion conversion NN 9878 91 30 to to IN 9878 91 31 PDF PDF NNP 9878 91 32 / / SYM 9878 91 33 A-2 A-2 NNP 9878 91 34 . . . 9878 92 1 Note note VB 9878 92 2 that that IN 9878 92 3 we -PRON- PRP 9878 92 4 generated generate VBD 9878 92 5 MD5 MD5 NNP 9878 92 6 checksums checksum NNS 9878 92 7 to to TO 9878 92 8 verify verify VB 9878 92 9 image image NN 9878 92 10 data datum NNS 9878 92 11 streams stream NNS 9878 92 12 . . . 9878 93 1 Images image NNS 9878 93 2 data datum NNS 9878 93 3 streams stream NNS 9878 93 4 are be VBP 9878 93 5 the the DT 9878 93 6 same same JJ 9878 93 7 , , , 9878 93 8 but but CC 9878 93 9 metadata metadata NN 9878 93 10 location location NN 9878 93 11 can can MD 9878 93 12 be be VB 9878 93 13 varied varied JJ 9878 93 14 because because IN 9878 93 15 of of IN 9878 93 16 inconsistent inconsistent JJ 9878 93 17 TIFF TIFF NNP 9878 93 18 tags tag NNS 9878 93 19 used use VBN 9878 93 20 over over IN 9878 93 21 the the DT 9878 93 22 years year NNS 9878 93 23 . . . 9878 94 1 When when WRB 9878 94 2 converting convert VBG 9878 94 3 one one CD 9878 94 4 TIFF TIFF NNP 9878 94 5 to to IN 9878 94 6 another another DT 9878 94 7 TIFF TIFF NNP 9878 94 8 , , , 9878 94 9 ImageMagick ImageMagick NNP 9878 94 10 has have VBZ 9878 94 11 its -PRON- PRP$ 9878 94 12 implementation implementation NN 9878 94 13 of of IN 9878 94 14 metadata metadata NN 9878 94 15 tags tag NNS 9878 94 16 . . . 9878 95 1 The the DT 9878 95 2 code code NN 9878 95 3 can can MD 9878 95 4 be be VB 9878 95 5 found find VBN 9878 95 6 in in IN 9878 95 7 appendix appendix NNP 9878 95 8 B. B. NNP 9878 95 9 PDF PDF NNP 9878 95 10 / / , 9878 95 11 A A NNP 9878 95 12 Validation validation NN 9878 95 13 PDF pdf NN 9878 95 14 / / SYM 9878 95 15 A A NNP 9878 95 16 is be VBZ 9878 95 17 one one CD 9878 95 18 of of IN 9878 95 19 the the DT 9878 95 20 most most RBS 9878 95 21 recognized recognize VBN 9878 95 22 digital digital JJ 9878 95 23 preservation preservation NN 9878 95 24 formats format NNS 9878 95 25 , , , 9878 95 26 specially specially RB 9878 95 27 designed design VBN 9878 95 28 for for IN 9878 95 29 long long JJ 9878 95 30 -term -term HYPH 9878 95 31 preservation preservation NN 9878 95 32 and and CC 9878 95 33 access access NN 9878 95 34 . . . 9878 96 1 However however RB 9878 96 2 , , , 9878 96 3 no no DT 9878 96 4 commonly commonly RB 9878 96 5 accepted accept VBN 9878 96 6 PDF PDF NNP 9878 96 7 / / , 9878 96 8 A a DT 9878 96 9 validator validator NN 9878 96 10 was be VBD 9878 96 11 available available JJ 9878 96 12 in in IN 9878 96 13 the the DT 9878 96 14 past past NN 9878 96 15 , , , 9878 96 16 although although IN 9878 96 17 several several JJ 9878 96 18 commercial commercial JJ 9878 96 19 and and CC 9878 96 20 open open JJ 9878 96 21 source source NN 9878 96 22 PDF PDF NNP 9878 96 23 preflight preflight NN 9878 96 24 and and CC 9878 96 25 validation validation NN 9878 96 26 engines engine NNS 9878 96 27 ( ( -LRB- 9878 96 28 e.g. e.g. RB 9878 96 29 , , , 9878 96 30 Acrobat Acrobat NNP 9878 96 31 ) ) -RRB- 9878 96 32 were be VBD 9878 96 33 available available JJ 9878 96 34 . . . 9878 97 1 Validating validate VBG 9878 97 2 a a DT 9878 97 3 PDF pdf NN 9878 97 4 / / SYM 9878 97 5 A a NN 9878 97 6 against against IN 9878 97 7 the the DT 9878 97 8 PDF PDF NNP 9878 97 9 / / SYM 9878 97 10 A a DT 9878 97 11 standards standard NNS 9878 97 12 is be VBZ 9878 97 13 a a DT 9878 97 14 challenging challenging JJ 9878 97 15 task task NN 9878 97 16 for for IN 9878 97 17 a a DT 9878 97 18 few few JJ 9878 97 19 reasons reason NNS 9878 97 20 , , , 9878 97 21 including include VBG 9878 97 22 the the DT 9878 97 23 complexity complexity NN 9878 97 24 of of IN 9878 97 25 the the DT 9878 97 26 PDF PDF NNP 9878 97 27 and and CC 9878 97 28 PDF PDF NNP 9878 97 29 / / SYM 9878 97 30 A A NNP 9878 97 31 formats format NNS 9878 97 32 . . . 9878 98 1 The the DT 9878 98 2 PDF PDF NNP 9878 98 3 Association Association NNP 9878 98 4 and and CC 9878 98 5 the the DT 9878 98 6 Open Open NNP 9878 98 7 Preservation Preservation NNP 9878 98 8 Foundation Foundation NNP 9878 98 9 recognized recognize VBD 9878 98 10 the the DT 9878 98 11 need need NN 9878 98 12 and and CC 9878 98 13 started start VBD 9878 98 14 a a DT 9878 98 15 project project NN 9878 98 16 to to TO 9878 98 17 develop develop VB 9878 98 18 an an DT 9878 98 19 open open JJ 9878 98 20 source source NN 9878 98 21 PDF PDF NNP 9878 98 22 / / , 9878 98 23 A a DT 9878 98 24 validator validator NN 9878 98 25 and and CC 9878 98 26 build build VB 9878 98 27 a a DT 9878 98 28 maintenance maintenance NN 9878 98 29 community community NN 9878 98 30 . . . 9878 99 1 Their -PRON- PRP$ 9878 99 2 result result NN 9878 99 3 , , , 9878 99 4 VeraPDF verapdf FW 9878 99 5 , , , 9878 99 6 is be VBZ 9878 99 7 an an DT 9878 99 8 open open JJ 9878 99 9 source source NN 9878 99 10 validator validator NN 9878 99 11 designed design VBN 9878 99 12 for for IN 9878 99 13 all all DT 9878 99 14 PDF PDF NNP 9878 99 15 / / SYM 9878 99 16 A a DT 9878 99 17 parts part NNS 9878 99 18 and and CC 9878 99 19 conformance conformance NN 9878 99 20 levels level NNS 9878 99 21 . . . 9878 100 1 Released release VBN 9878 100 2 in in IN 9878 100 3 January January NNP 9878 100 4 2017 2017 CD 9878 100 5 , , , 9878 100 6 the the DT 9878 100 7 goal goal NN 9878 100 8 of of IN 9878 100 9 veraPDF veraPDF NNP 9878 100 10 is be VBZ 9878 100 11 to to TO 9878 100 12 become become VB 9878 100 13 the the DT 9878 100 14 commonly commonly RB 9878 100 15 accepted accept VBN 9878 100 16 PDF PDF NNP 9878 100 17 / / , 9878 100 18 A A NNP 9878 100 19 validator validator NN 9878 100 20 . . . 9878 101 1 10 10 CD 9878 101 2 Our -PRON- PRP$ 9878 101 3 generated generate VBN 9878 101 4 PDF PDF NNP 9878 101 5 / / SYM 9878 101 6 As as IN 9878 101 7 have have VBP 9878 101 8 been be VBN 9878 101 9 validated validate VBN 9878 101 10 with with IN 9878 101 11 veraPDF veraPDF NNP 9878 101 12 1.4 1.4 CD 9878 101 13 and and CC 9878 101 14 Adobe Adobe NNP 9878 101 15 Acrobat Acrobat NNP 9878 101 16 Pro Pro NNP 9878 101 17 DC DC NNP 9878 101 18 Preflight Preflight NNP 9878 101 19 . . . 9878 102 1 Both both DT 9878 102 2 products product NNS 9878 102 3 validated validate VBD 9878 102 4 the the DT 9878 102 5 PDF PDF NNP 9878 102 6 / / SYM 9878 102 7 A-2b A-2b NNP 9878 102 8 files file NNS 9878 102 9 as as IN 9878 102 10 fully fully RB 9878 102 11 compatible compatible JJ 9878 102 12 . . . 9878 103 1 Our -PRON- PRP$ 9878 103 2 implementations implementation NNS 9878 103 3 showed show VBD 9878 103 4 that that IN 9878 103 5 veraPDF veraPDF NNP 9878 103 6 1.4 1.4 CD 9878 103 7 verified verify VBD 9878 103 8 more more JJR 9878 103 9 cases case NNS 9878 103 10 than than IN 9878 103 11 Acrobat Acrobat NNP 9878 103 12 DC DC NNP 9878 103 13 Preflight Preflight NNP 9878 103 14 . . . 9878 104 1 Figure figure NN 9878 104 2 1 1 CD 9878 104 3 shows show VBZ 9878 104 4 a a DT 9878 104 5 PDF PDF NNP 9878 104 6 file file NN 9878 104 7 structure structure NN 9878 104 8 and and CC 9878 104 9 its -PRON- PRP$ 9878 104 10 metadata metadata NN 9878 104 11 . . . 9878 105 1 INFORMATION INFORMATION NNP 9878 105 2 TECHNOLOGY technology NN 9878 105 3 AND and CC 9878 105 4 LIBRARIES library NNS 9878 105 5 | | NNP 9878 105 6 MARCH MARCH NNP 9878 105 7 2018 2018 CD 9878 105 8 58 58 CD 9878 105 9 Figure figure NN 9878 105 10 1 1 CD 9878 105 11 . . . 9878 106 1 A a DT 9878 106 2 PDF PDF NNP 9878 106 3 object object NN 9878 106 4 tree tree NN 9878 106 5 with with IN 9878 106 6 root root NN 9878 106 7 - - HYPH 9878 106 8 level level NN 9878 106 9 metadata metadata NN 9878 106 10 . . . 9878 107 1 RUNTIME RUNTIME NNP 9878 107 2 AND and CC 9878 107 3 CONCLUSION conclusion VB 9878 107 4 The the DT 9878 107 5 time time NN 9878 107 6 complexity complexity NN 9878 107 7 of of IN 9878 107 8 our -PRON- PRP$ 9878 107 9 code code NN 9878 107 10 is be VBZ 9878 107 11 O(log O(log NNP 9878 107 12 n n NNP 9878 107 13 ) ) -RRB- 9878 107 14 because because IN 9878 107 15 of of IN 9878 107 16 the the DT 9878 107 17 sorting sort VBG 9878 107 18 algorithms algorithm NNS 9878 107 19 used use VBD 9878 107 20 . . . 9878 108 1 TIFFs tiff NNS 9878 108 2 were be VBD 9878 108 3 first first RB 9878 108 4 converted convert VBN 9878 108 5 to to IN 9878 108 6 JPEG2000 JPEG2000 NNP 9878 108 7 . . . 9878 109 1 When when WRB 9878 109 2 JPEG2000 JPEG2000 NNP 9878 109 3 images image NNS 9878 109 4 are be VBP 9878 109 5 added add VBN 9878 109 6 to to IN 9878 109 7 a a DT 9878 109 8 PDF PDF NNP 9878 109 9 / / SYM 9878 109 10 A-2 A-2 NNP 9878 109 11 file file NN 9878 109 12 , , , 9878 109 13 no no DT 9878 109 14 further further JJ 9878 109 15 image image NN 9878 109 16 manipulation manipulation NN 9878 109 17 is be VBZ 9878 109 18 required require VBN 9878 109 19 because because IN 9878 109 20 the the DT 9878 109 21 generated generate VBN 9878 109 22 PDF PDF NNP 9878 109 23 / / SYM 9878 109 24 A-2 A-2 NNP 9878 109 25 uses use VBZ 9878 109 26 JPEG2000 JPEG2000 NNP 9878 109 27 directly directly RB 9878 109 28 ( ( -LRB- 9878 109 29 in in IN 9878 109 30 other other JJ 9878 109 31 words word NNS 9878 109 32 , , , 9878 109 33 it -PRON- PRP 9878 109 34 uses use VBZ 9878 109 35 the the DT 9878 109 36 JPXDecode JPXDecode NNP 9878 109 37 filter filter NN 9878 109 38 ) ) -RRB- 9878 109 39 . . . 9878 110 1 Tables table NNS 9878 110 2 1 1 CD 9878 110 3 and and CC 9878 110 4 2 2 CD 9878 110 5 show show VB 9878 110 6 the the DT 9878 110 7 performance performance NN 9878 110 8 comparison comparison NN 9878 110 9 running run VBG 9878 110 10 in in IN 9878 110 11 our -PRON- PRP$ 9878 110 12 computer computer NN 9878 110 13 hardware hardware NN 9878 110 14 and and CC 9878 110 15 software software NN 9878 110 16 environment environment NN 9878 110 17 ( ( -LRB- 9878 110 18 Intel Intel NNP 9878 110 19 Core Core NNP 9878 110 20 i7 i7 NN 9878 110 21 - - HYPH 9878 110 22 2600 2600 CD 9878 110 23 CPU@3.4GHz CPU@3.4GHz NNP 9878 110 24 , , , 9878 110 25 8 8 CD 9878 110 26 GB GB NNP 9878 110 27 DDR3 DDR3 NNP 9878 110 28 RAM RAM NNP 9878 110 29 , , , 9878 110 30 3 3 CD 9878 110 31 TB TB NNP 9878 110 32 7200-RPM 7200-RPM VBG 9878 110 33 64MB 64mb CD 9878 110 34 - - HYPH 9878 110 35 cache cache JJ 9878 110 36 hard hard JJ 9878 110 37 disk disk NN 9878 110 38 running run VBG 9878 110 39 Ubuntu Ubuntu NNP 9878 110 40 16.10 16.10 CD 9878 110 41 ) ) -RRB- 9878 110 42 . . . 9878 111 1 DIGITIZATION DIGITIZATION NNS 9878 111 2 OF of IN 9878 111 3 TEXT text NN 9878 111 4 DOCUMENTS documents NN 9878 111 5 USING USING NNP 9878 111 6 PDF PDF NNP 9878 111 7 / / SYM 9878 111 8 A A NNP 9878 111 9 | | NNP 9878 111 10 HAN HAN NNP 9878 111 11 AND and CC 9878 111 12 WAN WAN NNP 9878 111 13 59 59 CD 9878 111 14 HTTPS://DOI.ORG/10.6017 HTTPS://DOI.ORG/10.6017 NNS 9878 111 15 / / SYM 9878 111 16 ITAL.V37I1.9878 ITAL.V37I1.9878 NNP 9878 111 17 Table table NN 9878 111 18 1 1 CD 9878 111 19 . . . 9878 112 1 Runtimes runtime NNS 9878 112 2 of of IN 9878 112 3 converting convert VBG 9878 112 4 grayscale grayscale JJ 9878 112 5 TIFFs TIFFs NNPS 9878 112 6 to to IN 9878 112 7 JPEG2000s JPEG2000s NNP 9878 112 8 and and CC 9878 112 9 to to IN 9878 112 10 PDF PDF NNP 9878 112 11 / / SYM 9878 112 12 A-2b A-2b NNP 9878 112 13 No no NN 9878 112 14 . . . 9878 113 1 of of IN 9878 113 2 Files Files NNP 9878 113 3 Total Total NNP 9878 113 4 File file NN 9878 113 5 Size size NN 9878 113 6 ( ( -LRB- 9878 113 7 MB MB NNP 9878 113 8 ) ) -RRB- 9878 113 9 Image Image NNP 9878 113 10 Conversion Conversion NNP 9878 113 11 Runtime Runtime NNP 9878 113 12 ( ( -LRB- 9878 113 13 TIFFs TIFFs NNP 9878 113 14 to to IN 9878 113 15 JP2s JP2s NNP 9878 113 16 in in IN 9878 113 17 seconds second NNS 9878 113 18 ) ) -RRB- 9878 113 19 Total Total NNP 9878 113 20 Runtime Runtime NNP 9878 113 21 ( ( -LRB- 9878 113 22 TIFFs tiff VBN 9878 113 23 to to IN 9878 113 24 JP2s JP2s NNP 9878 113 25 to to IN 9878 113 26 a a DT 9878 113 27 single single JJ 9878 113 28 PDF PDF NNP 9878 113 29 / / SYM 9878 113 30 A-2b A-2b NNP 9878 113 31 in in IN 9878 113 32 seconds second NNS 9878 113 33 ) ) -RRB- 9878 113 34 1 1 CD 9878 113 35 9.1 9.1 CD 9878 113 36 3.61 3.61 CD 9878 113 37 3.98 3.98 CD 9878 113 38 10 10 CD 9878 113 39 91.1 91.1 CD 9878 113 40 35.63 35.63 CD 9878 113 41 36.71 36.71 CD 9878 113 42 20 20 CD 9878 113 43 182.2 182.2 CD 9878 113 44 71.83 71.83 CD 9878 113 45 73.98 73.98 CD 9878 113 46 50 50 CD 9878 113 47 455.5 455.5 CD 9878 113 48 179.06 179.06 CD 9878 113 49 184.63 184.63 CD 9878 113 50 100 100 CD 9878 113 51 910.9 910.9 CD 9878 113 52 358.3 358.3 CD 9878 113 53 370.91 370.91 CD 9878 113 54 Table Table NNP 9878 113 55 2 2 CD 9878 113 56 . . . 9878 114 1 Runtimes runtime NNS 9878 114 2 of of IN 9878 114 3 converting convert VBG 9878 114 4 color color NN 9878 114 5 TIFFs TIFFs NNPS 9878 114 6 to to IN 9878 114 7 JPEG2000s JPEG2000s NNP 9878 114 8 and and CC 9878 114 9 to to IN 9878 114 10 PDF PDF NNP 9878 114 11 / / SYM 9878 114 12 A-2b A-2b NNP 9878 114 13 No no NN 9878 114 14 . . . 9878 115 1 of of IN 9878 115 2 Files Files NNP 9878 115 3 Total Total NNP 9878 115 4 File file NN 9878 115 5 Size size NN 9878 115 6 ( ( -LRB- 9878 115 7 MB MB NNP 9878 115 8 ) ) -RRB- 9878 115 9 Image Image NNP 9878 115 10 Conversion Conversion NNP 9878 115 11 Runtime Runtime NNP 9878 115 12 ( ( -LRB- 9878 115 13 TIFFs TIFFs NNP 9878 115 14 to to IN 9878 115 15 JP2s JP2s NNP 9878 115 16 in in IN 9878 115 17 seconds second NNS 9878 115 18 ) ) -RRB- 9878 115 19 Total Total NNP 9878 115 20 Runtime Runtime NNP 9878 115 21 ( ( -LRB- 9878 115 22 TIFFs tiff VBN 9878 115 23 to to IN 9878 115 24 JP2s JP2s NNP 9878 115 25 to to IN 9878 115 26 a a DT 9878 115 27 single single JJ 9878 115 28 PDF PDF NNP 9878 115 29 / / SYM 9878 115 30 A-2b A-2b NNP 9878 115 31 in in IN 9878 115 32 seconds second NNS 9878 115 33 ) ) -RRB- 9878 115 34 1 1 CD 9878 115 35 27.3 27.3 CD 9878 115 36 14.80 14.80 CD 9878 115 37 14.94 14.94 CD 9878 115 38 10 10 CD 9878 115 39 273 273 CD 9878 115 40 150.51 150.51 CD 9878 115 41 151.55 151.55 CD 9878 115 42 20 20 CD 9878 115 43 546 546 CD 9878 115 44 289.95 289.95 CD 9878 115 45 293.21 293.21 CD 9878 115 46 50 50 CD 9878 115 47 1,415 1,415 CD 9878 115 48 741.89 741.89 CD 9878 115 49 749.75 749.75 CD 9878 115 50 100 100 CD 9878 115 51 2,730 2,730 CD 9878 115 52 1490.49 1490.49 CD 9878 115 53 1509.23 1509.23 CD 9878 115 54 The the DT 9878 115 55 results result NNS 9878 115 56 show show VBP 9878 115 57 that that WDT 9878 115 58 ( ( -LRB- 9878 115 59 a a LS 9878 115 60 ) ) -RRB- 9878 115 61 the the DT 9878 115 62 majority majority NN 9878 115 63 of of IN 9878 115 64 the the DT 9878 115 65 runtime runtime NN 9878 115 66 ( ( -LRB- 9878 115 67 more more JJR 9878 115 68 than than IN 9878 115 69 95 95 CD 9878 115 70 percent percent NN 9878 115 71 ) ) -RRB- 9878 115 72 is be VBZ 9878 115 73 spent spend VBN 9878 115 74 in in IN 9878 115 75 converting convert VBG 9878 115 76 a a DT 9878 115 77 TIFF TIFF NNP 9878 115 78 to to IN 9878 115 79 a a DT 9878 115 80 JPEG2000 JPEG2000 NNP 9878 115 81 using use VBG 9878 115 82 ImageMagick ImageMagick NNP 9878 115 83 ( ( -LRB- 9878 115 84 see see VB 9878 115 85 figure figure NN 9878 115 86 2 2 CD 9878 115 87 ) ) -RRB- 9878 115 88 ; ; : 9878 115 89 ( ( -LRB- 9878 115 90 b b LS 9878 115 91 ) ) -RRB- 9878 115 92 the the DT 9878 115 93 average average JJ 9878 115 94 runtime runtime NN 9878 115 95 of of IN 9878 115 96 converting convert VBG 9878 115 97 a a DT 9878 115 98 TIFF TIFF NNP 9878 115 99 has have VBZ 9878 115 100 a a DT 9878 115 101 constant constant JJ 9878 115 102 positive positive JJ 9878 115 103 relationship relationship NN 9878 115 104 with with IN 9878 115 105 the the DT 9878 115 106 file file NN 9878 115 107 ’s ’s POS 9878 115 108 size size NN 9878 115 109 ( ( -LRB- 9878 115 110 see see VB 9878 115 111 figure figure NN 9878 115 112 2 2 CD 9878 115 113 ) ) -RRB- 9878 115 114 ; ; : 9878 115 115 ( ( -LRB- 9878 115 116 c c NN 9878 115 117 ) ) -RRB- 9878 115 118 in in IN 9878 115 119 INFORMATION INFORMATION NNP 9878 115 120 TECHNOLOGY TECHNOLOGY NNP 9878 115 121 AND and CC 9878 115 122 LIBRARIES library NNS 9878 115 123 | | NNP 9878 115 124 MARCH MARCH NNP 9878 115 125 2018 2018 CD 9878 115 126 60 60 CD 9878 115 127 comparison comparison NN 9878 115 128 , , , 9878 115 129 the the DT 9878 115 130 runtime runtime NN 9878 115 131 of of IN 9878 115 132 converting convert VBG 9878 115 133 a a DT 9878 115 134 color color NN 9878 115 135 TIFF TIFF NNP 9878 115 136 is be VBZ 9878 115 137 significantly significantly RB 9878 115 138 higher high JJR 9878 115 139 than than IN 9878 115 140 that that DT 9878 115 141 of of IN 9878 115 142 converting convert VBG 9878 115 143 a a DT 9878 115 144 greyscale greyscale JJ 9878 115 145 TIFF TIFF NNP 9878 115 146 ( ( -LRB- 9878 115 147 see see VB 9878 115 148 figure figure NN 9878 115 149 2 2 CD 9878 115 150 ) ) -RRB- 9878 115 151 ; ; : 9878 115 152 and and CC 9878 115 153 ( ( -LRB- 9878 115 154 d d NN 9878 115 155 ) ) -RRB- 9878 115 156 it -PRON- PRP 9878 115 157 is be VBZ 9878 115 158 feasible feasible JJ 9878 115 159 in in IN 9878 115 160 terms term NNS 9878 115 161 of of IN 9878 115 162 time time NN 9878 115 163 and and CC 9878 115 164 resources resource NNS 9878 115 165 to to TO 9878 115 166 convert convert VB 9878 115 167 existing exist VBG 9878 115 168 master master NN 9878 115 169 images image NNS 9878 115 170 of of IN 9878 115 171 digital digital JJ 9878 115 172 document document NN 9878 115 173 collections collection NNS 9878 115 174 to to IN 9878 115 175 PDF PDF NNP 9878 115 176 / / SYM 9878 115 177 A-2b A-2b NNP 9878 115 178 . . . 9878 116 1 For for IN 9878 116 2 example example NN 9878 116 3 , , , 9878 116 4 the the DT 9878 116 5 runtime runtime NN 9878 116 6 of of IN 9878 116 7 1 1 CD 9878 116 8 TB tb NN 9878 116 9 of of IN 9878 116 10 conversion conversion NN 9878 116 11 of of IN 9878 116 12 color color NN 9878 116 13 TIFFs TIFFs NNPS 9878 116 14 will will MD 9878 116 15 be be VB 9878 116 16 552,831 552,831 CD 9878 116 17 seconds second NNS 9878 116 18 ( ( -LRB- 9878 116 19 153.5 153.5 CD 9878 116 20 hours hour NNS 9878 116 21 ; ; : 9878 116 22 6.398 6.398 CD 9878 116 23 days day NNS 9878 116 24 ) ) -RRB- 9878 116 25 using use VBG 9878 116 26 the the DT 9878 116 27 above above JJ 9878 116 28 hardware hardware NN 9878 116 29 . . . 9878 117 1 The the DT 9878 117 2 authors author NNS 9878 117 3 have have VBP 9878 117 4 already already RB 9878 117 5 processed process VBN 9878 117 6 more more JJR 9878 117 7 than than IN 9878 117 8 600,000 600,000 CD 9878 117 9 TIFFs tiff NNS 9878 117 10 using use VBG 9878 117 11 this this DT 9878 117 12 method method NN 9878 117 13 . . . 9878 118 1 The the DT 9878 118 2 authors author NNS 9878 118 3 conclude conclude VBP 9878 118 4 that that IN 9878 118 5 using use VBG 9878 118 6 PDF PDF NNP 9878 118 7 / / , 9878 118 8 A A NNP 9878 118 9 gives give VBZ 9878 118 10 institutions institution NNS 9878 118 11 advantages advantage NNS 9878 118 12 of of IN 9878 118 13 the the DT 9878 118 14 newly newly RB 9878 118 15 preferred prefer VBN 9878 118 16 master master NN 9878 118 17 file file NN 9878 118 18 format format NN 9878 118 19 for for IN 9878 118 20 digitization digitization NN 9878 118 21 of of IN 9878 118 22 text text NN 9878 118 23 documents document NNS 9878 118 24 over over IN 9878 118 25 TIFF TIFF NNP 9878 118 26 / / SYM 9878 118 27 JPEG2000 JPEG2000 NNP 9878 118 28 . . . 9878 119 1 The the DT 9878 119 2 above above JJ 9878 119 3 implementation implementation NN 9878 119 4 demonstrates demonstrate VBZ 9878 119 5 the the DT 9878 119 6 ease ease NN 9878 119 7 , , , 9878 119 8 the the DT 9878 119 9 reasonable reasonable JJ 9878 119 10 runtime runtime NN 9878 119 11 , , , 9878 119 12 and and CC 9878 119 13 the the DT 9878 119 14 availability availability NN 9878 119 15 of of IN 9878 119 16 open open JJ 9878 119 17 source source NN 9878 119 18 software software NN 9878 119 19 to to TO 9878 119 20 perform perform VB 9878 119 21 such such JJ 9878 119 22 conversions conversion NNS 9878 119 23 . . . 9878 120 1 From from IN 9878 120 2 both both CC 9878 120 3 the the DT 9878 120 4 theoretical theoretical JJ 9878 120 5 analysis analysis NN 9878 120 6 and and CC 9878 120 7 empirical empirical JJ 9878 120 8 evidences evidence NNS 9878 120 9 , , , 9878 120 10 the the DT 9878 120 11 authors author NNS 9878 120 12 show show VBP 9878 120 13 that that IN 9878 120 14 PDF PDF NNP 9878 120 15 / / SYM 9878 120 16 A A NNP 9878 120 17 has have VBZ 9878 120 18 advantages advantage NNS 9878 120 19 over over IN 9878 120 20 the the DT 9878 120 21 traditional traditional JJ 9878 120 22 preferred preferred JJ 9878 120 23 file file NN 9878 120 24 format format NN 9878 120 25 TIFF TIFF NNP 9878 120 26 for for IN 9878 120 27 digitization digitization NN 9878 120 28 of of IN 9878 120 29 text text NN 9878 120 30 documents document NNS 9878 120 31 . . . 9878 121 1 Following follow VBG 9878 121 2 best good JJS 9878 121 3 practice practice NN 9878 121 4 , , , 9878 121 5 a a DT 9878 121 6 PDF PDF NNP 9878 121 7 / / SYM 9878 121 8 A a DT 9878 121 9 file file NN 9878 121 10 can can MD 9878 121 11 be be VB 9878 121 12 a a DT 9878 121 13 self- self- NN 9878 121 14 contained contain VBN 9878 121 15 and and CC 9878 121 16 self self NN 9878 121 17 - - HYPH 9878 121 18 described describe VBN 9878 121 19 container container NN 9878 121 20 that that WDT 9878 121 21 accommodates accommodate VBZ 9878 121 22 all all PDT 9878 121 23 the the DT 9878 121 24 data datum NNS 9878 121 25 from from IN 9878 121 26 digitization digitization NN 9878 121 27 of of IN 9878 121 28 textual textual JJ 9878 121 29 materials material NNS 9878 121 30 , , , 9878 121 31 including include VBG 9878 121 32 page page NN 9878 121 33 - - HYPH 9878 121 34 level level NN 9878 121 35 metadata metadata NN 9878 121 36 and and CC 9878 121 37 ICC icc NN 9878 121 38 profiles profile NNS 9878 121 39 . . . 9878 122 1 SUMMARY summary VB 9878 122 2 The the DT 9878 122 3 goal goal NN 9878 122 4 of of IN 9878 122 5 this this DT 9878 122 6 article article NN 9878 122 7 is be VBZ 9878 122 8 to to TO 9878 122 9 demonstrate demonstrate VB 9878 122 10 empirical empirical JJ 9878 122 11 evidences evidence NNS 9878 122 12 of of IN 9878 122 13 using use VBG 9878 122 14 PDF PDF NNP 9878 122 15 / / , 9878 122 16 A A NNP 9878 122 17 for for IN 9878 122 18 digitization digitization NN 9878 122 19 of of IN 9878 122 20 text text NN 9878 122 21 document document NN 9878 122 22 . . . 9878 123 1 The the DT 9878 123 2 authors author NNS 9878 123 3 evaluated evaluate VBD 9878 123 4 and and CC 9878 123 5 used use VBN 9878 123 6 multiple multiple JJ 9878 123 7 open open JJ 9878 123 8 source source NN 9878 123 9 software software NN 9878 123 10 programs program NNS 9878 123 11 for for IN 9878 123 12 processing process VBG 9878 123 13 raster raster NN 9878 123 14 images image NNS 9878 123 15 , , , 9878 123 16 extracting extract VBG 9878 123 17 image image NN 9878 123 18 metadata metadata NN 9878 123 19 , , , 9878 123 20 and and CC 9878 123 21 generating generate VBG 9878 123 22 PDF PDF NNP 9878 123 23 / / , 9878 123 24 A a DT 9878 123 25 files file NNS 9878 123 26 . . . 9878 124 1 These these DT 9878 124 2 PDF PDF NNP 9878 124 3 / / , 9878 124 4 A a DT 9878 124 5 files file NNS 9878 124 6 were be VBD 9878 124 7 validated validate VBN 9878 124 8 using use VBG 9878 124 9 the the DT 9878 124 10 up up RB 9878 124 11 - - HYPH 9878 124 12 to to IN 9878 124 13 - - HYPH 9878 124 14 date date NN 9878 124 15 PDF PDF NNP 9878 124 16 / / SYM 9878 124 17 A A NNP 9878 124 18 validators validator NNS 9878 124 19 veraPDF veraPDF NNP 9878 124 20 and and CC 9878 124 21 Acrobat Acrobat NNP 9878 124 22 Preflight Preflight NNP 9878 124 23 . . . 9878 125 1 The the DT 9878 125 2 authors author NNS 9878 125 3 also also RB 9878 125 4 calculated calculate VBD 9878 125 5 the the DT 9878 125 6 time time NN 9878 125 7 complexity complexity NN 9878 125 8 of of IN 9878 125 9 the the DT 9878 125 10 program program NN 9878 125 11 and and CC 9878 125 12 measured measure VBD 9878 125 13 the the DT 9878 125 14 total total JJ 9878 125 15 runtime runtime NN 9878 125 16 in in IN 9878 125 17 multiple multiple JJ 9878 125 18 testing testing NN 9878 125 19 cases case NNS 9878 125 20 . . . 9878 126 1 Most Most JJS 9878 126 2 of of IN 9878 126 3 the the DT 9878 126 4 runtime runtime NN 9878 126 5 was be VBD 9878 126 6 spent spend VBN 9878 126 7 on on IN 9878 126 8 image image NN 9878 126 9 conversions conversion NNS 9878 126 10 from from IN 9878 126 11 TIFF TIFF NNP 9878 126 12 to to IN 9878 126 13 JPEG2000 JPEG2000 NNP 9878 126 14 . . . 9878 127 1 The the DT 9878 127 2 creation creation NN 9878 127 3 of of IN 9878 127 4 the the DT 9878 127 5 PDF PDF NNP 9878 127 6 / / SYM 9878 127 7 A-2b A-2b NNP 9878 127 8 file file NN 9878 127 9 with with IN 9878 127 10 associated associate VBN 9878 127 11 page page NN 9878 127 12 - - HYPH 9878 127 13 level level NN 9878 127 14 metadata metadata NN 9878 127 15 accounted account VBD 9878 127 16 for for IN 9878 127 17 less less JJR 9878 127 18 than than IN 9878 127 19 5 5 CD 9878 127 20 percent percent NN 9878 127 21 of of IN 9878 127 22 the the DT 9878 127 23 total total JJ 9878 127 24 runtime runtime NN 9878 127 25 . . . 9878 128 1 Runtime runtime NN 9878 128 2 of of IN 9878 128 3 conversion conversion NN 9878 128 4 of of IN 9878 128 5 a a DT 9878 128 6 color color NN 9878 128 7 TIFF TIFF NNP 9878 128 8 was be VBD 9878 128 9 much much RB 9878 128 10 higher high JJR 9878 128 11 than than IN 9878 128 12 that that DT 9878 128 13 of of IN 9878 128 14 a a DT 9878 128 15 greyscale greyscale JJ 9878 128 16 one one NN 9878 128 17 . . . 9878 129 1 Our -PRON- PRP$ 9878 129 2 theoretical theoretical JJ 9878 129 3 analysis analysis NN 9878 129 4 and and CC 9878 129 5 empirical empirical JJ 9878 129 6 examples example NNS 9878 129 7 show show VBP 9878 129 8 that that IN 9878 129 9 using use VBG 9878 129 10 PDF PDF NNP 9878 129 11 / / SYM 9878 129 12 A-2 A-2 NNP 9878 129 13 presents present VBZ 9878 129 14 many many JJ 9878 129 15 advantages advantage NNS 9878 129 16 over over IN 9878 129 17 the the DT 9878 129 18 traditional traditional JJ 9878 129 19 preferred preferred JJ 9878 129 20 file file NN 9878 129 21 format format NN 9878 129 22 ( ( -LRB- 9878 129 23 TIFF TIFF NNP 9878 129 24 / / SYM 9878 129 25 JPEG2000 JPEG2000 NNP 9878 129 26 ) ) -RRB- 9878 129 27 for for IN 9878 129 28 digitization digitization NN 9878 129 29 of of IN 9878 129 30 text text NN 9878 129 31 documents document NNS 9878 129 32 . . . 9878 130 1 DIGITIZATION DIGITIZATION NNS 9878 130 2 OF of IN 9878 130 3 TEXT text NN 9878 130 4 DOCUMENTS documents NN 9878 130 5 USING USING NNP 9878 130 6 PDF PDF NNP 9878 130 7 / / SYM 9878 130 8 A A NNP 9878 130 9 | | NNP 9878 130 10 HAN HAN NNP 9878 130 11 AND and CC 9878 130 12 WAN WAN NNP 9878 130 13 61 61 CD 9878 130 14 HTTPS://DOI.ORG/10.6017 HTTPS://DOI.ORG/10.6017 NNS 9878 130 15 / / SYM 9878 130 16 ITAL.V37I1.9878 ital.v37i1.9878 JJ 9878 130 17 Figure figure NN 9878 130 18 2 2 CD 9878 130 19 . . . 9878 131 1 File file NN 9878 131 2 size size NN 9878 131 3 , , , 9878 131 4 greyscale greyscale NN 9878 131 5 and and CC 9878 131 6 color color NN 9878 131 7 TIFFs tiff NNS 9878 131 8 and and CC 9878 131 9 runtime runtime NN 9878 131 10 ratio ratio NN 9878 131 11 . . . 9878 132 1 INFORMATION INFORMATION NNP 9878 132 2 TECHNOLOGY technology NN 9878 132 3 AND and CC 9878 132 4 LIBRARIES library NNS 9878 132 5 | | NNP 9878 132 6 MARCH MARCH NNP 9878 132 7 2018 2018 CD 9878 132 8 62 62 CD 9878 132 9 APPENDIX APPENDIX NNP 9878 132 10 A a DT 9878 132 11 : : : 9878 132 12 SAMPLE SAMPLE NNP 9878 132 13 TIFF TIFF NNP 9878 132 14 METADATA metadata VBP 9878 132 15 WITH with IN 9878 132 16 ICC icc NN 9878 132 17 HEADER HEADER NNS 9878 132 18 < < XX 9878 132 19 tiff tiff NN 9878 132 20 : : : 9878 132 21 BitsPerSample>88 > XX 9878 132 25 < < XX 9878 132 26 IFD0 ifd0 NN 9878 132 27 : : : 9878 132 28 ImageWidth>34003400 > XX 9878 132 32 < < XX 9878 132 33 IFD0 IFD0 NNP 9878 132 34 : : : 9878 132 35 ImageHeight>46804680 > XX 9878 132 39 < < XX 9878 132 40 IFD0 IFD0 NNP 9878 132 41 : : : 9878 132 42 BitsPerSample>8 BitsPerSample>8 NNP 9878 132 43 8 8 CD 9878 132 44 8 > XX 9878 132 48 < < XX 9878 132 49 IFD0 IFD0 NNP 9878 132 50 : : : 9878 132 51 Compression compression NN 9878 132 52 > > XX 9878 132 53 Uncompressed > XX 9878 132 57 < < XX 9878 132 58 IFD0 IFD0 NNP 9878 132 59 : : : 9878 132 60 PhotometricInterpretation PhotometricInterpretation NNP 9878 132 61 > > XX 9878 132 62 RGB > XX 9878 132 66 < < XX 9878 132 67 IFD0 ifd0 XX 9878 132 68 : : : 9878 132 69 StripOffsets>(Binary stripoffsets>(binary JJ 9878 132 70 data datum NNS 9878 132 71 41025 41025 CD 9878 132 72 bytes byte NNS 9878 132 73 , , , 9878 132 74 use use NN 9878 132 75 -b -b HYPH 9878 132 76 option option NN 9878 132 77 to to IN 9878 132 78 extract) > XX 9878 132 82 < < XX 9878 132 83 IFD0 IFD0 NNP 9878 132 84 : : : 9878 132 85 SamplesPerPixel>33 > XX 9878 132 89 < < XX 9878 132 90 IFD0 IFD0 NNP 9878 132 91 : : : 9878 132 92 RowsPerStrip>11 > XX 9878 132 96 < < XX 9878 132 97 IFD0 IFD0 NNP 9878 132 98 : : : 9878 132 99 StripByteCounts>(Binary StripByteCounts>(Binary NNP 9878 132 100 data datum NNS 9878 132 101 28079 28079 CD 9878 132 102 bytes byte NNS 9878 132 103 , , , 9878 132 104 use use NN 9878 132 105 -b -b HYPH 9878 132 106 option option NN 9878 132 107 to to IN 9878 132 108 extract) > XX 9878 132 112 < < XX 9878 132 113 IFD0 IFD0 NNP 9878 132 114 : : : 9878 132 115 XResolution>400400 > XX 9878 132 119 < < XX 9878 132 120 IFD0 ifd0 NN 9878 132 121 : : : 9878 132 122 YResolution>400400 > XX 9878 132 126 < < XX 9878 132 127 IFD0 IFD0 NNP 9878 132 128 : : : 9878 132 129 PlanarConfiguration PlanarConfiguration NNP 9878 132 130 > > XX 9878 132 131 Chunky > XX 9878 132 135 < < XX 9878 132 136 ICC ICC NNP 9878 132 137 - - HYPH 9878 132 138 header header NN 9878 132 139 : : : 9878 132 140 ProfileCMMType profilecmmtype NN 9878 132 141 > > FW 9878 132 142 APPL > XX 9878 132 148 < < XX 9878 132 149 ICC ICC NNP 9878 132 150 - - HYPH 9878 132 151 header header NNP 9878 132 152 : : : 9878 132 153 ProfileVersion>2.2.02.2.0 > XX 9878 132 159 < < XX 9878 132 160 ICC ICC NNP 9878 132 161 - - HYPH 9878 132 162 header header NN 9878 132 163 : : : 9878 132 164 ProfileClass ProfileClass NNP 9878 132 165 > > NN 9878 132 166 Display Display NNP 9878 132 167 Device Device NNP 9878 132 168 Profile > XX 9878 132 174 < < XX 9878 132 175 ICC ICC NNP 9878 132 176 - - HYPH 9878 132 177 header header NNP 9878 132 178 : : : 9878 132 179 ColorSpaceData ColorSpaceData NNP 9878 132 180 > > XX 9878 132 181 RGB RGB NNP 9878 132 182 < < XX 9878 132 183 /ICC /ICC NNP 9878 132 184 - - HYPH 9878 132 185 header header NN 9878 132 186 : : : 9878 132 187 ColorSpaceData ColorSpaceData NNP 9878 132 188 > > XX 9878 132 189 < < XX 9878 132 190 ICC ICC NNP 9878 132 191 - - HYPH 9878 132 192 header header NNP 9878 132 193 : : : 9878 132 194 ProfileConnectionSpace profileconnectionspace XX 9878 132 195 > > XX 9878 132 196 XYZ XYZ NNP 9878 132 197 < < XX 9878 132 198 /ICC /ICC NNP 9878 132 199 - - HYPH 9878 132 200 header header NN 9878 132 201 : : : 9878 132 202 ProfileConnectionSpace profileconnectionspace XX 9878 132 203 > > XX 9878 132 204 < < XX 9878 132 205 ICC ICC NNP 9878 132 206 - - HYPH 9878 132 207 header header NN 9878 132 208 : : : 9878 132 209 ProfileDateTime>2006:02:02 ProfileDateTime>2006:02:02 NNP 9878 132 210 02:20:00 > XX 9878 132 216 < < XX 9878 132 217 ICC ICC NNP 9878 132 218 - - HYPH 9878 132 219 header header NNP 9878 132 220 : : : 9878 132 221 ProfileFileSignature profilefilesignature NN 9878 132 222 > > FW 9878 132 223 acsp > XX 9878 132 229 < < XX 9878 132 230 ICC ICC NNP 9878 132 231 - - HYPH 9878 132 232 header header NN 9878 132 233 : : : 9878 132 234 PrimaryPlatform PrimaryPlatform NNP 9878 132 235 > > XX 9878 132 236 Apple Apple NNP 9878 132 237 Computer Computer NNP 9878 132 238 Inc. > XX 9878 132 244 < < XX 9878 132 245 ICC ICC NNP 9878 132 246 - - HYPH 9878 132 247 header header NN 9878 132 248 : : : 9878 132 249 CMMFlags cmmflag NNS 9878 132 250 > > XX 9878 132 251 Not not RB 9878 132 252 Embedded embed VBN 9878 132 253 , , , 9878 132 254 Independent > XX 9878 132 260 < < XX 9878 132 261 ICC ICC NNP 9878 132 262 - - HYPH 9878 132 263 header header NN 9878 132 264 : : : 9878 132 265 DeviceManufacturer DeviceManufacturer NNP 9878 132 266 > > XX 9878 132 267 none > XX 9878 132 273 < < XX 9878 132 274 ICC ICC NNP 9878 132 275 - - HYPH 9878 132 276 header header NNP 9878 132 277 : : : 9878 132 278 DeviceModel> > XX 9878 132 284 < < XX 9878 132 285 ICC ICC NNP 9878 132 286 - - HYPH 9878 132 287 header header NN 9878 132 288 : : : 9878 132 289 DeviceAttributes DeviceAttributes NNP 9878 132 290 > > XX 9878 132 291 Reflective reflective JJ 9878 132 292 , , , 9878 132 293 Glossy Glossy NNP 9878 132 294 , , , 9878 132 295 Positive positive JJ 9878 132 296 , , , 9878 132 297 Color > XX 9878 132 302 < < XX 9878 132 303 ICC ICC NNP 9878 132 304 - - HYPH 9878 132 305 header header NN 9878 132 306 : : : 9878 132 307 RenderingIntent RenderingIntent NNP 9878 132 308 > > NN 9878 132 309 Perceptual > XX 9878 132 315 < < XX 9878 132 316 ICC ICC NNP 9878 132 317 - - HYPH 9878 132 318 header header NNP 9878 132 319 : : : 9878 132 320 ConnectionSpaceIlluminant>0.9642 ConnectionSpaceIlluminant>0.9642 NNP 9878 132 321 1 1 CD 9878 132 322 0.82491 > XX 9878 132 327 < < XX 9878 132 328 ICC ICC NNP 9878 132 329 - - HYPH 9878 132 330 header header NN 9878 132 331 : : : 9878 132 332 ProfileCreator ProfileCreator NNP 9878 132 333 > > XX 9878 132 334 EPSO > XX 9878 132 340 < < XX 9878 132 341 ICC ICC NNP 9878 132 342 - - HYPH 9878 132 343 header header NNP 9878 132 344 : : : 9878 132 345 ProfileID>00 > XX 9878 132 351 < < XX 9878 132 352 ICC_Profile ICC_Profile NNS 9878 132 353 : : : 9878 132 354 ProfileDescription ProfileDescription NNP 9878 132 355 > > XX 9878 132 356 EPSON epson RB 9878 132 357 sRGB > XX 9878 132 361 < < XX 9878 132 362 ICC_Profile ICC_Profile NNS 9878 132 363 : : : 9878 132 364 RedMatrixColumn>0.43607 redmatrixcolumn>0.43607 CD 9878 132 365 0.22249 0.22249 CD 9878 132 366 0.01392 > XX 9878 132 370 < < XX 9878 132 371 ICC_Profile ICC_Profile NNS 9878 132 372 : : : 9878 132 373 GreenMatrixColumn>0.38515 greenmatrixcolumn>0.38515 CD 9878 132 374 0.71687 0.71687 CD 9878 132 375 0.09708 > XX 9878 132 379 < < XX 9878 132 380 ICC_Profile ICC_Profile NNS 9878 132 381 : : : 9878 132 382 BlueMatrixColumn>0.14307 bluematrixcolumn>0.14307 CD 9878 132 383 0.06061 0.06061 CD 9878 132 384 0.7141 > XX 9878 132 388 < < XX 9878 132 389 ICC_Profile ICC_Profile NNS 9878 132 390 : : : 9878 132 391 MediaWhitePoint>0.95045 mediawhitepoint>0.95045 CD 9878 132 392 1 1 CD 9878 132 393 1.08905 > XX 9878 132 397 < < XX 9878 132 398 ICC_Profile ICC_Profile NNS 9878 132 399 : : : 9878 132 400 ProfileCopyright ProfileCopyright NNP 9878 132 401 > > XX 9878 132 402 Copyright copyright NN 9878 132 403 ( ( -LRB- 9878 132 404 c c NN 9878 132 405 ) ) -RRB- 9878 132 406 SEIKO seiko NN 9878 132 407 EPSON EPSON NNP 9878 132 408 CORPORATION corporation NN 9878 132 409 2000 2000 CD 9878 132 410 - - SYM 9878 132 411 2006 2006 CD 9878 132 412 . . . 9878 133 1 All all DT 9878 133 2 rights right NNS 9878 133 3 reserved. > XX 9878 133 7 < < XX 9878 133 8 ICC_Profile ICC_Profile NNS 9878 133 9 : : : 9878 133 10 RedTRC>(Binary redtrc>(binary JJ 9878 133 11 data datum NNS 9878 133 12 8204 8204 CD 9878 133 13 bytes byte NNS 9878 133 14 , , , 9878 133 15 use use NN 9878 133 16 -b -b HYPH 9878 133 17 option option NN 9878 133 18 to to TO 9878 133 19 extract) > XX 9878 133 23 < < XX 9878 133 24 ICC_Profile ICC_Profile NNS 9878 133 25 : : : 9878 133 26 GreenTRC>(Binary GreenTRC>(Binary NNP 9878 133 27 data data NN 9878 133 28 8204 8204 CD 9878 133 29 bytes byte NNS 9878 133 30 , , , 9878 133 31 use use NN 9878 133 32 -b -b HYPH 9878 133 33 option option NN 9878 133 34 to to TO 9878 133 35 extract) > XX 9878 133 39 < < XX 9878 133 40 ICC_Profile ICC_Profile NNS 9878 133 41 : : : 9878 133 42 BlueTRC>(Binary BlueTRC>(Binary NNP 9878 133 43 data datum NNS 9878 133 44 8204 8204 CD 9878 133 45 bytes byte NNS 9878 133 46 , , , 9878 133 47 use use NN 9878 133 48 -b -b HYPH 9878 133 49 option option NN 9878 133 50 to to TO 9878 133 51 extract) > XX 9878 133 55 < < XX 9878 133 56 ICC_Profile ICC_Profile NNS 9878 133 57 : : : 9878 133 58 MediaBlackPoint>0 MediaBlackPoint>0 NNP 9878 133 59 0 0 CD 9878 133 60 0 > XX 9878 133 64 DIGITIZATION digitization NN 9878 133 65 OF of IN 9878 133 66 TEXT text NN 9878 133 67 DOCUMENTS documents NN 9878 133 68 USING USING NNP 9878 133 69 PDF PDF NNP 9878 133 70 / / SYM 9878 133 71 A A NNP 9878 133 72 | | NNP 9878 133 73 HAN HAN NNP 9878 133 74 AND and CC 9878 133 75 WAN WAN NNP 9878 133 76 63 63 CD 9878 133 77 HTTPS://DOI.ORG/10.6017 HTTPS://DOI.ORG/10.6017 NNS 9878 133 78 / / SYM 9878 133 79 ITAL.V37I1.9878 ital.v37i1.9878 JJ 9878 133 80 APPENDIX APPENDIX NNP 9878 133 81 B b NN 9878 133 82 : : : 9878 133 83 SAMPLE SAMPLE NNP 9878 133 84 CODE code NN 9878 133 85 TO to IN 9878 133 86 CONVERT CONVERT NNP 9878 133 87 PDF PDF NNP 9878 133 88 / / SYM 9878 133 89 A-2 A-2 NNP 9878 133 90 BACK BACK VBZ 9878 133 91 TO to IN 9878 133 92 JPEG2000S jpeg2000s NN 9878 133 93 / / SYM 9878 133 94 * * NFP 9878 133 95 Assumption assumption NN 9878 133 96 : : : 9878 133 97 The the DT 9878 133 98 PDF PDF NNP 9878 133 99 / / SYM 9878 133 100 A-2b A-2b NNP 9878 133 101 file file NN 9878 133 102 was be VBD 9878 133 103 specifically specifically RB 9878 133 104 generated generate VBN 9878 133 105 from from IN 9878 133 106 image image NN 9878 133 107 objects object NNS 9878 133 108 converted convert VBN 9878 133 109 from from IN 9878 133 110 TIFF TIFF NNP 9878 133 111 images image NNS 9878 133 112 with with IN 9878 133 113 JPXDecode JPXDecode NNP 9878 133 114 along along IN 9878 133 115 with with IN 9878 133 116 page page NN 9878 133 117 - - HYPH 9878 133 118 level level NN 9878 133 119 metadata metadata NN 9878 133 120 * * NFP 9878 133 121 / / SYM 9878 133 122 public public JJ 9878 133 123 static static JJ 9878 133 124 void void NN 9878 133 125 parse(String parse(stre VBG 9878 133 126 src src NNP 9878 133 127 , , , 9878 133 128 String string NN 9878 133 129 dest d JJS 9878 133 130 ) ) -RRB- 9878 133 131 throws throw VBZ 9878 133 132 IOException ioexception NN 9878 133 133 { { -LRB- 9878 133 134 PdfReader PdfReader NNP 9878 133 135 reader reader NN 9878 133 136 = = SYM 9878 133 137 new new NNP 9878 133 138 PdfReader(src PdfReader(src NNP 9878 133 139 ) ) -RRB- 9878 133 140 ; ; : 9878 133 141 PdfObject pdfobject VB 9878 133 142 obj obj NN 9878 133 143 ; ; : 9878 133 144 int int VB 9878 133 145 counter counter NN 9878 133 146 = = NFP 9878 133 147 0 0 CD 9878 133 148 ; ; : 9878 133 149 for(int for(int NN 9878 133 150 i i PRP 9878 133 151 = = CC 9878 133 152 1 1 CD 9878 133 153 ; ; : 9878 133 154 i i PRP 9878 133 155 < < XX 9878 133 156 = = NFP 9878 133 157 reader.getXrefSize reader.getxrefsize XX 9878 133 158 ( ( -LRB- 9878 133 159 ) ) -RRB- 9878 133 160 ; ; : 9878 133 161 i i PRP 9878 133 162 + + SYM 9878 133 163 + + SYM 9878 133 164 ) ) -RRB- 9878 133 165 { { -LRB- 9878 133 166 obj obj UH 9878 133 167 = = NFP 9878 133 168 reader.getPdfObject(i reader.getPdfObject(i NNP 9878 133 169 ) ) -RRB- 9878 133 170 ; ; : 9878 133 171 if(obj if(obj NN 9878 133 172 ! ! . 9878 133 173 = = NFP 9878 133 174 null null NNP 9878 133 175 & & CC 9878 133 176 & & CC 9878 133 177 obj.isStream obj.isstream ADD 9878 133 178 ( ( -LRB- 9878 133 179 ) ) -RRB- 9878 133 180 ) ) -RRB- 9878 133 181 { { -LRB- 9878 133 182 PRStream prstream NN 9878 133 183 stream stream NN 9878 133 184 = = NFP 9878 133 185 ( ( -LRB- 9878 133 186 PRStream prstream NN 9878 133 187 ) ) -RRB- 9878 133 188 obj obj UH 9878 133 189 ; ; : 9878 133 190 byte byte NN 9878 133 191 [ [ -LRB- 9878 133 192 ] ] -RRB- 9878 133 193 b b NN 9878 133 194 ; ; : 9878 133 195 try try VB 9878 133 196 { { -LRB- 9878 133 197 b b NN 9878 133 198 = = SYM 9878 133 199 PdfReader.getStreamBytes(stream PdfReader.getStreamBytes(stream NNP 9878 133 200 ) ) -RRB- 9878 133 201 ; ; : 9878 133 202 } } -RRB- 9878 133 203 catch(UnsupportedPdfException catch(UnsupportedPdfException NNP 9878 133 204 e e NNP 9878 133 205 ) ) -RRB- 9878 133 206 { { -LRB- 9878 133 207 b b NN 9878 133 208 = = SYM 9878 133 209 PdfReader.getStreamBytesRaw(stream PdfReader.getStreamBytesRaw(stream NNP 9878 133 210 ) ) -RRB- 9878 133 211 ; ; : 9878 133 212 } } -RRB- 9878 133 213 PdfObject PdfObject NNP 9878 133 214 pdfsubtype pdfsubtype NN 9878 133 215 = = SYM 9878 133 216 stream.get(PdfName stream.get(pdfname NN 9878 133 217 . . . 9878 133 218 SUBTYPE SUBTYPE NNP 9878 133 219 ) ) -RRB- 9878 133 220 ; ; : 9878 133 221 FileOutputStream fileoutputstream ADD 9878 133 222 fos fos NN 9878 133 223 = = SYM 9878 133 224 null null NN 9878 133 225 ; ; : 9878 133 226 if if IN 9878 133 227 ( ( -LRB- 9878 133 228 pdfsubtype pdfsubtype NNP 9878 133 229 ! ! . 9878 133 230 = = NFP 9878 133 231 null null NNP 9878 133 232 & & CC 9878 133 233 & & CC 9878 133 234 pdfsubtype.toString().equals(PdfName pdfsubtype.toString().equals(PdfName NNP 9878 133 235 . . . 9878 133 236 XML.toString XML.toString NNP 9878 133 237 ( ( -LRB- 9878 133 238 ) ) -RRB- 9878 133 239 ) ) -RRB- 9878 133 240 ) ) -RRB- 9878 133 241 { { -LRB- 9878 133 242 fos fos NN 9878 133 243 = = SYM 9878 133 244 new new JJ 9878 133 245 FileOutputStream(String.format(dest FileOutputStream(String.format(dest NNP 9878 133 246 + + CC 9878 133 247 " " `` 9878 133 248 _ _ NNP 9878 133 249 xml/ xml/ NNP 9878 133 250 " " '' 9878 133 251 + + CC 9878 133 252 counter+".xml counter+".xml NN 9878 133 253 " " '' 9878 133 254 , , , 9878 133 255 i i PRP 9878 133 256 ) ) -RRB- 9878 133 257 ) ) -RRB- 9878 133 258 ; ; : 9878 133 259 System.out.println("Page system.out.println("page NN 9878 133 260 Metadata Metadata NNP 9878 133 261 Extracted extract VBN 9878 133 262 ! ! . 9878 134 1 ") ") `` 9878 134 2 ; ; : 9878 134 3 } } -RRB- 9878 134 4 if if IN 9878 134 5 ( ( -LRB- 9878 134 6 pdfsubtype pdfsubtype NNP 9878 134 7 ! ! . 9878 134 8 = = NFP 9878 134 9 null null NNP 9878 134 10 & & CC 9878 134 11 & & CC 9878 134 12 pdfsubtype.toString().equals(PdfName pdfsubtype.toString().equals(PdfName NNP 9878 134 13 . . . 9878 134 14 IMAGE.toString IMAGE.toString '' 9878 134 15 ( ( -LRB- 9878 134 16 ) ) -RRB- 9878 134 17 ) ) -RRB- 9878 134 18 ) ) -RRB- 9878 134 19 { { -LRB- 9878 134 20 counter counter NN 9878 134 21 + + NNP 9878 134 22 + + SYM 9878 134 23 ; ; : 9878 134 24 fos fos NNP 9878 134 25 = = SYM 9878 134 26 new new JJ 9878 134 27 FileOutputStream(String.format(dest FileOutputStream(String.format(dest NNP 9878 134 28 + + CC 9878 134 29 " " `` 9878 134 30 _ _ NNP 9878 134 31 jp2/ jp2/ NNP 9878 134 32 " " '' 9878 134 33 + + NNP 9878 134 34 counter+".jp2 counter+".jp2 NNP 9878 134 35 " " '' 9878 134 36 , , , 9878 134 37 i i PRP 9878 134 38 ) ) -RRB- 9878 134 39 ) ) -RRB- 9878 134 40 ; ; : 9878 134 41 } } -RRB- 9878 134 42 if if IN 9878 134 43 ( ( -LRB- 9878 134 44 fos fos JJ 9878 134 45 ! ! . 9878 134 46 = = SYM 9878 134 47 null null NN 9878 134 48 ) ) -RRB- 9878 134 49 { { -LRB- 9878 134 50 fos.write(b fos.write(b NNP 9878 134 51 ) ) -RRB- 9878 134 52 ; ; : 9878 134 53 fos.flush fos.flush NNP 9878 134 54 ( ( -LRB- 9878 134 55 ) ) -RRB- 9878 134 56 ; ; . 9878 134 57 fos.close fos.close NNP 9878 134 58 ( ( -LRB- 9878 134 59 ) ) -RRB- 9878 134 60 ; ; : 9878 134 61 System.out.println("JPEG2000s System.out.println("JPEG2000s NNP 9878 134 62 Conversion Conversion NNP 9878 134 63 from from IN 9878 134 64 PDF PDF NNP 9878 134 65 completed complete VBN 9878 134 66 ! ! . 9878 135 1 ") ") `` 9878 135 2 ; ; : 9878 135 3 } } -RRB- 9878 135 4 } } -RRB- 9878 135 5 } } -RRB- 9878 135 6 / / NFP 9878 135 7 * * NFP 9878 135 8 Then then RB 9878 135 9 Use Use NNP 9878 135 10 ImageMagick ImageMagick NNP 9878 135 11 library library NN 9878 135 12 to to TO 9878 135 13 convert convert VB 9878 135 14 JPEG2000s JPEG2000s NNP 9878 135 15 to to IN 9878 135 16 TIFFs TIFFs NNP 9878 135 17 * * NFP 9878 135 18 / / SYM 9878 135 19 INFORMATION INFORMATION NNP 9878 135 20 TECHNOLOGY TECHNOLOGY NNP 9878 135 21 AND and CC 9878 135 22 LIBRARIES library NNS 9878 135 23 | | NNP 9878 135 24 MARCH MARCH NNS 9878 135 25 2018 2018 CD 9878 135 26 64 64 CD 9878 135 27 REFERENCES reference NNS 9878 135 28 1 1 CD 9878 135 29 PDF-Tools.com PDF-Tools.com NNP 9878 135 30 and and CC 9878 135 31 PDF PDF NNP 9878 135 32 Association Association NNP 9878 135 33 , , , 9878 135 34 “ " `` 9878 135 35 PDF PDF NNP 9878 135 36 / / SYM 9878 135 37 A a NN 9878 135 38 — — : 9878 135 39 The the DT 9878 135 40 Standard Standard NNP 9878 135 41 for for IN 9878 135 42 Long Long NNP 9878 135 43 - - HYPH 9878 135 44 Term term NN 9878 135 45 Archiving archiving NN 9878 135 46 , , , 9878 135 47 ” " '' 9878 135 48 version version NN 9878 135 49 2.4 2.4 CD 9878 135 50 , , , 9878 135 51 white white JJ 9878 135 52 paper paper NN 9878 135 53 , , , 9878 135 54 May May NNP 9878 135 55 20 20 CD 9878 135 56 , , , 9878 135 57 2009 2009 CD 9878 135 58 , , , 9878 135 59 http://www.pdf- http://www.pdf- NNP 9878 135 60 tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf ADD 9878 135 61 ; ; : 9878 135 62 Duff Duff NNP 9878 135 63 Johnson Johnson NNP 9878 135 64 , , , 9878 135 65 “ " `` 9878 135 66 White White NNP 9878 135 67 Paper Paper NNP 9878 135 68 : : : 9878 135 69 How how WRB 9878 135 70 to to TO 9878 135 71 Implement implement VB 9878 135 72 PDF PDF NNP 9878 135 73 / / , 9878 135 74 A a NN 9878 135 75 , , , 9878 135 76 ” " '' 9878 135 77 Talking talk VBG 9878 135 78 PDF PDF NNP 9878 135 79 , , , 9878 135 80 August August NNP 9878 135 81 24 24 CD 9878 135 82 , , , 9878 135 83 2010 2010 CD 9878 135 84 , , , 9878 135 85 https://talkingpdf.org/white-paper- https://talkingpdf.org/white-paper- VBZ 9878 135 86 how how RB 9878 135 87 - - HYPH 9878 135 88 to to IN 9878 135 89 - - HYPH 9878 135 90 implement implement NN 9878 135 91 - - HYPH 9878 135 92 pdfa/ pdfa/ NNS 9878 135 93 ; ; : 9878 135 94 Alexandra Alexandra NNP 9878 135 95 Oettler Oettler NNP 9878 135 96 , , , 9878 135 97 “ " `` 9878 135 98 PDF PDF NNP 9878 135 99 / / SYM 9878 135 100 A a NN 9878 135 101 in in IN 9878 135 102 a a DT 9878 135 103 Nutshell Nutshell NNP 9878 135 104 2.0 2.0 CD 9878 135 105 : : : 9878 135 106 PDF pdf NN 9878 135 107 for for IN 9878 135 108 Long Long NNP 9878 135 109 - - HYPH 9878 135 110 Term Term NNP 9878 135 111 Archiving Archiving NNP 9878 135 112 , , , 9878 135 113 ” " '' 9878 135 114 Association Association NNP 9878 135 115 for for IN 9878 135 116 Digital Digital NNP 9878 135 117 Standards Standards NNP 9878 135 118 , , , 9878 135 119 2013 2013 CD 9878 135 120 , , , 9878 135 121 https://www.pdfa.org/wp- https://www.pdfa.org/wp- NNP 9878 135 122 content content NN 9878 135 123 / / SYM 9878 135 124 until2016_uploads/2013/05 until2016_uploads/2013/05 NNP 9878 135 125 / / SYM 9878 135 126 PDFA_in_a_Nutshell_211.pdf pdfa_in_a_nutshell_211.pdf NN 9878 135 127 ; ; : 9878 135 128 Library Library NNP 9878 135 129 of of IN 9878 135 130 Congress Congress NNP 9878 135 131 , , , 9878 135 132 “ " `` 9878 135 133 PDF PDF NNP 9878 135 134 / / , 9878 135 135 A A NNP 9878 135 136 , , , 9878 135 137 PDF PDF NNP 9878 135 138 for for IN 9878 135 139 Long Long NNP 9878 135 140 - - HYPH 9878 135 141 Term Term NNP 9878 135 142 Preservation Preservation NNP 9878 135 143 , , , 9878 135 144 ” " '' 9878 135 145 last last JJ 9878 135 146 modified modify VBN 9878 135 147 July July NNP 9878 135 148 27 27 CD 9878 135 149 , , , 9878 135 150 2017 2017 CD 9878 135 151 , , , 9878 135 152 https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml NN 9878 135 153 . . . 9878 136 1 2 2 LS 9878 136 2 Library Library NNP 9878 136 3 of of IN 9878 136 4 Congress Congress NNP 9878 136 5 , , , 9878 136 6 “ " `` 9878 136 7 The the DT 9878 136 8 Time Time NNP 9878 136 9 and and CC 9878 136 10 Place Place NNP 9878 136 11 for for IN 9878 136 12 PDF PDF NNP 9878 136 13 : : : 9878 136 14 An an DT 9878 136 15 Interview interview NN 9878 136 16 with with IN 9878 136 17 Duff Duff NNP 9878 136 18 Johnson Johnson NNP 9878 136 19 of of IN 9878 136 20 the the DT 9878 136 21 PDF PDF NNP 9878 136 22 Association Association NNP 9878 136 23 , , , 9878 136 24 ” " '' 9878 136 25 The the DT 9878 136 26 Signal Signal NNP 9878 136 27 ( ( -LRB- 9878 136 28 blog blog NN 9878 136 29 ) ) -RRB- 9878 136 30 , , , 9878 136 31 December December NNP 9878 136 32 12 12 CD 9878 136 33 , , , 9878 136 34 2017 2017 CD 9878 136 35 , , , 9878 136 36 https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff- https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff- NNP 9878 136 37 johnson johnson NNP 9878 136 38 - - HYPH 9878 136 39 of of IN 9878 136 40 - - HYPH 9878 136 41 the the DT 9878 136 42 - - HYPH 9878 136 43 pdf pdf NN 9878 136 44 - - NN 9878 136 45 association/. association/. JJ 9878 137 1 3 3 CD 9878 137 2 Yan Yan NNP 9878 137 3 Han Han NNP 9878 137 4 , , , 9878 137 5 “ " `` 9878 137 6 Beyond beyond IN 9878 137 7 TIFF TIFF NNP 9878 137 8 and and CC 9878 137 9 JPEG2000 JPEG2000 NNP 9878 137 10 : : : 9878 137 11 PDF PDF NNP 9878 137 12 / / SYM 9878 137 13 A A NNP 9878 137 14 as as IN 9878 137 15 an an DT 9878 137 16 OAIS OAIS NNP 9878 137 17 Submission Submission NNP 9878 137 18 Information Information NNP 9878 137 19 Package Package NNP 9878 137 20 Container Container NNP 9878 137 21 , , , 9878 137 22 ” " '' 9878 137 23 Library library JJ 9878 137 24 Hi Hi NNP 9878 137 25 Tech Tech NNP 9878 137 26 33 33 CD 9878 137 27 , , , 9878 137 28 no no UH 9878 137 29 . . . 9878 138 1 3 3 CD 9878 138 2 ( ( -LRB- 9878 138 3 2015 2015 CD 9878 138 4 ) ) -RRB- 9878 138 5 : : : 9878 138 6 409–23 409–23 CD 9878 138 7 , , , 9878 138 8 https://doi.org/10.1108/LHT-06-2015- https://doi.org/10.1108/LHT-06-2015- NNP 9878 138 9 0068 0068 CD 9878 138 10 . . . 9878 139 1 4 4 CD 9878 139 2 Federal Federal NNP 9878 139 3 Agencies Agencies NNPS 9878 139 4 Digital Digital NNP 9878 139 5 Guidelines Guidelines NNP 9878 139 6 Initiative Initiative NNP 9878 139 7 , , , 9878 139 8 Technical Technical NNP 9878 139 9 Guidelines Guidelines NNP 9878 139 10 for for IN 9878 139 11 Digitizing Digitizing NNP 9878 139 12 Cultural Cultural NNP 9878 139 13 Heritage Heritage NNP 9878 139 14 Materials Materials NNPS 9878 139 15 . . . 9878 140 1 ( ( -LRB- 9878 140 2 Washington Washington NNP 9878 140 3 , , , 9878 140 4 DC DC NNP 9878 140 5 : : : 9878 140 6 Federal Federal NNP 9878 140 7 Agencies Agencies NNPS 9878 140 8 Digital Digital NNP 9878 140 9 Guidelines Guidelines NNP 9878 140 10 Initiative Initiative NNP 9878 140 11 , , , 9878 140 12 2016 2016 CD 9878 140 13 ) ) -RRB- 9878 140 14 , , , 9878 140 15 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20D http://www.digitizationguidelines.gov/guidelines/fadgi%20federal%20%20agencies%20d ADD 9878 140 16 igital%20Guidelines%20Initiative-2016%20Final_rev1.pdf igital%20Guidelines%20Initiative-2016%20Final_rev1.pdf NNS 9878 140 17 . . . 9878 141 1 5 5 CD 9878 141 2 Duff Duff NNP 9878 141 3 Johnson Johnson NNP 9878 141 4 , , , 9878 141 5 “ " `` 9878 141 6 US US NNP 9878 141 7 Federal Federal NNP 9878 141 8 Agencies Agencies NNPS 9878 141 9 Approve Approve NNP 9878 141 10 PDF PDF NNP 9878 141 11 / / , 9878 141 12 A a NN 9878 141 13 , , , 9878 141 14 ” " '' 9878 141 15 PDF PDF NNP 9878 141 16 Association Association NNP 9878 141 17 , , , 9878 141 18 September September NNP 9878 141 19 2 2 CD 9878 141 20 , , , 9878 141 21 2016 2016 CD 9878 141 22 , , , 9878 141 23 http://www.pdfa.org/new/us-federal-agencies-approve-pdfa/. http://www.pdfa.org/new/us-federal-agencies-approve-pdfa/. CD 9878 142 1 6 6 CD 9878 142 2 Bruno Bruno NNP 9878 142 3 Lowagie Lowagie NNP 9878 142 4 , , , 9878 142 5 iText iText NNP 9878 142 6 in in IN 9878 142 7 Action Action NNP 9878 142 8 , , , 9878 142 9 2nd 2nd JJ 9878 142 10 ed ed NN 9878 142 11 . . . 9878 143 1 ( ( -LRB- 9878 143 2 Stamford Stamford NNP 9878 143 3 , , , 9878 143 4 CT CT NNP 9878 143 5 : : : 9878 143 6 Manning manning NN 9878 143 7 , , , 9878 143 8 2010 2010 CD 9878 143 9 ) ) -RRB- 9878 143 10 . . . 9878 144 1 7 7 LS 9878 144 2 “ " `` 9878 144 3 iText iText NNP 9878 144 4 5.4.4 5.4.4 CD 9878 144 5 , , , 9878 144 6 ” " '' 9878 144 7 iText itext RB 9878 144 8 , , , 9878 144 9 last last JJ 9878 144 10 modified modify VBN 9878 144 11 September September NNP 9878 144 12 16 16 CD 9878 144 13 , , , 9878 144 14 2013 2013 CD 9878 144 15 , , , 9878 144 16 http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 NNP 9878 144 17 . . . 9878 145 1 8 8 CD 9878 145 2 Timothy Timothy NNP 9878 145 3 Robert Robert NNP 9878 145 4 Hart Hart NNP 9878 145 5 and and CC 9878 145 6 Denise Denise NNP 9878 145 7 de de NNP 9878 145 8 Vries Vries NNP 9878 145 9 , , , 9878 145 10 “ " `` 9878 145 11 Metadata Metadata NNP 9878 145 12 Provenance Provenance NNP 9878 145 13 and and CC 9878 145 14 Vulnerability vulnerability NN 9878 145 15 , , , 9878 145 16 ” " '' 9878 145 17 Information Information NNP 9878 145 18 Technology Technology NNP 9878 145 19 and and CC 9878 145 20 Libraries Libraries NNP 9878 145 21 36 36 CD 9878 145 22 , , , 9878 145 23 no no UH 9878 145 24 . . . 9878 146 1 4 4 CD 9878 146 2 ( ( -LRB- 9878 146 3 2017 2017 CD 9878 146 4 ) ) -RRB- 9878 146 5 , , , 9878 146 6 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 NNP 9878 146 7 . . . 9878 147 1 9 9 CD 9878 147 2 Johan Johan NNP 9878 147 3 Van Van NNP 9878 147 4 der der IN 9878 147 5 Knijff Knijff NNP 9878 147 6 , , , 9878 147 7 “ " `` 9878 147 8 JPEG JPEG NNP 9878 147 9 2000 2000 CD 9878 147 10 for for IN 9878 147 11 Long Long NNP 9878 147 12 - - HYPH 9878 147 13 Term Term NNP 9878 147 14 Preservation preservation NN 9878 147 15 : : : 9878 147 16 JP2 JP2 NNP 9878 147 17 as as IN 9878 147 18 a a DT 9878 147 19 Preservation Preservation NNP 9878 147 20 Format Format NNP 9878 147 21 , , , 9878 147 22 ” " '' 9878 147 23 D- D- NNP 9878 147 24 Lib Lib NNP 9878 147 25 17 17 CD 9878 147 26 , , , 9878 147 27 no no UH 9878 147 28 . . . 9878 148 1 5/6 5/6 CD 9878 148 2 ( ( -LRB- 9878 148 3 2011 2011 CD 9878 148 4 ) ) -RRB- 9878 148 5 , , , 9878 148 6 https://doi.org/10.1045/may2011-vanderknijff https://doi.org/10.1045/may2011-vanderknijff ADD 9878 148 7 . . . 9878 149 1 10 10 CD 9878 149 2 PDF PDF NNP 9878 149 3 Association Association NNP 9878 149 4 , , , 9878 149 5 “ " `` 9878 149 6 How how WRB 9878 149 7 veraPDF veraPDF NNP 9878 149 8 does do VBZ 9878 149 9 PDF PDF NNP 9878 149 10 / / , 9878 149 11 A A NNP 9878 149 12 Validation validation NN 9878 149 13 , , , 9878 149 14 ” " '' 9878 149 15 2016 2016 CD 9878 149 16 , , , 9878 149 17 http://www.pdfa.org/how- http://www.pdfa.org/how- NNP 9878 149 18 verapdf verapdf NN 9878 149 19 - - HYPH 9878 149 20 does do VBZ 9878 149 21 - - HYPH 9878 149 22 pdfa pdfa NN 9878 149 23 - - HYPH 9878 149 24 validation/. validation/. NN 9878 150 1 http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf NNP 9878 150 2 http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf http://www.pdf-tools.com/public/downloads/whitepapers/whitepaper-pdfa.pdf NNP 9878 150 3 https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://talkingpdf.org/white-paper-how-to-implement-pdfa/ NNP 9878 150 4 https://talkingpdf.org/white-paper-how-to-implement-pdfa/ https://talkingpdf.org/white-paper-how-to-implement-pdfa/ NNP 9878 150 5 https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf NNP 9878 150 6 https://www.pdfa.org/wp-content/until2016_uploads/2013/05/PDFA_in_a_Nutshell_211.pdf https://www.pdfa.org/wp-content/until2016_uploads/2013/05/pdfa_in_a_nutshell_211.pdf NN 9878 150 7 https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml NNP 9878 150 8 https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ NNP 9878 150 9 https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ NNP 9878 150 10 https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ NNPS 9878 150 11 https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ https://blogs.loc.gov/thesignal/2017/12/the-time-and-place-for-pdf-an-interview-with-duff-johnson-of-the-pdf-association/ NNP 9878 150 12 https://doi.org/10.1108/LHT-06-2015-0068 https://doi.org/10.1108/LHT-06-2015-0068 NNP 9878 150 13 https://doi.org/10.1108/LHT-06-2015-0068 https://doi.org/10.1108/LHT-06-2015-0068 NNS 9878 150 14 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf NNP 9878 150 15 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf NNP 9878 150 16 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf NNP 9878 150 17 http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencies%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf NNP 9878 150 18 https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ NNP 9878 150 19 https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ NNP 9878 150 20 https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ https://www.pdfa.org/new/us-federal-agencies-approve-pdfa/ NNP 9878 150 21 http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 NNP 9878 150 22 http://itextpdf.com/changelog/544 http://itextpdf.com/changelog/544 NNP 9878 150 23 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 NNP 9878 150 24 https://doi.org/10.6017/ital.v36i4.10146 https://doi.org/10.6017/ital.v36i4.10146 NNP 9878 150 25 https://doi.org/10.1045/may2011-vanderknijff https://doi.org/10.1045/may2011-vanderknijff NNP 9878 150 26 https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ NNP 9878 150 27 https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ UH 9878 150 28 https://www.pdfa.org/how-verapdf-does-pdfa-validation/ https://www.pdfa.org/how-verapdf-does-pdfa-validation/ NNP 9878 150 29 Abstract Abstract NNP 9878 150 30 Background Background NNP 9878 150 31 Goals Goals NNPS 9878 150 32 and and CC 9878 150 33 Tasks Tasks NNPS 9878 150 34 Choices Choices NNPS 9878 150 35 of of IN 9878 150 36 PDF PDF NNP 9878 150 37 / / SYM 9878 150 38 A A NNP 9878 150 39 Standards Standards NNPS 9878 150 40 and and CC 9878 150 41 Conformance Conformance NNP 9878 150 42 Level Level NNP 9878 150 43 Data Data NNP 9878 150 44 Source Source NNP 9878 150 45 PDF PDF NNP 9878 150 46 / / SYM 9878 150 47 A a NN 9878 150 48 and and CC 9878 150 49 Image Image NNP 9878 150 50 Manipulation Manipulation NNP 9878 150 51 Tools Tools NNPS 9878 150 52 Metadata Metadata NNP 9878 150 53 Extraction extraction NN 9878 150 54 Tools tool NNS 9878 150 55 and and CC 9878 150 56 Color Color NNP 9878 150 57 Profiles Profiles NNPS 9878 150 58 Implementation Implementation NNP 9878 150 59 Converting convert VBG 9878 150 60 and and CC 9878 150 61 Ordering ordering NN 9878 150 62 TIFFs TIFFs NNPS 9878 150 63 into into IN 9878 150 64 a a DT 9878 150 65 Single single JJ 9878 150 66 PDF PDF NNP 9878 150 67 / / SYM 9878 150 68 A-2 A-2 NNP 9878 150 69 File File NNP 9878 150 70 Converting Converting NNP 9878 150 71 PDF PDF NNP 9878 150 72 / / SYM 9878 150 73 A-2 A-2 NNP 9878 150 74 Files file VBZ 9878 150 75 back back RB 9878 150 76 to to IN 9878 150 77 TIFFs TIFFs NNPS 9878 150 78 and and CC 9878 150 79 JPEG2000s JPEG2000s NNP 9878 150 80 PDF PDF NNP 9878 150 81 / / SYM 9878 150 82 A A NNP 9878 150 83 Validation validation NN 9878 150 84 Runtime Runtime NNP 9878 150 85 and and CC 9878 150 86 Conclusion Conclusion NNP 9878 150 87 Summary Summary NNP 9878 150 88 Appendix Appendix NNP 9878 150 89 A a NN 9878 150 90 : : : 9878 150 91 Sample Sample NNP 9878 150 92 TIFF TIFF NNP 9878 150 93 Metadata Metadata NNP 9878 150 94 with with IN 9878 150 95 ICC ICC NNP 9878 150 96 header header NN 9878 150 97 Appendix Appendix NNP 9878 150 98 B B NNP 9878 150 99 : : : 9878 150 100 Sample Sample NNP 9878 150 101 Code Code NNP 9878 150 102 to to TO 9878 150 103 convert convert VB 9878 150 104 PDF PDF NNP 9878 150 105 / / SYM 9878 150 106 A-2 A-2 NNP 9878 150 107 back back RB 9878 150 108 to to IN 9878 150 109 JPEG2000s JPEG2000s NNP 9878 150 110 References reference NNS