mv: ‘./input-file.zip’ and ‘./input-file.zip’ are the same file Creating study carrel named machine-learning Initializing database Unzipping Archive: input-file.zip creating: ./tmp/input/machine-learning/ inflating: ./tmp/input/machine-learning/altman.docx inflating: ./tmp/input/machine-learning/prudhomme.docx inflating: ./tmp/input/machine-learning/cohen-nakazawa.docx inflating: ./tmp/input/machine-learning/harper.docx inflating: ./tmp/input/machine-learning/hansen.docx inflating: ./tmp/input/machine-learning/morgan.docx inflating: ./tmp/input/machine-learning/hintze-schossau.docx inflating: ./tmp/input/machine-learning/wiegand.docx inflating: ./tmp/input/machine-learning/lesk.docx inflating: ./tmp/input/machine-learning/kim.docx inflating: ./tmp/input/machine-learning/lucic-shanahan.docx inflating: ./tmp/input/machine-learning/jiang.docx === updating bibliographic database Building study carrel named machine-learning FILE: cache/altman.docx OUTPUT: txt/altman.txt FILE: cache/hansen.docx OUTPUT: txt/hansen.txt FILE: cache/lucic-shanahan.docx OUTPUT: txt/lucic-shanahan.txt FILE: cache/cohen-nakazawa.docx OUTPUT: txt/cohen-nakazawa.txt FILE: cache/jiang.docx OUTPUT: txt/jiang.txt FILE: cache/hintze-schossau.docx OUTPUT: txt/hintze-schossau.txt FILE: cache/lesk.docx OUTPUT: txt/lesk.txt FILE: cache/morgan.docx OUTPUT: txt/morgan.txt FILE: cache/prudhomme.docx OUTPUT: txt/prudhomme.txt FILE: cache/kim.docx OUTPUT: txt/kim.txt FILE: cache/wiegand.docx OUTPUT: txt/wiegand.txt FILE: cache/harper.docx OUTPUT: txt/harper.txt prudhomme txt/../wrd/prudhomme.wrd lucic-shanahan txt/../pos/lucic-shanahan.pos lucic-shanahan txt/../wrd/lucic-shanahan.wrd hansen txt/../wrd/hansen.wrd hansen txt/../pos/hansen.pos jiang txt/../pos/jiang.pos lucic-shanahan txt/../ent/lucic-shanahan.ent prudhomme txt/../ent/prudhomme.ent jiang txt/../ent/jiang.ent jiang txt/../wrd/jiang.wrd altman txt/../wrd/altman.wrd morgan txt/../pos/morgan.pos lesk txt/../wrd/lesk.wrd harper txt/../wrd/harper.wrd lesk txt/../pos/lesk.pos prudhomme txt/../pos/prudhomme.pos hansen txt/../ent/hansen.ent morgan txt/../wrd/morgan.wrd hintze-schossau txt/../wrd/hintze-schossau.wrd wiegand txt/../wrd/wiegand.wrd hintze-schossau txt/../pos/hintze-schossau.pos === file2bib.sh === id: prudhomme author: title: prudhomme date: pages: extension: .docx txt: ./txt/prudhomme.txt cache: ./cache/prudhomme.docx Component 1 Y component: Quantization table 0, Sampling factors 2 horiz/2 vert Component 2 Cb component: Quantization table 1, Sampling factors 1 horiz/1 vert Component 3 Cr component: Quantization table 1, Sampling factors 1 horiz/1 vert Compression Type Baseline Content-Type ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'image/jpeg'] Creation-Date 2020-04-14T20:53:08 Data Precision 8 bits Exif IFD0:Artist pprudho Exif IFD0:Padding [2060 values] Exif IFD0:Windows XP Author pprudho Exif SubIFD:Date/Time Digitized 2020:04:14 20:53:08 Exif SubIFD:Date/Time Original 2020:04:14 20:53:08 Exif SubIFD:Padding [2060 values] Exif SubIFD:Sub-Sec Time Digitized 48 Exif SubIFD:Sub-Sec Time Original 48 File Modified Date Thu Dec 10 14:22:05 +00:00 2020 File Name apache-tika-4059332103772536558.tmp File Size 111014 bytes Image Height 357 pixels Image Width 1263 pixels Number of Components 3 Number of Tables 4 Huffman tables Resolution Units inch Thumbnail Height Pixels 0 Thumbnail Width Pixels 0 X Resolution 168 dots X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser', ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.jpeg.JpegParser']] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth ['0', '1'] X-TIKA:embedded_resource_path /image1.jpg X-TIKA:parse_time_millis ['41', '4'] XMP Value Count 3 Y Resolution 168 dots dcterms:created 2020-04-14T20:53:08 embeddedRelationshipId rId8 exif:DateTimeOriginal 2020-04-14T20:53:08 meta:creation-date 2020-04-14T20:53:08 resourceName ["b'prudhomme.docx'", 'image1.jpg'] tiff:BitsPerSample 8 tiff:ImageLength 357 tiff:ImageWidth 1263 === file2bib.sh === id: lucic-shanahan author: Microsoft Office User title: lucic-shanahan date: pages: extension: .docx txt: ./txt/lucic-shanahan.txt cache: ./cache/lucic-shanahan.docx Author Microsoft Office User Chroma BlackIsZero ['true', 'true'] Chroma ColorSpaceType ['RGB', 'RGB'] Chroma NumChannels ['3', '3'] Compression CompressionTypeName ['deflate', 'deflate'] Compression Lossless ['true', 'true'] Compression NumProgressiveScans ['1', '1'] Content-Type ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'image/png', 'image/png'] Creation-Date 2020-06-24T14:38:00Z Data BitsPerSample ['8 8 8', '8 8 8'] Data PlanarConfiguration ['PixelInterleaved', 'PixelInterleaved'] Data SampleFormat ['UnsignedIntegral', 'UnsignedIntegral'] Dimension ImageOrientation ['Normal', 'Normal'] Dimension PixelAspectRatio ['1.0', '1.0'] IHDR ['width=1432, height=1073, bitDepth=8, colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1429, height=1172, bitDepth=8, colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none'] Transparency Alpha ['none', 'none'] X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser', ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser']] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth ['0', '1', '1'] X-TIKA:embedded_resource_path ['/image2.png', '/image1.png'] X-TIKA:parse_time_millis ['103', '2', '1'] creator Microsoft Office User dc:creator Microsoft Office User dcterms:created 2020-06-24T14:38:00Z embeddedRelationshipId ['rId10', 'rId11'] height ['1073', '1172'] meta:author Microsoft Office User meta:creation-date 2020-06-24T14:38:00Z resourceName ["b'lucic-shanahan.docx'", 'image2.png', 'image1.png'] tiff:BitsPerSample ['8 8 8', '8 8 8'] tiff:ImageLength ['1073', '1172'] tiff:ImageWidth ['1432', '1429'] width ['1432', '1429'] altman txt/../pos/altman.pos === file2bib.sh === id: jiang author: title: jiang date: pages: extension: .docx txt: ./txt/jiang.txt cache: ./cache/jiang.docx Chroma BlackIsZero ['true', 'true'] Chroma ColorSpaceType ['RGB', 'RGB'] Chroma NumChannels ['4', '4'] Compression CompressionTypeName ['deflate', 'deflate'] Compression Lossless ['true', 'true'] Compression NumProgressiveScans ['1', '1'] Content-Type ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'image/png', 'image/png'] Creation-Date 2020-01-04T03:04:00Z Data BitsPerSample ['8 8 8 8', '8 8 8 8'] Data PlanarConfiguration ['PixelInterleaved', 'PixelInterleaved'] Data SampleFormat ['UnsignedIntegral', 'UnsignedIntegral'] Dimension ImageOrientation ['Normal', 'Normal'] Dimension PixelAspectRatio ['1.0', '1.0'] IHDR ['width=1410, height=1208, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1970, height=1358, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none'] Transparency Alpha ['nonpremultipled', 'nonpremultipled'] X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser', ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser']] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth ['0', '1', '1'] X-TIKA:embedded_resource_path ['/image1.png', '/image2.png'] X-TIKA:parse_time_millis ['99', '2', '1'] dcterms:created 2020-01-04T03:04:00Z embeddedRelationshipId ['rId10', 'rId11'] height ['1208', '1358'] meta:creation-date 2020-01-04T03:04:00Z resourceName ["b'jiang.docx'", 'image1.png', 'image2.png'] tiff:BitsPerSample ['8 8 8 8', '8 8 8 8'] tiff:ImageLength ['1208', '1358'] tiff:ImageWidth ['1410', '1970'] width ['1410', '1970'] hintze-schossau txt/../ent/hintze-schossau.ent harper txt/../pos/harper.pos wiegand txt/../pos/wiegand.pos === file2bib.sh === id: hansen author: title: hansen date: pages: extension: .docx txt: ./txt/hansen.txt cache: ./cache/hansen.docx Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 51 resourceName b'hansen.docx' kim txt/../wrd/kim.wrd lesk txt/../ent/lesk.ent kim txt/../pos/kim.pos cohen-nakazawa txt/../wrd/cohen-nakazawa.wrd altman txt/../ent/altman.ent === file2bib.sh === id: hintze-schossau author: title: hintze-schossau date: pages: extension: .docx txt: ./txt/hintze-schossau.txt cache: ./cache/hintze-schossau.docx Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 37 resourceName b'hintze-schossau.docx' === file2bib.sh === id: morgan author: title: morgan date: pages: extension: .docx txt: ./txt/morgan.txt cache: ./cache/morgan.docx Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 57 resourceName b'morgan.docx' cohen-nakazawa txt/../pos/cohen-nakazawa.pos morgan txt/../ent/morgan.ent wiegand txt/../ent/wiegand.ent === file2bib.sh === id: lesk author: title: lesk date: pages: extension: .docx txt: ./txt/lesk.txt cache: ./cache/lesk.docx Chroma BlackIsZero ['true', 'true', 'true'] Chroma ColorSpaceType ['RGB', 'RGB', 'RGB'] Chroma NumChannels ['4', '4', '4'] Compression CompressionTypeName ['deflate', 'deflate', 'deflate'] Compression Lossless ['true', 'true', 'true'] Compression NumProgressiveScans ['1', '1', '1'] Content-Type ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'image/png', 'image/png', 'image/png'] Data BitsPerSample ['8 8 8 8', '8 8 8 8', '8 8 8 8'] Data PlanarConfiguration ['PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved'] Data SampleFormat ['UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral'] Dimension ImageOrientation ['Normal', 'Normal', 'Normal'] Dimension PixelAspectRatio ['1.0', '1.0', '1.0'] IHDR ['width=950, height=784, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=733, height=352, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=694, height=250, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none'] Transparency Alpha ['nonpremultipled', 'nonpremultipled', 'nonpremultipled'] X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser', ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser']] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth ['0', '1', '1', '1'] X-TIKA:embedded_resource_path ['/image3.png', '/image2.png', '/image1.png'] X-TIKA:parse_time_millis ['63', '2', '0', '0'] embeddedRelationshipId ['rId10', 'rId11', 'rId9'] height ['784', '352', '250'] resourceName ["b'lesk.docx'", 'image3.png', 'image2.png', 'image1.png'] tiff:BitsPerSample ['8 8 8 8', '8 8 8 8', '8 8 8 8'] tiff:ImageLength ['784', '352', '250'] tiff:ImageWidth ['950', '733', '694'] width ['950', '733', '694'] cohen-nakazawa txt/../ent/cohen-nakazawa.ent === file2bib.sh === id: altman author: title: altman date: pages: extension: .docx txt: ./txt/altman.txt cache: ./cache/altman.docx Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 57 resourceName b'altman.docx' === file2bib.sh === id: kim author: Bohyun Kim title: kim date: pages: extension: .docx txt: ./txt/kim.txt cache: ./cache/kim.docx Author Bohyun Kim Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document Creation-Date 2020-06-02T05:47:00Z X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 72 creator Bohyun Kim dc:creator Bohyun Kim dcterms:created 2020-06-02T05:47:00Z meta:author Bohyun Kim meta:creation-date 2020-06-02T05:47:00Z resourceName b'kim.docx' === file2bib.sh === id: harper author: title: harper date: pages: extension: .docx txt: ./txt/harper.txt cache: ./cache/harper.docx Chroma BlackIsZero ['true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true'] Chroma ColorSpaceType ['RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB', 'RGB'] Chroma NumChannels ['4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '4', '3', '3', '4', '4', '4', '4', '4', '4', '4', '4', '4', '3', '4'] Compression CompressionTypeName ['deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate', 'deflate'] Compression Lossless ['true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true', 'true'] Compression NumProgressiveScans ['1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'] Content-Type ['application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png', 'image/png'] Data BitsPerSample ['8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8', '8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8', '8 8 8 8'] Data PlanarConfiguration ['PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved', 'PixelInterleaved'] Data SampleFormat ['UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral', 'UnsignedIntegral'] Data SignificantBitsPerSample ['8 8 8 8', '8 8 8 8', '8 8 8 8'] Dimension ImageOrientation ['Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal', 'Normal'] Dimension PixelAspectRatio ['1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0', '1.0'] IHDR ['width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=142, height=142, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=588, height=576, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=588, height=576, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=588, height=576, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=588, height=576, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=588, height=576, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=926, height=700, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=592, height=451, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=199, height=203, bitDepth=8, colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=503, height=501, bitDepth=8, colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=423, height=420, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1480, height=533, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1295, height=257, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1150, height=128, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1150, height=128, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=539, height=253, bitDepth=8, colorType=RGB, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none', 'width=1, height=1, bitDepth=8, colorType=RGBAlpha, compressionMethod=deflate, filterMethod=adaptive, interlaceMethod=none'] Transparency Alpha ['nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'none', 'none', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'nonpremultipled', 'none', 'nonpremultipled'] X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser', ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser'], ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.image.ImageParser']] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth ['0', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1', '1'] X-TIKA:embedded_resource_path ['/image6.png', '/image25.png', '/image23.png', '/image26.png', '/image27.png', '/image30.png', '/image20.png', '/image21.png', '/image19.png', '/image8.png', '/image3.png', '/image15.png', '/image13.png', '/image12.png', '/image16.png', '/image14.png', '/image2.png', '/image29.png', '/image32.png', '/image9.png', '/image24.png', '/image31.png', '/image10.png', '/image1.png', '/image28.png', '/image7.png', '/image17.png', '/image5.png', '/image22.png', '/image18.png', '/image4.png', '/image33.png', '/image11.png'] X-TIKA:parse_time_millis ['203', '2', '1', '3', '1', '1', '1', '0', '1', '1', '0', '1', '1', '0', '0', '1', '1', '0', '1', '1', '0', '1', '0', '1', '0', '1', '0', '0', '1', '0', '1', '1', '0', '1'] embeddedRelationshipId ['rId14', 'rId15', 'rId16', 'rId17', 'rId18', 'rId19', 'rId20', 'rId21', 'rId22', 'rId23', 'rId26', 'rId28', 'rId29', 'rId30', 'rId31', 'rId32', 'rId33', 'rId34', 'rId35', 'rId36', 'rId39', 'rId40', 'rId41', 'rId47', 'rId59', 'rId60', 'rId66', 'rId67', 'rId70', 'rId71', 'rId72', 'rId76', 'rId77'] height ['1', '142', '142', '142', '142', '142', '142', '142', '142', '1', '1', '576', '576', '576', '576', '576', '1', '700', '451', '1', '203', '501', '1', '420', '533', '1', '257', '1', '128', '128', '1', '253', '1'] resourceName ["b'harper.docx'", 'image6.png', 'image25.png', 'image23.png', 'image26.png', 'image27.png', 'image30.png', 'image20.png', 'image21.png', 'image19.png', 'image8.png', 'image3.png', 'image15.png', 'image13.png', 'image12.png', 'image16.png', 'image14.png', 'image2.png', 'image29.png', 'image32.png', 'image9.png', 'image24.png', 'image31.png', 'image10.png', 'image1.png', 'image28.png', 'image7.png', 'image17.png', 'image5.png', 'image22.png', 'image18.png', 'image4.png', 'image33.png', 'image11.png'] sBIT sBIT_RGBAlpha ['red=8, green=8, blue=8, alpha=8', 'red=8, green=8, blue=8, alpha=8', 'red=8, green=8, blue=8, alpha=8'] tiff:BitsPerSample ['8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8', '8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8 8', '8 8 8', '8 8 8 8'] tiff:ImageLength ['1', '142', '142', '142', '142', '142', '142', '142', '142', '1', '1', '576', '576', '576', '576', '576', '1', '700', '451', '1', '203', '501', '1', '420', '533', '1', '257', '1', '128', '128', '1', '253', '1'] tiff:ImageWidth ['1', '142', '142', '142', '142', '142', '142', '142', '142', '1', '1', '588', '588', '588', '588', '588', '1', '926', '592', '1', '199', '503', '1', '423', '1480', '1', '1295', '1', '1150', '1150', '1', '539', '1'] width ['1', '142', '142', '142', '142', '142', '142', '142', '142', '1', '1', '588', '588', '588', '588', '588', '1', '926', '592', '1', '199', '503', '1', '423', '1480', '1', '1295', '1', '1150', '1150', '1', '539', '1'] === file2bib.sh === id: wiegand author: Sue Wiegand title: wiegand date: pages: extension: .docx txt: ./txt/wiegand.txt cache: ./cache/wiegand.docx Author Sue Wiegand Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document Creation-Date 2020-01-14T23:22:00Z X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 73 creator Sue Wiegand dc:creator Sue Wiegand dcterms:created 2020-01-14T23:22:00Z meta:author Sue Wiegand meta:creation-date 2020-01-14T23:22:00Z resourceName b'wiegand.docx' harper txt/../ent/harper.ent === file2bib.sh === id: cohen-nakazawa author: Jason E. Cohen title: cohen-nakazawa date: pages: extension: .docx txt: ./txt/cohen-nakazawa.txt cache: ./cache/cohen-nakazawa.docx Author Jason E. Cohen Content-Type application/vnd.openxmlformats-officedocument.wordprocessingml.document Creation-Date 2020-02-18T19:17:00Z X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.microsoft.ooxml.OOXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 70 creator Jason E. Cohen dc:creator Jason E. Cohen dcterms:created 2020-02-18T19:17:00Z meta:author Jason E. Cohen meta:creation-date 2020-02-18T19:17:00Z resourceName b'cohen-nakazawa.docx' kim txt/../ent/kim.ent Done mapping. Reducing machine-learning === reduce.pl bib === id = prudhomme author = title = prudhomme date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 3690 sentences = 245 flesch = 49 summary = However, "the viability of machine learning and artificial intelligence is predicated on the representativeness and quality of the data that they are trained on," as Thomas Padilla, Interim Head, Knowledge Production at the University of Nevada Las Vegas, asserts (2019, 14). In this essay, I begin by placing artificial intelligence and machine learning in context, then proceed by discussing why AI matters for archives and libraries, and describing the techniques used in a pilot automation project from the perspective of digital curation at Oklahoma State University Archives. Artificial intelligence, and specifically machine learning as a subfield of AI, has direct applications through pattern recognition techniques that predict the labeling values for unlabeled data. Along with greater computing capabilities, artificial intelligence could be an opportunity for libraries and archives to boost the discovery of their digital collections by pushing text and image recognition machine learning techniques to new limits. cache = ./cache/prudhomme.docx txt = ./txt/prudhomme.txt === reduce.pl bib === id = harper author = title = harper date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 5838 sentences = 489 flesch = 59 summary = Figure 2 Images generated with a simple statistical model appear as noise as the model is insufficient to capture the structure of the real data (Markov chains trained using wine bottles and circles from Google's QuickDraw dataset). Other types of generative statistical models, like Naive Bayes or a higher-order Markov chain,[footnoteRef:1] could perhaps capture a bit more information about the training data, but they would still be insufficient for real-world applications like this.[footnoteRef:2] Image, video, and audio are complicated; it is hard to reduce them to their essence with basic statistical rules in the way we were able to with the ordering of letters in English and Italian. Figure 4 A GAN being trained on wine bottle sketches from Google's quickdraw dataset (https://github.com/googlecreativelab/quickdraw-dataset) shows the generator learning how to produce better sketches over time. GANs in Action: Deep Learning with Generative Adversarial Networks. cache = ./cache/harper.docx txt = ./txt/harper.txt === reduce.pl bib === id = cohen-nakazawa author = Jason E. Cohen title = cohen-nakazawa date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 7632 sentences = 334 flesch = 48 summary = Consequently, our chapter describes the process we used to (1) generate technical and descriptive metadata for historical photographs as we pulled material from an extant blog website into a digital archives platform; (2) identify recurring faces in individual pictures as well as in photographs of groups of sometimes unidentified people in order to generate social networks as metadata; and (3) to help develop a controlled vocabulary for the institution's future needs for object management and description. Similarly, as the ownership of historical images suddenly extended to include present-day community members, and as these questions of access and serving a local public were inextricably bound up with interactions with members of that shared public whose family names and faces appear in the images we were making available, we began to consider the ways in which our archival work was tied to what Ryan Calo calls the "historical validation" of primary source materials (2017, 424-5). cache = ./cache/cohen-nakazawa.docx txt = ./txt/cohen-nakazawa.txt === reduce.pl bib === id = hansen author = title = hansen date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 4321 sentences = 235 flesch = 59 summary = [5: https://dml.cz/ ] [6: http://www.numdam.org/ ] [7: https://zbmath.org/ ] [8: Mathematical Subject Classification (MSC) values in MathSciNet and zbMath are a particularly interesting categorization set to work with as they are assigned and reviewed by a subject area expert editor and an active researcher in the same, or closely related, subfield as the article's content before they are published. Now let us shift from mathematics-specific categorization to subject categorization in general and look at the work Microsoft has done assigning Fields of Study (FoS) in the Microsoft Academic Graph (MAG) which is used to create their Microsoft Academic article search product.[footnoteRef:15] While the MAG FoS project is also attempting to categorize articles for proper indexing and search, it represents the second path which is taken by automated categorization projects: using machine learning techniques to both create the taxonomy and to classify. cache = ./cache/hansen.docx txt = ./txt/hansen.txt === reduce.pl bib === id = altman author = title = altman date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 6071 sentences = 311 flesch = 60 summary = I did most of my data cleanup by hand using spreadsheet software, and was not careful about preserving the formulas for each step of the process; instead, I deleted and wrote over many important intermediate computations, saving only the final results. The pipeline for a machine learning project generally comprises five stages: data acquisition, data preparation, model training and testing, evaluation and analysis, and application of results. However you get your initial data, it is generally a good idea to save a copy in the rawest possible form and treat that copy as immutable, at least during the initial phase of testing different algorithms or configurations. This is often the part of the process that requires the most work, and you should expect to iterate over your data preparations many times, even after you've started training and testing models. As you begin ingesting and preparing data, you'll want to explore possible machine learning algorithms to perform on your dataset. cache = ./cache/altman.docx txt = ./txt/altman.txt === reduce.pl bib === id = hintze-schossau author = title = hintze-schossau date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 5083 sentences = 336 flesch = 56 summary = Artificial Intelligence, with its ability to machine learn coupled to an almost humanlike understanding, sounds like the ideal tool to the humanities. Machine learning allows us to learn from these data sets in ways that exceed human capabilities, while an artificial brain will eventually allow us to objectively describe a subjective experience (through quantifying neural activations or positively and negatively associated memories). The following paragraphs will explore current Machine Learning and Artificial Intelligence technologies, explain how quantitative or qualitative they really are, and explore what the possible implications for future Digital Humanities could be. Currently, machines do not learn but must be trained, typically with human-labeled data. At the same time, memory formation (Marstaller, Hintze, and Adami 2013), information integration in the brain (Tononi 2004), and how systems evolve the ability to learn (Sheneman, Schossau, and Hintze 2019) are also being researched, as they are building blocks of general purpose intelligence. cache = ./cache/hintze-schossau.docx txt = ./txt/hintze-schossau.txt === reduce.pl bib === id = kim author = Bohyun Kim title = kim date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 6982 sentences = 516 flesch = 55 summary = With their limited intelligence and fully deterministic nature, early rule-based symbolic AI systems raised few ethical concerns.[footnoteRef:4] AI systems that near or surpass human capability, on the other hand, are likely to be given the autonomy to make their own decisions without humans, even when their workings are not entirely transparent, and some of those decisions are distinctively moral in character. The Library of Congress has worked on detecting features, such as railroads in maps, using the convolutional neural network model, and issued a solicitation for a machine learning and deep learning pilot program that will maximize the use of its digital collections in 2019.[footnoteRef:18] Indiana University Libraries, AVP, University of Texas Austin School of Information, and the New York Public Library are jointly developing the Audiovisual Metadata Platform (AMP), using many AI tools in order to automatically generate metadata for audiovisual materials, which collection managers can use to supplement their archival description and processing workflows.[footnoteRef:19] [18: See Blewer, Kim, and Phetteplace 2018 and Price 2019. cache = ./cache/kim.docx txt = ./txt/kim.txt === reduce.pl bib === id = morgan author = title = morgan date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 5269 sentences = 375 flesch = 59 summary = Now, in a time of "big data," it is possible to go beyond mere automation and towards the more intelligent use of computers; the use of algorithms and machine learning is an integral part of future library collection building and service provision. Finally, this chapter outlines both a number of possible machine learning applications for libraries as well as a few real world use cases. Like the scale of computer input, the library profession has not really exploited computers' ability to save, organize, and retrieve data; on the whole, the library profession does not understand the concept of a "data structure." For example, tab-delimited files, CSV (comma-separated value) files, relational database schema, XML files, JSON files, and the content of email messages or HTTP server responses are all examples of different types of data structures. cache = ./cache/morgan.docx txt = ./txt/morgan.txt === reduce.pl bib === id = wiegand author = Sue Wiegand title = wiegand date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 6152 sentences = 426 flesch = 44 summary = JSTOR, for example, will provide up to 25,000 documents (or more at special request) in a dataset for analysis.[footnoteRef:2] Clarivate's Content as a Service provides Web of Science data to accommodate multiple purposes.[footnoteRef:3] Besides the many freely available bibliodata sources, researchers can sign up for developer accounts in databases such as Scopus to work with datasets for text mining and computational analysis.[footnoteRef:4] Using library-licensed collections as data could allow researchers to save time in reading a large corpus, stay updated on a topic of interest, analyze the most important topics at a given time period, confirm gaps in the research literature for investigation, and increase the efficiency of sifting through massive amounts of research in, for instance, the race to develop a vaccine (Ong 2020; Vamathevan 2019). By building out new services and tools, and instructing at all levels, libraries can reinvent themselves continuously by investing in creative and sustainable innovation, from digital and data literacy to assembling modules for a library-based Researchers' Workstation that uses Machine Learning to enhance the efficiency of the scholars' research cycle. cache = ./cache/wiegand.docx txt = ./txt/wiegand.txt === reduce.pl bib === id = jiang author = title = jiang date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 3583 sentences = 323 flesch = 55 summary = Among the top strengths of happy marriages, at least five can be reflected in cross-disciplinary ML research, including "discuss problems well," "handle differences creatively," and "maintain a good balance of time alone and together." I use two examples of my personal experiences (as a computer scientist) of collaborating with researchers from multiple disciplines (e.g., historians, psychologists, IT technicians) to illustrate. Cross-disciplinary research matters, because (1) it provides an understanding of complex problems that require a multifaceted approach to solve; (2) it combines disciplinary breadth with the ability to collaborate and synthesize varying expertise; (3) it enables researchers to reach a wider audience and communicate diverse viewpoints; (4) it encourages researchers to confront questions that traditional disciplines do not ask while opening up new areas of research; and (5) it promotes disciplinary self-awareness about methods and creative practices (Urquhart et al. cache = ./cache/jiang.docx txt = ./txt/jiang.txt === reduce.pl bib === id = lucic-shanahan author = Microsoft Office User title = lucic-shanahan date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 2981 sentences = 180 flesch = 58 summary = On its "Big Read" website, the Library of Congress includes information about One Book programs around the United States,[footnoteRef:2] and the American Library Association (ALA) also provides materials with which a library can build its own One Book program and, in this way, bring members of their communities together in a conversation.[footnoteRef:3] While community reading programs are not a new phenomenon and exist in various formats and sizes, the One Book One Chicago program is notable because of its size (the Chicago Public Library has 81 local branches) as well as its history (the program has been in existence for nearly 20 years). As part of ongoing work of the "Reading Chicago Reading" project, we used the secure data portal of the HathiTrust Research Consortium to access and pre-process the in-copyright novels in our set. The place names extracted from our three Chicago-setting OBOC books allowed us to focus on particular areas of the city such as Hyde Park, which is mentioned in each of them. cache = ./cache/lucic-shanahan.docx txt = ./txt/lucic-shanahan.txt === reduce.pl bib === id = lesk author = title = lesk date = pages = extension = .docx mime = application/vnd.openxmlformats-officedocument.wordprocessingml.document words = 4868 sentences = 364 flesch = 64 summary = Fragility errors here can arise from many sources for example, the training data may not be representative of the real problem (if you train a machine translation program solely on engineering documents, do not expect it to do well on theater reviews). Similarly, the New York Times discussed the way groups of primarily young white men will build systems that focus on their data, and give wrong or discriminatory answers in more general situations (Tugend 2019). Instead of trying to learn more about the characteristics of a system that is being modeled, the effort is driven by the dictum, "more data beats better algorithms." In a review of the history of speech recognition, Xuedong Huang, James Baker, and Raj Reddy write, "The power of these systems arises mainly from their ability to collect, process, and learn from very large datasets. cache = ./cache/lesk.docx txt = ./txt/lesk.txt Building ./etc/reader.txt altman wiegand morgan harper wiegand prudhomme number of items: 12 sum of words: 62,470 average size in words: 5,205 average readability score: 55 nouns: data; learning; machine; research; libraries; library; information; process; model; example; time; images; use; systems; project; people; system; results; training; text; place; tools; work; way; algorithms; collections; researchers; problem; set; algorithm; dataset; materials; image; knowledge; problems; examples; number; applications; level; services; articles; recognition; network; gans; decision; classification; archives; techniques; input; file verbs: is; be; are; have; was; were; do; has; using; see; learning; used; make; use; given; based; been; help; create; had; does; find; learn; generated; need; did; work; trained; provide; build; generate; being; making; working; know; including; identify; get; called; become; known; include; ’s; produce; found; add; want; understand; think; made adjectives: such; new; other; many; different; more; digital; moral; human; deep; possible; large; good; local; important; -; ethical; able; social; specific; same; available; historical; real; own; neural; intelligent; high; full; better; common; library; final; traditional; public; first; computational; multiple; likely; cultural; artificial; unique; similar; particular; technical; simple; open; generative; disciplinary; second adverbs: not; also; more; then; well; only; as; even; very; out; so; together; just; now; however; most; instead; here; up; often; still; n’t; already; first; rather; especially; perhaps; much; highly; really; far; back; always; too; morally; previously; sometimes; on; increasingly; down; fully; finally; automatically; yet; similarly; never; generally; enough; easily; better pronouns: we; it; you; their; our; they; your; i; its; them; us; my; one; itself; themselves; her; me; his; he; yourself; she; ourselves; ours; ’s; ml+history; https://www.kaggle.com/c/deepfake-detection-challenge; https://devblogs.nvidia.com/explaining-deep-learning-self-driving-car/.; him; alphago proper nouns: ai; learning; machine; al; ml; chicago; library; intelligence; artificial; et; university; new; digital; google; data; daniel; johnson; information; ieee; research; york; marc; gan; adversarial; science; n.d; microsoft; generative; review; networks; press; journal; technology; may; markov; international; conference; reading; kentucky; .; march; january; computer; december; congress; Řehůřek; proceedings; msc; mark; libraries keywords: machine; learning; datum; system; library; image; university; tönnies; research; process; problem; pmss; place; new; networks; moral; microsoft; material; markov; marc; kentucky; information; human; generative; gan; example; eastern; disciplinary; chinese; chicago; balke; archive; algorithm; adversarial one topic; one dimension: learning file(s): ./cache/altman.docx titles(s): altman three topics; one dimension: learning; learning; data file(s): ./cache/kim.docx, ./cache/harper.docx, ./cache/altman.docx titles(s): kim | harper | altman five topics; three dimensions: learning machine data; data learning machine; ai machine learning; library learning machine; chicago data place file(s): ./cache/cohen-nakazawa.docx, ./cache/altman.docx, ./cache/kim.docx, ./cache/wiegand.docx, ./cache/lucic-shanahan.docx titles(s): cohen-nakazawa | altman | kim | wiegand | lucic-shanahan ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: altman author: title: altman date: words: 6071 sentences: 311 pages: flesch: 60 cache: ./cache/altman.docx txt: ./txt/altman.txt summary: I did most of my data cleanup by hand using spreadsheet software, and was not careful about preserving the formulas for each step of the process; instead, I deleted and wrote over many important intermediate computations, saving only the final results. The pipeline for a machine learning project generally comprises five stages: data acquisition, data preparation, model training and testing, evaluation and analysis, and application of results. However you get your initial data, it is generally a good idea to save a copy in the rawest possible form and treat that copy as immutable, at least during the initial phase of testing different algorithms or configurations. This is often the part of the process that requires the most work, and you should expect to iterate over your data preparations many times, even after you''ve started training and testing models. As you begin ingesting and preparing data, you''ll want to explore possible machine learning algorithms to perform on your dataset. id: hansen author: title: hansen date: words: 4321 sentences: 235 pages: flesch: 59 cache: ./cache/hansen.docx txt: ./txt/hansen.txt summary: [5: https://dml.cz/ ] [6: http://www.numdam.org/ ] [7: https://zbmath.org/ ] [8: Mathematical Subject Classification (MSC) values in MathSciNet and zbMath are a particularly interesting categorization set to work with as they are assigned and reviewed by a subject area expert editor and an active researcher in the same, or closely related, subfield as the article''s content before they are published. Now let us shift from mathematics-specific categorization to subject categorization in general and look at the work Microsoft has done assigning Fields of Study (FoS) in the Microsoft Academic Graph (MAG) which is used to create their Microsoft Academic article search product.[footnoteRef:15] While the MAG FoS project is also attempting to categorize articles for proper indexing and search, it represents the second path which is taken by automated categorization projects: using machine learning techniques to both create the taxonomy and to classify. id: harper author: title: harper date: words: 5838 sentences: 489 pages: flesch: 59 cache: ./cache/harper.docx txt: ./txt/harper.txt summary: Figure 2 Images generated with a simple statistical model appear as noise as the model is insufficient to capture the structure of the real data (Markov chains trained using wine bottles and circles from Google''s QuickDraw dataset). Other types of generative statistical models, like Naive Bayes or a higher-order Markov chain,[footnoteRef:1] could perhaps capture a bit more information about the training data, but they would still be insufficient for real-world applications like this.[footnoteRef:2] Image, video, and audio are complicated; it is hard to reduce them to their essence with basic statistical rules in the way we were able to with the ordering of letters in English and Italian. Figure 4 A GAN being trained on wine bottle sketches from Google''s quickdraw dataset (https://github.com/googlecreativelab/quickdraw-dataset) shows the generator learning how to produce better sketches over time. GANs in Action: Deep Learning with Generative Adversarial Networks. id: hintze-schossau author: title: hintze-schossau date: words: 5083 sentences: 336 pages: flesch: 56 cache: ./cache/hintze-schossau.docx txt: ./txt/hintze-schossau.txt summary: Artificial Intelligence, with its ability to machine learn coupled to an almost humanlike understanding, sounds like the ideal tool to the humanities. Machine learning allows us to learn from these data sets in ways that exceed human capabilities, while an artificial brain will eventually allow us to objectively describe a subjective experience (through quantifying neural activations or positively and negatively associated memories). The following paragraphs will explore current Machine Learning and Artificial Intelligence technologies, explain how quantitative or qualitative they really are, and explore what the possible implications for future Digital Humanities could be. Currently, machines do not learn but must be trained, typically with human-labeled data. At the same time, memory formation (Marstaller, Hintze, and Adami 2013), information integration in the brain (Tononi 2004), and how systems evolve the ability to learn (Sheneman, Schossau, and Hintze 2019) are also being researched, as they are building blocks of general purpose intelligence. id: jiang author: title: jiang date: words: 3583 sentences: 323 pages: flesch: 55 cache: ./cache/jiang.docx txt: ./txt/jiang.txt summary: Among the top strengths of happy marriages, at least five can be reflected in cross-disciplinary ML research, including "discuss problems well," "handle differences creatively," and "maintain a good balance of time alone and together." I use two examples of my personal experiences (as a computer scientist) of collaborating with researchers from multiple disciplines (e.g., historians, psychologists, IT technicians) to illustrate. Cross-disciplinary research matters, because (1) it provides an understanding of complex problems that require a multifaceted approach to solve; (2) it combines disciplinary breadth with the ability to collaborate and synthesize varying expertise; (3) it enables researchers to reach a wider audience and communicate diverse viewpoints; (4) it encourages researchers to confront questions that traditional disciplines do not ask while opening up new areas of research; and (5) it promotes disciplinary self-awareness about methods and creative practices (Urquhart et al. id: lesk author: title: lesk date: words: 4868 sentences: 364 pages: flesch: 64 cache: ./cache/lesk.docx txt: ./txt/lesk.txt summary: Fragility errors here can arise from many sources for example, the training data may not be representative of the real problem (if you train a machine translation program solely on engineering documents, do not expect it to do well on theater reviews). Similarly, the New York Times discussed the way groups of primarily young white men will build systems that focus on their data, and give wrong or discriminatory answers in more general situations (Tugend 2019). Instead of trying to learn more about the characteristics of a system that is being modeled, the effort is driven by the dictum, "more data beats better algorithms." In a review of the history of speech recognition, Xuedong Huang, James Baker, and Raj Reddy write, "The power of these systems arises mainly from their ability to collect, process, and learn from very large datasets. id: morgan author: title: morgan date: words: 5269 sentences: 375 pages: flesch: 59 cache: ./cache/morgan.docx txt: ./txt/morgan.txt summary: Now, in a time of "big data," it is possible to go beyond mere automation and towards the more intelligent use of computers; the use of algorithms and machine learning is an integral part of future library collection building and service provision. Finally, this chapter outlines both a number of possible machine learning applications for libraries as well as a few real world use cases. Like the scale of computer input, the library profession has not really exploited computers'' ability to save, organize, and retrieve data; on the whole, the library profession does not understand the concept of a "data structure." For example, tab-delimited files, CSV (comma-separated value) files, relational database schema, XML files, JSON files, and the content of email messages or HTTP server responses are all examples of different types of data structures. id: prudhomme author: title: prudhomme date: words: 3690 sentences: 245 pages: flesch: 49 cache: ./cache/prudhomme.docx txt: ./txt/prudhomme.txt summary: However, "the viability of machine learning and artificial intelligence is predicated on the representativeness and quality of the data that they are trained on," as Thomas Padilla, Interim Head, Knowledge Production at the University of Nevada Las Vegas, asserts (2019, 14). In this essay, I begin by placing artificial intelligence and machine learning in context, then proceed by discussing why AI matters for archives and libraries, and describing the techniques used in a pilot automation project from the perspective of digital curation at Oklahoma State University Archives. Artificial intelligence, and specifically machine learning as a subfield of AI, has direct applications through pattern recognition techniques that predict the labeling values for unlabeled data. Along with greater computing capabilities, artificial intelligence could be an opportunity for libraries and archives to boost the discovery of their digital collections by pushing text and image recognition machine learning techniques to new limits. id: kim author: Bohyun Kim title: kim date: words: 6982 sentences: 516 pages: flesch: 55 cache: ./cache/kim.docx txt: ./txt/kim.txt summary: With their limited intelligence and fully deterministic nature, early rule-based symbolic AI systems raised few ethical concerns.[footnoteRef:4] AI systems that near or surpass human capability, on the other hand, are likely to be given the autonomy to make their own decisions without humans, even when their workings are not entirely transparent, and some of those decisions are distinctively moral in character. The Library of Congress has worked on detecting features, such as railroads in maps, using the convolutional neural network model, and issued a solicitation for a machine learning and deep learning pilot program that will maximize the use of its digital collections in 2019.[footnoteRef:18] Indiana University Libraries, AVP, University of Texas Austin School of Information, and the New York Public Library are jointly developing the Audiovisual Metadata Platform (AMP), using many AI tools in order to automatically generate metadata for audiovisual materials, which collection managers can use to supplement their archival description and processing workflows.[footnoteRef:19] [18: See Blewer, Kim, and Phetteplace 2018 and Price 2019. id: cohen-nakazawa author: Jason E. Cohen title: cohen-nakazawa date: words: 7632 sentences: 334 pages: flesch: 48 cache: ./cache/cohen-nakazawa.docx txt: ./txt/cohen-nakazawa.txt summary: Consequently, our chapter describes the process we used to (1) generate technical and descriptive metadata for historical photographs as we pulled material from an extant blog website into a digital archives platform; (2) identify recurring faces in individual pictures as well as in photographs of groups of sometimes unidentified people in order to generate social networks as metadata; and (3) to help develop a controlled vocabulary for the institution''s future needs for object management and description. Similarly, as the ownership of historical images suddenly extended to include present-day community members, and as these questions of access and serving a local public were inextricably bound up with interactions with members of that shared public whose family names and faces appear in the images we were making available, we began to consider the ways in which our archival work was tied to what Ryan Calo calls the "historical validation" of primary source materials (2017, 424-5). id: lucic-shanahan author: Microsoft Office User title: lucic-shanahan date: words: 2981 sentences: 180 pages: flesch: 58 cache: ./cache/lucic-shanahan.docx txt: ./txt/lucic-shanahan.txt summary: On its "Big Read" website, the Library of Congress includes information about One Book programs around the United States,[footnoteRef:2] and the American Library Association (ALA) also provides materials with which a library can build its own One Book program and, in this way, bring members of their communities together in a conversation.[footnoteRef:3] While community reading programs are not a new phenomenon and exist in various formats and sizes, the One Book One Chicago program is notable because of its size (the Chicago Public Library has 81 local branches) as well as its history (the program has been in existence for nearly 20 years). As part of ongoing work of the "Reading Chicago Reading" project, we used the secure data portal of the HathiTrust Research Consortium to access and pre-process the in-copyright novels in our set. The place names extracted from our three Chicago-setting OBOC books allowed us to focus on particular areas of the city such as Hyde Park, which is mentioned in each of them. id: wiegand author: Sue Wiegand title: wiegand date: words: 6152 sentences: 426 pages: flesch: 44 cache: ./cache/wiegand.docx txt: ./txt/wiegand.txt summary: JSTOR, for example, will provide up to 25,000 documents (or more at special request) in a dataset for analysis.[footnoteRef:2] Clarivate''s Content as a Service provides Web of Science data to accommodate multiple purposes.[footnoteRef:3] Besides the many freely available bibliodata sources, researchers can sign up for developer accounts in databases such as Scopus to work with datasets for text mining and computational analysis.[footnoteRef:4] Using library-licensed collections as data could allow researchers to save time in reading a large corpus, stay updated on a topic of interest, analyze the most important topics at a given time period, confirm gaps in the research literature for investigation, and increase the efficiency of sifting through massive amounts of research in, for instance, the race to develop a vaccine (Ong 2020; Vamathevan 2019). By building out new services and tools, and instructing at all levels, libraries can reinvent themselves continuously by investing in creative and sustainable innovation, from digital and data literacy to assembling modules for a library-based Researchers'' Workstation that uses Machine Learning to enhance the efficiency of the scholars'' research cycle. ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel