mv: ‘./input-file.zip’ and ‘./input-file.zip’ are the same file
Creating study carrel named johnson-machine-2021

Initializing database
Unzipping 
Archive:  input-file.zip
   creating: ./tmp/input/johnson-machine-2021/
  inflating: ./tmp/input/johnson-machine-2021/11-prudhomme-taking.pdf  
  inflating: ./tmp/input/johnson-machine-2021/07-kim-ai.pdf  
  inflating: ./tmp/input/johnson-machine-2021/03-plumb-humanities.pdf  
  inflating: ./tmp/input/johnson-machine-2021/.DS_Store  
  inflating: ./tmp/input/johnson-machine-2021/12-cohen-machine.pdf  
  inflating: ./tmp/input/johnson-machine-2021/08-altman-building.pdf  
  inflating: ./tmp/input/johnson-machine-2021/05-wiegand-cultures.pdf  
  inflating: ./tmp/input/johnson-machine-2021/09-lesk-fragility.pdf  
  inflating: ./tmp/input/johnson-machine-2021/00-johnson-preface.pdf  
  inflating: ./tmp/input/johnson-machine-2021/04-janco-machine.pdf  
  inflating: ./tmp/input/johnson-machine-2021/13-lucic-towards.pdf  
  inflating: ./tmp/input/johnson-machine-2021/14-hansen-can.pdf  
  inflating: ./tmp/input/johnson-machine-2021/metadata.csv  
  inflating: ./tmp/input/johnson-machine-2021/01-hintze-artificial.pdf  
  inflating: ./tmp/input/johnson-machine-2021/06-jiang-cross.pdf  
  inflating: ./tmp/input/johnson-machine-2021/10-morgan-bringing.pdf  
  inflating: ./tmp/input/johnson-machine-2021/02-harper-generative.pdf  
=== updating bibliographic database
Building study carrel named johnson-machine-2021
  FILE: cache/00-johnson-preface.pdf
OUTPUT: txt/00-johnson-preface.txt
  FILE: cache/11-prudhomme-taking.pdf
OUTPUT: txt/11-prudhomme-taking.txt
  FILE: cache/12-cohen-machine.pdf
OUTPUT: txt/12-cohen-machine.txt
  FILE: cache/08-altman-building.pdf
OUTPUT: txt/08-altman-building.txt
  FILE: cache/14-hansen-can.pdf
OUTPUT: txt/14-hansen-can.txt
  FILE: cache/09-lesk-fragility.pdf
OUTPUT: txt/09-lesk-fragility.txt
  FILE: cache/13-lucic-towards.pdf
OUTPUT: txt/13-lucic-towards.txt
  FILE: cache/07-kim-ai.pdf
OUTPUT: txt/07-kim-ai.txt
  FILE: cache/03-plumb-humanities.pdf
OUTPUT: txt/03-plumb-humanities.txt
  FILE: cache/05-wiegand-cultures.pdf
OUTPUT: txt/05-wiegand-cultures.txt
  FILE: cache/10-morgan-bringing.pdf
OUTPUT: txt/10-morgan-bringing.txt
  FILE: cache/01-hintze-artificial.pdf
OUTPUT: txt/01-hintze-artificial.txt
  FILE: cache/02-harper-generative.pdf
OUTPUT: txt/02-harper-generative.txt
  FILE: cache/06-jiang-cross.pdf
OUTPUT: txt/06-jiang-cross.txt
  FILE: cache/04-janco-machine.pdf
OUTPUT: txt/04-janco-machine.txt
=== file2bib.sh ===
         id: 00-johnson-preface
     author: Johnson
      title: Preface
       date: 2021
      pages: 3
  extension: .pdf
        txt: ./txt/00-johnson-preface.txt
      cache: ./cache/00-johnson-preface.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:38:26Z
Last-Modified	2021-02-23T19:38:38Z
Last-Save-Date	2021-02-23T19:38:38Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	94
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:38:26Z
date	2021-02-23T19:38:38Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:38:26Z
dcterms:modified	2021-02-23T19:38:38Z
meta:creation-date	2021-02-23T19:38:26Z
meta:save-date	2021-02-23T19:38:38Z
modified	2021-02-23T19:38:38Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['2727', '3115', '239']
pdf:docinfo:created	2021-02-23T19:38:26Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:38:38Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'00-johnson-preface.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	3
=== file2bib.sh ===
         id: 04-janco-machine
     author: Janco
      title: Machine Learning in Digital Scholarship
       date: 2021
      pages: 6
  extension: .pdf
        txt: ./txt/04-janco-machine.txt
      cache: ./cache/04-janco-machine.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:42:59Z
Last-Modified	2021-02-23T19:43:05Z
Last-Save-Date	2021-02-23T19:43:05Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	289
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:42:59Z
date	2021-02-23T19:43:05Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:42:59Z
dcterms:modified	2021-02-23T19:43:05Z
meta:creation-date	2021-02-23T19:42:59Z
meta:save-date	2021-02-23T19:43:05Z
modified	2021-02-23T19:43:05Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1761', '3241', '3580', '3731', '3266', '1253']
pdf:docinfo:created	2021-02-23T19:42:59Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:43:05Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'04-janco-machine.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	6
=== file2bib.sh ===
         id: 11-prudhomme-taking
     author: Prudhomme
      title: Taking a Leap Forward: Machine Learning for New Limits
       date: 2021
      pages: 9
  extension: .pdf
        txt: ./txt/11-prudhomme-taking.txt
      cache: ./cache/11-prudhomme-taking.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:55:02Z
Last-Modified	2021-02-23T19:55:06Z
Last-Save-Date	2021-02-23T19:55:06Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	178
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:55:02Z
date	2021-02-23T19:55:06Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:55:02Z
dcterms:modified	2021-02-23T19:55:06Z
meta:creation-date	2021-02-23T19:55:02Z
meta:save-date	2021-02-23T19:55:06Z
modified	2021-02-23T19:55:06Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1663', '3168', '2905', '3394', '1436', '2975', '3027', '2659', '547']
pdf:docinfo:created	2021-02-23T19:55:02Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:55:06Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'11-prudhomme-taking.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	9
=== file2bib.sh ===
         id: 13-lucic-towards
     author: Lucic
      title: Towards a Chicago place name dataset: From back-of-the-book index to a labeled dataset
       date: 2021
      pages: 7
  extension: .pdf
        txt: ./txt/13-lucic-towards.txt
      cache: ./cache/13-lucic-towards.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:56:30Z
Last-Modified	2021-02-23T19:56:35Z
Last-Save-Date	2021-02-23T19:56:35Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	273
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:56:30Z
date	2021-02-23T19:56:35Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:56:30Z
dcterms:modified	2021-02-23T19:56:35Z
meta:creation-date	2021-02-23T19:56:30Z
meta:save-date	2021-02-23T19:56:35Z
modified	2021-02-23T19:56:35Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1012', '2777', '3424', '1458', '1325', '3261', '2950']
pdf:docinfo:created	2021-02-23T19:56:30Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:56:35Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'13-lucic-towards.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	7
=== file2bib.sh ===
         id: 14-hansen-can
     author: Hansen
      title: Can a Hammer Categorize Highly Technical Articles?
       date: 2021
      pages: 8
  extension: .pdf
        txt: ./txt/14-hansen-can.txt
      cache: ./cache/14-hansen-can.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:57:06Z
Last-Modified	2021-02-23T19:57:11Z
Last-Save-Date	2021-02-23T19:57:11Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	222
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:57:06Z
date	2021-02-23T19:57:11Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:57:06Z
dcterms:modified	2021-02-23T19:57:11Z
meta:creation-date	2021-02-23T19:57:06Z
meta:save-date	2021-02-23T19:57:11Z
modified	2021-02-23T19:57:11Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1679', '3465', '3316', '3818', '3393', '3462', '2610', '427']
pdf:docinfo:created	2021-02-23T19:57:06Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:57:11Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'14-hansen-can.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	8
=== file2bib.sh ===
         id: 06-jiang-cross
     author: Jiang
      title: Cross-Disciplinary ML Research is like Happy Marriages: Five Strengths and Two Examples
       date: 2021
      pages: 10
  extension: .pdf
        txt: ./txt/06-jiang-cross.txt
      cache: ./cache/06-jiang-cross.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:51:37Z
Last-Modified	2021-02-23T19:51:44Z
Last-Save-Date	2021-02-23T19:51:44Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	353
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:51:37Z
date	2021-02-23T19:51:44Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:51:37Z
dcterms:modified	2021-02-23T19:51:44Z
meta:creation-date	2021-02-23T19:51:37Z
meta:save-date	2021-02-23T19:51:44Z
modified	2021-02-23T19:51:44Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1568', '2994', '3143', '236', '1927', '2986', '1941', '2732', '2893', '373']
pdf:docinfo:created	2021-02-23T19:51:37Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:51:44Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'06-jiang-cross.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	10
=== file2bib.sh ===
         id: 01-hintze-artificial
     author: Hintze
      title: Artificial Intelligence in the Humanities: Wolf in Disguise, or Digital Revolution?
       date: 2021
      pages: 10
  extension: .pdf
        txt: ./txt/01-hintze-artificial.txt
      cache: ./cache/01-hintze-artificial.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:39:22Z
Last-Modified	2021-02-23T19:39:48Z
Last-Save-Date	2021-02-23T19:39:48Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	287
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:39:22Z
date	2021-02-23T19:39:48Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:39:22Z
dcterms:modified	2021-02-23T19:39:48Z
meta:creation-date	2021-02-23T19:39:22Z
meta:save-date	2021-02-23T19:39:48Z
modified	2021-02-23T19:39:48Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1331', '3418', '3364', '3591', '3404', '3645', '3332', '2918', '2663', '309']
pdf:docinfo:created	2021-02-23T19:39:22Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:39:48Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'01-hintze-artificial.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	10
=== file2bib.sh ===
         id: 09-lesk-fragility
     author: Lesk
      title: Fragility and Intelligibility of Deep Learning for Libraries
       date: 2021
      pages: 11
  extension: .pdf
        txt: ./txt/09-lesk-fragility.txt
      cache: ./cache/09-lesk-fragility.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:53:44Z
Last-Modified	2021-02-23T19:53:51Z
Last-Save-Date	2021-02-23T19:53:51Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	521
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:53:44Z
date	2021-02-23T19:53:51Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:53:44Z
dcterms:modified	2021-02-23T19:53:51Z
meta:creation-date	2021-02-23T19:53:44Z
meta:save-date	2021-02-23T19:53:51Z
modified	2021-02-23T19:53:51Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1254', '3210', '2506', '1393', '3171', '1852', '2607', '3047', '2661', '2974', '362']
pdf:docinfo:created	2021-02-23T19:53:44Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:53:51Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'09-lesk-fragility.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	11
=== file2bib.sh ===
         id: 10-morgan-bringing
     author: Morgan
      title: Bringing Algorithms and Machine Learning Into Library Collections and Services
       date: 2021
      pages: 13
  extension: .pdf
        txt: ./txt/10-morgan-bringing.txt
      cache: ./cache/10-morgan-bringing.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:54:29Z
Last-Modified	2021-02-23T19:54:36Z
Last-Save-Date	2021-02-23T19:54:36Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	340
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:54:29Z
date	2021-02-23T19:54:36Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:54:29Z
dcterms:modified	2021-02-23T19:54:36Z
meta:creation-date	2021-02-23T19:54:29Z
meta:save-date	2021-02-23T19:54:36Z
modified	2021-02-23T19:54:36Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1605', '1816', '3480', '3516', '3055', '2195', '1143', '3370', '2245', '2306', '1148', '1360', '572']
pdf:docinfo:created	2021-02-23T19:54:29Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:54:36Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'10-morgan-bringing.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	13
=== file2bib.sh ===
         id: 08-altman-building
     author: Altman
      title: Building a Machine Learning Pipeline
       date: 2021
      pages: 11
  extension: .pdf
        txt: ./txt/08-altman-building.txt
      cache: ./cache/08-altman-building.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:53:02Z
Last-Modified	2021-02-23T19:53:09Z
Last-Save-Date	2021-02-23T19:53:09Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	354
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:53:02Z
date	2021-02-23T19:53:09Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:53:02Z
dcterms:modified	2021-02-23T19:53:09Z
meta:creation-date	2021-02-23T19:53:02Z
meta:save-date	2021-02-23T19:53:09Z
modified	2021-02-23T19:53:09Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1544', '3239', '3273', '3080', '3409', '3275', '3283', '2599', '2784', '3038', '2029']
pdf:docinfo:created	2021-02-23T19:53:02Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:53:09Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'08-altman-building.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	11
=== file2bib.sh ===
         id: 03-plumb-humanities
     author: Plumb
      title: Humanities and Social Science Reading through Machine Learning
       date: 2021
      pages: 14
  extension: .pdf
        txt: ./txt/03-plumb-humanities.txt
      cache: ./cache/03-plumb-humanities.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:42:10Z
Last-Modified	2021-02-23T19:42:20Z
Last-Save-Date	2021-02-23T19:42:20Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	416
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:42:10Z
date	2021-02-23T19:42:20Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:42:10Z
dcterms:modified	2021-02-23T19:42:20Z
meta:creation-date	2021-02-23T19:42:10Z
meta:save-date	2021-02-23T19:42:20Z
modified	2021-02-23T19:42:20Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1499', '3488', '3198', '3477', '3525', '3680', '1858', '3594', '3269', '3806', '3212', '3053', '2828', '701']
pdf:docinfo:created	2021-02-23T19:42:10Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:42:20Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'03-plumb-humanities.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	14
=== file2bib.sh ===
         id: 12-cohen-machine
     author: Cohen
      title: Machine Learning + Data Creation in a Community Partnership for Archival Research
       date: 2021
      pages: 13
  extension: .pdf
        txt: ./txt/12-cohen-machine.txt
      cache: ./cache/12-cohen-machine.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:55:45Z
Last-Modified	2021-02-23T19:55:52Z
Last-Save-Date	2021-02-23T19:55:52Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	335
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:55:45Z
date	2021-02-23T19:55:52Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:55:45Z
dcterms:modified	2021-02-23T19:55:52Z
meta:creation-date	2021-02-23T19:55:45Z
meta:save-date	2021-02-23T19:55:52Z
modified	2021-02-23T19:55:52Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1055', '3435', '3682', '3404', '3580', '3777', '3423', '3672', '3146', '3653', '3411', '2761', '1834']
pdf:docinfo:created	2021-02-23T19:55:45Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:55:52Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'12-cohen-machine.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	13
=== file2bib.sh ===
         id: 07-kim-ai
     author: Kim
      title: AI and Its Moral Concerns
       date: 2021
      pages: 13
  extension: .pdf
        txt: ./txt/07-kim-ai.txt
      cache: ./cache/07-kim-ai.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:52:26Z
Last-Modified	2021-02-23T19:52:32Z
Last-Save-Date	2021-02-23T19:52:32Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	338
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:52:26Z
date	2021-02-23T19:52:32Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:52:26Z
dcterms:modified	2021-02-23T19:52:32Z
meta:creation-date	2021-02-23T19:52:26Z
meta:save-date	2021-02-23T19:52:32Z
modified	2021-02-23T19:52:32Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['2160', '3381', '3274', '3376', '3073', '3523', '2955', '3262', '3274', '3739', '2870', '2729', '1973']
pdf:docinfo:created	2021-02-23T19:52:26Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:52:32Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'07-kim-ai.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	13
=== file2bib.sh ===
         id: 02-harper-generative
     author: Harper
      title: Generative Machine Learning
       date: 2021
      pages: 15
  extension: .pdf
        txt: ./txt/02-harper-generative.txt
      cache: ./cache/02-harper-generative.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:40:58Z
Last-Modified	2021-02-23T19:41:05Z
Last-Save-Date	2021-02-23T19:41:05Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	711
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:40:58Z
date	2021-02-23T19:41:05Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:40:58Z
dcterms:modified	2021-02-23T19:41:05Z
meta:creation-date	2021-02-23T19:40:58Z
meta:save-date	2021-02-23T19:41:05Z
modified	2021-02-23T19:41:05Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1624', '2349', '2291', '3443', '1761', '2108', '1965', '262', '2399', '1679', '2166', '2770', '2915', '2848', '2065']
pdf:docinfo:created	2021-02-23T19:40:58Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:41:05Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'02-harper-generative.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	15
=== file2bib.sh ===
         id: 05-wiegand-cultures
     author: Wiegand
      title: Cultures of Innovation: Machine Learning as a Library Service
       date: 2021
      pages: 14
  extension: .pdf
        txt: ./txt/05-wiegand-cultures.txt
      cache: ./cache/05-wiegand-cultures.pdf

Content-Type	application/pdf
Creation-Date	2021-02-23T19:43:35Z
Last-Modified	2021-02-23T19:43:41Z
Last-Save-Date	2021-02-23T19:43:41Z
X-Parsed-By	['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.pdf.PDFParser']
X-TIKA:content_handler	ToTextContentHandler
X-TIKA:embedded_depth	0
X-TIKA:parse_time_millis	392
access_permission:assemble_document	true
access_permission:can_modify	true
access_permission:can_print	true
access_permission:can_print_degraded	true
access_permission:extract_content	true
access_permission:extract_for_accessibility	true
access_permission:fill_in_form	true
access_permission:modify_annotations	true
created	2021-02-23T19:43:35Z
date	2021-02-23T19:43:41Z
dc:format	application/pdf; version=1.3
dcterms:created	2021-02-23T19:43:35Z
dcterms:modified	2021-02-23T19:43:41Z
meta:creation-date	2021-02-23T19:43:35Z
meta:save-date	2021-02-23T19:43:41Z
modified	2021-02-23T19:43:41Z
pdf:PDFVersion	1.3
pdf:charsPerPage	['1556', '3332', '3258', '3344', '3197', '3457', '3639', '3491', '2966', '3359', '2681', '2571', '2776', '1122']
pdf:docinfo:created	2021-02-23T19:43:35Z
pdf:docinfo:creator_tool	LaTeX with hyperref
pdf:docinfo:modified	2021-02-23T19:43:41Z
pdf:docinfo:producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
pdf:encrypted	false
pdf:hasMarkedContent	false
pdf:hasXFA	false
pdf:hasXMP	false
pdf:unmappedUnicodeCharsPerPage	['0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0']
producer	macOS Version 11.2.1 (Build 20D74) Quartz PDFContext, AppendMode 1.1
resourceName	b'05-wiegand-cultures.pdf'
xmp:CreatorTool	LaTeX with hyperref
xmpTPg:NPages	14
00-johnson-preface  txt/../ent/00-johnson-preface.ent
04-janco-machine  txt/../ent/04-janco-machine.ent
13-lucic-towards  txt/../ent/13-lucic-towards.ent
14-hansen-can  txt/../ent/14-hansen-can.ent
11-prudhomme-taking  txt/../ent/11-prudhomme-taking.ent
06-jiang-cross  txt/../ent/06-jiang-cross.ent
01-hintze-artificial  txt/../ent/01-hintze-artificial.ent
09-lesk-fragility  txt/../ent/09-lesk-fragility.ent
10-morgan-bringing  txt/../ent/10-morgan-bringing.ent
08-altman-building  txt/../ent/08-altman-building.ent
12-cohen-machine  txt/../ent/12-cohen-machine.ent
03-plumb-humanities  txt/../ent/03-plumb-humanities.ent
05-wiegand-cultures  txt/../ent/05-wiegand-cultures.ent
02-harper-generative  txt/../ent/02-harper-generative.ent
07-kim-ai  txt/../ent/07-kim-ai.ent
00-johnson-preface  txt/../pos/00-johnson-preface.pos
13-lucic-towards  txt/../pos/13-lucic-towards.pos
14-hansen-can  txt/../pos/14-hansen-can.pos
04-janco-machine  txt/../pos/04-janco-machine.pos
06-jiang-cross  txt/../pos/06-jiang-cross.pos
11-prudhomme-taking  txt/../pos/11-prudhomme-taking.pos
09-lesk-fragility  txt/../pos/09-lesk-fragility.pos
10-morgan-bringing  txt/../pos/10-morgan-bringing.pos
02-harper-generative  txt/../pos/02-harper-generative.pos
08-altman-building  txt/../pos/08-altman-building.pos
01-hintze-artificial  txt/../pos/01-hintze-artificial.pos
03-plumb-humanities  txt/../pos/03-plumb-humanities.pos
07-kim-ai  txt/../pos/07-kim-ai.pos
05-wiegand-cultures  txt/../pos/05-wiegand-cultures.pos
12-cohen-machine  txt/../pos/12-cohen-machine.pos
00-johnson-preface  txt/../wrd/00-johnson-preface.wrd
13-lucic-towards  txt/../wrd/13-lucic-towards.wrd
04-janco-machine  txt/../wrd/04-janco-machine.wrd
06-jiang-cross  txt/../wrd/06-jiang-cross.wrd
14-hansen-can  txt/../wrd/14-hansen-can.wrd
11-prudhomme-taking  txt/../wrd/11-prudhomme-taking.wrd
09-lesk-fragility  txt/../wrd/09-lesk-fragility.wrd
01-hintze-artificial  txt/../wrd/01-hintze-artificial.wrd
08-altman-building  txt/../wrd/08-altman-building.wrd
02-harper-generative  txt/../wrd/02-harper-generative.wrd
10-morgan-bringing  txt/../wrd/10-morgan-bringing.wrd
03-plumb-humanities  txt/../wrd/03-plumb-humanities.wrd
07-kim-ai  txt/../wrd/07-kim-ai.wrd
05-wiegand-cultures  txt/../wrd/05-wiegand-cultures.wrd
12-cohen-machine  txt/../wrd/12-cohen-machine.wrd
Done mapping.
Reducing johnson-machine-2021
=== reduce.pl bib ===
         id = 07-kim-ai
     author = Kim
      title = AI and Its Moral Concerns
       date = 2021
      pages = 13
  extension = .pdf
       mime = application/pdf
      words = 7293
  sentences = 784
     flesch = 55
    summary = does not provide an easy answer to the question of how one should program moral decisionmaking into intelligent machines. Described below are some of the significant ethical challenges that autonomous AI systems such as military robots present. 11Note that this moral decision-making process can be modeled with a rule-based symbolic AI approach, a machine 13(Kahn 2012) also argues that the resulting increase in the number of wars by the use of military robots will be morally 15This black-box nature of AI systems powered by machine learning has raised great concern among many AI researchers in recent years. agency in the AI -powered automated information environment presents an ethical challenge In this chapter, I discussed four significant ethical challenges that automating decisions and actions with AI presents: (a) moral desensitization; (b) unintended outcomes; (c) surrender of are at an early stage in developing AI applications and applying machine learning and deep learning techniques to improve library services, systems, and operations.
      cache = ./cache/07-kim-ai.pdf
       txt  = ./txt/07-kim-ai.txt
=== reduce.pl bib ===
         id = 11-prudhomme-taking
     author = Prudhomme
      title = Taking a Leap Forward: Machine Learning for New Limits
       date = 2021
      pages = 9
  extension = .pdf
       mime = application/pdf
      words = 3910
  sentences = 387
     flesch = 51
    summary = Combining automatic processes to assist in supporting inventory management with a focus on descriptive metadata, a machine learning solution could help alleviate time-consuming and relatively expensive metadata tagging tasks, Deep learning neural networks are more effective in feature detection as they are able to solve complex problems such as image classification with greater accuracy when trained with large datasets. For images, how can archives build a data-labeling pipeline into their digital curation workflow that enables machine learning of collections? machine learning is only good so long as value is added, archives and libraries will need to think As deep learning applications will only be as effective as the data, archives and libraries should expand their Along with greater computing capabilities, artificial intelligence could be an opportunity for libraries and archives to boost the discovery of their digital collections by pushing text and image
      cache = ./cache/11-prudhomme-taking.pdf
       txt  = ./txt/11-prudhomme-taking.txt
=== reduce.pl bib ===
         id = 03-plumb-humanities
     author = Plumb
      title = Humanities and Social Science Reading through Machine Learning
       date = 2021
      pages = 14
  extension = .pdf
       mime = application/pdf
      words = 7195
  sentences = 599
     flesch = 45
    summary = Respondents such as Mark Algee-Hewitt pointed out that literary scholars employ computational statistical models in order to reveal something about texts that human readers Machine learning, and word embedding algorithms in particular, may have a unique ability to shift this conversation into new territory, where scholars Acknowledging this helps contextualize machine learning algorithms for text analysis tasks in the humanities, but also highlights data curation challenges This naturally raises questions about how machine learning algorithms like word embeddings are implemented for text analysis, and how they Based on the potential for word embeddings to model semantic spaces for different corpora and compare the distribution of terms, the next step was to build a corpus of non-canonical Designing humanities research with novel word embedding models stands to widen the territory where machine learning engineers look for conceptual concepts Systematic data curation, combined with word embedding algorithms, represent a new interpretive system for literary scholars.
      cache = ./cache/03-plumb-humanities.pdf
       txt  = ./txt/03-plumb-humanities.txt
=== reduce.pl bib ===
         id = 12-cohen-machine
     author = Cohen
      title = Machine Learning + Data Creation in a Community Partnership for Archival Research
       date = 2021
      pages = 13
  extension = .pdf
       mime = application/pdf
      words = 7542
  sentences = 474
     flesch = 54
    summary = archivally focused project that emerged from a partnership between the Pine Mountain Settlement School (PMSS)1 in Harlan County, Kentucky, and scholars and students at Berea College. a latent social network of historical families represented by the images held in one local archive, curricula for use in Kentucky public schools with PMSS archival materials. That decision led a team of Berea College undergraduate and faculty researchers to scrape the data from the PMSS archive site and supplement the images and transcriptions it contains with available textual metadata drawn from the site.9 Alongside the WordPress facial recognition software to identify the persons in historic photographs in the PMSS archives. We demonstrated to the local members at Pine Mountain how our use case and its constraints for digital archives fit with the current standards for the fair use of copyrighted materials
      cache = ./cache/12-cohen-machine.pdf
       txt  = ./txt/12-cohen-machine.txt
=== reduce.pl bib ===
         id = 05-wiegand-cultures
     author = Wiegand
      title = Cultures of Innovation: Machine Learning as a Library Service
       date = 2021
      pages = 14
  extension = .pdf
       mime = application/pdf
      words = 7014
  sentences = 761
     flesch = 49
    summary = traditional role, librarians in the 20th century added a new function—discovery—teaching people to find and use the library's collected scholarship. learning in the library as the next step beyond collecting, with librarians instructing on information infrastructure with the goal of empowering library users to find, evaluate, and use scholarly go far beyond local library collections to a global perspective and normative practice of participation at scale in innovative emerging technologies such as Machine Learning. start by using Machine Learning tools to automate alerts of new content in a narrow area of interest and help researchers at all levels find and focus on problem-solving. A library that adapted Machine Learning as an innovation technology would improve its practices; add new services; choose, use, and license collections differently; utilize all spaces for learning; and role model innovative leadership. opening local collections to discovery and use in order to create new knowledge through digitization and semantic linking, with cross-disciplinary technologies to augment traditional research
      cache = ./cache/05-wiegand-cultures.pdf
       txt  = ./txt/05-wiegand-cultures.txt
=== reduce.pl bib ===
         id = 08-altman-building
     author = Altman
      title = Building a Machine Learning Pipeline
       date = 2021
      pages = 11
  extension = .pdf
       mime = application/pdf
      words = 6148
  sentences = 361
     flesch = 63
    summary = As you begin ingesting and preparing data, you'll want to explore possible machine learning algorithms to perform on your dataset. Start by determining what general type of learning algorithm you need, and proceed from there to research and select one that While the final output of a machine learning workflow is some sort of intelligent model, The pipeline for a machine learning project generally comprises five stages: data acquisition, data preparation, model training and testing, evaluation and analysis, and application of results. good idea to save a copy in the rawest possible form and treat that copy as immutable, at least during the initial phase of testing different algorithms or configurations. algorithm uses the training data to "learn" a set of rules that it can subsequently apply to new, Immutable data storage can benefit the batch-processing ML pipeline, especially during the initial research and development phase.
      cache = ./cache/08-altman-building.pdf
       txt  = ./txt/08-altman-building.txt
=== reduce.pl bib ===
         id = 09-lesk-fragility
     author = Lesk
      title = Fragility and Intelligibility of Deep Learning for Libraries
       date = 2021
      pages = 11
  extension = .pdf
       mime = application/pdf
      words = 4796
  sentences = 474
     flesch = 63
    summary = Machine learning systems have a set of data for training. of the real problem (if you train a machine translation program solely on engineering documents, there may be a lot of training data, including many noisy points, and the program may decide on Many popular magazines have discussed this problem; Forbes, for example, had an explanation of how the choice of datasets can produce a biased result without any deliberate attempt to used to suggest malicious creation of training data or examples of data designed to deceive machine learning systems. blood pressure, and lower blood pressure decreases the risk of heart attacks." Then I have to explain that the paper evaluates 32 possibilities (prior/current ownership ⇥ cats/dogs ⇥ 4 medical compare the performance of machine learning systems for medical diagnosis with actual doctors If a program is constantly learning from new data, there is no list of previously fixed failures to
      cache = ./cache/09-lesk-fragility.pdf
       txt  = ./txt/09-lesk-fragility.txt
=== reduce.pl bib ===
         id = 13-lucic-towards
     author = Lucic
      title = Towards a Chicago place name dataset: From back-of-the-book index to a labeled dataset
       date = 2021
      pages = 7
  extension = .pdf
       mime = application/pdf
      words = 3073
  sentences = 285
     flesch = 63
    summary = Reading Chicago Reading1 is a grant-supported digital humanities project that takes as its object the "One Book One Chicago" (OBOC) program2 of the Chicago Public Library. A related question is the focus of this paper: by associating place names with sentiment scores in Chicago-themed OBOC The HathiTrust research portal permits the extraction of non-consumptive features of the works included in the digital library, even those that are still under copyright. The place names extracted from our three Chicago-setting OBOC books allowed us to focus Our interest in creating a dataset of Chicago place names extracted from literature led us to Kaser's book contains several indexes that can serve as sources of labeled data or instances in which Chicago locations are mentioned. the index as a source of already-labeled data for Chicago place names. associated sentiment scores for Chicago place names in the three OBOC selections centered on
      cache = ./cache/13-lucic-towards.pdf
       txt  = ./txt/13-lucic-towards.txt
=== reduce.pl bib ===
         id = 00-johnson-preface
     author = Johnson
      title = Preface
       date = 2021
      pages = 3
  extension = .pdf
       mime = application/pdf
      words = 1090
  sentences = 72
     flesch = 47
    summary = The plan called for a survey and a series of workshops hosted across the country to explore, originally, "the national need for library based topic modeling tools in support of cross-disciplinary libraries ran concurrently with our grant — Cordell 2020 and Padilla 2019, which were commissioned by major players in the field, the Library of Congress and OCLC, respectively — and vi Machine Learning, Libraries, and Cross-Disciplinary Research We would like to thank the IMLS for providing essential funding support for the grant and the Thank you to the members of the Notre Dame IMLS grant team who, at of course, thanks to the 95 participants in our 2019 IMLS Grant Workshops (too many to enumerate here) and to the essay authors for sharing their expertise and perspectives in growing our collective knowledge of machine learning and its use in research, scholarship, and cultural heritage organizations. https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html
      cache = ./cache/00-johnson-preface.pdf
       txt  = ./txt/00-johnson-preface.txt
=== reduce.pl bib ===
         id = 04-janco-machine
     author = Janco
      title = Machine Learning in Digital Scholarship
       date = 2021
      pages = 6
  extension = .pdf
       mime = application/pdf
      words = 3101
  sentences = 245
     flesch = 54
    summary = Tools like RunwayML, the Teachable Machine, and Google AutoML allow researchers to train project-specific Since 2014, dramatic innovations in machine learning have occurred, providing new capabilities in computer vision, natural language processing, and other areas of applied artificial intelligence. deliberately and identify how machine learning methods can benefit a scholar's research? for identifying basic tasks that can be completed by computers in ways that advance humanities research (2000). When working with texts or images, machine learning models are presently capable of making simple annotations and associations. Google's Teachable Machine offers an intuitive web application that humanities faculty and students can use to train classification models for images, sounds, and poses. Machine learning models offer a variety of ways to identify similarity and difference with research materials. goals of academic researchers in the humanities with the technical possibilities of machine learning. "Scholarly Primitives: What Methods Do Humanities Researchers Have
      cache = ./cache/04-janco-machine.pdf
       txt  = ./txt/04-janco-machine.txt
=== reduce.pl bib ===
         id = 14-hansen-can
     author = Hansen
      title = Can a Hammer Categorize Highly Technical Articles?
       date = 2021
      pages = 8
  extension = .pdf
       mime = application/pdf
      words = 4339
  sentences = 394
     flesch = 63
    summary = I would use the Mathematical Subject Classification (MSC) values assigned to the publications in MathSciNet1 to create a temporal citation network which would allow me to visualize Machine-learning-based categorization needs data to classify, which in our case automated categorization of mathematics, we were dilettantes in the world of machine learning. what happens when smarter and more capable minds tackle the problem of classifying mathematics and other highly technical subjects using advanced machine learning techniques. 9Mathematical Subject Classification (MSC) values in MathSciNet and zbMath are a particularly interesting categorization set to work with as they are assigned and reviewed by a subject area expert editor and an active researcher in the 16See ?iiTb,ff�+�/2KB+XKB+'QbQ7iX+QKf. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 One really interesting part of the machine learning method used by Microsoft was that it did not rely only on information from the article being replace the work of humans categorizing mathematics articles indexed in a database, which for
      cache = ./cache/14-hansen-can.pdf
       txt  = ./txt/14-hansen-can.txt
=== reduce.pl bib ===
         id = 01-hintze-artificial
     author = Hintze
      title = Artificial Intelligence in the Humanities: Wolf in Disguise, or Digital Revolution?
       date = 2021
      pages = 10
  extension = .pdf
       mime = application/pdf
      words = 5069
  sentences = 357
     flesch = 56
    summary = Artificial Intelligence, with its ability to machine learn coupled to an almost human-like understanding, sounds like the ideal tool to the humanities. But are these technologies imbued with intuition or understanding, and do they learn like humans? In the 80s and 90s, as home computers were becoming more common, Hollywood was sensationalizing the idea of smart or human-like Artificial Intelligent machines (AI) through movies Machine learning allows us to learn from these data sets in ways that exceed human capabilities, while an artificial brain will eventually allow us to objectively describe a subjective experience (through quantifying neural activations or positively and negatively associated memories). The following paragraphs will explore current Machine Learning and Artificial Intelligence learning, to the point where our whole identity as human could be generously defined as the Just because humans and machine learning are both black Currently, machines do not learn but must be trained, typically with human-labeled data.
      cache = ./cache/01-hintze-artificial.pdf
       txt  = ./txt/01-hintze-artificial.txt
=== reduce.pl bib ===
         id = 06-jiang-cross
     author = Jiang
      title = Cross-Disciplinary ML Research is like Happy Marriages: Five Strengths and Two Examples
       date = 2021
      pages = 10
  extension = .pdf
       mime = application/pdf
      words = 3623
  sentences = 425
     flesch = 55
    summary = Cross-disciplinary research matters, because (1) it provides an understanding of complex problems that require a multifaceted approach to solve; (2) it combines disciplinary breadth with the ability to collaborate One of the most popular cross-disciplinary research topics/programs is Machine Learning + top strengths of conducting cross-disciplinary ML research and give two examples based on my marriages, just like collaborators expect to have successful project outcomes (Robinson and Blanton 1993; Pettigrew 2000; Xu et al. The history professor Liang Cai and I have collaborated on an international research project titled "Digital Empires: Structured Biographical and Social Network Analysis of Early Chinese We have enjoyed our collaboration and the power of cross-disciplinary research. Specifically, I presented the top strengths of producing successful cross-disciplinary ML research: (1) Partners are satisfied with communication. "The Challenges of Cross ǉ Disciplinary Research." Social Research Collaboration." Social Studies of Science 33, no. "Building Cross-Disciplinary Research Collaborations."
      cache = ./cache/06-jiang-cross.pdf
       txt  = ./txt/06-jiang-cross.txt
=== reduce.pl bib ===
         id = 10-morgan-bringing
     author = Morgan
      title = Bringing Algorithms and Machine Learning Into Library Collections and Services
       date = 2021
      pages = 13
  extension = .pdf
       mime = application/pdf
      words = 5793
  sentences = 739
     flesch = 74
    summary = advent of computers, the idea of sharing cataloging data as MARC (machine readable cataloging) the full text of its collections to enhance bibliographic description and resulting public service. ability to save, organize, and retrieve data; on the whole, the library profession does not understand the concept of a "data structure." For example, tab-delimited files, CSV (comma-separated the use of data structures, computers store and retrieve information. Libraries use computers to store, organize, preserve, and disseminate the gray literature of our time, and we call these systems "institutional repositories." In all Using such a process, there are really only four different types of machine learning: classification, clustering, regression, and dimension reduction. Given a set of previously classified menus, one could create a model There are many possible ways to enhance library collections and services through the use of machine learning. of plain text files and an integer, Topic Modeling Tool will create a weighted list of latent themes
      cache = ./cache/10-morgan-bringing.pdf
       txt  = ./txt/10-morgan-bringing.txt
=== reduce.pl bib ===
         id = 02-harper-generative
     author = Harper
      title = Generative Machine Learning
       date = 2021
      pages = 15
  extension = .pdf
       mime = application/pdf
      words = 5935
  sentences = 662
     flesch = 59
    summary = Reddit have each issued their own bans on the category of machine-generated or -altered content that is commonly termed "deep fakes" (Cohen 2020; Romm, Harwell, and Stanley-Becker TV because of their dystopian implications, deep fakes are just one application of generative machine learning. Figure 2.2: Images generated with a simple statistical model appear as noise as the model is insufficient to capture the structure of the real data (Markov chains trained using wine bottles and 1In many examples, I have used the Google QuickDraw Dataset to highlight features of generative machine learning. (?iiTb,ff;Bi?m#X+QKf;QQ;H2+'2�iBp2H�#f[mB+F/'�r@/�i�b2i) shows the generator learning how to produce better sketches over time. built a GAN that generates high-quality photo-realistic images of people (Karras, Laine, and Aila Beyond medicine and autonomous vehicles, generative data augmentation will progressively impact other imaging-heavy fields (Shorten and Khoshgoftaar 2019) like GANs in Action: Deep Learning with Generative Adversarial Networks.
      cache = ./cache/02-harper-generative.pdf
       txt  = ./txt/02-harper-generative.txt
Building ./etc/reader.txt
05-wiegand-cultures
01-hintze-artificial
07-kim-ai
05-wiegand-cultures
11-prudhomme-taking
07-kim-ai
                number of items: 15
                   sum of words: 75,921
          average size in words: 5,061
      average readability score: 56

                          nouns: data; machine; learning; research; �; library; information; process; libraries; model; example; text; time; images; work; word; systems; results; project; tools; use; knowledge; training; way; place; people; algorithms; system; researchers; set; models; words; collections; materials; methods; problem; algorithm; language; dataset; scholars; image; applications; ways; problems; humanities; analysis; questions; examples; network; history
                          verbs: is; are; be; have; was; were; do; using; has; used; learning; make; based; use; given; see; been; help; learn; generated; find; does; create; trained; had; need; build; provide; identify; ff; work; generate; did; called; including; being; become; produce; known; include; ’s; working; making; found; know; edited; take; existing; understand; get
                     adjectives: new; such; other; many; different; digital; more; computational; large; moral; human; deep; important; possible; -; historical; literary; able; social; local; good; specific; ethical; �; same; similar; cultural; available; own; neural; high; real; first; traditional; final; common; better; library; disciplinary; technical; artificial; multiple; full; simple; particular; intelligent; unique; original; likely; recent
                        adverbs: not; also; more; only; then; well; as; even; very; out; now; so; however; most; often; n’t; up; just; together; still; already; instead; here; rather; first; highly; always; especially; perhaps; much; far; too; really; morally; fully; better; back; similarly; increasingly; down; yet; previously; on; generally; easily; thus; sometimes; simply; long; likely
                       pronouns: we; it; you; their; they; our; i; its; your; them; us; one; my; itself; her; themselves; he; his; me; she; ourselves; yourself; ours; ibqm; `ikr?qh2f; #f[mb+f/`; ’s; ၯஒ,ࡢᄝࡢმ; zbmath,19; qbxq`;fryxryydfbrr3jr@yrn@ynj; mfvqm`2@; ibqmbx?ikh?/`4tm#v2; hxpj3brxrynd9; hvib+bfk; https://www.aclweb.org/anthology/p14-5010/; https://radimrehurek.com/gensim/; http://www.minedminds.org/; http://read.gov/resources/; him; hh@/b;bi; hbx2; hbbkf; fr?v@kyr8@r; ff/?h; de-; byry; bhf1pb/2m+2@amkk; b;m; `b; +ibfi
                   proper nouns: �; learning; machine; ai; libraries; disciplinary; cross; -; al; researchǔchapter; ml; library; chicago; university; ff; digital; et; intelligence; artificial; new; data; google; research; m; science; information; ieee; york; press; n.d; review; gan; journal; marc; microsoft; ing; generative; conference; adversarial; technology; may; international; figure; humanities; computer; march; .; markov; january; congress
                       keywords: learning; machine; research; datum; system; libraries; word; university; scholar; reading; process; pmss; place; notre; nakazawa; msc; moral; model; microsoft; material; markov; literary; library; kentucky; information; image; ieee; human; gan; disciplinary; computational; cohen; chinese; chicago; balke; algorithm; adversarial

       one topic; one dimension: learning
                        file(s): ./cache/11-prudhomme-taking.pdf
                      titles(s): Taking a Leap Forward: Machine Learning for New Limits

    three topics; one dimension: learning; library; word
                        file(s): ./cache/07-kim-ai.pdf, ./cache/05-wiegand-cultures.pdf, ./cache/03-plumb-humanities.pdf
                      titles(s): AI and Its Moral Concerns | Cultures of Innovation: Machine Learning as a Library Service | Humanities and Social Science Reading through Machine Learning

  five topics; three dimensions: learning machine data; 2019 https learning; learning machine data; research ml disciplinary; chicago place book
                        file(s): ./cache/07-kim-ai.pdf, ./cache/02-harper-generative.pdf, ./cache/12-cohen-machine.pdf, ./cache/06-jiang-cross.pdf, ./cache/13-lucic-towards.pdf
                      titles(s): AI and Its Moral Concerns | Generative Machine Learning | Machine Learning + Data Creation in a Community Partnership for Archival Research | Cross-Disciplinary ML Research is like Happy Marriages: Five Strengths and Two Examples | Towards a Chicago place name dataset: From back-of-the-book index to a labeled dataset

      Type: zip2carrel
     title: johnson-machine-2021
      date: 2021-02-23
      time: 21:20
  username: emorgan
    patron: Eric Morgan
     email: emorgan@nd.edu
     input: Y7z2ihXDL5.zip
==== make-pages.sh htm files
==== make-pages.sh complex files
==== make-pages.sh named enities
==== making bibliographics
         id: 08-altman-building
     author: Altman
      title: Building a Machine Learning Pipeline
       date: 2021
      words: 6148
  sentences: 361
      pages: 11
     flesch: 63
      cache: ./cache/08-altman-building.pdf
        txt: ./txt/08-altman-building.txt
    summary: As you begin ingesting and preparing data, you''ll want to explore possible machine learning algorithms to perform on your dataset. Start by determining what general type of learning algorithm you need, and proceed from there to research and select one that While the final output of a machine learning workflow is some sort of intelligent model, The pipeline for a machine learning project generally comprises five stages: data acquisition, data preparation, model training and testing, evaluation and analysis, and application of results. good idea to save a copy in the rawest possible form and treat that copy as immutable, at least during the initial phase of testing different algorithms or configurations. algorithm uses the training data to "learn" a set of rules that it can subsequently apply to new, Immutable data storage can benefit the batch-processing ML pipeline, especially during the initial research and development phase.

         id: 12-cohen-machine
     author: Cohen
      title: Machine Learning + Data Creation in a Community Partnership for Archival Research
       date: 2021
      words: 7542
  sentences: 474
      pages: 13
     flesch: 54
      cache: ./cache/12-cohen-machine.pdf
        txt: ./txt/12-cohen-machine.txt
    summary: archivally focused project that emerged from a partnership between the Pine Mountain Settlement School (PMSS)1 in Harlan County, Kentucky, and scholars and students at Berea College. a latent social network of historical families represented by the images held in one local archive, curricula for use in Kentucky public schools with PMSS archival materials. That decision led a team of Berea College undergraduate and faculty researchers to scrape the data from the PMSS archive site and supplement the images and transcriptions it contains with available textual metadata drawn from the site.9 Alongside the WordPress facial recognition software to identify the persons in historic photographs in the PMSS archives. We demonstrated to the local members at Pine Mountain how our use case and its constraints for digital archives fit with the current standards for the fair use of copyrighted materials

         id: 14-hansen-can
     author: Hansen
      title: Can a Hammer Categorize Highly Technical Articles?
       date: 2021
      words: 4339
  sentences: 394
      pages: 8
     flesch: 63
      cache: ./cache/14-hansen-can.pdf
        txt: ./txt/14-hansen-can.txt
    summary: I would use the Mathematical Subject Classification (MSC) values assigned to the publications in MathSciNet1 to create a temporal citation network which would allow me to visualize Machine-learning-based categorization needs data to classify, which in our case automated categorization of mathematics, we were dilettantes in the world of machine learning. what happens when smarter and more capable minds tackle the problem of classifying mathematics and other highly technical subjects using advanced machine learning techniques. 9Mathematical Subject Classification (MSC) values in MathSciNet and zbMath are a particularly interesting categorization set to work with as they are assigned and reviewed by a subject area expert editor and an active researcher in the 16See ?iiTb,ff�+�/2KB+XKB+''QbQ7iX+QKf. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 One really interesting part of the machine learning method used by Microsoft was that it did not rely only on information from the article being replace the work of humans categorizing mathematics articles indexed in a database, which for

         id: 02-harper-generative
     author: Harper
      title: Generative Machine Learning
       date: 2021
      words: 5935
  sentences: 662
      pages: 15
     flesch: 59
      cache: ./cache/02-harper-generative.pdf
        txt: ./txt/02-harper-generative.txt
    summary: Reddit have each issued their own bans on the category of machine-generated or -altered content that is commonly termed "deep fakes" (Cohen 2020; Romm, Harwell, and Stanley-Becker TV because of their dystopian implications, deep fakes are just one application of generative machine learning. Figure 2.2: Images generated with a simple statistical model appear as noise as the model is insufficient to capture the structure of the real data (Markov chains trained using wine bottles and 1In many examples, I have used the Google QuickDraw Dataset to highlight features of generative machine learning. (?iiTb,ff;Bi?m#X+QKf;QQ;H2+''2�iBp2H�#f[mB+F/''�r@/�i�b2i) shows the generator learning how to produce better sketches over time. built a GAN that generates high-quality photo-realistic images of people (Karras, Laine, and Aila Beyond medicine and autonomous vehicles, generative data augmentation will progressively impact other imaging-heavy fields (Shorten and Khoshgoftaar 2019) like GANs in Action: Deep Learning with Generative Adversarial Networks.

         id: 01-hintze-artificial
     author: Hintze
      title: Artificial Intelligence in the Humanities: Wolf in Disguise, or Digital Revolution?
       date: 2021
      words: 5069
  sentences: 357
      pages: 10
     flesch: 56
      cache: ./cache/01-hintze-artificial.pdf
        txt: ./txt/01-hintze-artificial.txt
    summary: Artificial Intelligence, with its ability to machine learn coupled to an almost human-like understanding, sounds like the ideal tool to the humanities. But are these technologies imbued with intuition or understanding, and do they learn like humans? In the 80s and 90s, as home computers were becoming more common, Hollywood was sensationalizing the idea of smart or human-like Artificial Intelligent machines (AI) through movies Machine learning allows us to learn from these data sets in ways that exceed human capabilities, while an artificial brain will eventually allow us to objectively describe a subjective experience (through quantifying neural activations or positively and negatively associated memories). The following paragraphs will explore current Machine Learning and Artificial Intelligence learning, to the point where our whole identity as human could be generously defined as the Just because humans and machine learning are both black Currently, machines do not learn but must be trained, typically with human-labeled data.

         id: 04-janco-machine
     author: Janco
      title: Machine Learning in Digital Scholarship
       date: 2021
      words: 3101
  sentences: 245
      pages: 6
     flesch: 54
      cache: ./cache/04-janco-machine.pdf
        txt: ./txt/04-janco-machine.txt
    summary: Tools like RunwayML, the Teachable Machine, and Google AutoML allow researchers to train project-specific Since 2014, dramatic innovations in machine learning have occurred, providing new capabilities in computer vision, natural language processing, and other areas of applied artificial intelligence. deliberately and identify how machine learning methods can benefit a scholar''s research? for identifying basic tasks that can be completed by computers in ways that advance humanities research (2000). When working with texts or images, machine learning models are presently capable of making simple annotations and associations. Google''s Teachable Machine offers an intuitive web application that humanities faculty and students can use to train classification models for images, sounds, and poses. Machine learning models offer a variety of ways to identify similarity and difference with research materials. goals of academic researchers in the humanities with the technical possibilities of machine learning. "Scholarly Primitives: What Methods Do Humanities Researchers Have

         id: 06-jiang-cross
     author: Jiang
      title: Cross-Disciplinary ML Research is like Happy Marriages: Five Strengths and Two Examples
       date: 2021
      words: 3623
  sentences: 425
      pages: 10
     flesch: 55
      cache: ./cache/06-jiang-cross.pdf
        txt: ./txt/06-jiang-cross.txt
    summary: Cross-disciplinary research matters, because (1) it provides an understanding of complex problems that require a multifaceted approach to solve; (2) it combines disciplinary breadth with the ability to collaborate One of the most popular cross-disciplinary research topics/programs is Machine Learning + top strengths of conducting cross-disciplinary ML research and give two examples based on my marriages, just like collaborators expect to have successful project outcomes (Robinson and Blanton 1993; Pettigrew 2000; Xu et al. The history professor Liang Cai and I have collaborated on an international research project titled "Digital Empires: Structured Biographical and Social Network Analysis of Early Chinese We have enjoyed our collaboration and the power of cross-disciplinary research. Specifically, I presented the top strengths of producing successful cross-disciplinary ML research: (1) Partners are satisfied with communication. "The Challenges of Cross ǉ Disciplinary Research." Social Research Collaboration." Social Studies of Science 33, no. "Building Cross-Disciplinary Research Collaborations."

         id: 00-johnson-preface
     author: Johnson
      title: Preface
       date: 2021
      words: 1090
  sentences: 72
      pages: 3
     flesch: 47
      cache: ./cache/00-johnson-preface.pdf
        txt: ./txt/00-johnson-preface.txt
    summary: The plan called for a survey and a series of workshops hosted across the country to explore, originally, "the national need for library based topic modeling tools in support of cross-disciplinary libraries ran concurrently with our grant — Cordell 2020 and Padilla 2019, which were commissioned by major players in the field, the Library of Congress and OCLC, respectively — and vi Machine Learning, Libraries, and Cross-Disciplinary Research We would like to thank the IMLS for providing essential funding support for the grant and the Thank you to the members of the Notre Dame IMLS grant team who, at of course, thanks to the 95 participants in our 2019 IMLS Grant Workshops (too many to enumerate here) and to the essay authors for sharing their expertise and perspectives in growing our collective knowledge of machine learning and its use in research, scholarship, and cultural heritage organizations. https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html

         id: 07-kim-ai
     author: Kim
      title: AI and Its Moral Concerns
       date: 2021
      words: 7293
  sentences: 784
      pages: 13
     flesch: 55
      cache: ./cache/07-kim-ai.pdf
        txt: ./txt/07-kim-ai.txt
    summary: does not provide an easy answer to the question of how one should program moral decisionmaking into intelligent machines. Described below are some of the significant ethical challenges that autonomous AI systems such as military robots present. 11Note that this moral decision-making process can be modeled with a rule-based symbolic AI approach, a machine 13(Kahn 2012) also argues that the resulting increase in the number of wars by the use of military robots will be morally 15This black-box nature of AI systems powered by machine learning has raised great concern among many AI researchers in recent years. agency in the AI -powered automated information environment presents an ethical challenge In this chapter, I discussed four significant ethical challenges that automating decisions and actions with AI presents: (a) moral desensitization; (b) unintended outcomes; (c) surrender of are at an early stage in developing AI applications and applying machine learning and deep learning techniques to improve library services, systems, and operations.

         id: 09-lesk-fragility
     author: Lesk
      title: Fragility and Intelligibility of Deep Learning for Libraries
       date: 2021
      words: 4796
  sentences: 474
      pages: 11
     flesch: 63
      cache: ./cache/09-lesk-fragility.pdf
        txt: ./txt/09-lesk-fragility.txt
    summary: Machine learning systems have a set of data for training. of the real problem (if you train a machine translation program solely on engineering documents, there may be a lot of training data, including many noisy points, and the program may decide on Many popular magazines have discussed this problem; Forbes, for example, had an explanation of how the choice of datasets can produce a biased result without any deliberate attempt to used to suggest malicious creation of training data or examples of data designed to deceive machine learning systems. blood pressure, and lower blood pressure decreases the risk of heart attacks." Then I have to explain that the paper evaluates 32 possibilities (prior/current ownership ⇥ cats/dogs ⇥ 4 medical compare the performance of machine learning systems for medical diagnosis with actual doctors If a program is constantly learning from new data, there is no list of previously fixed failures to

         id: 13-lucic-towards
     author: Lucic
      title: Towards a Chicago place name dataset: From back-of-the-book index to a labeled dataset
       date: 2021
      words: 3073
  sentences: 285
      pages: 7
     flesch: 63
      cache: ./cache/13-lucic-towards.pdf
        txt: ./txt/13-lucic-towards.txt
    summary: Reading Chicago Reading1 is a grant-supported digital humanities project that takes as its object the "One Book One Chicago" (OBOC) program2 of the Chicago Public Library. A related question is the focus of this paper: by associating place names with sentiment scores in Chicago-themed OBOC The HathiTrust research portal permits the extraction of non-consumptive features of the works included in the digital library, even those that are still under copyright. The place names extracted from our three Chicago-setting OBOC books allowed us to focus Our interest in creating a dataset of Chicago place names extracted from literature led us to Kaser''s book contains several indexes that can serve as sources of labeled data or instances in which Chicago locations are mentioned. the index as a source of already-labeled data for Chicago place names. associated sentiment scores for Chicago place names in the three OBOC selections centered on

         id: 10-morgan-bringing
     author: Morgan
      title: Bringing Algorithms and Machine Learning Into Library Collections and Services
       date: 2021
      words: 5793
  sentences: 739
      pages: 13
     flesch: 74
      cache: ./cache/10-morgan-bringing.pdf
        txt: ./txt/10-morgan-bringing.txt
    summary: advent of computers, the idea of sharing cataloging data as MARC (machine readable cataloging) the full text of its collections to enhance bibliographic description and resulting public service. ability to save, organize, and retrieve data; on the whole, the library profession does not understand the concept of a "data structure." For example, tab-delimited files, CSV (comma-separated the use of data structures, computers store and retrieve information. Libraries use computers to store, organize, preserve, and disseminate the gray literature of our time, and we call these systems "institutional repositories." In all Using such a process, there are really only four different types of machine learning: classification, clustering, regression, and dimension reduction. Given a set of previously classified menus, one could create a model There are many possible ways to enhance library collections and services through the use of machine learning. of plain text files and an integer, Topic Modeling Tool will create a weighted list of latent themes

         id: 03-plumb-humanities
     author: Plumb
      title: Humanities and Social Science Reading through Machine Learning
       date: 2021
      words: 7195
  sentences: 599
      pages: 14
     flesch: 45
      cache: ./cache/03-plumb-humanities.pdf
        txt: ./txt/03-plumb-humanities.txt
    summary: Respondents such as Mark Algee-Hewitt pointed out that literary scholars employ computational statistical models in order to reveal something about texts that human readers Machine learning, and word embedding algorithms in particular, may have a unique ability to shift this conversation into new territory, where scholars Acknowledging this helps contextualize machine learning algorithms for text analysis tasks in the humanities, but also highlights data curation challenges This naturally raises questions about how machine learning algorithms like word embeddings are implemented for text analysis, and how they Based on the potential for word embeddings to model semantic spaces for different corpora and compare the distribution of terms, the next step was to build a corpus of non-canonical Designing humanities research with novel word embedding models stands to widen the territory where machine learning engineers look for conceptual concepts Systematic data curation, combined with word embedding algorithms, represent a new interpretive system for literary scholars.

         id: 11-prudhomme-taking
     author: Prudhomme
      title: Taking a Leap Forward: Machine Learning for New Limits
       date: 2021
      words: 3910
  sentences: 387
      pages: 9
     flesch: 51
      cache: ./cache/11-prudhomme-taking.pdf
        txt: ./txt/11-prudhomme-taking.txt
    summary: Combining automatic processes to assist in supporting inventory management with a focus on descriptive metadata, a machine learning solution could help alleviate time-consuming and relatively expensive metadata tagging tasks, Deep learning neural networks are more effective in feature detection as they are able to solve complex problems such as image classification with greater accuracy when trained with large datasets. For images, how can archives build a data-labeling pipeline into their digital curation workflow that enables machine learning of collections? machine learning is only good so long as value is added, archives and libraries will need to think As deep learning applications will only be as effective as the data, archives and libraries should expand their Along with greater computing capabilities, artificial intelligence could be an opportunity for libraries and archives to boost the discovery of their digital collections by pushing text and image

         id: 05-wiegand-cultures
     author: Wiegand
      title: Cultures of Innovation: Machine Learning as a Library Service
       date: 2021
      words: 7014
  sentences: 761
      pages: 14
     flesch: 49
      cache: ./cache/05-wiegand-cultures.pdf
        txt: ./txt/05-wiegand-cultures.txt
    summary: traditional role, librarians in the 20th century added a new function—discovery—teaching people to find and use the library''s collected scholarship. learning in the library as the next step beyond collecting, with librarians instructing on information infrastructure with the goal of empowering library users to find, evaluate, and use scholarly go far beyond local library collections to a global perspective and normative practice of participation at scale in innovative emerging technologies such as Machine Learning. start by using Machine Learning tools to automate alerts of new content in a narrow area of interest and help researchers at all levels find and focus on problem-solving. A library that adapted Machine Learning as an innovation technology would improve its practices; add new services; choose, use, and license collections differently; utilize all spaces for learning; and role model innovative leadership. opening local collections to discovery and use in order to create new knowledge through digitization and semantic linking, with cross-disciplinary technologies to augment traditional research

==== make-pages.sh questions
==== make-pages.sh search
==== make-pages.sh topic modeling corpus
Zipping study carrel