mv: 'input-file.zip' and './input-file.zip' are the same file Creating study carrel named subject-whigParty-freebo Initializing database Unzipping Archive: input-file.zip inflating: ./tmp/input/A52425.xml inflating: ./tmp/input/xml2htm.xsl inflating: ./tmp/input/metadata.csv caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: === metadata file: ./tmp/input/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-whigParty-freebo May 25, 2021 1:00:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. May 25, 2021 1:00:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files unless you've excluded the TesseractOCRParser from the default parser. Tesseract may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. May 25, 2021 1:00:39 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. INFO Starting Apache Tika 1.24.1 server INFO Setting the server's publish address to be http://localhost:9998/ INFO Logging initialized @4141ms to org.eclipse.jetty.util.log.Slf4jLog INFO jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git: a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 1.8.0_281-b09 INFO Started ServerConnector@3e74829{HTTP/1.1, (http/1.1)}{localhost:9998} INFO Started @4266ms WARN Empty contextPath INFO Started o.e.j.s.h.ContextHandler@51fadaff{/,null,AVAILABLE} INFO Started Apache Tika server at http://localhost:9998/ INFO rmeta/text (autodetecting type) FILE: cache/A52425.xml OUTPUT: txt/A52425.txt === file2bib.sh === INFO Detecting media type for Filename: b'A52425.xml' INFO rmeta/text (autodetecting type) A52425 txt/../pos/A52425.pos A52425 txt/../ent/A52425.ent A52425 txt/../wrd/A52425.wrd === file2bib.sh === id: A52425 author: Norris, John, 1657-1711. title: A murnival of knaves, or, Whiggism plainly display'd, and (if not grown shameless) burlesqu't out of countenance date: 1683 pages: extension: .xml txt: ./txt/A52425.txt cache: ./cache/A52425.xml Content-Type application/xml X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.xml.DcXMLParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 24 resourceName b'A52425.xml' Done mapping. Reducing subject-whigParty-freebo === reduce.pl bib === id = A52425 author = Norris, John, 1657-1711. title = A murnival of knaves, or, Whiggism plainly display'd, and (if not grown shameless) burlesqu't out of countenance date = 1683 pages = extension = .xml mime = application/xml words = 7539 sentences = 2432 flesch = 99 summary = This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. A murnival of knaves, or, Whiggism plainly display'd, and (if not grown shameless) burlesqu't out of countenance A murnival of knaves, or, Whiggism plainly display'd, and (if not grown shameless) burlesqu't out of countenance Satire in verse on four prominent Whigs: Lord Shaftesbury, Titus Oates, Slingsby Bethel, & Sir Thomas Player. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). cache = ./cache/A52425.xml txt = ./txt/A52425.txt Building ./etc/reader.txt A52425 A52425 number of items: 1 sum of words: 7,539 average size in words: 7,539 average readability score: 99 nouns: t; man; thing; text; o; wou''d; shame; day; work; texts; place; way; time; sense; one; none; l; characters; xml; works; nought; images; image; ears; books; ado; worship; word; truth; tone; tho; story; sight; shameless; rebellion; project; people; page; others; o.; nothing; noise; night; mouth''d; knaves; keying; heart; ground; friend; faith verbs: is; be; ''s; was; has; do; have; say; made; are; did; know; had; were; make; hear; tell; let; done; think; been; leave; grown; take; see; said; does; being; swear; set; please; give; encoded; die; call; appears; teach; read; makes; known; hath; go; break; walking; turn; try; prove; pray; play; nay adjectives: more; good; great; old; true; small; poor; little; such; much; large; early; dead; worst; worse; wide; same; other; full; first; english; worth; wise; very; pretty; plain; new; most; less; least; late; german; general; due; deep; black; available; wretched; welcome; strong; strange; second; rare; own; original; light; last; illegible; gross; grand adverbs: not; so; now; then; too; as; up; out; well; very; yet; no; more; thus; therefore; long; all; n''t; only; once; much; first; rather; plainly; never; most; just; away; still; soon; quite; over; onely; on; off; indeed; in; home; here; far; else; certainly; at; again; truly; there; sometimes; pretty; particularly; online pronouns: he; i; his; you; it; him; they; their; your; me; we; them; her; our; my; thee; thy; himself; us; twou''d; she; one proper nouns: tcp; th; rome; man; english; whig; lord; jack; holy; hell; cloak; whigs; tom; thou; text; state; grave; sir; conscience; city; turk; tei; t; son; pole; norris; london; god; eebo; tory; sire; rogue; oxford; john; jew; heaven; hath; great; grace; fop; fool; enuf; don; crew; age; wit; whiggism; trick; treason; tho keywords: text; tcp; state; man; jack; hell; english; early one topic; one dimension: th file(s): ./cache/A52425.xml titles(s): A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance three topics; one dimension: th; zone; zone file(s): ./cache/A52425.xml, ./cache/A52425.xml, ./cache/A52425.xml titles(s): A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance | A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance | A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance five topics; three dimensions: th like man; zone grope group; zone grope group; zone grope group; zone grope group file(s): ./cache/A52425.xml, ./cache/A52425.xml, ./cache/A52425.xml, ./cache/A52425.xml, ./cache/A52425.xml titles(s): A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance | A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance | A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance | A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance | A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance Type: zip2carrel title: subject-whigParty-freebo date: 2021-05-25 time: 12:42 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: input-file.zip ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: A52425 author: Norris, John, 1657-1711. title: A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance date: 1683 words: 7539 sentences: 2432 pages: flesch: 99 cache: ./cache/A52425.xml txt: ./txt/A52425.txt summary: This keyboarded and encoded edition of the work described above is co-owned by the institutions providing financial support to the Early English Books Online Text Creation Partnership. A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance A murnival of knaves, or, Whiggism plainly display''d, and (if not grown shameless) burlesqu''t out of countenance Satire in verse on four prominent Whigs: Lord Shaftesbury, Titus Oates, Slingsby Bethel, & Sir Thomas Player. EEBO-TCP is a partnership between the Universities of Michigan and Oxford and the publisher ProQuest to create accurately transcribed and encoded texts based on the image sets published by ProQuest via their Early English Books Online (EEBO) database (http://eebo.chadwyck.com). EEBO-TCP aimed to produce large quantities of textual data within the usual project restraints of time and funding, and therefore chose to create diplomatic transcriptions (as opposed to critical editions) with light-touch, mainly structural encoding based on the Text Encoding Initiative (http://www.tei-c.org). ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel