mv: ‘./input-file.zip’ and ‘./input-file.zip’ are the same file Creating study carrel named subject-electronicBooks-gutenberg Initializing database Unzipping Archive: input-file.zip creating: ./tmp/input/input-file/ inflating: ./tmp/input/input-file/181.txt inflating: ./tmp/input/input-file/4742.txt inflating: ./tmp/input/input-file/53.txt inflating: ./tmp/input/input-file/11077.txt inflating: ./tmp/input/input-file/33460.txt inflating: ./tmp/input/input-file/metadata.csv caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: ./tmp/input/input-file === metadata file: ./tmp/input/input-file/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-electronicBooks-gutenberg FILE: cache/181.txt OUTPUT: txt/181.txt FILE: cache/4742.txt OUTPUT: txt/4742.txt FILE: cache/53.txt OUTPUT: txt/53.txt FILE: cache/11077.txt OUTPUT: txt/11077.txt FILE: cache/33460.txt OUTPUT: txt/33460.txt 4742 txt/../wrd/4742.wrd 4742 txt/../ent/4742.ent 4742 txt/../pos/4742.pos === file2bib.sh === id: 4742 author: Vaknin, Samuel title: TrendSiters Digital Content and Web Technologies date: pages: extension: .txt txt: ./txt/4742.txt cache: ./cache/4742.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 1 resourceName b'4742.txt' 181 txt/../pos/181.pos 181 txt/../wrd/181.wrd 181 txt/../ent/181.ent 11077 txt/../pos/11077.pos === file2bib.sh === id: 181 author: Perathoner, Marcello title: The Project Gutenberg RST Manual date: pages: extension: .txt txt: ./txt/181.txt cache: ./cache/181.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 4 resourceName b'181.txt' 11077 txt/../wrd/11077.wrd 11077 txt/../ent/11077.ent === file2bib.sh === id: 11077 author: Doctorow, Cory title: Ebooks: Neither E, Nor Books Paper for the O'Reilly Emerging Technologies Conference, 2004 date: pages: extension: .txt txt: ./txt/11077.txt cache: ./cache/11077.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 1 resourceName b'11077.txt' 33460 txt/../pos/33460.pos 33460 txt/../wrd/33460.wrd 33460 txt/../ent/33460.ent === file2bib.sh === id: 33460 author: Lebert, Marie title: Booknology: The eBook (1971-2010) date: pages: extension: .txt txt: ./txt/33460.txt cache: ./cache/33460.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'33460.txt' 53 txt/../pos/53.pos 53 txt/../wrd/53.wrd 53 txt/../ent/53.ent === file2bib.sh === id: 53 author: Library of Congress title: Workshop on Electronic Texts: Proceedings, 9-10 June 1992 date: pages: extension: .txt txt: ./txt/53.txt cache: ./cache/53.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 5 resourceName b'53.txt' Done mapping. Reducing subject-electronicBooks-gutenberg === reduce.pl bib === id = 181 author = Perathoner, Marcello title = The Project Gutenberg RST Manual date = pages = extension = .txt mime = text/plain words = 3034 sentences = 385 flesch = 97 summary = (4) This directive inserts the PG header as generated from the The metadata directive contains all data that is used to generate the PG Fields In the MARCREL Scheme Fields Without Scheme Fields Without Scheme Fields Without Scheme Specify a width option in images and a figwidth option in figures and Expressing the image size relative to the screen width is the best way A generated local table of contents: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam ipsum dolor sit amet, consetetur sadipscing elitr, sed diam ipsum dolor sit amet, consetetur sadipscing elitr, sed diam Definition List Option List (not used by PG) ⁷ This title contains a footnote reference. cache = ./cache/181.txt txt = ./txt/181.txt === reduce.pl bib === id = 4742 author = Vaknin, Samuel title = TrendSiters Digital Content and Web Technologies date = pages = extension = .txt mime = text/plain words = 30 sentences = 3 flesch = 86 summary = Copyright (C) 2007 by Lidija Rangelovska. Please see the corresponding RTF file for this eBook. RTF is Rich Text Format, and is readable in nearly any modern word processing program. cache = ./cache/4742.txt txt = ./txt/4742.txt === reduce.pl bib === id = 53 author = Library of Congress title = Workshop on Electronic Texts: Proceedings, 9-10 June 1992 date = pages = extension = .txt mime = text/plain words = 70864 sentences = 10819 flesch = 72 summary = Cornell project are creating digital image sets of older books in the books on microfilm to digital image sets, Project Open Book (POB). that electronic images constitute a serious attempt to represent text in to bring together people who are working on texts and images. imaging, text-coding, and networked distribution that suit their Evaluation of the prospects for the use of electronic texts includes two projects that involve electronic texts were being done by people with a numerous network experiments in accessing full-text materials, including The students with either electronic format, text or image, electronic text, which was developed through the use of computers in the coming together of people working on texts and not images. electronic texts, and the implications of that use for information databases, image (and text) document collections stored on network "file In the use of electronic imaging for document preservation, there are cache = ./cache/53.txt txt = ./txt/53.txt === reduce.pl bib === id = 11077 author = Doctorow, Cory title = Ebooks: Neither E, Nor Books Paper for the O'Reilly Emerging Technologies Conference, 2004 date = pages = extension = .txt mime = text/plain words = 7660 sentences = 499 flesch = 78 summary = about how much they dug the ebook and so bought the paper-book 2. Ebooks complement paper books. Having a paper book is good. he read half my first novel from the bound book, and printed the perfect-bound, laminated-cover, printed-spine paper book in ten settles on this ebook thing, owning a paper book is going to feel the things that people are saying about your book is that it can [Ebooks are like paper books]. [Ebooks are like paper books]. hear the title, see the cover, pick up the book, read a review, copies of their books that sell, so having a good count makes a that printed books are different from monastic Bibles: they are new and scary practice of ebook "piracy." [alt.binaries.e-books seemed to me that electronic books are *different* from paper In other words, most people who download the book do so for the cache = ./cache/11077.txt txt = ./txt/11077.txt === reduce.pl bib === id = 33460 author = Lebert, Marie title = Booknology: The eBook (1971-2010) date = pages = extension = .txt mime = text/plain words = 16493 sentences = 1070 flesch = 65 summary = January 1993 > The Online Books Page, a catalog of free ebooks States), the Online Books Page is "a website that facilitates access to was no drop in sales for books also available for free on the web. Sales of print books with a free online version increased. became an online digital library of text, audio, software, image and launched the website "Merriam-Webster Online: The Language Center" in online publishing, to sell digital books through the internet. main French-language online encyclopedia available for free. digitize one million books in a number of languages, including in India (OCA), a universal public digital library launched by the Internet website, A Web of Online Dictionaries (included in the new one), 2000, Numilog launched an online bookstore that became the main Frenchlanguage aggregator of digital books. In January 2001, Adobe launched the Acrobat eBook Reader (for free) and million books by digitizing the collections of main partner libraries, cache = ./cache/33460.txt txt = ./txt/33460.txt Building ./etc/reader.txt 53 33460 11077 53 33460 11077 number of items: 5 sum of words: 98,081 average size in words: 19,616 average readability score: 79 nouns: text; image; books; information; images; project; people; library; access; material; network; software; use; copyright; texts; book; technology; example; work; time; materials; system; data; quality; users; standards; paper; internet; process; way; cd; documents; form; research; database; world; page; computer; libraries; document; rom; format; scanning; preservation; scholars; words; number; line; years; students verbs: is; be; are; was; has; were; have; do; been; used; had; use; launched; using; made; being; make; does; concerning; including; published; printed; read; said; created; provide; ''s; put; perform; done; based; illustrated; described; included; distributed; did; noted; need; take; making; developed; become; include; called; working; create; see; given; get; demonstrated adjectives: electronic; digital; other; available; new; more; several; first; public; many; different; free; same; full; such; high; scholarly; primary; much; possible; most; large; online; good; important; various; particular; readable; major; -; long; general; able; own; numerous; original; little; few; standard; common; small; technical; second; personal; single; networked; real; necessary; local; key adverbs: not; also; more; as; well; up; then; very; only; out; so; thus; just; most; now; even; much; n''t; however; that; extremely; is; still; together; off; online; perhaps; often; in; approximately; later; highly; currently; worldwide; simply; rather; less; far; first; about; next; ever; back; ago; once; especially; down; already; too; on pronouns: it; they; its; their; one; them; he; i; we; you; his; she; her; my; our; us; itself; your; me; him; themselves; ourselves; mine; himself; herself; ''s; oneself; em; eds; bookshelf; > proper nouns: ─; library; am; university; project; lc; perseus; national; sgml; congress; lynch; fleischhauer; reader; cornell; american; et; workshop; united; tei; pob; text; nal; besser; states; memory; january; english; aiim; washington; october; march; research; michelson; lesk; information; april; zidar; september; freeman; december; xerox; yale; online; natdp; french; palm; november; battin; sperberg; packard keywords: university; text; library; workshop; work; tei; sgml; september; rtf; rom; reader; project; pob; perseus; paper; palm; national; march; lynch; january; french; fleischhauer; english; december; cornell; copyright; congress; book; bibles; american one topic; one dimension: text file(s): ./cache/181.txt titles(s): The Project Gutenberg RST Manual three topics; one dimension: text; et; modern file(s): ./cache/53.txt, ./cache/181.txt, ./cache/4742.txt titles(s): Workshop on Electronic Texts: Proceedings, 9-10 June 1992 | The Project Gutenberg RST Manual | TrendSiters Digital Content and Web Technologies five topics; three dimensions: text electronic library; books launched online; et pg sed; rangelovska lidija rtf; rangelovska lidija rtf file(s): ./cache/53.txt, ./cache/33460.txt, ./cache/181.txt, ./cache/4742.txt, ./cache/4742.txt titles(s): Workshop on Electronic Texts: Proceedings, 9-10 June 1992 | Booknology: The eBook (1971-2010) | The Project Gutenberg RST Manual | TrendSiters Digital Content and Web Technologies | TrendSiters Digital Content and Web Technologies Type: gutenberg title: subject-electronicBooks-gutenberg date: 2021-06-06 time: 14:06 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: facet_subject:"Electronic books" ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: 11077 author: Doctorow, Cory title: Ebooks: Neither E, Nor Books Paper for the O''Reilly Emerging Technologies Conference, 2004 date: words: 7660 sentences: 499 pages: flesch: 78 cache: ./cache/11077.txt txt: ./txt/11077.txt summary: about how much they dug the ebook and so bought the paper-book 2. Ebooks complement paper books. Having a paper book is good. he read half my first novel from the bound book, and printed the perfect-bound, laminated-cover, printed-spine paper book in ten settles on this ebook thing, owning a paper book is going to feel the things that people are saying about your book is that it can [Ebooks are like paper books]. [Ebooks are like paper books]. hear the title, see the cover, pick up the book, read a review, copies of their books that sell, so having a good count makes a that printed books are different from monastic Bibles: they are new and scary practice of ebook "piracy." [alt.binaries.e-books seemed to me that electronic books are *different* from paper In other words, most people who download the book do so for the id: 33460 author: Lebert, Marie title: Booknology: The eBook (1971-2010) date: words: 16493 sentences: 1070 pages: flesch: 65 cache: ./cache/33460.txt txt: ./txt/33460.txt summary: January 1993 > The Online Books Page, a catalog of free ebooks States), the Online Books Page is "a website that facilitates access to was no drop in sales for books also available for free on the web. Sales of print books with a free online version increased. became an online digital library of text, audio, software, image and launched the website "Merriam-Webster Online: The Language Center" in online publishing, to sell digital books through the internet. main French-language online encyclopedia available for free. digitize one million books in a number of languages, including in India (OCA), a universal public digital library launched by the Internet website, A Web of Online Dictionaries (included in the new one), 2000, Numilog launched an online bookstore that became the main Frenchlanguage aggregator of digital books. In January 2001, Adobe launched the Acrobat eBook Reader (for free) and million books by digitizing the collections of main partner libraries, id: 53 author: Library of Congress title: Workshop on Electronic Texts: Proceedings, 9-10 June 1992 date: words: 70864 sentences: 10819 pages: flesch: 72 cache: ./cache/53.txt txt: ./txt/53.txt summary: Cornell project are creating digital image sets of older books in the books on microfilm to digital image sets, Project Open Book (POB). that electronic images constitute a serious attempt to represent text in to bring together people who are working on texts and images. imaging, text-coding, and networked distribution that suit their Evaluation of the prospects for the use of electronic texts includes two projects that involve electronic texts were being done by people with a numerous network experiments in accessing full-text materials, including The students with either electronic format, text or image, electronic text, which was developed through the use of computers in the coming together of people working on texts and not images. electronic texts, and the implications of that use for information databases, image (and text) document collections stored on network "file In the use of electronic imaging for document preservation, there are id: 181 author: Perathoner, Marcello title: The Project Gutenberg RST Manual date: words: 3034 sentences: 385 pages: flesch: 97 cache: ./cache/181.txt txt: ./txt/181.txt summary: (4) This directive inserts the PG header as generated from the The metadata directive contains all data that is used to generate the PG Fields In the MARCREL Scheme Fields Without Scheme Fields Without Scheme Fields Without Scheme Specify a width option in images and a figwidth option in figures and Expressing the image size relative to the screen width is the best way A generated local table of contents: Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam ipsum dolor sit amet, consetetur sadipscing elitr, sed diam ipsum dolor sit amet, consetetur sadipscing elitr, sed diam Definition List Option List (not used by PG) ⁷ This title contains a footnote reference. id: 4742 author: Vaknin, Samuel title: TrendSiters Digital Content and Web Technologies date: words: 30 sentences: 3 pages: flesch: 86 cache: ./cache/4742.txt txt: ./txt/4742.txt summary: Copyright (C) 2007 by Lidija Rangelovska. Please see the corresponding RTF file for this eBook. RTF is Rich Text Format, and is readable in nearly any modern word processing program. ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel