mv: ‘./input-file.zip’ and ‘./input-file.zip’ are the same file Creating study carrel named subject-projectGutenberg-gutenberg Initializing database Unzipping Archive: input-file.zip creating: ./tmp/input/input-file/ inflating: ./tmp/input/input-file/14585.txt inflating: ./tmp/input/input-file/9109.txt inflating: ./tmp/input/input-file/36616.txt inflating: ./tmp/input/input-file/48791.txt inflating: ./tmp/input/input-file/metadata.csv caution: excluded filename not matched: *MACOSX* === DIRECTORIES: ./tmp/input === DIRECTORY: ./tmp/input/input-file === metadata file: ./tmp/input/input-file/metadata.csv === found metadata file === updating bibliographic database Building study carrel named subject-projectGutenberg-gutenberg FILE: cache/9109.txt OUTPUT: txt/9109.txt FILE: cache/14585.txt OUTPUT: txt/14585.txt FILE: cache/48791.txt OUTPUT: txt/48791.txt FILE: cache/36616.txt OUTPUT: txt/36616.txt 14585 txt/../pos/14585.pos 36616 txt/../pos/36616.pos 9109 txt/../wrd/9109.wrd 9109 txt/../pos/9109.pos 9109 txt/../ent/9109.ent 36616 txt/../wrd/36616.wrd 36616 txt/../ent/36616.ent 14585 txt/../wrd/14585.wrd 14585 txt/../ent/14585.ent === file2bib.sh === id: 36616 author: Lebert, Marie title: Project Gutenberg 4 July 1971 - 4 July 2011: Album date: pages: extension: .txt txt: ./txt/36616.txt cache: ./cache/36616.txt Content-Encoding UTF-8 Content-Type text/plain; charset=UTF-8 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'36616.txt' === file2bib.sh === id: 9109 author: Tinsley, Jim title: The Project Gutenberg FAQ 2002 date: pages: extension: .txt txt: ./txt/9109.txt cache: ./cache/9109.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'9109.txt' === file2bib.sh === id: 14585 author: ERPANET title: ERPANET Case Study: Project Gutenberg date: pages: extension: .txt txt: ./txt/14585.txt cache: ./cache/14585.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 2 resourceName b'14585.txt' 48791 txt/../pos/48791.pos 48791 txt/../wrd/48791.wrd 48791 txt/../ent/48791.ent === file2bib.sh === id: 48791 author: Hart, Michael title: Project Gutenberg Newsletters 1999 Thirteen Letters: December 1998 to December 1999 date: pages: extension: .txt txt: ./txt/48791.txt cache: ./cache/48791.txt Content-Encoding ISO-8859-1 Content-Type text/plain; charset=ISO-8859-1 X-Parsed-By ['org.apache.tika.parser.DefaultParser', 'org.apache.tika.parser.csv.TextAndCSVParser'] X-TIKA:content_handler ToTextContentHandler X-TIKA:embedded_depth 0 X-TIKA:parse_time_millis 3 resourceName b'48791.txt' Done mapping. Reducing subject-projectGutenberg-gutenberg === reduce.pl bib === id = 48791 author = Hart, Michael title = Project Gutenberg Newsletters 1999 Thirteen Letters: December 1998 to December 1999 date = pages = extension = .txt mime = text/plain words = 38429 sentences = 4839 flesch = 81 summary = of our new, and more complete, AND PUBLIC DOMAIN, Shakespeare edition! If the New York Times' estimates of 7 years for information doubling 5. The 49 NEW Project Gutenberg Complete Works of Shakespeare, 5. The 49 NEW Project Gutenberg Complete Works of Shakespeare, April Etexts are all done, and we may have already started May. Nov 1998 Locrine/Mucedorus, Shakespeare Apocrypha [1ws48xxx.xxx]1548 Nov 1998 As You Like It, by William Shakespeare [2ws25xxx.xxx]1523 Nov 1998 As You Like It, by William Shakespeare [2ws25xxx.xxx]1523 Oct 1998 Love's Labour's Lost, by William Shakespeare [2ws12xxx.xxx]1510 Oct 1998 Love's Labour's Lost, by William Shakespeare [2ws12xxx.xxx]1510 Several new Project Gutenberg sites listed below and more than a whole New index of Project Gutenberg Etexts in Australia **And Now Our List of Current Postings of More Project Gutenberg Etexts** 3,333 Project Gutenberg Etexts online at that time. original Complete works of Shakespeare [Etext #100] but some are new. cache = ./cache/48791.txt txt = ./txt/48791.txt === reduce.pl bib === id = 9109 author = Tinsley, Jim title = The Project Gutenberg FAQ 2002 date = pages = extension = .txt mime = text/plain words = 1278 sentences = 95 flesch = 82 summary = If you want to let readers know that your site has other related The book that you translated needs to be in the public domain, and we which you worked--it needs to be a pre-1923 or otherwise public domain work to the public domain, or do you want to retain copyright? just need to write the appropriate letter and send the text to us. If you want to release it into the public domain and distribute it gives me pleasure to release this work into the public domain, and I invite Project Gutenberg to publish this public domain edition. that it _is_ a public domain version, we do need a signed letter. into the public domain for Project Gutenberg to publish it? non-exclusive rights to distribute this book in electronic form I hold the copyright on a book, and would like Project Gutenberg Why does PG format texts the way it does? cache = ./cache/9109.txt txt = ./txt/9109.txt === reduce.pl bib === id = 36616 author = Lebert, Marie title = Project Gutenberg 4 July 1971 - 4 July 2011: Album date = pages = extension = .txt mime = text/plain words = 1053 sentences = 115 flesch = 80 summary = In January 2005, Project Gutenberg reached 15,000 ebooks. In December 2006, Project Gutenberg reached 20,000 ebooks. There were ebooks in 50 languages in December 2006. In December 2006, Mike Cook launched the blog Project Gutenberg News as July 2007 > Project Gutenberg Canada Project Gutenberg sent out 15 million ebooks via CDs and DVDs by snail April 2008 > eBook #25000 > English Book Collectors, by William Younger Project Gutenberg reached 25,000 books in April 2008. Project Gutenberg reached 30,000 books in October 2009. In December 2010, Project Gutenberg offered 33,000 high-quality ebooks, April 2011 > 20,000 ebooks processed by Distributed Proofreaders April 2011 > 30,000 English ebooks in Project Gutenberg The 30,000th English-language ebook was posted on 12 April 2011. In June 2011, the 14 main languages were English (with 30,569 ebooks on 4 July 2011 > 40th anniversary of Project Gutenberg Copyrighted images: @folio Project, Distributed Proofreaders (all cache = ./cache/36616.txt txt = ./txt/36616.txt === reduce.pl bib === id = 14585 author = ERPANET title = ERPANET Case Study: Project Gutenberg date = pages = extension = .txt mime = text/plain words = 1226 sentences = 84 flesch = 62 summary = Project Gutenberg is one of the earliest web sites on the internet and Project Gutenberg ensures that all eBooks are available freely accessible to the general public in a digitised format. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make when scanning and editing texts for Project Gutenberg to ensure that Project Gutenberg aims to make digitised versions of popular literature digitising the out of print works, Project Gutenberg feels that they Project Gutenberg already has numerous plain text files that are 20-30 Project Gutenberg eBooks are created as plain ASCII text files. Project Gutenberg team will also generate plain ASCII (15) text files. The first is the Project Gutenberg site itself and the other Project Gutenberg uses the unique eBook number as the file name. Therefore, if the eBook is the 10001 plain text file created it will be cache = ./cache/14585.txt txt = ./txt/14585.txt Building ./etc/reader.txt 48791 14585 9109 48791 14585 36616 number of items: 4 sum of words: 41,986 average size in words: 10,496 average readability score: 76 nouns: files; month; message; internet; ebooks; information; copyright; years; version; #; number; %; books; time; book; email; -; volunteers; mail; users; edupage; e; web; people; service; year; computer; work; access; text; works; site; name; list; line; domain; system; languages; sites; today; months; company; week; software; help; etexts; computers; edition; companies; way verbs: is; have; are; be; do; send; has; get; was; subscribe; need; ''s; want; says; were; use; been; make; find; download; like; know; had; help; say; take; see; am; posted; sent; let; unsubscribe; including; done; did; going; getting; reserved; reading; mh; read; include; go; ends; working; made; found; being; put; lost adjectives: new; more; other; first; many; different; public; free; last; available; few; own; next; main; electronic; several; major; most; such; possible; german; good; double; same; original; much; hard; great; extra; current; willing; binary; unzipped; true; ready; plain; entire; able; online; least; high; complete; old; large; big; average; subject; recent; past; long adverbs: not; just; also; out; now; n''t; so; only; up; as; directly; still; even; very; here; more; nearly; about; usually; well; yet; back; then; ago; approximately; already; online; on; properly; probably; never; much; in; rather; otherwise; however; actually; really; currently; below; most; away; all; ahead; again; there; quite; ever; at; unsubscribe pronouns: we; you; it; i; our; your; they; me; its; their; them; my; us; his; yourself; he; him; her; myself; itself; one; she; themselves; ourselves; ours; i''m; himself; ff][0ws37xxx.xxx]2262; ''s proper nouns: project; shakespeare; gutenberg; william; jun; nov; may; apr; aug; sep; oct; human; new; genome; etexts; henry; king; jan; microsoft; feb; dec; jul; wednesday; etext; de; chromosome; public; newsletter; edupage; mar; john; domain; #; balzac; michael; aol; times; news; b.; honore; part; richard; internet; george; december; harte; april; a; london; ii keywords: gutenberg; william; wednesday; web; shakespeare; sep; public; project; number; newsletter; microsoft; king; jun; john; internet; human; henry; genome; etexts; edupage; domain; chromosome; balzac; aug one topic; one dimension: xxx file(s): ./cache/9109.txt titles(s): The Project Gutenberg FAQ 2002 three topics; one dimension: xxx; ebooks; book file(s): ./cache/48791.txt, ./cache/36616.txt, ./cache/9109.txt titles(s): Project Gutenberg Newsletters 1999 Thirteen Letters: December 1998 to December 1999 | Project Gutenberg 4 July 1971 - 4 July 2011: Album | The Project Gutenberg FAQ 2002 five topics; three dimensions: xxx 2000 1999; ebooks gutenberg project; book want copyright; typed submit respect; typed submit respect file(s): ./cache/48791.txt, ./cache/36616.txt, ./cache/9109.txt, ./cache/9109.txt, ./cache/9109.txt titles(s): Project Gutenberg Newsletters 1999 Thirteen Letters: December 1998 to December 1999 | Project Gutenberg 4 July 1971 - 4 July 2011: Album | The Project Gutenberg FAQ 2002 | The Project Gutenberg FAQ 2002 | The Project Gutenberg FAQ 2002 Type: gutenberg title: subject-projectGutenberg-gutenberg date: 2021-06-09 time: 17:06 username: emorgan patron: Eric Morgan email: emorgan@nd.edu input: facet_subject:"Project Gutenberg" ==== make-pages.sh htm files ==== make-pages.sh complex files ==== make-pages.sh named enities ==== making bibliographics id: 14585 author: ERPANET title: ERPANET Case Study: Project Gutenberg date: words: 1226 sentences: 84 pages: flesch: 62 cache: ./cache/14585.txt txt: ./txt/14585.txt summary: Project Gutenberg is one of the earliest web sites on the internet and Project Gutenberg ensures that all eBooks are available freely accessible to the general public in a digitised format. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make when scanning and editing texts for Project Gutenberg to ensure that Project Gutenberg aims to make digitised versions of popular literature digitising the out of print works, Project Gutenberg feels that they Project Gutenberg already has numerous plain text files that are 20-30 Project Gutenberg eBooks are created as plain ASCII text files. Project Gutenberg team will also generate plain ASCII (15) text files. The first is the Project Gutenberg site itself and the other Project Gutenberg uses the unique eBook number as the file name. Therefore, if the eBook is the 10001 plain text file created it will be id: 48791 author: Hart, Michael title: Project Gutenberg Newsletters 1999 Thirteen Letters: December 1998 to December 1999 date: words: 38429 sentences: 4839 pages: flesch: 81 cache: ./cache/48791.txt txt: ./txt/48791.txt summary: of our new, and more complete, AND PUBLIC DOMAIN, Shakespeare edition! If the New York Times'' estimates of 7 years for information doubling 5. The 49 NEW Project Gutenberg Complete Works of Shakespeare, 5. The 49 NEW Project Gutenberg Complete Works of Shakespeare, April Etexts are all done, and we may have already started May. Nov 1998 Locrine/Mucedorus, Shakespeare Apocrypha [1ws48xxx.xxx]1548 Nov 1998 As You Like It, by William Shakespeare [2ws25xxx.xxx]1523 Nov 1998 As You Like It, by William Shakespeare [2ws25xxx.xxx]1523 Oct 1998 Love''s Labour''s Lost, by William Shakespeare [2ws12xxx.xxx]1510 Oct 1998 Love''s Labour''s Lost, by William Shakespeare [2ws12xxx.xxx]1510 Several new Project Gutenberg sites listed below and more than a whole New index of Project Gutenberg Etexts in Australia **And Now Our List of Current Postings of More Project Gutenberg Etexts** 3,333 Project Gutenberg Etexts online at that time. original Complete works of Shakespeare [Etext #100] but some are new. id: 36616 author: Lebert, Marie title: Project Gutenberg 4 July 1971 - 4 July 2011: Album date: words: 1053 sentences: 115 pages: flesch: 80 cache: ./cache/36616.txt txt: ./txt/36616.txt summary: In January 2005, Project Gutenberg reached 15,000 ebooks. In December 2006, Project Gutenberg reached 20,000 ebooks. There were ebooks in 50 languages in December 2006. In December 2006, Mike Cook launched the blog Project Gutenberg News as July 2007 > Project Gutenberg Canada Project Gutenberg sent out 15 million ebooks via CDs and DVDs by snail April 2008 > eBook #25000 > English Book Collectors, by William Younger Project Gutenberg reached 25,000 books in April 2008. Project Gutenberg reached 30,000 books in October 2009. In December 2010, Project Gutenberg offered 33,000 high-quality ebooks, April 2011 > 20,000 ebooks processed by Distributed Proofreaders April 2011 > 30,000 English ebooks in Project Gutenberg The 30,000th English-language ebook was posted on 12 April 2011. In June 2011, the 14 main languages were English (with 30,569 ebooks on 4 July 2011 > 40th anniversary of Project Gutenberg Copyrighted images: @folio Project, Distributed Proofreaders (all id: 9109 author: Tinsley, Jim title: The Project Gutenberg FAQ 2002 date: words: 1278 sentences: 95 pages: flesch: 82 cache: ./cache/9109.txt txt: ./txt/9109.txt summary: If you want to let readers know that your site has other related The book that you translated needs to be in the public domain, and we which you worked--it needs to be a pre-1923 or otherwise public domain work to the public domain, or do you want to retain copyright? just need to write the appropriate letter and send the text to us. If you want to release it into the public domain and distribute it gives me pleasure to release this work into the public domain, and I invite Project Gutenberg to publish this public domain edition. that it _is_ a public domain version, we do need a signed letter. into the public domain for Project Gutenberg to publish it? non-exclusive rights to distribute this book in electronic form I hold the copyright on a book, and would like Project Gutenberg Why does PG format texts the way it does? ==== make-pages.sh questions ==== make-pages.sh search ==== make-pages.sh topic modeling corpus Zipping study carrel