Summary of your 'study carrel' ============================== This is a summary of your Distant Reader 'study carrel'. The Distant Reader harvested & cached your content into a collection/corpus. It then applied sets of natural language processing and text mining against the collection. The results of this process was reduced to a database file -- a 'study carrel'. The study carrel can then be queried, thus bringing light specific characteristics for your collection. These characteristics can help you summarize the collection as well as enumerate things you might want to investigate more closely. This report is a terse narrative report, and when processing is complete you will be linked to a more complete narrative report. Eric Lease Morgan Number of items in the collection; 'How big is my corpus?' ---------------------------------------------------------- 4 Average length of all items measured in words; "More or less, how big is each item?" ------------------------------------------------------------------------------------ 10497 Average readability score of all items (0 = difficult; 100 = easy) ------------------------------------------------------------------ 76 Top 50 statistically significant keywords; "What is my collection about?" ------------------------------------------------------------------------- 4 Gutenberg 1 web 1 number 1 internet 1 William 1 Wednesday 1 Shakespeare 1 Sep 1 Public 1 Project 1 Newsletter 1 Microsoft 1 King 1 Jun 1 John 1 Human 1 Henry 1 Genome 1 Etexts 1 Edupage 1 Domain 1 Chromosome 1 Balzac 1 Aug Top 50 lemmatized nouns; "What is discussed?" --------------------------------------------- 116 month 108 file 93 year 93 book 89 message 80 ebook 74 version 74 internet 65 work 65 information 64 time 62 copyright 60 site 58 computer 55 number 52 volunteer 51 # 50 text 50 company 49 user 49 % 46 service 44 email 44 - 42 mail 42 language 39 edupage 39 e 37 web 36 line 35 system 35 name 35 edition 34 access 33 list 32 domain 31 week 29 people 29 law 29 day 28 today 27 relay 26 thing 26 request 26 newsletter 25 software 25 problem 25 help 23 way 23 story Top 50 proper nouns; "What are the names of persons or places?" -------------------------------------------------------------- 210 Project 203 Shakespeare 172 Gutenberg 144 William 125 Jun 114 Nov 109 May 109 Apr 92 Aug 82 Sep 73 Oct 64 Human 59 Genome 59 Etexts 55 Henry 53 New 51 King 49 Jan 47 Feb 46 Dec 43 Jul 42 Wednesday 42 Microsoft 42 Etext 40 Chromosome 38 Public 38 Newsletter 36 de 36 Edupage 35 Mar 33 John 33 Domain 32 # 29 Balzac 28 Michael 28 AOL 27 Times 27 B. 26 News 26 Honore 24 Richard 24 Part 24 George 23 Harte 22 II 22 H. 22 Anthony 21 York 21 Susan 21 London Top 50 personal pronouns nouns; "To whom are things referred?" ------------------------------------------------------------- 314 we 299 you 226 it 206 i 80 they 67 me 51 them 34 us 17 yourself 12 he 6 him 5 myself 5 itself 3 one 2 she 1 themselves 1 ourselves 1 ours 1 himself 1 her 1 ff][0ws37xxx.xxx]2262 1 ''s Top 50 lemmatized verbs; "What do things do?" --------------------------------------------- 1071 be 361 have 180 do 108 send 106 get 97 say 73 need 69 use 66 subscribe 65 want 63 make 60 go 54 include 51 find 45 post 41 take 41 download 37 know 35 read 35 create 34 work 32 help 31 like 28 see 27 give 24 let 23 start 23 put 22 try 22 end 21 unsubscribe 21 offer 21 call 20 require 20 release 20 lose 20 finish 20 contain 19 write 19 think 19 reserve 19 publish 19 hope 18 mh 18 allow 17 come 17 ask 16 receive 16 delay 16 complete Top 50 lemmatized adjectives and adverbs; "How are things described?" --------------------------------------------------------------------- 196 not 94 new 83 more 81 just 72 first 71 also 68 other 66 out 61 now 51 many 46 so 45 different 41 only 38 up 36 public 36 free 36 as 35 last 33 directly 31 few 31 available 30 well 28 still 28 own 28 much 28 even 27 next 26 very 26 online 25 most 25 here 24 main 23 good 23 electronic 23 about 22 several 22 nearly 22 long 21 major 20 usually 19 least 19 high 19 great 18 yet 18 back 17 then 17 such 17 possible 17 old 17 hard Top 50 lemmatized superlative adjectives; "How are things described to the extreme?" ------------------------------------------------------------------------- 15 most 11 least 7 good 4 great 4 Most 2 small 2 simple 2 new 2 late 2 large 2 high 2 early 2 big 1 wide 1 short 1 rich 1 old 1 nice 1 low 1 long 1 few 1 close 1 bad Top 50 lemmatized superlative adverbs; "How do things do to the extreme?" ------------------------------------------------------------------------ 10 most 8 least 2 well 1 fast Top 50 Internet domains; "What Webbed places are alluded to in this corpus?" ---------------------------------------------------------------------------- 7 www.sjmercury.com 6 www.nytimes.com 4 www.latimes.com 4 www.educom.edu 3 www.usatoday.com 3 www.ft.com 2 www.washingtonpost.com 2 www.techweb.com 2 www.msnbc.com 2 promo.net 1 www.waldenfont.com 1 www.pobox.com 1 www.pandemonium.de 1 www.newpaltz.edu 1 www.library.adelaide.edu.au 1 www.investors.com 1 www.informationweek.com 1 www.hatzinikolaou.org 1 www.gutenbergnews.org 1 www.gutenberg.net 1 www.ebay.com 1 www.datacanyon.com 1 www.creativecommons.org> 1 www.computer.org 1 www.cddc.vt.edu 1 www.abebooks.com 1 www.1source.com 1 wsj.com 1 digital.library.upenn.edu Top 50 URLs; "What is hyperlinked from this corpus?" ---------------------------------------------------- 4 http://www.educom.edu/web/pubs/pubHomeFrame.html 1 http://www.washingtonpost.com/wp-srv/business/daily/oct99/mci6.htm 1 http://www.washingtonpost.com/wp-srv/business/daily/may99/privacy4.htm 1 http://www.waldenfont.com/gutenberg2 1 http://www.usatoday.com/life/cyber/tech/ctf950.htm 1 http://www.usatoday.com/life/cyber/tech/ctf470.htm 1 http://www.usatoday.com/life/cyber/nb/nb5.htm 1 http://www.techweb.com/wire/story/TWB19990513S0009 1 http://www.techweb.com/news/story/TWB19990726S0003 1 http://www.sjmercury.com/svtech/news/indepth/docs/instan072799.htm 1 http://www.sjmercury.com/svtech/news/breaking/merc/docs/086415.htm 1 http://www.sjmercury.com/svtech/news/breaking/merc/docs/065761.htm 1 http://www.sjmercury.com/svtech/news/breaking/merc/docs/011223.htm 1 http://www.sjmercury.com/svtech/news/breaking/ap/docs/764802l.htm 1 http://www.sjmercury.com/svtech/news/breaking/ap/docs/722931l.htm 1 http://www.sjmercury.com/ 1 http://www.pobox.com/~jimhenry/etext/ 1 http://www.pandemonium.de/gutenberg/ 1 http://www.nytimes.com/library/tech/99/09/biztech/articles/10handspring.html 1 http://www.nytimes.com/library/tech/99/08/cyber/articles/26domain.html 1 http://www.nytimes.com/library/tech/99/08/biztech/articles/16data.html 1 http://www.nytimes.com/library/tech/99/07/biztech/articles/27net.html 1 http://www.nytimes.com/library/tech/99/06/circuits/articles/24spam.html 1 http://www.nytimes.com/library/tech/99/05/biztech/articles/14soft.html 1 http://www.newpaltz.edu/~hathaway/goldenbowl1.html 1 http://www.msnbc.com/news/304583.asp 1 http://www.msnbc.com/news/282421.asp 1 http://www.library.adelaide.edu.au/catalogs/adelaide.html 1 http://www.latimes.com/home/business/t000045210.html 1 http://www.latimes.com/HOME/BUSINESS/t000076061.html 1 http://www.latimes.com/HOME/BUSINESS/t000073861.html 1 http://www.latimes.com/HOME/BUSINESS/t000073171.html 1 http://www.investors.com/ 1 http://www.informationweek.com/716/16olocr.htm 1 http://www.hatzinikolaou.org 1 http://www.gutenbergnews.org/documents/pg-40th-anniversary.pdf 1 http://www.gutenberg.net 1 http://www.ft.com/hippocampus/q181aba.htm 1 http://www.ft.com/hippocampus/q15aeee6.htm 1 http://www.ft.com/hippocampus/q14310a.htm 1 http://www.ebay.com 1 http://www.datacanyon.com/mirrors/gutenberg/ 1 http://www.creativecommons.org> 1 http://www.computer.org/computer/bcsummary.htm 1 http://www.cddc.vt.edu/gutenberg 1 http://www.abebooks.com/ 1 http://www.1source.com/~pollarda/textview/ 1 http://wsj.com/ 1 http://promo.net/pg/volunteer.html 1 http://promo.net/pg/vol/wwwboard/ Top 50 email addresses; "Who are you gonna call?" ------------------------------------------------- 16 listproc@educom.unc.edu 9 listproc@listserv.oit.unc.edu 8 manager@educom.unc.edu 6 hart@pobox.com 5 globaltraveler5565@yahoo.com 5 gehl@educom.edu 5 douglas@educom.edu 4 gutenberg@fireantproductions.com 4 newsscan@newsscan.com 4 editors@newsscan.com 3 owner-gutnberg@listserv.oit.unc.edu 3 hscrr@vgernet.net 3 edupage@educause.edu 3 listserv@listserv.educause.edu 2 newsscan@newsscan.com 2 gehl@newsscan.com 2 douglas@newsscan.com 2 dagnyj@hotmail.com 2 cannona@fireantproductions.com 1 you@yoursite.com 1 webmaster@promo.net 1 ssalade@jump.net 1 rogajin@hotmail.com 1 rdavis@yin.or.jp 1 rburkey@heads-up.com 1 rbenning@tampabay.rr.com 1 razucena@netway.com 1 razucena@gis.net 1 marystarr@earthlink.net 1 leonidas@hatzinikolaou.org 1 latin@lists.colorado.edu 1 joglar@iiqab.csic.es 1 jmendham@interlog.com 1 jimhenry@avana.net 1 jenifer.whiting@ptsem.edu 1 jbickers@ihug.co.nz 1 gutenberg@monogames.com 1 gill@geography.nottingham.ac.uk 1 gbnewby@ils.unc.edu 1 ftp-admin@is.co.za 1 fiji@ayup.limey.net 1 elianab68@yahoo.com 1 dlinsta@robles.callutheran.edu 1 dixonm@access.mountain.net 1 davidr@inconnect.com 1 cyri@juno.com 1 ccx074@coventry.ac.uk 1 ccx074@ccj.coventry.ac.uk 1 brown12@students.uiuc.edu 1 beandp@primenet.com Top 50 positive assertions; "What sentences are in the shape of noun-verb-noun?" ------------------------------------------------------------------------------- 2 ebooks are available 2 ebooks were available 2 files are large 2 gutenberg is proud 1 * is anyone 1 * like * 1 books were n''t 1 copyright were still 1 copyrights were not 1 ebook download sites 1 ebooks are then 1 ebooks are uploaded 1 ebooks were also 1 etexts are all 1 file was never 1 files are not 1 files are vastly 1 files letting users 1 files were also 1 gutenberg has not 1 gutenberg has now 1 gutenberg have volunteers 1 gutenberg sent out 1 information does not 1 information is already 1 internet was unconstitutional 1 messages sent out 1 month done ahead 1 project going in 1 site has other 1 site is ftp 1 version is not 1 version is rlglh10a.txt 1 version was worth 1 work getting volume 1 work is rather 1 works are also 1 works is vital 1 works made accessible 1 year is just Top 50 negative assertions; "What sentences are in the shape of noun-verb-no|not-noun?" --------------------------------------------------------------------------------------- 1 files are not complete 1 gutenberg has not only 1 version is not ready A rudimentary bibliography -------------------------- id = 14585 author = ERPANET title = ERPANET Case Study: Project Gutenberg date = keywords = Gutenberg summary = Project Gutenberg is one of the earliest web sites on the internet and Project Gutenberg ensures that all eBooks are available freely accessible to the general public in a digitised format. Project Gutenberg must be exceedingly careful to respect U.S. copyright laws regarding the works that they digitise and make when scanning and editing texts for Project Gutenberg to ensure that Project Gutenberg aims to make digitised versions of popular literature digitising the out of print works, Project Gutenberg feels that they Project Gutenberg already has numerous plain text files that are 20-30 Project Gutenberg eBooks are created as plain ASCII text files. Project Gutenberg team will also generate plain ASCII (15) text files. The first is the Project Gutenberg site itself and the other Project Gutenberg uses the unique eBook number as the file name. Therefore, if the eBook is the 10001 plain text file created it will be id = 48791 author = Hart, Michael title = Project Gutenberg Newsletters 1999 Thirteen Letters: December 1998 to December 1999 date = keywords = Aug; Balzac; Chromosome; Domain; Edupage; Etexts; Genome; Gutenberg; Henry; Human; John; Jun; King; Microsoft; Newsletter; Project; Public; Sep; Shakespeare; Wednesday; William; internet; number; web summary = of our new, and more complete, AND PUBLIC DOMAIN, Shakespeare edition! If the New York Times'' estimates of 7 years for information doubling 5. The 49 NEW Project Gutenberg Complete Works of Shakespeare, 5. The 49 NEW Project Gutenberg Complete Works of Shakespeare, April Etexts are all done, and we may have already started May. Nov 1998 Locrine/Mucedorus, Shakespeare Apocrypha [1ws48xxx.xxx]1548 Nov 1998 As You Like It, by William Shakespeare [2ws25xxx.xxx]1523 Nov 1998 As You Like It, by William Shakespeare [2ws25xxx.xxx]1523 Oct 1998 Love''s Labour''s Lost, by William Shakespeare [2ws12xxx.xxx]1510 Oct 1998 Love''s Labour''s Lost, by William Shakespeare [2ws12xxx.xxx]1510 Several new Project Gutenberg sites listed below and more than a whole New index of Project Gutenberg Etexts in Australia **And Now Our List of Current Postings of More Project Gutenberg Etexts** 3,333 Project Gutenberg Etexts online at that time. original Complete works of Shakespeare [Etext #100] but some are new. id = 36616 author = Lebert, Marie title = Project Gutenberg 4 July 1971 - 4 July 2011: Album date = keywords = Gutenberg summary = In January 2005, Project Gutenberg reached 15,000 ebooks. In December 2006, Project Gutenberg reached 20,000 ebooks. There were ebooks in 50 languages in December 2006. In December 2006, Mike Cook launched the blog Project Gutenberg News as July 2007 > Project Gutenberg Canada Project Gutenberg sent out 15 million ebooks via CDs and DVDs by snail April 2008 > eBook #25000 > English Book Collectors, by William Younger Project Gutenberg reached 25,000 books in April 2008. Project Gutenberg reached 30,000 books in October 2009. In December 2010, Project Gutenberg offered 33,000 high-quality ebooks, April 2011 > 20,000 ebooks processed by Distributed Proofreaders April 2011 > 30,000 English ebooks in Project Gutenberg The 30,000th English-language ebook was posted on 12 April 2011. In June 2011, the 14 main languages were English (with 30,569 ebooks on 4 July 2011 > 40th anniversary of Project Gutenberg Copyrighted images: @folio Project, Distributed Proofreaders (all id = 9109 author = Tinsley, Jim title = The Project Gutenberg FAQ 2002 date = keywords = Gutenberg summary = If you want to let readers know that your site has other related The book that you translated needs to be in the public domain, and we which you worked--it needs to be a pre-1923 or otherwise public domain work to the public domain, or do you want to retain copyright? just need to write the appropriate letter and send the text to us. If you want to release it into the public domain and distribute it gives me pleasure to release this work into the public domain, and I invite Project Gutenberg to publish this public domain edition. that it _is_ a public domain version, we do need a signed letter. into the public domain for Project Gutenberg to publish it? non-exclusive rights to distribute this book in electronic form I hold the copyright on a book, and would like Project Gutenberg Why does PG format texts the way it does?