Summary of your 'study carrel' ============================== This is a summary of your Distant Reader 'study carrel'. The Distant Reader harvested & cached your content into a collection/corpus. It then applied sets of natural language processing and text mining against the collection. The results of this process was reduced to a database file -- a 'study carrel'. The study carrel can then be queried, thus bringing light specific characteristics for your collection. These characteristics can help you summarize the collection as well as enumerate things you might want to investigate more closely. Eric Lease Morgan May 27, 2019 Number of items in the collection; 'How big is my corpus?' ---------------------------------------------------------- 99 Average length of all items measured in words; "More or less, how big is each item?" ------------------------------------------------------------------------------------ 20597 Average readability score of all items (0 = difficult; 100 = easy) ------------------------------------------------------------------ 69 Top 50 statistically significant keywords; "What is my collection about?" ------------------------------------------------------------------------- 6 web 6 Library 5 Perl 5 Great 5 Distant 4 xml 4 word 4 text 4 step 4 html 4 VIAF 4 Research 4 RDF 4 MARC 4 Google 4 GitHub 4 Catalogue 3 reader 3 University 3 PDF 3 Libraries 3 Digital 3 Data 3 Dame 3 DPLA 3 Conference 3 Code4Lib 3 Center 3 Alex 2 url 2 uri 2 uncategorized 2 sparql 2 source 2 reading 2 list 2 link 2 internet 2 file 2 etc 2 datum 2 create 2 book 2 Water 2 TFIDF 2 TEI 2 State 2 Solr 2 Services 2 Semantic Top 50 lemmatized nouns; "What is discussed?" --------------------------------------------- 3319 text 2883 word 2791 datum 2786 library 2598 file 2240 p 2126 content 1992 thing 1789 book 1724 number 1626 collection 1520 time 1509 list 1490 process 1439 set 1382 document 1379 information 1368 result 1288 way 1217 service 1204 sandbox 1168 use 1159 people 1141 computer 1117 source 1046 web 1039 work 1037 system 1025 record 997 application 993 reader 987 search 980 part 972 database 948 example 947 item 934 href="http://dh.crc.nd.edu 926 xml 880 tool 832 software 822 idea 788 water 787 data 781 presentation 771 blog 758 index 751 musing 744 catalog 737 article 703 name Top 50 proper nouns; "What are the names of persons or places?" -------------------------------------------------------------- 2344 /p 1500 li 1243 # 990 RDF 887 University 809 Library 803 MARC 773 href="http://infomotions.com 720 td 559 p 450 Data 409 Notre 407 Perl 404 Digital 394 Dame 388 Libraries 376 Great 367 src="http://infomotions.com 354 amp 348 Google 340 search.cgi?q 331 Project 315 tcp 310 alex 300 h2 298 zzzz&word 298 concordance/?cmd 297 > 294 Eric 292 nbsp; 16 # 9 ’s 9 let’s 8 year’s 8 http://example.org/city 6 safe.” 5 em 4 you.” 4 years’ 4 we’ll 4 us” 4 there.” 4 ii 4 herself 4 apache2::const::ok 3 wordclouds 3 useful” 3 solve?” 3 mine 3 http://example.org/europe 3 http://dbpedia.org/resource/walt_disney 2 y’ 2 yours 2 you’s 2 you’re 2 using?” 2 used.” 2 url” 2 unnecessary?” 2 types… Top 50 lemmatized verbs; "What do things do?" --------------------------------------------- 28301 be 4107 have 3640 use 3482 do 1978 create 1879 make 1537 describe 1466 give 1256 get 1187 link 1129 call 1119 include 1104 see 1084 find 1066 write 1058 read 1049 go 961 provide 879 learn 862 need 822 take 785 come 760 think 646 base 618 work 612 know 611 implement 594 say 544 want 543 publish 522 compare 513 outline 466 denote 457 require 440 understand 439 save 439 contain 428 support 413 enable 397 put 394 intend 389 return 387 search 383 share 379 apply 376 allow 373 begin 371 seem 370 follow 366 ask Top 50 lemmatized adjectives and adverbs; "How are things described?" --------------------------------------------------------------------- 3748 not 2581 more 1676 then 1610 well 1382 other 1310 as 1093 many 1036 first 971 open 956 most 889 very 883 such 824 digital 806 also 797 good 772 much 731 new 706 great 705 only 698 just 666 up 666 few 614 available 604 full 592 same 559 really 555 different 550 long 545 strong 540 easy 533 here 524 so 521 second 507 next 506 bibliographic 504 possible 425 simple 419 now 405 able 387 out 375 plain 369 - 367 too 366 back 363 specifically 358 traditional 358 instead 344 even 341 about 339 large Top 50 lemmatized superlative adjectives; "How are things described to the extreme?" ------------------------------------------------------------------------- 295 most 181 good 147 least 82 great 74 Most 64 late 58 old 34 big 33 simple 31 tiny 28 brief 27 long 25 large 18 high 14 short 13 manif 10 easy 10 Least 8 wide 8 few 7 small 6 subroutines.pl 6 new 6 full 4 wise 4 strong 4 shabby 4 nice 4 low 4 furth 4 frequent.png">most 3 wherefore’s 3 near 3 common 3 codef 2 ugly 2 tricky 2 tough 2 sru::requ 2 rich 2 reat 2 li 2 hot 2 hard 2 fuzzy 2 fresh 2 fast 2 early 2 close 2 clean Top 50 lemmatized superlative adverbs; "How do things do to the extreme?" ------------------------------------------------------------------------ 661 most 39 least 24 well 2 hard 2 hadn&apost 2 fewest 1 phrases.txt">most Top 50 Internet domains; "What Webbed places are alluded to in this corpus?" ---------------------------------------------------------------------------- 1884 infomotions.com 1074 dh.crc.nd.edu 228 example.org 220 sites.nd.edu 199 blogs.nd.edu 179 www.w3.org 115 sites.tufts.edu 105 github.com 98 bit.ly 94 en.wikipedia.org 75 carrels.distantreader.org 71 maps.google.com 70 purl.org 56 id.loc.gov 55 bibframe.org 51 xmlns.com 48 search.cpan.org 45 simile.mit.edu 35 data.archiveshub.ac.uk 31 cds.crc.nd.edu 29 litablog.org 28 www.loc.gov 28 dbpedia.org 27 journal.code4lib.org 26 www.worldcat.org 25 worldcat.org 23 hdl.handle.net 21 www.flickr.com 21 viaf.org 19 dewey.library.nd.edu 18 www.amazon.com 18 example.com 17 archiveshub.ac.uk 16 www.oclc.org 16 orcid.org 15 techessence.info 15 code.google.com 14 www.library.nd.edu 14 www.catholicresearch.net 14 wifo5-03.informatik.uni-mannheim.de 12 www.assoc-amazon.com 12 labs.regesta.com 12 id.crossref.org 11 voyant-tools.org 11 nlp.stanford.edu 11 farm5.static.flickr.com 10 www.youtube.com 10 www.springerlink.com 10 www.hathitrust.org 10 serials.infomotions.com Top 50 URLs; "What is hyperlinked from this corpus?" ---------------------------------------------------- 28 http://example.org/rome 28 http://example.org/italy 28 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=6">Formats/Web 22 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=22">Formats/Journal 21 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=11">Themes/Data 20 http://example.org/unitedstates 20 http://example.org/paris 20 http://example.org/newyork 20 http://example.org/london 20 http://example.org/france 20 http://example.org/england 20 http://example.org/city 17 http://infomotions.com/alex/">Alex 16 http://example.org/country 14 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=18">Formats/Magazine 12 http://example.org/europe 11 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=39">Themes/Digital 11 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=29">Themes/Libraries 10 http://www.w3.org/1999/02/22-rdf-syntax-ns#" 10 http://simile.mit.edu/2006/01/ontologies/mods3#> 10 http://id.loc.gov/authorities/names/n79089957" 9 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=172">Formats/Technical 8 http://xmlns.com/foaf/0.1/gender": 8 http://xmlns.com/foaf/0.1/Person" 8 http://simile.mit.edu/2006/01/ontologies/mods3#> 8 http://purl.org/dc/terms/creator": 8 http://localhost:210/solr'' 8 http://id.loc.gov/authorities/names/n79089957"> 8 http://example.com/index.php/foo 8 http://en.wikipedia.org/wiki/Declaration_of_Independence"> 7 http://sites.tufts.edu/liam/" 6 http://xmlns.com/foaf/0.1/"> 6 http://xmlns.com/foaf/0.1/ 6 http://www.w3.org/2000/01/rdf-schema#" 6 http://www.w3.org/1999/xhtml" 6 http://www.w3.org/1999/02/22-rdf-syntax-ns#type": 6 http://www.w3.org/1999/02/22-rdf-syntax-ns# 6 http://purl.org/dc/terms/" 6 http://purl.org/dc/terms/ 6 http://id.loc.gov/authorities/names/n79089957": 6 http://github.com/codeforkjeff/refine_viaf 6 http://en.wikipedia.org/wiki/Declaration_of_Independence": 6 http://distantreader.org 6 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=56">Themes/Information 6 http://dh.crc.nd.edu/sandbox/readings/?cmd=term&id=118">Themes/Text 6 http://dbpedia.org/resource/Walt_Disney 6 http://data.archiveshub.ac.uk/def/> 6 http://bibframe.org/vocab/label> 6 http://bibframe.org/vocab/label> 5 http://sites.tufts.edu/liam/">LiAM Top 50 email addresses; "Who are you gonna call?" ------------------------------------------------- 48 eric_morgan@infomotions.com 40 emorgan@nd.edu 15 listserv@listserv.nd.edu 5 mwilkens@nd.edu 3 nd-dh@listserv.nd.edu 3 angelfund4code4lib@infomotions.com 2 eric_morgan@ncsu.edu 2 bug-parallel@gnu.org 1 tika-user@lucene.apache.org 1 tika-user-subscribe@lucene.apache.org 1 parallel@gnu.org 1 info-gnu@gnu.org 1 gnu@gnu.org 1 code4lib@listserv.nd.edu Top 50 positive assertions; "What sentences are in the shape of noun-verb-noun?" ------------------------------------------------------------------------------- 108 text was never 44 # done exit 22 libraries are not 21 document was never 18 > linked data 17 process is not 15 text was originally 14 content is not 14 documents using mallet 14 things are not 12 libraries are uniquely 12 using linked data 11 libraries provide services 8 information have fundamentally 8 libraries are almost 8 libraries do not 8 things are more 8 use is similar 7 books are not 7 collections are empty 7 process is akin 7 services are useless 6 # get input 6 file is not 6 file was never 6 files are not 6 files are tab 6 lists are similar 6 people do not 6 process was not 6 results are not 6 results are then 6 thing is not 5 book is about 5 book is not 5 books are athenian 5 data using elasticsearch 5 data using new 5 document is available 5 process is more 5 results were not 4 # create saxon 4 # do keyword 4 # done exit; find uris 4 book called letters 4 book called walden Top 50 negative assertions; "What sentences are in the shape of noun-verb-no|not-noun?" --------------------------------------------------------------------------------------- 6 libraries are not really 5 content is not necessarily 4 collections are not really 4 content is not as 4 files are not text 4 files include no graphical 4 process is not perfect 4 process was not too 4 results are not surprising 4 thing do not necessarily 3 content is not really 3 document has no source 3 libraries are no longer 3 process is not difficult 3 results do not really 3 results were not conclusive 3 service was not able 2 book is not very 2 books are not central 2 books are not source 2 collection are not necessarily 2 content is not bibliographic 2 file is not really 2 files are not really 2 files do not always 2 information is not good 2 libraries are not able 2 libraries are not necessarily 2 library is not sure 2 list is not perfect 2 people have no metadata 2 process is not as 2 process is not only 2 process is not scalable 2 result did not necessarily 2 results are not very 2 results were not meaningful 2 thing is not as 2 thing is not obvious 2 thing was not worth 2 things are not just 2 things are not logical 2 things are not quantitive 2 things are not understood.” 1 > provide not only 1 > was not yet Sizes of items; "Measures in words, how big is each item?" ---------------------------------------------------------- 389347 planet-infomotions-com-3359 300279 planet-infomotions-com-8900 59918 infomotions-com-2987 59918 infomotions-com-9504 13253 infomotions-com-9318 11414 infomotions-com-6757 4666 dh-crc-nd-edu-9558 3844 planet-infomotions-com-9545 3460 tika-apache-org-2948 2496 infomotions-com-9966 2298 github-com-8326 1474 infomotions-com-3769 1460 github-com-8202 1348 github-com-9780 1219 github-com-8025 1214 www-gnu-org-8892 797 www-laurenceanthony-net-8779 739 github-com-2983 625 github-com-7801 582 github-com-379 512 dh-crc-nd-edu-1806 492 pkp-sfu-ca-4628 393 curl-haxx-se-8721 355 infomotions-com-953 354 mallet-cs-umass-edu-3654 343 serials-infomotions-com-5908 325 infomotions-com-3852 313 distantreader-org-6471 313 distantreader-org-7009 301 stedolan-github-io-4569 289 2020-code4lib-org-5785 183 infomotions-com-172 131 bit-ly-5230 105 infomotions-com-3637 88 planet-infomotions-com-4104 58 planet-infomotions-com-7919 37 infomotions-com-555 32 twitter-com-9838 29 dh-crc-nd-edu-7757 28 docs-pkp-sfu-ca-7101 22 youtu-be-1944 10 sites-tufts-edu-6731 bit-ly-8913 infomotions-com-7836 planet-infomotions-com-8963 sites-nd-edu-1179 sites-nd-edu-1522 sites-nd-edu-1664 sites-nd-edu-1720 sites-nd-edu-1886 sites-nd-edu-1918 sites-nd-edu-2178 sites-nd-edu-2246 sites-nd-edu-2497 sites-nd-edu-2573 sites-nd-edu-2650 sites-nd-edu-2818 sites-nd-edu-2908 sites-nd-edu-2910 sites-nd-edu-3073 sites-nd-edu-3118 sites-nd-edu-3187 sites-nd-edu-3469 sites-nd-edu-3471 sites-nd-edu-3574 sites-nd-edu-3585 sites-nd-edu-3678 sites-nd-edu-3721 sites-nd-edu-393 sites-nd-edu-3940 sites-nd-edu-460 sites-nd-edu-5154 sites-nd-edu-5464 sites-nd-edu-6066 sites-nd-edu-6181 sites-nd-edu-6245 sites-nd-edu-6302 sites-nd-edu-6366 sites-nd-edu-6432 sites-nd-edu-6582 sites-nd-edu-6875 sites-nd-edu-755 sites-nd-edu-7631 sites-nd-edu-7840 sites-nd-edu-7928 sites-nd-edu-8089 sites-nd-edu-8419 sites-nd-edu-8448 sites-nd-edu-8489 sites-nd-edu-8691 sites-nd-edu-8707 sites-nd-edu-8762 sites-nd-edu-9146 sites-nd-edu-9191 sites-nd-edu-9996 ucla-zoom-us-1408 www-gutenberg-org-6207 www-gutenberg-org-941 www-xsede-org-5929 Readability of items; "How difficult is each item to read?" ----------------------------------------------------------- 91.0 sites-tufts-edu-6731 90.0 twitter-com-9838 87.0 infomotions-com-172 85.0 stedolan-github-io-4569 83.0 dh-crc-nd-edu-7757 81.0 youtu-be-1944 80.0 infomotions-com-555 80.0 tika-apache-org-2948 78.0 serials-infomotions-com-5908 77.0 infomotions-com-3769 77.0 infomotions-com-9966 77.0 planet-infomotions-com-4104 74.0 github-com-2983 74.0 github-com-7801 74.0 planet-infomotions-com-3359 73.0 curl-haxx-se-8721 73.0 github-com-379 73.0 www-gnu-org-8892 72.0 github-com-8025 72.0 github-com-8202 71.0 github-com-8326 69.0 dh-crc-nd-edu-9558 69.0 github-com-9780 69.0 infomotions-com-2987 69.0 infomotions-com-9504 68.0 bit-ly-5230 68.0 distantreader-org-6471 68.0 distantreader-org-7009 67.0 planet-infomotions-com-8900 66.0 2020-code4lib-org-5785 66.0 infomotions-com-6757 65.0 infomotions-com-3637 65.0 www-laurenceanthony-net-8779 63.0 infomotions-com-953 60.0 dh-crc-nd-edu-1806 57.0 infomotions-com-9318 57.0 planet-infomotions-com-9545 52.0 planet-infomotions-com-7919 51.0 infomotions-com-3852 51.0 pkp-sfu-ca-4628 44.0 mallet-cs-umass-edu-3654 30.0 docs-pkp-sfu-ca-7101 bit-ly-8913 infomotions-com-7836 planet-infomotions-com-8963 sites-nd-edu-1179 sites-nd-edu-1522 sites-nd-edu-1664 sites-nd-edu-1720 sites-nd-edu-1886 sites-nd-edu-1918 sites-nd-edu-2178 sites-nd-edu-2246 sites-nd-edu-2497 sites-nd-edu-2573 sites-nd-edu-2650 sites-nd-edu-2818 sites-nd-edu-2908 sites-nd-edu-2910 sites-nd-edu-3073 sites-nd-edu-3118 sites-nd-edu-3187 sites-nd-edu-3469 sites-nd-edu-3471 sites-nd-edu-3574 sites-nd-edu-3585 sites-nd-edu-3678 sites-nd-edu-3721 sites-nd-edu-393 sites-nd-edu-3940 sites-nd-edu-460 sites-nd-edu-5154 sites-nd-edu-5464 sites-nd-edu-6066 sites-nd-edu-6181 sites-nd-edu-6245 sites-nd-edu-6302 sites-nd-edu-6366 sites-nd-edu-6432 sites-nd-edu-6582 sites-nd-edu-6875 sites-nd-edu-755 sites-nd-edu-7631 sites-nd-edu-7840 sites-nd-edu-7928 sites-nd-edu-8089 sites-nd-edu-8419 sites-nd-edu-8448 sites-nd-edu-8489 sites-nd-edu-8691 sites-nd-edu-8707 sites-nd-edu-8762 sites-nd-edu-9146 sites-nd-edu-9191 sites-nd-edu-9996 ucla-zoom-us-1408 www-gutenberg-org-6207 www-gutenberg-org-941 www-xsede-org-5929 Item summaries; "In a narrative form, how can each item be abstracted?" ----------------------------------------------------------------------- 2020-code4lib-org-5785 Put another way, the Reader consumes just about any number of files in any just about any format, and it outputs plain text files, delimited files, a relational database, and a set of HTML reports all for the purposes of systematic reading. The first half of this workshop will be on the use of the Distant Reader. Attendees will learn how to submit content to the Reader, and then how to interact with the HTML reports. The second half of the workshop will be on hacking the Reader''s structured data. Given the plain text files, tab-delimited files, and relational database the Reader also outputs, attendees will learn how to do various visualizations against the data, subset the data with SQL, index the data with Solr, normalize the data with OpenRefine, use machine learning against the data, etc., all for the purposes of more in-depth analysis. bit-ly-5230 Link Grabber Chrome Web Store Link Grabber offered by Don 70,000+ users Overview An easy to use extractor or grabber for hyperlinks on an HTML page Extract links from an HTML page and display them in another tab. Features: * Requires no special permissions * No usage information / analytics are collected from you * Use either browser action button or context menu item to activate * Configurable list of blocked domains * Filter links by substring match * Copy links to clipboard * Show/hide links that appear more than once on the page * Show/hide links to the same hostname as the source page * Group links by domain Thanks to: Icons by FatCow http://www.fatcow.com/free-icons React http://facebook.github.io/react/ Twitter Bootstrap http://twitter.github.com/bootstrap Report Abuse Additional Information Version: 0.5.2 Updated: November 10, 2017 Size: 199KiB Language: English (United States) bit-ly-8913 curl-haxx-se-8721 tiny-curl Releases curl tool curl-library curl-users Book: Everything curl Release Procedure Test curl Release table curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, curl is used in command lines or scripts to transfer data. Who makes curl? curl is free and open source software and exists What''s the latest curl? The most recent stable version is 7.73.0, released on 14th of October 2020. Currently, 90 of the listed downloads are of the latest version. Time to donate to the curl project? Check out the latest source code Everything curl is a detailed everything there is to know about curl, libcurl and the associated project. Learn how to use curl. perhaps how the curl project accepts contributions. Everything curl is itself an Everything curl is itself an Everything curl is itself an Everything curl is itself an Everything curl is itself an open project that accepts your contributions and help. dh-crc-nd-edu-1806 Once parts-of-speech are denoted, a reader can begin to analyze a text on a … Continue reading → A student here at Notre Dame wants to do computer and text mining analyze a set of websites. Beth Plale and Yiming Sun, both from the HathiTrust Research Center, came to Notre Dame on Tuesday (May 7) to give the digital humanities group an update of some of the things happening at the Center. This posting documents some … Continue reading → In his words, “I will explain how practices such as text mining present a fundamental challenge … Continue reading → This Friday (April 12) the Notre Dame Digital Humanities group will be sponsoring a lunchtime presentation by Matthew Sag called Copyright And The Digital Humanities: I will explain how practices such as text mining present a fundamental challenge to our … Continue reading → dh-crc-nd-edu-7757 Project Gutenberg Home Project Gutenberg Home Get URLs This is selected fulltext index to the content of Project Gutenberg. Enter a query. Query: Eric Lease Morgan April 30, 2019 dh-crc-nd-edu-9558 Once parts-of-speech are denoted, a reader can begin to analyze a text on a dimension beyond the simple tabulating of words. Beth Plale and Yiming Sun, both from the HathiTrust Research Center, came to Notre Dame on Tuesday (May 7) to give the digital humanities group an update of some of the things happening at the Center. In his words, "I will explain how practices such as text mining present a fundamental challenge to our understanding of copyright law and what this means for scholars in the digital humanities." To answer his own question, Sag does not believe processes like text mining violate copyright because the results are generated automatically — created by machines. I will explain how practices such as text mining present a fundamental challenge to our understanding of copyright law and what this means for scholars in the digital humanities. distantreader-org-6471 Distant Reader Gateway The Distant Reader is a tool for reading. The Distant Reader empowers you to use & understand large amounts of textual information both quickly & easily. Technically speaking, the Distant Reader is a system which locally harvests/caches content you specify. It then transforms the content into plain text, performs sets of natural language processing & text mining against the text, saves the results in a number of formats, reduces the whole to a cross-platform database file, queries the database thus summarizing the collection, zips the results of the entire process into a single file, and makes the file available to you for further investigation -"reading". Sample output of the Reader ("study carrels") I don''t know about you, but now-a-days I can find plenty of scholarly & authoritative content. The Distant Reader is intended to address this question by making observations against a corpus and providing tools for interpreting the results. distantreader-org-7009 Distant Reader Gateway The Distant Reader is a tool for reading. The Distant Reader empowers you to use & understand large amounts of textual information both quickly & easily. Technically speaking, the Distant Reader is a system which locally harvests/caches content you specify. It then transforms the content into plain text, performs sets of natural language processing & text mining against the text, saves the results in a number of formats, reduces the whole to a cross-platform database file, queries the database thus summarizing the collection, zips the results of the entire process into a single file, and makes the file available to you for further investigation -"reading". Sample output of the Reader ("study carrels") I don''t know about you, but now-a-days I can find plenty of scholarly & authoritative content. The Distant Reader is intended to address this question by making observations against a corpus and providing tools for interpreting the results. docs-pkp-sfu-ca-7101 REST API Reference, 3.1.x Open Journal Systems Community Documentation Interest Group Contributing Documentation Translating Guide Community Forum Public Knowledge Project PKP|Publishing Services Contact Us Contact Us github-com-2983 GitHub ericleasemorgan/reader-toolbox: A suite of scripts use to report on and analyze the content of Distant Reader "study carrels" GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. Open with GitHub Desktop Launching GitHub Desktop Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. If nothing happens, download GitHub Desktop and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. A suite of scripts use to report on and analyze the content of Distant Reader "study carrels" A suite of scripts use to report on and analyze the content of Distant Reader "study carrels" Contact GitHub We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. github-com-379 GitHub ericleasemorgan/reader-gutenberg: A system for implementing an index to Project Gutenberg Explore GitHub → GitHub Education GitHub Stars program GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. GitHub is where the world builds software GitHub CLI Open with GitHub Desktop Launching GitHub Desktop Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. If nothing happens, download GitHub Desktop and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. Failed to load latest commit information. Latest commit message A system for implementing an index to Project Gutenberg Contact GitHub We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. github-com-7801 GitHub ericleasemorgan/reader-lite: Given a file and a directory, output analysis of file to directory Explore GitHub → GitHub Education GitHub Stars program GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. GitHub is where the world builds software GitHub CLI Open with GitHub Desktop Launching GitHub Desktop Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. If nothing happens, download GitHub Desktop and try again. If nothing happens, download the GitHub extension for Visual Studio and try again. Latest commit Failed to load latest commit information. Latest commit message Commit time No releases published No packages published Contact GitHub We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. github-com-8025 GitHub ericleasemorgan/ojs-toolbox: Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. github-com-8202 GitHub senderle/topic-modeling-tool: A point-and-click tool for creating and analyzing topic models produced by MALLET. GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. If nothing happens, download GitHub Desktop and try again. The Topic Modeling Tool now has native Windows and Mac apps, and because of your operating system and version, and let us know the other tools you''re $ cd topic-modeling-tool/TopicModelingTool Work on this version of the tool has benefited from the support of A point-and-click tool for creating and analyzing topic models produced by MALLET. A point-and-click tool for creating and analyzing topic models produced by MALLET. senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html senderle.github.io/topic-modeling-tool/documentation/2017/01/06/quickstart.html We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. github-com-8326 GitHub ericleasemorgan/htid2books: Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. For example, ./bin/htid2txt.sh 194dfe2bg3 xa5350f0c44548487778e942518a nyp.33433082524681 In this case, the script will do the tiniest bit of validation, repeatedly run a Perl script (htid2txt.pl) to get the OCR of an individual page, cache the result, and when there no more pages in the given book, concatenate the cache into a text file saved in the directory named ./books. Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. github-com-9780 GitHub ericleasemorgan/reader: Distant Reader, a tool for using & understanding a corpus GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. The Distant Reader CORD is a high performance computing (HPC) system which: 1) takes an almost arbitrary amount of unstructured data (text) as input and outputs a set of structured data for analysis, and 2) does this work against a specific data set called CORD-19. As an HPC, the Distant Reader CORD is not a single computer program but instead a suite of software comprised of many individual scripts and applications. This suite of software will prepare a data set called "CORD-19" for processing with the Distant Reader. As a pre-processing step for the Distant Reader, the suite processes the CORD-19 metadata and its associated JSON files. infomotions-com-172 Index of /sandbox Home Alex Catalogue Serials Blog Musings Sandbox Sandbox Sandbox This is the Infomotions Sandbox, a place where we do applied research & development. The things found in here are experimental and may not work correctly. Your milage may vary, but they ought to be fun anyway. Name Last modified Size Description Parent Directory C4LJ/ 2008-05-24 13:17 alex-lite/ 2011-04-11 20:15 alex/ 2009-09-24 20:12 bibframe/ 2016-03-06 15:40 blues/ 2020-10-12 18:16 books/ 2014-12-13 07:11 concordance/ 2009-06-13 07:46 great-books-redux/ 2013-04-17 02:12 great-books/ 2010-12-31 10:39 gutenberg-index/ 2019-05-04 17:18 gutenberg/ 2009-01-19 08:57 liam/ 2018-01-04 10:47 mbooks/ 2014-11-08 20:20 mine-alamw11/ 2014-11-03 21:23 mine-mail/ 2014-11-03 21:33 mylibrary/ 2007-10-24 20:15 solr-sru/ 2014-09-22 21:59 timeline/ 2014-03-19 11:03 Author: Eric Lease Morgan Date created: 2009-01-19 (Martin Luther King Day) Date updated: 2009-01-19 URL: http://infomotions.com/sandbox/ infomotions-com-2987 To all these ends, Voyant Tools counts & tabulates the frequencies of words, plots the results in a number of useful ways, supports topic modeling, and the comparison documents across a corpus. This essay describes, illustrates, and demonstrates how the Digital Public Library of America (DPLA) can build on the good work of others who support the creation and maintenance of collections and provide value-added services against texts — a concept we call "use & understand". More specifically, this proposal assumes the collections of the DPLA include things like but not necessarily limited to: digitized versions of public domain works, the full-text of open access scholarly journals and/or trade magazines, scholarly and governmental data sets, theses & dissertations, a substantial portion of the existing United States government documents, the archives of selected mailing lists, and maybe even the archives of blog postings and Twitter feeds. infomotions-com-3637 Home Serials Blog Musings Sandbox Alex Catalogue Alex Catalogue Browse by author Browse by title Browse by tag About the Catalogue Alex Catalogue of Electronic Texts Alex Catalogue of Electronic Texts This is a collection of public domain and open access documents with a focus on American and English literature as well as Western philosophy. Its purpose is to help facilitate a person''s liberal arts education. "Big ideas don''t fit on a mobile." Discover what books you consider "great". Take the Great Books Survey. Creator: Eric Lease Morgan Date created: 1994-07-23 Date updated: 2014-12-12 URL: http://infomotions.com/alex infomotions-com-3769 Fun with RSS and the RSS aggregator called Planet – Infomotions Mini-Musings This posting outlines how I refined a number of my RSS feeds and then aggregated them into a coherent whole using Planet. The result is a fledgling system I call "What''s Eric Reading?" Since I wanted to share my wealth (after all, I am a librarian) I created an RSS feed against this system too. I went back to my water collection and created a full-fledged RSS feed against it as well. A couple of years ago the Code4Lib community created an RSS "planet" called Planet Code4Lib — "Blogs and feeds of interest to the Code4Lib community, aggregated." I think it is maintained by Jonathan Rochkind, but I''m not sure. Use the Planet software to aggregate RSS fitting your library''s collection development policy. infomotions-com-3852 With more than twenty years of experience, Infomotions can assist you, your staff, and your fellow employees learn about, create, and maintain digital library collections and services that are usable, scalable, sustainable, and relevant to your patrons. For example, Infomotions has been practicing open access publishing and open source software distribution for more than fifteen years. All of our articles, presentations, workshops, handouts, travel logs, and software are freely available through our Musings on Information and Librarianship. For example, try searching the Musings for articles, librarians, libraries, and librarianship, presentations, or travel logs. Alex Catalogue of Electronic Texts a collection of "great" American and English literature as well as Western philosophy Mr. Serials Collection a set of library-related electronic serials If you think Infomotions can assist you and your organization with your digital library collections and services, then don''t hesitate to drop us a line. eric_morgan@infomotions.com infomotions-com-555 Home Alex Catalogue Serials Water Water Blog Musings Planet Sandbox Water Collection Alas, as of December 13, 2014, my water collection has gone off-line, but you can read about it in a series of blog postings. infomotions-com-6757 It means PDF files need to have been “born digitally” or they need to have been processed with optical character recognition (OCR), and then … Continue reading Creating a plain text version of a corpus with Tika This essay describes, illustrates, and demonstrates how the Digital Public Library of America (DPLA) can build on the good work of others who support the creation and maintenance of collections and provide value-added services against texts — a concept we call “use & understand”. I decided to give it a whirl and particpate in the DPLA Beta Sprint, and below is my submission: DPLA Beta Sprint Submission My DPLA Beta Sprint submission will describe and demonstrate how the digitized versions of library collections can be made more useful through the application of text mining and various other digital humanities … Continue reading DPLA Beta Sprint Submission This posting describes the initial process I am using to do such a thing, but the imporant thing to note is that this process is more about librarianship than it is … Continue reading Collecting the Great Books infomotions-com-7836 infomotions-com-9318 Keywords: user-centered design; SOCHE; presentations; librarianship; Source: This essay was never formally published, but it was created for Southwestern Ohio Council for Higher Education (SOCHE) and a conference called ''The Human Face of Information (technology)'' Wednesday, May 6, 2009 at Wright State University In a sentence I learned two things: 1) institutional repository software such as Fedora, DSpace, and EPrints are increasingly being used for more than open access publishing efforts, and 2) the Web Services API of Fedora makes it relatively easy for developers using any programming language to interface with the underlying core.Keywords: Gruene, Texas; institutional repositories; digital libraries; travel log; Source: This file was never formally published. The purpose of OCKHAM is to articulate and design a set of "light weight reference models" for creating and maintaining digital library services and collections.Keywords: OCKHAM (Open Community Knowledge Hypermedia Administration and Metadata); Atlanta, GA; travel log; Source: This text was never published. infomotions-com-9504 To all these ends, Voyant Tools counts & tabulates the frequencies of words, plots the results in a number of useful ways, supports topic modeling, and the comparison documents across a corpus. This essay describes, illustrates, and demonstrates how the Digital Public Library of America (DPLA) can build on the good work of others who support the creation and maintenance of collections and provide value-added services against texts — a concept we call "use & understand". More specifically, this proposal assumes the collections of the DPLA include things like but not necessarily limited to: digitized versions of public domain works, the full-text of open access scholarly journals and/or trade magazines, scholarly and governmental data sets, theses & dissertations, a substantial portion of the existing United States government documents, the archives of selected mailing lists, and maybe even the archives of blog postings and Twitter feeds. infomotions-com-953 Browse by date Browse by subject Infomotions'' Musings on Information and Librarianship Infomotions'' Musings on Information and Librarianship This is a collection of the things I''ve written -my musings. It includes pre-edited as well as formally published articles, travel logs, descriptions of software applications, and the hand-outs of workshops and presentations. Adding Internet resources to our OPACs Description: This essay advocates the addition of bibliographic records describing Internet-based electronic serials and Internet resources in general to library online public access catalogs (OPAC), addresses a few implications of this proposition, and finally, suggests a few solutions to accomplish this goal. Subject(s): cataloging; articles; URL: http://infomotions.com/musings/adding-internet-resources/ A lot of the time, this means thinking, studying, writing, sharing, and repeating the process. I believe it is important to share one''s ideas freely. This collection is a manifestation of that idea. To these ends I am sharing the texts in this collection with you. URL: http://infomotions.com/musings/ infomotions-com-9966 Michael Hart in Roanoke (Indiana) – Infomotions Mini-Musings On Saturday, February 27, Paul Turner and I made our way to Roanoke (Indiana) to listen to Michael Hart tell stories about electronic texts and Project Gutenberg. To celebrate its 100th birthday, the Roanoke Public Library invited Michael Hart of Project Gutenberg fame to share his experience regarding electronic texts in a presentation called "Books & eBooks: Past, Present & Future Libraries". "The things Project Gutenberg creates are electronic texts, not ebooks. Maybe I should have phrased it differently and asked him, the way Paul did, to compare the experience of reading physical books and electronic texts. Posted on March 7, 2010March 11, 2010Author Eric Lease MorganCategories Alex Catalogue, TraveloguesTags Michael Hart, Project Gutenberg, Roanoke (Indiana) 5 thoughts on "Michael Hart in Roanoke (Indiana)" As president of the Roanoke Public Library Board, I appreciate what you have written and how the Michael Hart presentation impacted you. mallet-cs-umass-edu-3654 MAchine Learning for LanguagE Toolkit MALLET is open source software For research use, please remember to cite MALLET. MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. MALLET includes sophisticated tools for document classification: efficient routines for converting text to "features", In addition to classification, MALLET includes tools for sequence tagging Topic models are useful for analyzing large collections of The MALLET topic modeling toolkit contains efficient, Many of the algorithms in MALLET depend on numerical optimization. MALLET includes an efficient implementation of Limited Memory BFGS, In addition to sophisticated Machine Learning applications, MALLET includes routines for transforming text documents into [Quick Start] [Developer''s Guide] [Quick Start] [Developer''s Guide] An add-on package to MALLET, called GRMM, contains support for inference in general graphical models, "MALLET: A Machine Learning for Language Toolkit." pkp-sfu-ca-4628 PKP is a multi-university initiative developing (free) open source software and conducting research to improve the quality and reach of scholarly publishing Public Knowledge Project > Open Journal Systems Public Knowledge Project > Open Journal Systems Open Journal Systems (OJS) is an open source software application for managing and publishing scholarly journals. Originally developed and released by PKP in 2001 to improve access to research, it is the most widely used open source journal publishing platform in existence, with over 10,000 journals using it worldwide. PKP Publishing Services also offers a fee-based service which provides the installation and hosting of OJS, as well as performing daily backups of your data, applying security patches and upgrades, and priority answering your support questions. All revenue generated by the hosting service goes into developing PKP software and supporting the Public Knowledge Project. For support with PKP software we encourage users to consult our documentation and search our support forums. planet-infomotions-com-3359 [15] Next steps include: calculating an integer denoting the number of pages in an item, implementing a Web-based search interface to a subset’s full text as well as metadata, putting the source code (written in Python and Bash) on GitHub. After that I need to: identify more robust ways to create subsets from the whole of EEBO, provide links to the raw TEI/XML as well as HTML versions of items, implement quite a number of cosmetic enhancements, and most importantly, support the means to compare & contrast items of interest in each subset. The next steps are numerous and listed in no priority order: putting the whole thing on GitHub, outputting the reports in generic formats so other things can easily read them, improving the terminal-based search interface, implementing a Web-based search interface, writing advanced programs in R that chart and graph analysis, provide a means for comparing & contrasting two or more items from a corpus, indexing the corpus with a (real) indexer such as Solr, writing a "cookbook" describing how to use the browser to to "kewl" things, making the metadata of corpora available as Linked Data, etc. planet-infomotions-com-4104 Eric Lease Morgan''s Writings Timeline Eric Lease Morgan''s Writings Timeline This is timeline of my writings to date. (Well, the vast majority of ''em.) Click & drag or use your mouse wheel to navigate backwards and forwards through time. Click on an item to read a synopsis or link to the full text. See also the "planet" for a textual view. For more information see the blog posting. Author: Eric Lease Morgan Date created: December 20, 2010 Date updated: June 4, 2011 URL: http://planet.infomotions.com/timeline/ planet-infomotions-com-7919 Planet Eric Lease Morgan http://planet.infomotions.com/ Catholic Portal DH @ Notre Dame DH Blog @ Notre Dame LiAM: Linked Archival Metadata LiAM: Linked Archival Metadata Life of a Librarian Days in the Life of a Librarian Mini-musings Infomotions Mini-Musings Infomotions Mini-Musings Musings Infomotions'' Musings on Information and Librarianship Readings What''s Eric Reading? Water collection Water de Jour planet-infomotions-com-8900 [15] Next steps include: calculating an integer denoting the number of pages in an item, implementing a Web-based search interface to a subset’s full text as well as metadata, putting the source code (written in Python and Bash) on GitHub. After that I need to: identify more robust ways to create subsets from the whole of EEBO, provide links to the raw TEI/XML as well as HTML versions of items, implement quite a number of cosmetic enhancements, and most importantly, support the means to compare & contrast items of interest in each subset. The next steps are numerous and listed in no priority order: putting the whole thing on GitHub, outputting the reports in generic formats so other things can easily read them, improving the terminal-based search interface, implementing a Web-based search interface, writing advanced programs in R that chart and graph analysis, provide a means for comparing & contrasting two or more items from a corpus, indexing the corpus with a (real) indexer such as Solr, writing a "cookbook" describing how to use the browser to to "kewl" things, making the metadata of corpora available as Linked Data, etc. planet-infomotions-com-8963 planet-infomotions-com-9545 Rome in three days, an archivists introduction to linked data publishing Questions from a library science student about RDF and linked data Publishing archival descriptions as linked data via databases Simple linked data recipe for libraries, museums, and archives Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment Selected Internet Resources on Digital Research Data Curation Open source software and libraries: A current SWOT analysis Web-scale discovery indexes and "next generation" library catalogs MyLibrary: A Digital library framework & toolbox Open Library Developer''s Meeting: One Web Page for Every Book Ever Published Open source software at the Montana State University Libraries Symposium Open source software for libraries in 30 minutes Exploiting "Light-weight" Protocols and Open Source Tools to Implement Digital Library Collections and Services Open source software in libraries: A workshop Open source software in libraries Open source software in libraries Open source software in libraries serials-infomotions-com-5908 Index of / Serials Serials Electronic serials This is a loose collection of electronic journals (serials), mostly from the area of library science. As a librarian this sort of information interests me and that is why it has been collected. The process to create this collection has been coined the Mr. Serials Process. Since fewer and fewer electronic serials are distributed via electronic mail, the Mr. Serials Process is slowly becoming obsolete, but for some things it still works just fine. Read more about the Mr. Serials Process in Eric Lease Morgan "Description and Evaluation of the ''Mr. Serials'' Process: Automatically Collecting, Organizing, Archiving, Indexing, and Disseminating Electronic Serials" Serials Review 21 no. For the latest information regarding Mr. Serials see "Mr. Serials is Dead. Long live Mr. Serials." dated January 11, 2009. Name Last modified Size Description Author: Eric Lease Morgan Date created: 1992-06-21 Date updated: 2009-01-12 URL: http://serials.infomotions.com sites-nd-edu-1179 sites-nd-edu-1522 sites-nd-edu-1664 sites-nd-edu-1720 sites-nd-edu-1886 sites-nd-edu-1918 sites-nd-edu-2178 sites-nd-edu-2246 sites-nd-edu-2497 sites-nd-edu-2573 sites-nd-edu-2650 sites-nd-edu-2818 sites-nd-edu-2908 sites-nd-edu-2910 sites-nd-edu-3073 sites-nd-edu-3118 sites-nd-edu-3187 sites-nd-edu-3469 sites-nd-edu-3471 sites-nd-edu-3574 sites-nd-edu-3585 sites-nd-edu-3678 sites-nd-edu-3721 sites-nd-edu-393 sites-nd-edu-3940 sites-nd-edu-460 sites-nd-edu-5154 sites-nd-edu-5464 sites-nd-edu-6066 sites-nd-edu-6181 sites-nd-edu-6245 sites-nd-edu-6302 sites-nd-edu-6366 sites-nd-edu-6432 sites-nd-edu-6582 sites-nd-edu-6875 sites-nd-edu-755 sites-nd-edu-7631 sites-nd-edu-7840 sites-nd-edu-7928 sites-nd-edu-8089 sites-nd-edu-8419 sites-nd-edu-8448 sites-nd-edu-8489 sites-nd-edu-8691 sites-nd-edu-8707 sites-nd-edu-8762 sites-nd-edu-9146 sites-nd-edu-9191 sites-nd-edu-9996 sites-tufts-edu-6731 stedolan-github-io-4569 Tutorial Manual Source Try online! Linux (64-bit) OS X (64-bit) Windows (64-bit) Other platforms, older versions, and source Try online at jqplay.org! jq is like sed for JSON data you can use it to slice and filter and map and transform structured data with the same ease that sed, You can download a single binary, scp it to a far away machine of the same type, and expect it to work. jq can mangle the data format that you have into the one that you shorter and simpler than you''d expect. Go read the tutorial for more, or the manual See installation options on the download page, and the release notes jq 1.5 released, including new datetime, math, and regexp functions, See installation options on the release notes releases page. releases page. jq 1.4 (finally) released! Get it on the download page. Get it on the download page. jq 1.3 released. jq 1.3 released. tika-apache-org-2948 This release includes a new artifact to enable starting tika-server as a service via Eric Pugh, improved detection of zip-based formats, more complex PDF processing options, security fixes and numerous bug fixes and dependency upgrades. Please see the CHANGES.txt file for a full list of changes in this release, and have a look at the download page for more information on how to obtain Apache Tika 1.2. Please see the CHANGES.txt file for a full list of changes in this release, and have a look at the download page for more information on how to obtain Apache Tika 1.2. Please see the CHANGES.txt file for a full list of changes in this release, and have a look at the download page for more information on how to obtain Apache Tika 1.2. Please see the CHANGES.txt file for a full list of changes in this release, and have a look at the download page for more information on how to obtain Apache Tika 1.2. twitter-com-9838 We''ve detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter? Yes Something went wrong, but don''t fret — let''s give it another shot. ucla-zoom-us-1408 www-gnu-org-8892 GNU Project Free Software Foundation GNU parallel can then split the input and pipe it into If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. GNU parallel makes sure output from the commands is the same output as possible to use output from GNU parallel as input for other programs. For each line of input GNU parallel will execute command with If you prefer reading a book buy GNU Parallel 2018 at https://www.lulu.com/shop/ole-tange/gnu-parallel-2018/paperback/product-23558902.html of OPTIONS in man parallel (Use LESS=+/EXAMPLE: https://www.gnu.org/software/parallel/parallel_cheat.pdf For alternatives to GNU parallel, see GNU parallel, see: man parallel_design The GNU Parallel Citation FAQ. GNU parallel has two mailing lists: for discussing uses of GNU parallel. You can show your support for GNU parallel using our merchandise. O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014. www-gutenberg-org-6207 www-gutenberg-org-941 www-laurenceanthony-net-8779 Laurence Anthony''s AntConc All previous releases of AntConc can be found at the following link. AntConc 3.2.1 Tutorial (in English) Latest version available here. For example if you download AntConc 3.5.8, which was released in 2019, you would cite/reference it as follows: AntConc (Version 3.5.8) [Computer Software]. These lists can be imported into AntConc and used as reference corpora word lists to create keyword lists. Brown Corpus word frequency list (lowercase) These can be imported into AntConc to create lemma word lists. An English lemma list based on all words in the BNC corpus with a frequency greater than 2 (created by Laurence Anthony). To use this list, *append* a hyphen (-) and apostrophe ('') character to the AntConc token definition to ensure the processed correctly (see global settings). To use this list, *append* a hyphen (-) and apostrophe ('') character to the AntConc token definition (see global settings). www-xsede-org-5929 youtu-be-1944 Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features © 2020 Google LLC