id author title date pages extension mime words sentences flesch summary cache txt github-com-8326 GitHub - ericleasemorgan/htid2books: Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. .html text/html 2298 278 71 GitHub ericleasemorgan/htid2books: Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. For example, ./bin/htid2txt.sh 194dfe2bg3 xa5350f0c44548487778e942518a nyp.33433082524681 In this case, the script will do the tiniest bit of validation, repeatedly run a Perl script (htid2txt.pl) to get the OCR of an individual page, cache the result, and when there no more pages in the given book, concatenate the cache into a text file saved in the directory named ./books. Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. Given an access key, secret token, and a HathiTrust identifier, output plain text as well as PDF versions of a book. ./cache/github-com-8326.html ./txt/github-com-8326.txt