id author title date pages extension mime words sentences flesch summary cache txt github-com-9780 GitHub - ericleasemorgan/reader: Distant Reader, a tool for using & understanding a corpus .html text/html 1348 190 69 GitHub ericleasemorgan/reader: Distant Reader, a tool for using & understanding a corpus GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. The Distant Reader CORD is a high performance computing (HPC) system which: 1) takes an almost arbitrary amount of unstructured data (text) as input and outputs a set of structured data for analysis, and 2) does this work against a specific data set called CORD-19. As an HPC, the Distant Reader CORD is not a single computer program but instead a suite of software comprised of many individual scripts and applications. This suite of software will prepare a data set called "CORD-19" for processing with the Distant Reader. As a pre-processing step for the Distant Reader, the suite processes the CORD-19 metadata and its associated JSON files. ./cache/github-com-9780.html ./txt/github-com-9780.txt