GitHub - ericleasemorgan/ojs-toolbox: Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. Skip to content Sign up Why GitHub? Features → Code review Project management Integrations Actions Packages Security Team management Hosting Mobile Customer stories → Security → Team Enterprise Explore Explore GitHub → Learn & contribute Topics Collections Trending Learning Lab Open source guides Connect with others Events Community forum GitHub Education GitHub Stars program Marketplace Pricing Plans → Compare plans Contact Sales Nonprofit → Education → In this repository All GitHub ↵ Jump to ↵ No suggested jump to results In this repository All GitHub ↵ Jump to ↵ In this repository All GitHub ↵ Jump to ↵ Sign in Sign up {{ message }} ericleasemorgan / ojs-toolbox Watch 1 Star 4 Fork 0 Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. GPL-2.0 License 4 stars 0 forks Star Watch Code Issues 0 Pull requests 0 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights Dismiss Join GitHub today GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together. Sign up GitHub is where the world builds software Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world. Sign up for free Dismiss master 1 branch 0 tags Go to file Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Go back Launching Xcode If nothing happens, download Xcode and try again. Go back Launching Visual Studio If nothing happens, download the GitHub extension for Visual Studio and try again. Go back Latest commit   Git stats 14 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time bin     .gitignore     LICENSE     README.md     View code README.md OJS Toolbox Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. OJS is a journal publishing system. [1] Is supports a REST-ful API allowing the developer to read & write to the System's underlying database. [2] This hack -- the OJS Toolbox -- merely caches & reads the metadata associated with the published issues of a given journal title. The Toolbox is written in Bash. To cache the metadata, you will need to have additional software as part of your file system: curl and jq. [3, 4] Curl is used to interact with the API. Jq is used to read & parse the resulting JSON streams. When & if you want to transform the cached JSON files into rudimentary bibliographics, then you will also need to install GNU Parallel, a tool which makes parallel processing trivial. [5] Besides the software, you will need three pieces of information. The first is the root URL of the OJS system/title you wish to use. This value will probably look something like this --> https://example.com/index.php/foo Ask the OJS systems administrator regarding the details. The second piece of information is an authorization token. If an "api secret" has been created by the local OJS systems administrator, then each person with an OJS account ought to have been granted a token. Again, ask the OJS systems administrator for details. The third piece of information is the name of a directory where your metadata will be cached. For the sake of an example, assume the necessary values are: root URL - https://example.com/index.php/foo token - xyzzy directory - bar Once you have gotten this far, you can cache the totality of the issue metadata: $ ./bin/harvest.sh https://example.com/index.php/foo xyzzy bar More specifically, harvest.sh will create a directory called bar. It will then determine how many issues exist in the title foo. It will then harvest sets of issue data, parse each set into individual issue files, and save the result as JSON files in the bar directory. You now have a "database" containing all the bibliographic information of a given title For my purposes, I need a TSV file with four columns: 1) author, 2) title, 3) date, and 4) url. Such is the purpose of issues2tsv.sh and issue2tsv.sh. The first script, issues2tsv.sh, takes a directory as input. It then outputs a simple header, finds all the JSON files in the given directory, and passes them along (in parallel) to issue2tsv.sh which does the actual work. Thus, to create my TSV file, I submit a command like this: $ ./bin/issues2tsv.sh bar > ./bar.tsv The resulting file (bar.tsv) looks something like this: author title date url Kilgour The Catalog 1972-09-01 https://example.com/index.php/foo/article/download/5738/5119 McGee Two Designs 1972-09-01 https://example.com/index.php/foo/article/download/5739/5120 Saracevic Book Reviews 1972-09-01 https://example.com/index.php/foo/article/download/5740/5121 Give such a file, I can easily download the content of a given article, extract any of its plain text, perform various natural language processing tasks against it, text mine the whole, full text index the whole, apply various bits of machine learning against the whole, and in general, "read" the totality of the journal. See The Distant Reader for details. [6] Links [1] OJS - https://pkp.sfu.ca/ojs/ [2] OJS API - https://docs.pkp.sfu.ca/dev/api/ojs/3.1 [3] curl - https://curl.haxx.se [4] jq - https://stedolan.github.io/jq/ [5] GNU Parallel - https://www.gnu.org/software/parallel/ [6] Distant Reader - https://distantreader.org Eric Lease Morgan October 26, 2019 About Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream. Resources Readme License GPL-2.0 License Releases No releases published Packages 0 No packages published Languages Shell 100.0% © 2020 GitHub, Inc. Terms Privacy Cookie Preferences Security Status Help Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. Learn more. Accept Reject We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products. You can always update your selection by clicking Cookie Preferences at the bottom of the page. For more information, see our Privacy Statement. Essential cookies We use essential cookies to perform essential website functions, e.g. they're used to log you in. Learn more Always active Analytics cookies We use analytics cookies to understand how you use our websites so we can make them better, e.g. they're used to gather information about the pages you visit and how many clicks you need to accomplish a task. Learn more Accept Reject Save preferences