GitHub - ericleasemorgan/ojs-toolbox: Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream.


      Skip to content
      
    
                Sign up
              

                    Why GitHub?
                    
                      
                    Features →
                    	Code review
	Project management
	Integrations
	Actions
	Packages
	Security
	Team management
	Hosting
	Mobile


                    	Customer stories →
	Security →


                Team
              
	
                Enterprise
              
	
                    Explore
                    
                      
                    	Explore GitHub →


                    Learn & contribute

                    	Topics
	Collections
	Trending
	Learning Lab
	Open source guides


                    Connect with others

                    	Events
	Community forum
	GitHub Education
	GitHub Stars program


                Marketplace
              
	
                    Pricing
                    
                       
                    Plans →

                    	Compare plans
	Contact Sales


                    	Nonprofit →
	Education →


        In this repository
      
      
        All GitHub
      
      ↵
    

      Jump to
      ↵
    

    No suggested jump to results
  

        In this repository
      
      
        All GitHub
      
      ↵
    

      Jump to
      ↵
    

        In this repository
      
      
        All GitHub
      
      ↵
    

      Jump to
      ↵
    

          Sign in
        
            
              Sign up
            
      
      {{ message }}


      ericleasemorgan
    
    /
  
    ojs-toolbox
  
  
    Watch
    
      1
    

      Star

    
      4
    

          Fork

      
        0
      
  
        Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream.
      

            GPL-2.0 License
        
      
        4
        stars
      
        
        0
        forks
    

      Star


    Watch

      
            Code
              
      
            Issues
              0
      
	
            Pull requests
              0
      
	
            Actions
              
      
            Projects
              0
      
	
            Security
              
      
            Insights
              
      
            More
          

                    Code
                
	
                    Issues
                
	
                    Pull requests
                
	
                    Actions
                
	
                    Projects
                
	
                    Security
                
	
                    Insights
                

          Dismiss
        
        Join GitHub today

        GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.

        Sign up
      

          GitHub is where the world builds software

          Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.

          
            Sign up for free
            
              Dismiss
            
          
      master
      
    
      1
      branch
    
    
      0
      tags
    
  
    Go to file


      Code
      
    
  Clone


            HTTPS
          
            GitHub CLI

    
        Use Git or checkout with SVN using the web URL.
    

      Work fast with our official CLI.
      Learn more.
    

                Open with GitHub Desktop
            
	
                Download ZIP
            

          Launching GitHub Desktop

          If nothing happens, download GitHub Desktop and try again.

          Go back
        

          Launching GitHub Desktop

          If nothing happens, download GitHub Desktop and try again.

          Go back
        

          Launching Xcode

          If nothing happens, download Xcode and try again.

          Go back
        

          Launching Visual Studio

          If nothing happens, download the GitHub extension for Visual Studio and try again.

          Go back
        

    Latest commit

    
        Git stats

        	
                    14
                  commits
              
            
  Files

  
      Permalink

  
    Failed to load latest commit information.


        Type

        Name

        Latest commit message

        Commit time

      
            bin
          

            .gitignore
          

            LICENSE
          

            README.md
          

        View code
      
    
        README.md
      

        OJS Toolbox

Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream.

OJS is a journal publishing system. [1] Is supports a REST-ful API allowing the developer to read & write to the System's underlying database. [2] This hack -- the OJS Toolbox -- merely caches & reads the metadata associated with the published issues of a given journal title.

The Toolbox is written in Bash. To cache the metadata, you will need to have additional software as part of your file system: curl and jq. [3, 4] Curl is used to interact with the API. Jq is used to read & parse the resulting JSON streams. When & if you want to transform the cached JSON files into rudimentary bibliographics, then you will also need to install GNU Parallel, a tool which makes parallel processing trivial. [5]

Besides the software, you will need three pieces of information. The first is the root URL of the OJS system/title you wish to use. This value will probably look something like this --> https://example.com/index.php/foo  Ask the OJS systems administrator regarding the details. The second piece of information is an authorization token. If an "api secret" has been created by the local OJS systems administrator, then each person with an OJS account ought to have been granted a token. Again, ask the OJS systems administrator for details. The third piece of information is the name of a directory where your metadata will be cached. For the sake of an example, assume the necessary values are:

	root URL - https://example.com/index.php/foo
	token - xyzzy
	directory - bar


Once you have gotten this far, you can cache the totality of the issue metadata:

$ ./bin/harvest.sh https://example.com/index.php/foo xyzzy bar


More specifically, harvest.sh will create a directory called bar. It will then determine how many issues exist in the title foo. It will then harvest sets of issue data, parse each set into individual issue files, and save the result as JSON files in the bar directory. You now have a "database" containing all the bibliographic information of a given title

For my purposes, I need a TSV file with four columns: 1) author, 2) title, 3) date, and 4) url. Such is the purpose of issues2tsv.sh and issue2tsv.sh. The first script, issues2tsv.sh, takes a directory as input. It then outputs a simple header, finds all the JSON files in the given directory, and passes them along (in parallel) to issue2tsv.sh which does the actual work. Thus, to create my TSV file, I submit a command like this:

$ ./bin/issues2tsv.sh bar > ./bar.tsv


The resulting file (bar.tsv) looks something like this:

	author	title	date	url
	Kilgour	The Catalog	1972-09-01	https://example.com/index.php/foo/article/download/5738/5119
	McGee	Two Designs	1972-09-01	https://example.com/index.php/foo/article/download/5739/5120
	Saracevic	Book Reviews	1972-09-01	https://example.com/index.php/foo/article/download/5740/5121


Give such a file, I can easily download the content of a given article, extract any of its plain text, perform various natural language processing tasks against it, text mine the whole, full text index the whole, apply various bits of machine learning against the whole, and in general, "read" the totality of the journal. See The Distant Reader for details. [6]

Links

[1] OJS - https://pkp.sfu.ca/ojs/

[2] OJS API - https://docs.pkp.sfu.ca/dev/api/ojs/3.1

[3] curl - https://curl.haxx.se

[4] jq - https://stedolan.github.io/jq/

[5] GNU Parallel - https://www.gnu.org/software/parallel/

[6] Distant Reader - https://distantreader.org


Eric Lease Morgan <emorgan@nd.edu>

October 26, 2019


            About


      Given a Open Journal System (OJS) root URL and an authorization token, cache all JSON files associated with the given OJS title, and optionally output rudimentary bibliographics in the form of a tab-separated value (TSV) stream.
    

  Resources

  
      Readme
  

  License

  
        GPL-2.0 License
    
  
    Releases


    No releases published


    Packages 0


        No packages published 

      
              Languages


        Shell
        100.0%
      
    
      	© 2020 GitHub, Inc.
	Terms
	Privacy
	
  Cookie Preferences

	Security
	Status
	Help
	Contact GitHub
	Pricing
	API
	Training
	Blog
	About


    You can’t perform that action at this time.
  

    You signed in with another tab or window. Reload to refresh your session.
    You signed out in another tab or window. Reload to refresh your session.
  

  We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products.

              Learn more.
            

              Accept
              Reject
            

  We use optional third-party analytics cookies to understand how you use GitHub.com so we can build better products.

              
              You can always update your selection by clicking Cookie Preferences at the bottom of the page.
              For more information, see our Privacy Statement.
            

              Essential cookies

              We use essential cookies to perform essential website functions, e.g. they're used to log you in. 
                Learn more
              

              Always active

            
              Analytics cookies

              We use analytics cookies to understand how you use our websites so we can make them better, e.g. they're used to gather information about the pages you visit and how many clicks you need to accomplish a task. 
                Learn more
              

                Accept
                Reject
              

            Save preferences