Open Library provides dumps of all its data, generated every month. All of the dumps are formatted as tab separated files with the following columns:
-
type
- type of record (/type/edition, /type/work etc.)
-
key
- unique key of the record. (/books/OL1M etc.)
-
revision
- revision number of the record
-
last_modified
- last modified timestamp
-
JSON
- the complete record in JSON format
Dumps:
-
editions dump (~ 5.7G)
-
works dump (~ 1.6G)
-
authors dump (~ 0.3G)
-
all types dump (~ 7.8G): includes editions, works, authors, redirects, etc.
- complete dump (~ 18.3G): also includes past revisions of all the records in Open Library
For past dumps, see: https://archive.org/details/ol_exports?sort=-publicdate
Format of JSON records
A JSON schema for the various types is located at https://github.com/internetarchive/openlibrary-client/tree/master/olclient/schemata
-
Author Records: JSON serialization of a type/author
-
Edition Records: JSON serialization of a type/edition
- Work Records: JSON serialization of a type/work
OL Covers Dump
:TODO:
History
- Created December 14, 2011
- 16 revisions
February 13, 2020 | Edited by Drini | wording |
February 13, 2020 | Edited by Drini | fix headings |
February 13, 2020 | Edited by Drini | make more concise + add link to past dumps on archive.org |
January 22, 2020 | Edited by Tom Morris | Update dump sizes |
December 14, 2011 | Created by Anand Chitipothu | Documented Open Library Data Dumps |