id author title date pages extension mime words sentences flesch summary cache txt github-com-275 CDX Server API · webrecorder/pywb Wiki · GitHub .html text/html 1709 217 71 CDX Server API · webrecorder/pywb Wiki · GitHub In addition to replay capabilities, pywb also provides an extensive api for querying the capture index (CDX). The api can be used to get information about a range of archive captures/mementos, including filtering, sorting, and pagination for bulk query. pywb actually uses this same api internally to perform all index lookups in a consistent way. For example, the following query might return the first 10 results from host http://example.com/* where http://localhost:8080/coll-cdx?url=http://example.com/*&page=1&filter=mime:text/html&limit=10 The cdx-server command line application starts pywb in cdx server only mode (web archive replay functionality is not loaded, only the index). exact -default setting, will return captures that match the url exactly ...coll-cdx?url=example.com/*&filter==mime:text/html&filter=!=status:200 Return captures from example.com/* where mime is text/html and http status is not 200. The cdx server supports an optional pagination api, but it is currently only available when using ZipNum Compressed Index instead of a plain text cdx files. ./cache/github-com-275.html ./txt/github-com-275.txt