id author title date pages extension mime words sentences flesch summary cache txt lawlesst-github-io-2692 Ted Lawless .xml application/rss+xml 1684 82 63 We are defining keyphrases as up to three word phrases that are key, or important, to the overall subject matter of the document Keyphrase is often used interchangeably with keywords, but we are opting to use the former since it's more descriptive We did a fair amount of reading to grasp prior art in this area, extracting keyphrases is a long standing research topic in information retrieval and natural language processing, and ended up developing a custom solution based on term frequency in the Constellate corpus If you are interested in this work generally, and not just the Constellate implementation, Burton DeWilde has published an excellent primer on automated keyphrase extraction. Summary: The Datasette API available at https://baseballdb.lawlesst.net now contains the full Lahman Baseball Database. Summary: publishing the Lahman Baseball Database with Datasette API available at https://baseballdb.lawlesst.net. The code below is an example of issuing a query to the Wikidata SPARQL endpoint and loading the data into a pandas dataframe and running basic operations on the returned data. ./cache/lawlesst-github-io-2692.xml ./txt/lawlesst-github-io-2692.txt