id author title date pages extension mime words sentences flesch summary cache txt andromedayelton-com-301 archival face recognition for fun and nonprofit – andromeda yelton .html text/html 987 93 76 archival face recognition for fun and nonprofit – andromeda yelton Sadly, because we cannot have nice things, the data sets used for pretrained face recognition embeddings are things like lots of modern photos of celebrities, a corpus which wildly underrepresents 1) archival photos and 2) Black people. For step 1, I'm using DPLA, which has a super straightforward and well-documented API and an easy-to-use Python wrapper (which, despite not having been updated in a while, works just fine with Python 3.6, the latest version compatible with some of my dependencies). For step 3, face recognition, I'm using the steps in the same tutorial, but purely for proof-of-concept — the results are garbage because archival photos from mid-century don't actually look anything like modern-day celebrities. Gotcha 1: If you fetch a page from the API and assume you can treat its contents as an image, you will be sad. ./cache/andromedayelton-com-301.html ./txt/andromedayelton-com-301.txt