id author title date pages extension mime words sentences flesch summary cache txt github-com-8714 GitHub - DocNow/twarc: A command line tool (and Python library) for archiving Twitter JSON .html text/html 3226 451 80 GitHub DocNow/twarc: A command line tool (and Python library) for archiving Twitter JSON twarc is a command line tool and Python library for archiving Twitter JSON data. twarc search blacklivesmatter > tweets.jsonl twarc search '#blacklivesmatter OR #blm to:deray' > tweets.jsonl twarc search '#blacklivesmatter' --lang fr > tweets.jsonl twarc search blacklivesmatter --geocode 38.7442,-90.3054,1mi > tweets.jsonl The filter command will use Twitter's statuses/filter API to collect tweets as they happen. twarc filter blacklivesmatter,blm > tweets.jsonl Use the follow command line argument if you would like to collect tweets from twarc filter --follow 759251 > tweets.jsonl twarc filter --locations "\-74,40,-73,41" > tweets.jsonl twarc filter blacklivesmatter,blm --follow 759251 > tweets.jsonl twarc's hydrate command will read a file of tweet identifiers and write out the tweet JSON for them using Twitter's status/lookup API. The timeline command will use Twitter's user timeline API to collect the most recent tweets posted by the user indicated by screen_name. twarc --app_auth search ferguson > tweets.jsonl ./cache/github-com-8714.html ./txt/github-com-8714.txt