id author title date pages extension mime words sentences flesch summary cache txt github-com-6149 twut/usage.md at main · archivesunleashed/twut · GitHub .html text/html 2363 615 67 twut/usage.md at main · archivesunleashed/twut · GitHub Single-column DataFrame containing Tweet IDs. Scala DF ids(tweetsDF).show(2, false) Single-column DataFrame containing the tweet time. Single-column DataFrame containing the source of the tweet. Single-column DataFrame containing urls. urls(tweetsDF).show(10, false) SelectTweet.urls(df).show(10, False) Single-column DataFrame containing animated gif urls. |https://pbs.twimg.com/tweet_video_thumb/EKyat33U4AEpVFf.jpg| |https://pbs.twimg.com/tweet_video_thumb/EKyat33U4AEpVFf.jpg| |https://pbs.twimg.com/tweet_video_thumb/EKyQ1fAU8AM7r1I.jpg| |https://pbs.twimg.com/tweet_video_thumb/EKyQ1fAU8AM7r1I.jpg| |https://pbs.twimg.com/tweet_video_thumb/EKyau1OU8AAD_OZ.jpg| |https://pbs.twimg.com/tweet_video_thumb/EKyau1OU8AAD_OZ.jpg| Single-column DataFrame containing image urls. Single-column DataFrame containing video urls. val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) val tweetsDF = spark.read.json(tweets) In the Scala console, the results are automatically assigned to a variable, like the following: val results = spark.read.parquet("/path/to/export/directory/") results.show(20, false) tweet_ids.write.csv("/path/to/export/directory/") tweet_ids.write.csv("/path/to/export/directory/", header='true') tweet_ids.write.parquet("/path/to/export/directory/") tweet_ids = spark.read.parquet("/path/to/export/directory/") tweet_ids.show(20, false) ./cache/github-com-6149.html ./txt/github-com-6149.txt