Four curated study carrels: Emma, the Iliad and Odyssey, Moby Dick, and Walden

While here, basking inside the grandeur of the Sainte-Geneviève Library (Paris), I have finished curating four Distant Reader study carrels:

  1. Emma by Jane Austen
  2. The Iliad and the Odyssey by Homer
  3. Moby Dick by Herman Melville
  4. Walden by Henry David Thoreau

Introduction

Distant Reader study carrels are data sets intended to be read by people as well as computers. They are created through the use of a tool of my own design -- the Distant Reader Toolbox. Given an arbitrary number of files in a myriad of formats the Toolbox caches the files, transforms them into plain text files, performs feature extractions against the plain text, and finally saves the results as sets of tab-delimited files as well as an SQLite database. The files and the database can then computed against -- modeled -- in a myriad of ways: extents (sizes in words and readability scores), frequencies (unigrams, bigrams, keywords, parts-of-speech, named entities), topic modeling, network analysis, and a growing number of indexes (concordances, full-text searching, semantic indexing, and more recently, large language model embeddings).

I call these data sets "study carrels", and they are designed to be platform- and network-independent. Study carrel functionality requires zero network connectivity, and study carrel files can be read by any spreadsheet, database, analysis program (like OpenRefine), or programming language. Heck, I could even compute against study carrels on my old Macintosh SE 30 (circa 1990) if I really desired. For more detail regarding study carrels, see the readme file included with each carrel. All that said, once a carrel is created is lends itself to all sorts analysis, automated or not. The "not automated" analysis I call "curation" which is akin to a librarian curating any of their print collections.

With this in mind, I have curated four study carrels. I have divided each of the four books (above) into their individual chapters. I then created study carrels from the results, and I have done distant reading against each. I applied distant reading, observed the results, summarized my observations, and documented what I learned. Since each curation details what I learned, I won't go into all of it here, but I will highlight some of the results of my topic modeling.

Topic modeling

In a sentence, topic modeling is an unsupervised machine learning process used to enumerate the latent themes in any corpus. Given an integer (T), topic modeling divides a corpus into T topics and outputs the words associated with each topic. Like most machine learning techniques, the topic modeling process is nuanced and therefore the results are not deterministic. Still, topic modeling can be quite informative. For example, once a model has been created, the underlying documents can be associated with ordinal values (such as dates or sequences of chapters). The model can then be pivoted so the ordinal values and the topic model weights are compared. Finally, the pivoted table can be visualized in the form of a line chart. Thus, a person can address the age-old question, "How did such and such and so and so topic ebb and flow over time?" This is exactly what I did with Emma, the Iliad and the Odyssey, Moby Dick, and Walden. In each and every case, my topic modeling described the ebb and flow of the given book, which, in the end, was quite informative and helped me characterize each.

Emma

I topic modeled Emma with only four topics, and I assert the novel is about "emma", "engagement", "charade", and "jane". Moreover, these topics can be visualized as a pie chart as well as a line chart. Notice how "emma" dominates. From my point of view, it is all about Emma and her interactions/relationships with the people around her. For more elaboration, see the curated carrel.

label weight features
emma 3.0353 emma harriet weston knightley elton time great woodhouse quite nothing dear always
engagement 0.2443 engagement affection attachment snow circumstances happiness behaviour letter feeling heart resolution reflection
charade 0.18906 charade likeness sit eye sea lines alone wingfield maid picture smith's south
jane 0.16189 jane fairfax bates campbell dixon colonel cole dancing campbells instrument dance crown
./figures/emma_topics.png
topics
./figures/emma_topics-over-time.png
topics over time

Iliad and Odyssey

I did the same thing with the Iliad and the Odyssey, but this time I modeled with a value of eight. From this process, I assert the epic poems are about "man", "trojans", "achaeans", "achilles", "sea", ulysses", "horses", and "alcinous". This time "man" dominates, but "trojans" and "acheans" are a close second. More importantly, plotting the topics over the sequence of the books (time), I can literally see how the two poems are distinct stories; notice how the first part of the line chart is all about "trojans", and the second is all about "man". See the curated carrel for an elaboration.

labels weights features
man 1.08328 man house men father ulysses home people gods
trojans 0.38826 trojans spear hector achaeans fight jove ships battle
achaeans 0.16146 achaeans ships agamemnon atreus jove king held host
achilles 0.14746 achilles peleus hector city priam body river women
sea 0.11063 sea ship men circe island cave wind sun
ulysses 0.10853 ulysses telemachus suitors penelope eumaeus stranger house bow
horses 0.10553 horses diomed tydeus nestor agamemnon chariot menelaus ulysses
alcinous 0.0444 alcinous phaeacians clothes stranger demodocus vulcan girl nausicaa
./figures/homer_topics.png
topics
./figures/homer_topics-over-time.png
topics over time

Moby Dick

I topic modeled Moby Dick with a value of ten, and the resulting topics included: "ahab", "whales", "soul", "pip", "boats", "queequeg", "cook", "whaling", "jonah", and "bildad". The topics of "ahab" and "whales" dominate, and if you know the story, then this makes perfect sense. Topic modeling over time illustrates how the book's themes alternate, and thus I assert the book is not only about Ahab's obsession with the white whale, but it is also about the process of whaling, kinda like an instruction manual. Again, see the curated carrel for an elaboration.

labels weights features
ahab 0.99807 ahab man ship sea time stubb head men
whales 0.2336 whales sperm leviathan time might fish world many
soul 0.08189 soul whiteness dick moby brow mild times wild
pip 0.08068 pip carpenter coffin sun fire blacksmith doubloon try-works
boats 0.07229 boats line air spout water oars tashtego leeward
queequeg 0.06133 queequeg bed room landlord harpooneer door tomahawk bedford
cook 0.0569 cook sharks dat blubber mass tun bucket bunger
whaling 0.05234 whaling ships gabriel voyage whale-ship whalers fishery english
jonah 0.04218 jonah god loose-fish fast-fish law shipmates guernsey-man woe
bildad 0.02847 bildad peleg steelkilt sailor gentlemen lakeman radney don
./figures/moby_topics.png
topics
./figures/moby_topics-over-time.png
topics over time

Walden

Unlike the other books, Walden is not a novel but instead a set of essays. Set against the backdrop of a pond (but we would call it a lake), Thoreau elaborates on his observations of nature and what it means to be human. In this case I modeled with seven topics, and the results included: "man", "water", "woods", "beans", "books", "purity", and "sheltor". Yet again, the topic of "man" dominates, but notice how each of the chapters' titles very closely correspond to each of the computed topics. As I alluded to previously, pivoting a topic model on some other categorical value, often brings very interesting details to light. See the curated carrel for more detail.

labels weights features
man 1.64 man life men house time day part world get morning work thought
water 0.56165 water pond ice shore surface walden spring deep bottom snow winter summer
woods 0.2046 woods round fox pine door bird snow evening winter night suddenly near
beans 0.19471 beans hoe fields seed cultivated soil john corn field planted labor dwelt
books 0.19032 books forever words language really things men learned concord intellectual news wit
purity 0.16002 purity evening body warm laws gun humanity streams sensuality hunters vegetation animal
shelter 0.10274 shelter clothes furniture cost labor fuel clothing free houses people works boards
./figures/walden_topics.png
topics
./figures/walden_topics-over-time.png
topics over time

Summary

I have used both traditional as well as distant reading against four well-known books. I have documented what I learned, and this documentation has been manifested as a set of four curated Distant Reader study carrels. I assert traditional reading's value will never go away. After all, novels and sets of essays are purposely designed to be consumed through traditional ("close") reading. On the other hand, the application of distant reading can quickly and easily highlight all sorts of characteristics which are not, at first glance, very evident. The traditional and distant reading processes compliment each other.


Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This is the original publication.
Date created: 2025-10-13
Date updated: 2025-10-13
Subject(s): Distant Reader; readings; Emma; Walden; Moby Dick; Iliad; Odyssey;
URL: https://distantreader.org/blog/four-curated-carrels/