Four curated study carrels: Emma, the Iliad and Odyssey, Moby Dick, and Walden
While here, basking inside the grandeur of the Sainte-Geneviève Library (Paris), I have finished curating four Distant Reader study carrels:
- Emma by Jane Austen
- The Iliad and the Odyssey by Homer
- Moby Dick by Herman Melville
- Walden by Henry David Thoreau
Introduction
Distant Reader study carrels are data sets intended to be read by people as well as computers. They are created through the use of a tool of my own design -- the Distant Reader Toolbox. Given an arbitrary number of files in a myriad of formats the Toolbox caches the files, transforms them into plain text files, performs feature extractions against the plain text, and finally saves the results as sets of tab-delimited files as well as an SQLite database. The files and the database can then computed against -- modeled -- in a myriad of ways: extents (sizes in words and readability scores), frequencies (unigrams, bigrams, keywords, parts-of-speech, named entities), topic modeling, network analysis, and a growing number of indexes (concordances, full-text searching, semantic indexing, and more recently, large language model embeddings).
I call these data sets "study carrels", and they are designed to be platform- and network-independent. Study carrel functionality requires zero network connectivity, and study carrel files can be read by any spreadsheet, database, analysis program (like OpenRefine), or programming language. Heck, I could even compute against study carrels on my old Macintosh SE 30 (circa 1990) if I really desired. For more detail regarding study carrels, see the readme file included with each carrel. All that said, once a carrel is created is lends itself to all sorts analysis, automated or not. The "not automated" analysis I call "curation" which is akin to a librarian curating any of their print collections.
With this in mind, I have curated four study carrels. I have divided each of the four books (above) into their individual chapters. I then created study carrels from the results, and I have done distant reading against each. I applied distant reading, observed the results, summarized my observations, and documented what I learned. Since each curation details what I learned, I won't go into all of it here, but I will highlight some of the results of my topic modeling.
Topic modeling
In a sentence, topic modeling is an unsupervised machine learning process used to enumerate the latent themes in any corpus. Given an integer (T), topic modeling divides a corpus into T topics and outputs the words associated with each topic. Like most machine learning techniques, the topic modeling process is nuanced and therefore the results are not deterministic. Still, topic modeling can be quite informative. For example, once a model has been created, the underlying documents can be associated with ordinal values (such as dates or sequences of chapters). The model can then be pivoted so the ordinal values and the topic model weights are compared. Finally, the pivoted table can be visualized in the form of a line chart. Thus, a person can address the age-old question, "How did such and such and so and so topic ebb and flow over time?" This is exactly what I did with Emma, the Iliad and the Odyssey, Moby Dick, and Walden. In each and every case, my topic modeling described the ebb and flow of the given book, which, in the end, was quite informative and helped me characterize each.
Emma
I topic modeled Emma with only four topics, and I assert the novel is about "emma", "engagement", "charade", and "jane". Moreover, these topics can be visualized as a pie chart as well as a line chart. Notice how "emma" dominates. From my point of view, it is all about Emma and her interactions/relationships with the people around her. For more elaboration, see the curated carrel.
label | weight | features |
emma | 3.0353 | emma harriet weston knightley elton time great woodhouse quite nothing dear always |
engagement | 0.2443 | engagement affection attachment snow circumstances happiness behaviour letter feeling heart resolution reflection |
charade | 0.18906 | charade likeness sit eye sea lines alone wingfield maid picture smith's south |
jane | 0.16189 | jane fairfax bates campbell dixon colonel cole dancing campbells instrument dance crown |
![]() topics |
![]() topics over time |
Iliad and Odyssey
I did the same thing with the Iliad and the Odyssey, but this time I modeled with a value of eight. From this process, I assert the epic poems are about "man", "trojans", "achaeans", "achilles", "sea", ulysses", "horses", and "alcinous". This time "man" dominates, but "trojans" and "acheans" are a close second. More importantly, plotting the topics over the sequence of the books (time), I can literally see how the two poems are distinct stories; notice how the first part of the line chart is all about "trojans", and the second is all about "man". See the curated carrel for an elaboration.
labels | weights | features |
man | 1.08328 | man house men father ulysses home people gods |
trojans | 0.38826 | trojans spear hector achaeans fight jove ships battle |
achaeans | 0.16146 | achaeans ships agamemnon atreus jove king held host |
achilles | 0.14746 | achilles peleus hector city priam body river women |
sea | 0.11063 | sea ship men circe island cave wind sun |
ulysses | 0.10853 | ulysses telemachus suitors penelope eumaeus stranger house bow |
horses | 0.10553 | horses diomed tydeus nestor agamemnon chariot menelaus ulysses |
alcinous | 0.0444 | alcinous phaeacians clothes stranger demodocus vulcan girl nausicaa |
![]() topics | ![]() topics over time |
Moby Dick
I topic modeled Moby Dick with a value of ten, and the resulting topics included: "ahab", "whales", "soul", "pip", "boats", "queequeg", "cook", "whaling", "jonah", and "bildad". The topics of "ahab" and "whales" dominate, and if you know the story, then this makes perfect sense. Topic modeling over time illustrates how the book's themes alternate, and thus I assert the book is not only about Ahab's obsession with the white whale, but it is also about the process of whaling, kinda like an instruction manual. Again, see the curated carrel for an elaboration.
labels | weights | features |
ahab | 0.99807 | ahab man ship sea time stubb head men |
whales | 0.2336 | whales sperm leviathan time might fish world many |
soul | 0.08189 | soul whiteness dick moby brow mild times wild |
pip | 0.08068 | pip carpenter coffin sun fire blacksmith doubloon try-works |
boats | 0.07229 | boats line air spout water oars tashtego leeward |
queequeg | 0.06133 | queequeg bed room landlord harpooneer door tomahawk bedford |
cook | 0.0569 | cook sharks dat blubber mass tun bucket bunger |
whaling | 0.05234 | whaling ships gabriel voyage whale-ship whalers fishery english |
jonah | 0.04218 | jonah god loose-fish fast-fish law shipmates guernsey-man woe |
bildad | 0.02847 | bildad peleg steelkilt sailor gentlemen lakeman radney don |
![]() topics | ![]() topics over time |
Walden
Unlike the other books, Walden is not a novel but instead a set of essays. Set against the backdrop of a pond (but we would call it a lake), Thoreau elaborates on his observations of nature and what it means to be human. In this case I modeled with seven topics, and the results included: "man", "water", "woods", "beans", "books", "purity", and "sheltor". Yet again, the topic of "man" dominates, but notice how each of the chapters' titles very closely correspond to each of the computed topics. As I alluded to previously, pivoting a topic model on some other categorical value, often brings very interesting details to light. See the curated carrel for more detail.
labels | weights | features |
man | 1.64 | man life men house time day part world get morning work thought |
water | 0.56165 | water pond ice shore surface walden spring deep bottom snow winter summer |
woods | 0.2046 | woods round fox pine door bird snow evening winter night suddenly near |
beans | 0.19471 | beans hoe fields seed cultivated soil john corn field planted labor dwelt |
books | 0.19032 | books forever words language really things men learned concord intellectual news wit |
purity | 0.16002 | purity evening body warm laws gun humanity streams sensuality hunters vegetation animal |
shelter | 0.10274 | shelter clothes furniture cost labor fuel clothing free houses people works boards |
![]() topics | ![]() topics over time |
Summary
I have used both traditional as well as distant reading against four well-known books. I have documented what I learned, and this documentation has been manifested as a set of four curated Distant Reader study carrels. I assert traditional reading's value will never go away. After all, novels and sets of essays are purposely designed to be consumed through traditional ("close") reading. On the other hand, the application of distant reading can quickly and easily highlight all sorts of characteristics which are not, at first glance, very evident. The traditional and distant reading processes compliment each other.
Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This is the original publication.
Date created: 2025-10-13
Date updated: 2025-10-13
Subject(s): Distant Reader; readings; Emma; Walden; Moby Dick; Iliad; Odyssey;
URL: https://distantreader.org/blog/four-curated-carrels/