What is war, and how can it be justified?

I set out to address the questions, "What is war, and how can it be justified?" I propose to accomplish my goal by doing some analysis against the suggested readings of the University's 2022 Forum on War and Peace. More specifically, I digitized the suggested readings, amalgamated the readings into a corpus, applied a number of different data science computing technquies against the corpus, and tried to answer my questions. Below is what I learned.

Aboutness

The corpus includes fifteen items for a total of 1.5 million words. (By comparison, the Bible is about .8 million words long.) They date from 1500 (Gascoigne's The Fruites of War) to 2022 (Pope Francis's Against War). See the rudimentary bibliography complete with computed summaries and statistically significant keywords for more detail.

Unigram, bigram, and keyword visualizations begin to tell what the corpus is about:


unigrams

bigrams


keywords

Hidden in the fine print, I see the phrases "just war" and "good war", which give me hope I might be able to find some sort of answers to my questions. I also see the names of many people, and I might begin to ask new questions like "Who are all of these people, and what do they do?" See the automatically generated summary page for more descriptive statistic-like details regarding the corpus. From there you might discern the corpus is a set of narratives as opposed to a set of academic writings.

To further understand the "aboutness" of the corpus, I applied topic modeling to the whole. (Topic modeling is an unsupervised machine learning process used to enumerate latent themes in a corpus.). Thus, if I model the text for a single word, then the result is "war". I can assert, "The corpus is about war." Modeling with four words returns few surprises, especially with a knowledge corpus's titles: 1) war, 2) time, 3) mining, and 4) uncle. Modeling with fifteen topics results in the following, more nuanced, themes and proportions:

       labels  weights                                           features
         time  1.50303  time just man day another old last still take ...
          war  0.53878  war men death life soldier love soldiers enemy...
        peace  0.42298  peace war world god people weapons power human...
          men  0.36095  men trench company fire enemy front british li...
        world  0.30624  war world american civil looking film new amer...
         know  0.25768  know around just black ing really hair inside ...
       german  0.21518  war german people germans men plane american w...
       mother  0.19793  mother courage cart kattrin war chaplain cook ...
       mining  0.08940  mining human catholic development rights socia...
        qasim  0.05114  war qasim porn roy scranton went man told fuck...
          may  0.02922  may yet might english spanish gascoigne full s...
       naples  0.02907  naples giulia captain american italian remembe...
        uncle  0.02225  uncle toby father trim corporal quoth shall wo...
     slothrop  0.00822  slothrop among light away white rocket comes r...
    someplace  0.00805  someplace herero fires interface frame hollers...

These themes can be seen as a whole, and thus we can visualize their proportions:

The underlying model is manifested as matrix of rows and columns. If we add author values to the matrix and pivot the matrix accordingly, we can literally to what degree each author wrote about the predicted themes. Thus, compared to the others, the Pope wrote the most about peace. Hmmm:

What is war?

More to one of my original questions, "What is war?" One way to address that question is simply to query our corpus for "war is" or "war was", as in the following form. Unfortunately, the results link entire books and not chapters, let alone paragraphs or sentences:

query:
output format: HTML CSV JSON

(Use the form to search for additional things of interest.)

Another solution is to apply a concordancing, and suppose the text following each of the phrases are definitions found through concordancing. Some of the more telling results are below, and a complete listing has been saved locally. War is:

To paraphrase John Firth, "You shall know words by the the company they keep", and consequently counting, tabulating, and visualizing the bigrams containing the word war is a bit illustrative with both a word cloud as well as a network diagram. The result begins to tell me how the word war is used in the corpus:


bigram cloud

bigram network

Oftentimes, bigrams as seen in the same window of text, like a concordance, is more informative than straight up bigrams. These are called collocations. Collocating bigrams containing the word "war" results in these visualizations:


collocated bigrams cloud

collocated bigrams network

After removing the overwhelming bigrams, the results are more meaningful:


collocated bigrams cloud

collocated bigrams network

Eric Lease Morgan <emorgan@nd.edu>
Navari Family Center for Digital Scholarship
University of Notre Dame

April 5, 2023