Generative-AI Summarization

Ann Blair's book Too Much To Know overflows with techniques of how pre-early modern scholars dealt with information overload. [1] One of the more oft-used techniques is summarization. With the advent of generative-AI, it is almost trivial to create more-than-plausible summaries of documents.

The linked Python script is an example. [2] Given the path to a plain text file, the script will load a configured large-language model, vectorize the given plain text file, compare the two, and output a three-sentence summary. I enhanced the script to work in batch, and thus I have used the technique to summarize collections of items:

  * each chapter in each book written by Jane Austen [3]
  * 250 journal articles on the topic rheumatoid arthritis [4]
  * another 250 journal articles on the topic of climate change [5]
  * 130 articles on the topic of cataloging [6]

For any given document there are zero 100% correct summaries; everybody will summarize a document differently. That said, the results of this automated process look pretty good to me. Moreover, each list of summaries addresses difficult to answer questions such as:

  * how can Jane Austen's works be characterized?
  * what is rheumatoid arthritis and what are some of its treatments?
  * how is climate change being manifested across the globe?
  * how has the practice of cataloging changed over time?

The lists of summaries may be deemed as information overload in-and-of themselves, and one might consider summarizing the summaries. Such is an exercise left up to the reader.

I believe libraries and librarians ought to learn how to exploit generative-AI for summarization purposes. Just as the migration of printed cards to MARC transformed how libraries hosted catalogs, migrating from hand-crafted summaries to computed summaries will transform how information overload is managed.

[1] Blair, Ann. 2010. Too Much to Know : Managing Scholarly Information Before the Modern Age. New Haven Conn: Yale University Press.

--
Eric Lease Morgan <emorgan@nd.edu>
June 27, 2024