Aversions to Artificial Intelligence (AI)

If LLMs were not created with copyrighted materials, or, if LLMs were created with copyrighted materials, but access to the LLMs was limited only to authorized persons, then do you think the library community would be more amenable to their use?

Generative-AI kinda sorta works like this:

amass a set of documents
parse the documents into sentence-like chunks
create a document/term matrix ("vectorize") of all the sentences
save the sentence/vector combinations in a database
garner a query
vectorize the query
compute the similariaties ("distances") between the vectorized query and each of the stored vectors
return the N most similar sentences
done

The problem the library community seems to have is with Step #3, which is often called "embedding" and is done by things called "embedders". These embedders are representations of their original documents. They, in and of themselves, are sets of vectors describing the distances between documents. (Think "Markov models", only on steroids.) In order to create these embedders, both a whole lot of content and computing horsepower are necessary; a huge amount of linear algebra needs to be computed agaist a huge amount of sentences in order identify useful distance measures. Many, if not most, of these embedders were created from content harvested from the Web. Alas?

Now suppose the library community were to create one or more of their own embedders? Suppose the content came from things like Project Gutenberg, the HathiTrust, open access journal articles, public domain government documentss, etc? Suppose the process of vectorizing -- the creation a document/term matrixes -- was done with something as simple as Scikit Learn's CountVectorizer. The generative-AI results may not be as good as the results generated by Big Tech, but at least the facet of copyright would have been eliminated.

Suppose the copyright facet were removed from the LLM equation. Do you think the the library community would be as averse to the use of generative-AI? If not, then I assert the aversion is not necessarily copyright but something else, and if so, then what? Put another way, besides copyright, what aversions to the use of generatie-AI does the library community seem to have? Environmentaly unfriendly? Too easy? Requires additional information literacy skills? Professionally threatening? Fear of change? Minimal understanding of how generative-AI works?

Don't get me wrong. I have not been drinking any Kool-Aid, and that said, I don't understand what the problem is. To me the use of LLMs is merely another tool in my library toolbox.

Creator: Eric Lease Morgan <eric_morgan@infomotions.com>
Source: This posting was originally shared in the Code4Lib Slack channel, May 22, 2025.
Date created: 2025-06-17
Date updated: 2025-06-17
Subject(s): generative-AI;
URL: https://distantreader.org/blog/aversions-to-ai/