Summary of your 'study carrel'
==============================

This is a summary of your Distant Reader 'study carrel'.

The Distant Reader harvested & cached your content into a
collection/corpus. It then applied sets of natural language
processing and text mining against the collection. The results of
this process was reduced to a database file -- a 'study carrel'.
The study carrel can then be queried, thus bringing light
specific characteristics for your collection. These
characteristics can help you summarize the collection as well as
enumerate things you might want to investigate more closely.

This report is a terse narrative report, and when processing 
is complete you will be linked to a more complete narrative
report. 

                               Eric Lease Morgan <emorgan@nd.edu>


Number of items in the collection; 'How big is my corpus?'
----------------------------------------------------------
32


Average length of all items measured in words; "More or less, how big is each item?"
------------------------------------------------------------------------------------
13603


Average readability score of all items (0 = difficult; 100 = easy)
------------------------------------------------------------------
56


Top 50 statistically significant keywords; "What is my collection about?"
-------------------------------------------------------------------------
5	University
5	Learning
4	machine
3	research
3	library
3	datum
2	system
2	entity
2	edge
2	September
2	Research
2	OCLC
2	Metadata
2	Library
2	Libraries
2	Data
1	word
1	visualization
1	type
1	time
1	table
1	startup
1	scholar
1	process
1	preservation
1	pmss
1	plague
1	place
1	people
1	oclc
1	node
1	new
1	network
1	need
1	moral
1	model
1	mention
1	material
1	literary
1	link
1	line
1	learning
1	issue
1	image
1	idea
1	human
1	house
1	great
1	graph
1	field


Top 50 lemmatized nouns; "What is discussed?"
---------------------------------------------
2393	e
1960	%
1344	o
1336	n
1315	t
1256	i
1252	datum
1176	r
1070	p
1030	a
995	c
859	collection
836	people
759	library
754	time
753	research
718	entity
683	h
652	project
649	s
638	system
604	l
598	model
523	work
479	number
475	service
450	machine
449	learning
447	information
429	d
424	user
421	participant
419	metadata
417	example
402	v
395	image
390	source
378	way
377	f
370	type
367	figure
366	data
351	house
347	part
345	year
332	edge
325	activity
324	network
323	use
311	value


Top 50 proper nouns; "What are the names of persons or places?"
--------------------------------------------------------------
5337	_
1565	AI
644	�
564	Research
461	Index
439	University
433	Report
386	Intelligence
383	Artificial
331	Data
324	•
300	Learning
279	Library
251	Forum
234	Machine
217	Metadata
210	Bootleg
207	Digital
204	al
200	Figure
199	Libraries
198	OCLC
191	Wikibase
188	United
180	B
175	TA
174	W
170	Chart
168	CONTENTdm
160	National
152	States
150	Group
148	-
144	ML
135	Cross
133	u
130	M
126	New
125	Collection
120	A
118	et
118	Social
117	St.
114	London
114	Academic
111	International
110	Conference
109	John
108	Focus
108	.


Top 50 personal pronouns nouns; "To whom are things referred?"
-------------------------------------------------------------
2557	it
1955	they
1941	we
1555	i
1042	them
943	you
502	he
299	us
209	me
201	themselves
174	him
95	itself
84	she
52	one
42	her
40	’s
40	himself
36	myself
23	ourselves
18	yourself
11	herself
3	λ
3	thee
3	mine
2	α
2	thyself
2	theirs
2	t
2	ours
2	`ikr?qh2f
2	#f[mb+f/`
1	​[ensure
1	zbmath,19
1	y’
1	yourselves
1	yours
1	yolov5
1	yolov2
1	x
1	wikicite,48
1	u
1	thus,--
1	themself
1	tart
1	r
1	oneself
1	implementers.11
1	hxpj3brxrynd9
1	hvib+bfk
1	https://www.loc.gov/standards/premis/.


Top 50 lemmatized verbs; "What do things do?"
---------------------------------------------
11509	be
2864	have
1294	do
1123	use
736	make
617	see
555	say
551	go
539	provide
535	include
518	take
504	come
458	give
433	find
417	learn
367	know
365	base
348	link
346	work
342	create
319	develop
318	need
296	follow
277	show
270	generate
268	•
268	identify
264	get
261	build
253	describe
252	help
247	improve
241	think
238	call
233	share
214	support
208	increase
206	relate
203	add
202	die
200	look
199	focus
188	require
188	begin
186	allow
179	represent
174	note
172	consider
164	bring
164	become


Top 50 lemmatized adjectives and adverbs; "How are things described?"
---------------------------------------------------------------------
2049	not
1078	more
910	other
671	such
642	also
641	well
595	so
532	new
510	many
498	only
456	as
429	up
421	very
400	most
394	then
384	first
339	out
326	large
324	good
323	different
313	digital
311	great
296	same
293	high
278	even
278	-
271	now
258	much
216	here
214	several
201	public
201	institutional
194	specific
192	important
188	together
188	just
182	own
181	next
179	often
176	long
173	few
172	poor
171	satisfied
170	available
159	indeed
157	e.g.
156	possible
155	again
154	particular
154	however


Top 50 lemmatized superlative adjectives; "How are things described to the extreme?"
-------------------------------------------------------------------------
129	good
115	most
79	least
51	high
25	large
25	great
22	Most
18	bad
11	late
10	near
8	big
5	hard
5	fast
4	rich
4	low
3	stout
3	early
2	wide
2	ter
2	strong
2	simple
2	short
2	sharp
2	safe
2	long
2	hot
2	farth
2	deep
2	broad
1	wealthy
1	true
1	timely
1	sure
1	sparse
1	small
1	silly
1	remote
1	raw
1	quick
1	ordinari
1	old
1	nice
1	new
1	manif
1	l
1	j.ins.2019.12.082
1	https://www.globenewswire.com/news-release/2018/03/27/1453732/0/en/Global-Polymerase-Chain-Reaction-Market-Will-Reach
1	gentle
1	full
1	fresh


Top 50 lemmatized superlative adverbs; "How do things do to the extreme?"
------------------------------------------------------------------------
285	most
41	least
25	well
2	long
1	fast
1	ai4lam


Top 50 Internet domains; "What Webbed places are alluded to in this corpus?"
----------------------------------------------------------------------------
331	doi.org
132	researchworks.oclc.org
111	arxiv.org
110	hangingtogether.org
86	www.oclc.org
51	paperswithcode.com
45	registry.gbif.org
35	github.com
32	www.loc.gov
32	orcid.org
30	dx.doi.org
24	refhub.elsevier.com
21	www.liberatingstructures.com
21	gamestorming.com
20	www.researchgate.net
18	www.w3.org
17	www.aclweb.org
16	iwgsc.nal.usda.gov
15	www.mediawiki.org
14	towardsdatascience.com
14	drive.google.com
13	www.microsoft.com
12	visualqa.org
12	help.oclc.org
12	hdl.huntington.org
11	www.wired.com
11	reflections.mndigital.org
11	ec.europa.eu
11	cdm16002.contentdm.oclc.org
10	www.nytimes.com
10	library.stanford.edu
10	en.wikipedia.org
10	docs.google.com
10	dl.acm.org
10	archive-it.org
9	www.usgs.gov
9	www.tandfonline.com
9	www.congress.gov
9	web.archive.org
9	creativecommons.org
8	phil.cdc.gov
7	www.wwp.northeastern.edu
7	www.timeshighereducation.com
7	www.ars.usda.gov
7	sites.haa.pitt.edu
7	critinq.wordpress.com
7	cdm16014.contentdm.oclc.org
6	www.nngroup.com
6	www.lyrasis.org
6	www.ebgconsulting.com


Top 50 URLs; "What is hyperlinked from this corpus?"
----------------------------------------------------
7	http://paperswithcode.com/paper/efficientnet-rethinking-model-scaling-for
7	http://creativecommons.org/licenses/by/4.0/
7	http://arxiv.org/abs/1412.3555
6	http://www.oclc.org/research/areas/data-science/linkeddata/linked-data-prototype.html
6	http://www.ebgconsulting.com/blog/the-4ls-a-retrospective-technique/
6	http://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
6	http://hangingtogether.org/?p=7845
6	http://doi.org/10.25333/C3PG8J
6	http://doi.org/10.25333/BGFG-D241
5	http://www.oclc.org/research/themes/data-science/linkeddata/contentdm-linked-data-pilot.html
5	http://www.oclc.org/research
5	http://www.oclc.org
5	http://inception-project.github.io
5	http://doi.org/10.5479/si.13241612
5	http://doi.org/10.25333/faq3-ax08
5	http://arxiv.org/abs/2005.00687
5	http://arxiv.org/abs/2004.01375
4	http://www.timeshighereducation.com/world-university-rankings/2021/world-ranking#!/page/0/length/25/sort_by/rank/sort_order/asc/cols/stats
4	http://www.oclc.org/en/worldcat/linked-data/shared-entity-management-infrastructure.html
4	http://www.oclc.org/en/fast.html
4	http://www.nngroup.com/articles/dot-voting/
4	http://www.mediawiki.org/wiki/Manual:Pywikibot/Overview
4	http://www.loc.gov/aba/pcc/taskgroup/PCC-TG-Identity-Management-in-NACO-rev2018-05-22.pdf
4	http://www.liberatingstructures.com/mad-tea/
4	http://www.infodocket.com/2019/06/13/library-of-congress-posts-solicitation-for-a-machine-learning-deep-learning-pilot-program-to-maximize-the-use-of-its-digital-collection-library-is-looking-for-r/
4	http://www.geonames.org/
4	http://www.blog.google/products/assistant/interpreter-mode-brings-real-time-translation-your-phone/
4	http://www.ars-grin.gov/
4	http://researchworks.oclc.org/cdmld/screenshots/entity-Q144548.png
4	http://researchworks.oclc.org/cdmld/screenshots/cdm-property-proposal.png
4	http://refhub.elsevier.com/S0099-1333(21)00025-2/rf0030
4	http://publications.jrc.ec.europa.eu/repository/bitstream/JRC121680/jrc121680_jrc121680_academic_offer_of_advanced_digital_skills.pdf
4	http://passamaquoddypeople.com/passamaquoddy-traditional-knowledge-labels
4	http://paperswithcode.com/paper/self-training-with-noisy-student-improves
4	http://ojs.aaai.org/index.php/aimagazine/article/view/2157
4	http://oecd.ai/
4	http://journal.code4lib.org/articles/13671
4	http://iwgsc.nal.usda.gov
4	http://iiif.io/
4	http://icecores.org/
4	http://hangingtogether.org/?p=7854
4	http://hangingtogether.org/?p=7591
4	http://hangingtogether.org/?p=7135
4	http://hangingtogether.org/?p=7122
4	http://hangingtogether.org/?p=6997
4	http://hangingtogether.org/?p=5929
4	http://hangingtogether.org/?p=5710
4	http://hangingtogether.org/?p=5195
4	http://hangingtogether.org/?p=5091
4	http://gamestorming.com/trading-cards/


Top 50 email addresses; "Who are you gonna call?"
-------------------------------------------------
6	oclcresearch@oclc.org
2	mnarlock@nd.edu
2	jvecchio@nd.edu
2	djohns27@nd.edu
1	xiaoling@apple.com
1	sambband@in.ibm.com
1	ramasurn@in.ibm.com
1	mnm@iisc.ac.in
1	mmaceli@pratt.edu
1	matienzo@stanford.edu
1	anirbanb@iisc.ac.in
1	ai-index-report@stanford.edu


Top 50 positive assertions; "What sentences are in the shape of noun-verb-noun?"
-------------------------------------------------------------------------------
19	a linked data
12	a given year
8	a • recent
6	a given country
5	o do n''t
4	% were african
4	% were hispanic
4	people did not
2	% have non
2	% were asian
2	% were men
2	% were white
2	% were women
2	a given ai
2	a given corpus
2	a given entity
2	a given image
2	a given set
2	a needs assessment
2	a • responsible
2	ai • recent
2	collections are scientific
2	collections are sometimes
2	collections do not
2	data are available
2	data are good
2	libraries are also
2	libraries are not
2	libraries do not
2	libraries have not
2	libraries using newer
2	library is not
2	library was sometimes
2	models using only
2	numbers are difficult
2	people are not
2	people had not
2	people were in
2	people were so
2	people were very
2	services is possible
2	system is correct
2	work is …
1	% is no
1	% is optimal
1	a base model
1	a following discussion
1	a following state
1	a generated face
1	a given activity


Top 50 negative assertions; "What sentences are in the shape of noun-verb-no|not-noun?"
---------------------------------------------------------------------------------------
1	% is no better
1	ai is not yet
1	ai was not only
1	collection are no longer
1	collections have no funds
1	data is not ideal
1	entities are not common
1	entity has no relation
1	libraries are not as
1	libraries have no such
1	model is not as
1	people had no curiosity
1	people made no scruple
1	people were no more
1	people were not very
1	system is not yet
1	systems are not representative
1	systems include not just
1	time was not fully


Sizes of items; "Measures in words, how big is each item?"
----------------------------------------------------------
90540	defoe-plague-1722
80878	stanford-ai-2021
41159	matienzo-lighting-2020
21338	bahnemann-transforming-2021
21124	oclc-transitioning-2020
20691	oclc-social-2020
17394	schindel-economic-2020
14458	orr-bootleg-2020
8771	bielak-attre2vec-2021
8355	white-entrepreneurship-1987
8115	bandyopadhyay-beyond-2021
7542	cohen-machine-2021
7446	zkie-zero-2020
7325	mathews-think-2012
7293	kim-ai-2021
7195	plumb-humanities-2021
7014	wiegand-cultures-2021
6148	altman-building-2021
5935	harper-generative-2021
5793	morgan-bringing-2021
5069	hintze-artificial-2021
4796	lesk-fragility-2021
4339	hansen-can-2021
3987	narlock-digital-2021
3910	prudhomme-taking-2021
3728	aghassibake-supporting-2020
3623	jiang-cross-2021
3101	janco-machine-2021
3073	lucic-towards-2021
2793	maceli-what-2015
1269	yoon-what-2020
1090	johnson-preface-2021


Readability of items; "How difficult is each item to read?"
-----------------------------------------------------------
75.0	stanford-ai-2021
74.0	morgan-bringing-2021
69.0	defoe-plague-1722
68.0	bandyopadhyay-beyond-2021
65.0	mathews-think-2012
63.0	altman-building-2021
63.0	hansen-can-2021
63.0	lesk-fragility-2021
63.0	lucic-towards-2021
63.0	orr-bootleg-2020
63.0	zkie-zero-2020
62.0	bielak-attre2vec-2021
62.0	yoon-what-2020
59.0	harper-generative-2021
58.0	matienzo-lighting-2020
58.0	white-entrepreneurship-1987
56.0	hintze-artificial-2021
55.0	jiang-cross-2021
55.0	kim-ai-2021
55.0	maceli-what-2015
54.0	cohen-machine-2021
54.0	janco-machine-2021
51.0	prudhomme-taking-2021
49.0	wiegand-cultures-2021
48.0	oclc-transitioning-2020
47.0	johnson-preface-2021
47.0	oclc-social-2020
45.0	plumb-humanities-2021
44.0	aghassibake-supporting-2020
44.0	bahnemann-transforming-2021
39.0	schindel-economic-2020
28.0	narlock-digital-2021


Item summaries; "In a narrative form, how can each item be abstracted?"
-----------------------------------------------------------------------
aghassibake-supporting-2020	This article summarizes the findings of Visualizing the Future, which is an IMLS National Forum Grant (RE-73-18-0059-18) to develop a literacy-based instructional and research agenda for library and information professionals with the aim to create a community of praxis focused on data visualization. However, there are many commonalities in how we visualize information and data, and the academic library, at the heart of the university, can play a significant role in teaching these skills. Even visualization services focused on advanced instructional spaces or immersive and large scale displays, require expertise to help patrons learn how to use the space, maintain and manage technology, schedule events to create interest, and, especially in the case of advanced spaces, create and manage content to suggest the possibilities. A needs assessment can help identify user-centered services, outreach, and support that could help create a community around data visualization for students, faculty, researchers, non-library staff, and members of the public.

altman-building-2021	As you begin ingesting and preparing data, you''ll want to explore possible machine learning algorithms to perform on your dataset. Start by determining what general type of learning algorithm you need, and proceed from there to research and select one that While the final output of a machine learning workflow is some sort of intelligent model, The pipeline for a machine learning project generally comprises five stages: data acquisition, data preparation, model training and testing, evaluation and analysis, and application of results. good idea to save a copy in the rawest possible form and treat that copy as immutable, at least during the initial phase of testing different algorithms or configurations. algorithm uses the training data to "learn" a set of rules that it can subsequently apply to new, Immutable data storage can benefit the batch-processing ML pipeline, especially during the initial research and development phase.

bahnemann-transforming-2021	Transforming Metadata into Linked Data to Improve Digital Collection Discoverability: A CONTENTdm Pilot Project The OCLC CONTENTdm Linked Data Pilot project team consisted of the following OCLC staff: In the CONTENTdm Linked Data Pilot project, OCLC partnered testing new applications built in the Wikibase environment for data retrieval, image annotation, This report describes the course of the CONTENTdm Linked Data Pilot project and its primary CONTENTdm Linked Data Pilot project used the Wikibase environment, which includes several OCLC staff exported CONTENTdm metadata for each suggested collection and created an entity a project for each collection in the program OpenRefine25 (figure 11), which provides tools for data CONTENTdm collection metadata in an OpenRefine project.26 View a larger image online. the Wikibase, OCLC developed a CONTENTdm customization that embeds the Schema.org data https://www.oclc.org/en/events/2020/devconnect-online-2020/devconnect-2020-creating-linked-descriptive-data-for-contentdm.html https://www.oclc.org/en/events/2020/devconnect-online-2020/devconnect-2020-creating-linked-descriptive-data-for-contentdm.html https://researchworks.oclc.org/cdmld/screenshots/google-structured-data-testing-tool.png. https://researchworks.oclc.org/cdmld/screenshots/google-structured-data-testing-tool.png. Transforming Metadata into Linked Data to Improve Digital Collection Discoverability 73

bandyopadhyay-beyond-2021	Towards this end, we propose a novel concept of converting a network to its weighted line graph which is ideally suited to find the embedding of edges of the original network. unsupervised approach for edge embedding in homogeneous information networks, without relying on the node embeddings. Our proposed optimization framework for edge embedding also generates a set of node embeddings, which are not just the aggregation • We propose a novel edge embedding framework line2vec, for homogeneous social and information networks. different types of edges in a heterogeneous network, but their proposed method essentially uses an aggregation function inside the optimization framework to generate edge embeddings from the node nodes of the line graph, which essentially provides the edge embeddings of the original network. embedding of the node vuv in line graph (or the edge (u,v) ∈ E). Also two edges having similar neighborhood in the original network lead to two nodes having similar neighborhood in the transformed line graph.

bielak-attre2vec-2021	https://www.researchgate.net/publication/348079131_AttrE2vec_Unsupervised_Attributed_Edge_Representation_Learning?enrichId=rgreq-ee0c9a6154948c3f0080a33b782b9118-XXX&enrichSource=Y292ZXJQYWdlOzM0ODA3OTEzMTtBUzo5NzYzNjM1MzYyNzM0MTFAMTYwOTc5NDYxNTk2NA%3D%3D&el=1_x_3&_esc=publicationCoverPdf https://www.researchgate.net/publication/348079131_AttrE2vec_Unsupervised_Attributed_Edge_Representation_Learning?enrichId=rgreq-ee0c9a6154948c3f0080a33b782b9118-XXX&enrichSource=Y292ZXJQYWdlOzM0ODA3OTEzMTtBUzo5NzYzNjM1MzYyNzM0MTFAMTYwOTc5NDYxNTk2NA%3D%3D&el=1_x_3&_esc=publicationCoverPdf While there have been approaches that work on homogeneous and heterogeneous networks with multi-typed nodes and edges, there is a gap in learning edge representations. learns a low-dimensional vector representation for edges in attributed networks. Keywords: representation learning, graphs, edge embedding, random walk, neural Figure 1: Our proposed AttrE2vec model compared to other methods in the task of an attributed graph This work is motivated by the idea of unsupervised learning on networks with attributed edges such that the embeddings are method is DeepWalk [8], which in two-phases constructs node neighborhoods by performing fixed-length random walks and employs the skip-gram [7] model to preserve the network structure, attributes of nodes and edges (if method is capable of using) and the topological structure of the graph and node and edge attributes: f : (E,F,M) → Rd. Figure 2: Overview of the AttrE2vec model.

cohen-machine-2021	archivally focused project that emerged from a partnership between the Pine Mountain Settlement School (PMSS)1 in Harlan County, Kentucky, and scholars and students at Berea College. a latent social network of historical families represented by the images held in one local archive, curricula for use in Kentucky public schools with PMSS archival materials. That decision led a team of Berea College undergraduate and faculty researchers to scrape the data from the PMSS archive site and supplement the images and transcriptions it contains with available textual metadata drawn from the site.9 Alongside the WordPress facial recognition software to identify the persons in historic photographs in the PMSS archives. We demonstrated to the local members at Pine Mountain how our use case and its constraints for digital archives fit with the current standards for the fair use of copyrighted materials

defoe-plague-1722	My brother, though a very religious man himself, laughed at all I had suggested about its being an intimation from Heaven, and told me several stories of such foolhardy people, as he called them, as I was; that I ought indeed to submit to it as a work of Heaven if I had been any way disabled by distempers or diseases, and that then, not being able to go, I ought to acquiesce in the direction of Him, who, having been my Maker, had an undisputed right of sovereignty in disposing of me; and that then there had been no difficulty to determine which was the call of his providence, and which was not; but that I should take it as an intimation from Heaven that I should not go out of town, only because I could not hire a horse to go, or my fellow was run away that was to attend me, was ridiculous, since at the same time I had my health and limbs, and other servants, and might with ease travel a day or two on foot, and, having a good certificate of being in perfect health, might either hire a horse, or take post on the road, as I thought fit.

hansen-can-2021	I would use the Mathematical Subject Classification (MSC) values assigned to the publications in MathSciNet1 to create a temporal citation network which would allow me to visualize Machine-learning-based categorization needs data to classify, which in our case automated categorization of mathematics, we were dilettantes in the world of machine learning. what happens when smarter and more capable minds tackle the problem of classifying mathematics and other highly technical subjects using advanced machine learning techniques. 9Mathematical Subject Classification (MSC) values in MathSciNet and zbMath are a particularly interesting categorization set to work with as they are assigned and reviewed by a subject area expert editor and an active researcher in the 16See ?iiTb,ff�+�/2KB+XKB+''QbQ7iX+QKf. https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 One really interesting part of the machine learning method used by Microsoft was that it did not rely only on information from the article being replace the work of humans categorizing mathematics articles indexed in a database, which for

harper-generative-2021	Reddit have each issued their own bans on the category of machine-generated or -altered content that is commonly termed "deep fakes" (Cohen 2020; Romm, Harwell, and Stanley-Becker TV because of their dystopian implications, deep fakes are just one application of generative machine learning. Figure 2.2: Images generated with a simple statistical model appear as noise as the model is insufficient to capture the structure of the real data (Markov chains trained using wine bottles and 1In many examples, I have used the Google QuickDraw Dataset to highlight features of generative machine learning. (?iiTb,ff;Bi?m#X+QKf;QQ;H2+''2�iBp2H�#f[mB+F/''�r@/�i�b2i) shows the generator learning how to produce better sketches over time. built a GAN that generates high-quality photo-realistic images of people (Karras, Laine, and Aila Beyond medicine and autonomous vehicles, generative data augmentation will progressively impact other imaging-heavy fields (Shorten and Khoshgoftaar 2019) like GANs in Action: Deep Learning with Generative Adversarial Networks.

hintze-artificial-2021	Artificial Intelligence, with its ability to machine learn coupled to an almost human-like understanding, sounds like the ideal tool to the humanities. But are these technologies imbued with intuition or understanding, and do they learn like humans? In the 80s and 90s, as home computers were becoming more common, Hollywood was sensationalizing the idea of smart or human-like Artificial Intelligent machines (AI) through movies Machine learning allows us to learn from these data sets in ways that exceed human capabilities, while an artificial brain will eventually allow us to objectively describe a subjective experience (through quantifying neural activations or positively and negatively associated memories). The following paragraphs will explore current Machine Learning and Artificial Intelligence learning, to the point where our whole identity as human could be generously defined as the Just because humans and machine learning are both black Currently, machines do not learn but must be trained, typically with human-labeled data.

janco-machine-2021	Tools like RunwayML, the Teachable Machine, and Google AutoML allow researchers to train project-specific Since 2014, dramatic innovations in machine learning have occurred, providing new capabilities in computer vision, natural language processing, and other areas of applied artificial intelligence. deliberately and identify how machine learning methods can benefit a scholar''s research? for identifying basic tasks that can be completed by computers in ways that advance humanities research (2000). When working with texts or images, machine learning models are presently capable of making simple annotations and associations. Google''s Teachable Machine offers an intuitive web application that humanities faculty and students can use to train classification models for images, sounds, and poses. Machine learning models offer a variety of ways to identify similarity and difference with research materials. goals of academic researchers in the humanities with the technical possibilities of machine learning. "Scholarly Primitives: What Methods Do Humanities Researchers Have

jiang-cross-2021	Cross-disciplinary research matters, because (1) it provides an understanding of complex problems that require a multifaceted approach to solve; (2) it combines disciplinary breadth with the ability to collaborate One of the most popular cross-disciplinary research topics/programs is Machine Learning + top strengths of conducting cross-disciplinary ML research and give two examples based on my marriages, just like collaborators expect to have successful project outcomes (Robinson and Blanton 1993; Pettigrew 2000; Xu et al. The history professor Liang Cai and I have collaborated on an international research project titled "Digital Empires: Structured Biographical and Social Network Analysis of Early Chinese We have enjoyed our collaboration and the power of cross-disciplinary research. Specifically, I presented the top strengths of producing successful cross-disciplinary ML research: (1) Partners are satisfied with communication. "The Challenges of Cross ǉ Disciplinary Research." Social Research Collaboration." Social Studies of Science 33, no. "Building Cross-Disciplinary Research Collaborations."

johnson-preface-2021	The plan called for a survey and a series of workshops hosted across the country to explore, originally, "the national need for library based topic modeling tools in support of cross-disciplinary libraries ran concurrently with our grant — Cordell 2020 and Padilla 2019, which were commissioned by major players in the field, the Library of Congress and OCLC, respectively — and vi Machine Learning, Libraries, and Cross-Disciplinary Research We would like to thank the IMLS for providing essential funding support for the grant and the Thank you to the members of the Notre Dame IMLS grant team who, at of course, thanks to the 95 participants in our 2019 IMLS Grant Workshops (too many to enumerate here) and to the essay authors for sharing their expertise and perspectives in growing our collective knowledge of machine learning and its use in research, scholarship, and cultural heritage organizations. https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html https://www.oclc.org/research/publications/2019/oclcresearch-responsible-operations-data-science-machine-learning-ai.html

kim-ai-2021	does not provide an easy answer to the question of how one should program moral decisionmaking into intelligent machines. Described below are some of the significant ethical challenges that autonomous AI systems such as military robots present. 11Note that this moral decision-making process can be modeled with a rule-based symbolic AI approach, a machine 13(Kahn 2012) also argues that the resulting increase in the number of wars by the use of military robots will be morally 15This black-box nature of AI systems powered by machine learning has raised great concern among many AI researchers in recent years. agency in the AI -powered automated information environment presents an ethical challenge In this chapter, I discussed four significant ethical challenges that automating decisions and actions with AI presents: (a) moral desensitization; (b) unintended outcomes; (c) surrender of are at an early stage in developing AI applications and applying machine learning and deep learning techniques to improve library services, systems, and operations.

lesk-fragility-2021	Machine learning systems have a set of data for training. of the real problem (if you train a machine translation program solely on engineering documents, there may be a lot of training data, including many noisy points, and the program may decide on Many popular magazines have discussed this problem; Forbes, for example, had an explanation of how the choice of datasets can produce a biased result without any deliberate attempt to used to suggest malicious creation of training data or examples of data designed to deceive machine learning systems. blood pressure, and lower blood pressure decreases the risk of heart attacks." Then I have to explain that the paper evaluates 32 possibilities (prior/current ownership ⇥ cats/dogs ⇥ 4 medical compare the performance of machine learning systems for medical diagnosis with actual doctors If a program is constantly learning from new data, there is no list of previously fixed failures to

lucic-towards-2021	Reading Chicago Reading1 is a grant-supported digital humanities project that takes as its object the "One Book One Chicago" (OBOC) program2 of the Chicago Public Library. A related question is the focus of this paper: by associating place names with sentiment scores in Chicago-themed OBOC The HathiTrust research portal permits the extraction of non-consumptive features of the works included in the digital library, even those that are still under copyright. The place names extracted from our three Chicago-setting OBOC books allowed us to focus Our interest in creating a dataset of Chicago place names extracted from literature led us to Kaser''s book contains several indexes that can serve as sources of labeled data or instances in which Chicago locations are mentioned. the index as a source of already-labeled data for Chicago place names. associated sentiment scores for Chicago place names in the three OBOC selections centered on

maceli-what-2015	in developer-­‐oriented positions within LIS, this paper reports the results of a text analysis of a large a popular mailing list covering the intersection of technology and library work, the Code4lib The results of the text analysis of this dataset suggest the currently vital technology skills For those seeking employment in a technology-­‐intensive position within library and information understand common technology job requirements is relevant to current students positioning • What are the most common job titles and skills sought in technology-­‐focused LIS positions? relevant to technology in the library domain and to validate the job listing information and Figures 3 and 4 detail the most common terms used in position titles across librarian and Job Listing Terms Correlated with "XML" (most popular tag). Job Listing Terms Correlated with "Javascript" (Second Most Popular Tag), including Job Listing Terms Correlated with "Metadata" (fourth most popular tag). Job Listing Terms Correlated with "Developer." 

mathews-think-2012	we need to implement big new ideas, otherwise emphasis on community building, connecting people, engaging students, assisting researchers, and advancing Changes to the idea, product, or service are expected and required. Thinking like a startup means getting your idea out quickly. A variation of this model comes from the user experience domain and argues to shift the order of steps to Learn, Build, After learning about any potential problems, address those needs by either tweaking the idea or pivoting the concept. a new service, developing a new space, or reviewing current workflows, build this continuous feedback loop into your The NCSU Libraries have long practiced this good entrepreneurial development.20 Let''s look at two examples: A library I worked in wanted to offer a flexible, needs and further associates the library with user By focusing on relationship building instead of service excellence, organizations can uncover new Libraries need less assessment and more R&D.

matienzo-lighting-2020	Stanford Libraries hosted Lighting the Way: A National Forum on Archival Discovery and Delivery, which activities also were designed to serve the goals of the Forum, which included 1) allowing participants participants during group activities; and 3) providing a platform for engagement with the project. participants, their responsibilities, their work related to archival discovery and delivery, and successes Web Privacy Forum,8 which used collaborative design exercises to allow participants to help actively project team and facilitators to structure activities around groups of varying sizes, allowing for time for Map, Forum Schedule, Community Agreements and Code of Conduct, Project Overview, Participant to think ten times bolder to generate and identify "big ideas" to improve archival discovery and delivery, The project team asked participants, livestream viewers, and facilitators to provide additional feedback Description projects: Forum participants also identified ways in which they could change, enhance, or 

morgan-bringing-2021	advent of computers, the idea of sharing cataloging data as MARC (machine readable cataloging) the full text of its collections to enhance bibliographic description and resulting public service. ability to save, organize, and retrieve data; on the whole, the library profession does not understand the concept of a "data structure." For example, tab-delimited files, CSV (comma-separated the use of data structures, computers store and retrieve information. Libraries use computers to store, organize, preserve, and disseminate the gray literature of our time, and we call these systems "institutional repositories." In all Using such a process, there are really only four different types of machine learning: classification, clustering, regression, and dimension reduction. Given a set of previously classified menus, one could create a model There are many possible ways to enhance library collections and services through the use of machine learning. of plain text files and an integer, Topic Modeling Tool will create a weighted list of latent themes

narlock-digital-2021	As academic library support services for digital scholarship activities continue to expand and evolve, large for novel collaborations with academic library specialty research support services, such as digital scholarship centers (DSCs) (e.g., Bryson develop digital projects supporting scholarship and research" (Tzoc, academic libraries, DSCs, and digital preservation activities? preservation have been identified as ideal opportunities for collaboration between scholars, librarians, and information professionals, as library organizations tend to focus on lifecycle management with an increased development of tools to support data and digital project include "The Endings Project," funded by the Social Sciences and Humanities Research Council of Canada (https://projectendings.github. managing research projects, providing data curation and sharing support," leading them to suggest that libraries should "promote and content requires intense curatorial support, librarians, specifically subject selectors and disciplinary curators, are in the best position to provide feedback on digital scholarship projects (Tallman & Work, 2018).

oclc-social-2020	of campus units involved in both the provision and consumption of research support services and others on campus with the skills and expertise that the library brings to research support activities. This position will work with the Office of Institutional Research and DataSpark (Librarybased data analytics unit) to identify avenues to increase faculty and researcher The provision of research support services is seldom the responsibility of a single campus unit; campus stakeholders in research support, either as a provider or user, with the goal of making array of professionals working in the library, research development, faculty affairs, communications, potential role as campus stakeholders in research support services. units associated with research administration provide services that help advance the university''s in campus partnerships aimed at providing research support services across a diverse university 44 Social Interoperability in Research Support: Cross-Campus Partnerships and the University Research Enterprise

oclc-transitioning-2020	The OCLC Research Library Partners Metadata Managers Focus OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may The OCLC Research Library Partners Metadata Managers Focus Group The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library CONTENTdm Linked Data pilot, OCLC''s Shared Entity Management Infrastructure, Library of Metadata describing universities'' research data and materials in Institutional Repositories OCLC Research Library Partners Archives and Special Collections Linked Data Review Group.78 OCLC Research Library Partnership Web Archiving Metadata Working Group."84 The challenges that OCLC Research Library Partners Metadata Managers Planning Group Planning Group members selected the topics for the OCLC Research Library Partners Metadata 1. OCLC Research Library Partnership Metadata Managers Focus Group. "CONTENTdm Linked Data pilot." https://www.oclc.org/research https://www.oclc.org/research/areas/data-science/metadata-managers.html https://www.oclc.org/research/areas/data-science/metadata-managers.html https://www.oclc.org/research/areas/data-science/linkeddata/linked-data-overview.html https://www.oclc.org/research/areas/data-science/linkeddata/linked-data-overview.html https://www.oclc.org/research/areas/data-science/linkeddata/linked-data-overview.html The OCLC Research Library Partnership Archives and Special Collections Linked Data Review https://www.oclc.org/research/partnership/working-groups/archives-special-collections-linked-data-review.html https://www.oclc.org/research/partnership/working-groups/archives-special-collections-linked-data-review.html https://www.oclc.org/en/worldcat/linked-data/shared-entity-management-infrastructure.html https://www.oclc.org/en/worldcat/linked-data/shared-entity-management-infrastructure.html https://www.oclc.org/en/worldcat/linked-data/shared-entity-management-infrastructure.html https://www.oclc.org/en/worldcat/linked-data/shared-entity-management-infrastructure.html

orr-bootleg-2020	in a knowledge base, is how to disambiguate entities that appear rarely in the training data, termed tail Humans use subtle reasoning patterns based on knowledge of entity facts, relations, and types to transfer to other non-disambiguation tasks that require entity-based knowledge: we set a new state-ofthe-art in the popular TACRED relation extraction task by 1.0 F1 points and demonstrate up to 8% shows F1 versus number of times an entity was seen in training data for a baseline NED model compared to 3 Bootleg encodes the entity, relation, and type signals as embedding inputs to a unseen entities compared to the two models which respectively use only type and only relation the discriminative entity and more general relation and type signals that are useful for disambiguation. Given an example, we run inference with the Bootleg model to disambiguate named entities and generate

plumb-humanities-2021	Respondents such as Mark Algee-Hewitt pointed out that literary scholars employ computational statistical models in order to reveal something about texts that human readers Machine learning, and word embedding algorithms in particular, may have a unique ability to shift this conversation into new territory, where scholars Acknowledging this helps contextualize machine learning algorithms for text analysis tasks in the humanities, but also highlights data curation challenges This naturally raises questions about how machine learning algorithms like word embeddings are implemented for text analysis, and how they Based on the potential for word embeddings to model semantic spaces for different corpora and compare the distribution of terms, the next step was to build a corpus of non-canonical Designing humanities research with novel word embedding models stands to widen the territory where machine learning engineers look for conceptual concepts Systematic data curation, combined with word embedding algorithms, represent a new interpretive system for literary scholars.

prudhomme-taking-2021	Combining automatic processes to assist in supporting inventory management with a focus on descriptive metadata, a machine learning solution could help alleviate time-consuming and relatively expensive metadata tagging tasks, Deep learning neural networks are more effective in feature detection as they are able to solve complex problems such as image classification with greater accuracy when trained with large datasets. For images, how can archives build a data-labeling pipeline into their digital curation workflow that enables machine learning of collections? machine learning is only good so long as value is added, archives and libraries will need to think As deep learning applications will only be as effective as the data, archives and libraries should expand their Along with greater computing capabilities, artificial intelligence could be an opportunity for libraries and archives to boost the discovery of their digital collections by pushing text and image

schindel-economic-2020	"Project collections" (those managed by the researchers who obtained them for restricted use) and their costs and benefits were considered too varied for standard methodologies that assess costs and (through proper maintenance and preservation), intramural research and by extramural users (through online documentation and user access programs), users in other disciplines (through data curation), and the general public (through education and outreach). Departments and agencies can use the methods described here for evidencebased decisions concerning policies and management practices for their institutional collections. Some renewable collections provide users with living organisms for research and development of agricultural products and industrial processes (see Boxes 2 and 3). Examples include the USDA''s National Plant Germplasm System11 (NPGS; see Box 2) and Agricultural Research Service''s (ARS) Culture Collection12 (Box 3), and the CDC''s NHANES Biospecimen Program (see Box 4). Evenson and Gollin, 1997; Güereña, Lehmann, Thies, Enders, Karanja and Neufeldt, 2015).

stanford-ai-2021	But the second most important originators are different: In the United States, corporateaffiliated research represents 19.2% of the total publications, whereas government is the second most • After a two-year increase, the number of AI faculty departures from universities to industry jobs in North America • The number of papers with ethics-related keywords in titles submitted to AI conferences has grown since 2015, though the average number of paper titles matching ethics-related keywords at major AI conferences remains https://www.microsoft.com/en-us/research/publication/an-overview-of-microsoft-academic-service-mas-and-applications-2/ 6 For more insights on the adoption of AI and robots by the industry, read the National Bureau of Economic Research working paper based on the 2018 Annual Business Survey by the U.S. Census Bureau, Number of Artificial Intelligence/Machine Learning New PhDs 6 New AI PhDs in this section include PhD graduates who specialize in artificial intelligence from academic units (departments, colleges, or schools within universities) of computer science, computer 

white-entrepreneurship-1987	thought to be too often wrong to be tolerated by the organizational structure, but perhaps more importantly we believe that organizational decision making now tends toward committee approaches, consensus, participation, and consultation. as Thomas Watson, J t , though, sought ways to balance individual initiative and innovation with the characteristics of a large slow-moving bureaucratic structure. not have to leave and that they can work within the organization.'' Pinchot''s ideas are worth examining, because, as will be argued later, probably no profession has a greater need for this newly termed intrapreneur It was he who noted that managers needed to be innovative to avoid the risk of being boring, because concepts of greater participation, consensus seeking, and committee decision structures, when Drucker now suggests that such an approach has For libraries, the developments in management practice, and our own approaches to seeking Drucker and others dealing with nurturing innovation and entrepreneurship within the organization structure, there are a number of concerns that 

wiegand-cultures-2021	traditional role, librarians in the 20th century added a new function—discovery—teaching people to find and use the library''s collected scholarship. learning in the library as the next step beyond collecting, with librarians instructing on information infrastructure with the goal of empowering library users to find, evaluate, and use scholarly go far beyond local library collections to a global perspective and normative practice of participation at scale in innovative emerging technologies such as Machine Learning. start by using Machine Learning tools to automate alerts of new content in a narrow area of interest and help researchers at all levels find and focus on problem-solving. A library that adapted Machine Learning as an innovation technology would improve its practices; add new services; choose, use, and license collections differently; utilize all spaces for learning; and role model innovative leadership. opening local collections to discovery and use in order to create new knowledge through digitization and semantic linking, with cross-disciplinary technologies to augment traditional research

yoon-what-2020	Many Cloud Architects and Cloud Engineers are somewhat confused to grasp the difference between Azure Security Center (ASC) and Azure Sentinel. Azure Security Center is a unified infrastructure security management system that strengthens the security posture of your data centers, and provides advanced threat protection across your hybrid workloads in the cloud — whether they''re in Azure or not — as well as on-premises. Microsoft Azure Sentinel is a scalable, cloud-native, security information event management (SIEM) and security orchestration automated response (SOAR) solution. Security Center is one of the many sources of threat protection information that Azure Sentinel collects data from, to create a view for the entire organization. Once the Security Center data is in Azure Sentinel, customers can combine that data with other sources like firewalls, users, and devices, for proactive hunting and threat mitigation with advanced querying and the power of artificial intelligence. Microsoft will continue to invest in both Azure Security Center and Azure Sentinel.

zkie-zero-2020	From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains requires handling noisy texts, low resource settings and domain-specific KBs. Existing approaches are mostly inappropriate for this, as low-resource settings: often, no data annotated exists; coverage of open-domain knowledge bases Therefore, entity linking is frequently performed against domainspecific knowledge bases (Munnelly and Lawless, task or to train machine learning models for automatic annotation. Entity Linking describes the task of disambiguating mentions in a text against a knowledge base. been performed on entity linking against domainspecific knowledge bases. base and use full text search to retrieve candidates of the entities are in DBPedia), we manually created a domain specific knowledge base for this data We evaluate the performance of our Levenshteinbased recommender that suggests potential annotations to users (Table 3). In the future, we want to investigate more powerful recommenders, combine interactive entity linking with knowledge base completion and use online