Towards connecting scholarly editions to corpora in the LiLa (Linking Latin) Knowledge Base of linguistic resources  - 


Towards connecting scholarly editions to corpora in the
LiLa (Linking Latin) Knowledge Base of linguistic resources

Greta Franzini
greta.franzini@unicatt.it

Conference | Wuppertal, Germany | 17 December 2019

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme - Grant Agreement No. 769994.


1

Table of Contents

Introduction
Computational Linguistics
Linked Data and Linguistic Linked Open Data

LiLa: Linking Latin

Scholarly Editions
Linked Data
Connection to LiLa

Conclusion

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


2

Table of Contents

Introduction
Computational Linguistics
Linked Data and Linguistic Linked Open Data

LiLa: Linking Latin

Scholarly Editions
Linked Data
Connection to LiLa

Conclusion

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


3

Computational Linguistics
Definition

Computational Linguistics is an interdisciplinary field concerned with the processing of
language by computers. (Mitkov, 2004)

Computational Linguistics

Develops computational methods and
formalisms to answer linguistics questions.

Natural Language Processing

Solves engineering problems arising from the
analysis of natural language text.

(adapted from Eisner, 2016)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


3

Computational Linguistics
Definition

Computational Linguistics is an interdisciplinary field concerned with the processing of
language by computers. (Mitkov, 2004)

Computational Linguistics

Develops computational methods and
formalisms to answer linguistics questions.

Natural Language Processing

Solves engineering problems arising from the
analysis of natural language text.

(adapted from Eisner, 2016)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


3

Computational Linguistics
Definition

Computational Linguistics is an interdisciplinary field concerned with the processing of
language by computers. (Mitkov, 2004)

Computational Linguistics

Develops computational methods and
formalisms to answer linguistics questions.

Natural Language Processing

Solves engineering problems arising from the
analysis of natural language text.

(adapted from Eisner, 2016)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


3

Computational Linguistics
Definition

Computational Linguistics is an interdisciplinary field concerned with the processing of
language by computers. (Mitkov, 2004)

Computational Linguistics

Develops computational methods and
formalisms to answer linguistics questions.

Natural Language Processing

Solves engineering problems arising from the
analysis of natural language text.

(adapted from Eisner, 2016)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


4

Computational Linguistics
Linguistic Resources and NLP Tools

Automatic language processing requires linguistic resources and NLP tools

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


5

Computational Linguistics
Linguistic Resources

Dictionary collection of words and phrases with information about them
lexicon dictionary/list of words, typically for computational purposes

thesaurus words grouped together according to similarity of meaning

Ontology inventory of objects or processes in a domain, together with a specification of some
or all of the relations that hold among them, generally arranged as a hierarchy

Corpus a body of linguistic data in machine readable form, gathered according to some
principled sampling method and criterion. A syntactically/semantically-annotated
corpus is known as a treebank

Grammar systematic analysis of the structure of a language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


5

Computational Linguistics
Linguistic Resources

Dictionary collection of words and phrases with information about them
lexicon dictionary/list of words, typically for computational purposes

thesaurus words grouped together according to similarity of meaning
Ontology inventory of objects or processes in a domain, together with a specification of some

or all of the relations that hold among them, generally arranged as a hierarchy

Corpus a body of linguistic data in machine readable form, gathered according to some
principled sampling method and criterion. A syntactically/semantically-annotated
corpus is known as a treebank

Grammar systematic analysis of the structure of a language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


5

Computational Linguistics
Linguistic Resources

Dictionary collection of words and phrases with information about them
lexicon dictionary/list of words, typically for computational purposes

thesaurus words grouped together according to similarity of meaning
Ontology inventory of objects or processes in a domain, together with a specification of some

or all of the relations that hold among them, generally arranged as a hierarchy
Corpus a body of linguistic data in machine readable form, gathered according to some

principled sampling method and criterion. A syntactically/semantically-annotated
corpus is known as a treebank

Grammar systematic analysis of the structure of a language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


5

Computational Linguistics
Linguistic Resources

Dictionary collection of words and phrases with information about them
lexicon dictionary/list of words, typically for computational purposes

thesaurus words grouped together according to similarity of meaning
Ontology inventory of objects or processes in a domain, together with a specification of some

or all of the relations that hold among them, generally arranged as a hierarchy
Corpus a body of linguistic data in machine readable form, gathered according to some

principled sampling method and criterion. A syntactically/semantically-annotated
corpus is known as a treebank

Grammar systematic analysis of the structure of a language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


6

Computational Linguistics
NLP Tools

Tokeniser performs tokenisation and determines the boundaries for individual tokens in text
(words, numbers, punctuation)

Tagger assigns tags to words or expressions in a text (e.g. part of speech, named entity)
Parser analyses a sentence or other string of words into its constituents, producing a

parse tree of syntactic relations between them
Lemmatiser groups the inflected forms of a word together under a base form, recovers the base

form from an inflected form. Can be morphological (no context, ambiguity) or
morpho-syntactic (context, no ambiguity).

... and more.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


6

Computational Linguistics
NLP Tools

Tokeniser performs tokenisation and determines the boundaries for individual tokens in text
(words, numbers, punctuation)

Tagger assigns tags to words or expressions in a text (e.g. part of speech, named entity)

Parser analyses a sentence or other string of words into its constituents, producing a
parse tree of syntactic relations between them

Lemmatiser groups the inflected forms of a word together under a base form, recovers the base
form from an inflected form. Can be morphological (no context, ambiguity) or
morpho-syntactic (context, no ambiguity).

... and more.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


6

Computational Linguistics
NLP Tools

Tokeniser performs tokenisation and determines the boundaries for individual tokens in text
(words, numbers, punctuation)

Tagger assigns tags to words or expressions in a text (e.g. part of speech, named entity)
Parser analyses a sentence or other string of words into its constituents, producing a

parse tree of syntactic relations between them

Lemmatiser groups the inflected forms of a word together under a base form, recovers the base
form from an inflected form. Can be morphological (no context, ambiguity) or
morpho-syntactic (context, no ambiguity).

... and more.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


6

Computational Linguistics
NLP Tools

Tokeniser performs tokenisation and determines the boundaries for individual tokens in text
(words, numbers, punctuation)

Tagger assigns tags to words or expressions in a text (e.g. part of speech, named entity)
Parser analyses a sentence or other string of words into its constituents, producing a

parse tree of syntactic relations between them
Lemmatiser groups the inflected forms of a word together under a base form, recovers the base

form from an inflected form. Can be morphological (no context, ambiguity) or
morpho-syntactic (context, no ambiguity).

... and more.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


6

Computational Linguistics
NLP Tools

Tokeniser performs tokenisation and determines the boundaries for individual tokens in text
(words, numbers, punctuation)

Tagger assigns tags to words or expressions in a text (e.g. part of speech, named entity)
Parser analyses a sentence or other string of words into its constituents, producing a

parse tree of syntactic relations between them
Lemmatiser groups the inflected forms of a word together under a base form, recovers the base

form from an inflected form. Can be morphological (no context, ambiguity) or
morpho-syntactic (context, no ambiguity).

... and more.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


7

Computational Linguistics
Linguistic Resources and NLP Tools for Latin

Corpora Perseus Digital Library, Eurasian Latin Archive, Corpus Grammaticorum Latinorum,
Croatiae auctores Latini, Archivio della Latinità Italiana del Medioevo, Musisque
Deoque, Patrologia Latina, PHI Classical Latin Texts, Index Thomisticus Treebank,
PROIEL Latin Treebank, etc.

Lexica Vallex, IT-VaLex, Latin WordNet, Oxford Latin Dictionary, Du Cange Glossarium
Mediae et Infimae Latinitatis, Thesaurus Lingua Latinae, Thesaurus Formarum
Totius Latinitatis, Lexicon musicum Latinum medii aevi, etc.

NLP Tools LEMLAT, Whitaker’s Words, LatMor, TreeTagger, Collatinus, UDPipe, Chiron, etc.

Latin is the most resourced historical language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


7

Computational Linguistics
Linguistic Resources and NLP Tools for Latin

Corpora Perseus Digital Library, Eurasian Latin Archive, Corpus Grammaticorum Latinorum,
Croatiae auctores Latini, Archivio della Latinità Italiana del Medioevo, Musisque
Deoque, Patrologia Latina, PHI Classical Latin Texts, Index Thomisticus Treebank,
PROIEL Latin Treebank, etc.

Lexica Vallex, IT-VaLex, Latin WordNet, Oxford Latin Dictionary, Du Cange Glossarium
Mediae et Infimae Latinitatis, Thesaurus Lingua Latinae, Thesaurus Formarum
Totius Latinitatis, Lexicon musicum Latinum medii aevi, etc.

NLP Tools LEMLAT, Whitaker’s Words, LatMor, TreeTagger, Collatinus, UDPipe, Chiron, etc.

Latin is the most resourced historical language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


7

Computational Linguistics
Linguistic Resources and NLP Tools for Latin

Corpora Perseus Digital Library, Eurasian Latin Archive, Corpus Grammaticorum Latinorum,
Croatiae auctores Latini, Archivio della Latinità Italiana del Medioevo, Musisque
Deoque, Patrologia Latina, PHI Classical Latin Texts, Index Thomisticus Treebank,
PROIEL Latin Treebank, etc.

Lexica Vallex, IT-VaLex, Latin WordNet, Oxford Latin Dictionary, Du Cange Glossarium
Mediae et Infimae Latinitatis, Thesaurus Lingua Latinae, Thesaurus Formarum
Totius Latinitatis, Lexicon musicum Latinum medii aevi, etc.

NLP Tools LEMLAT, Whitaker’s Words, LatMor, TreeTagger, Collatinus, UDPipe, Chiron, etc.

Latin is the most resourced historical language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


7

Computational Linguistics
Linguistic Resources and NLP Tools for Latin

Corpora Perseus Digital Library, Eurasian Latin Archive, Corpus Grammaticorum Latinorum,
Croatiae auctores Latini, Archivio della Latinità Italiana del Medioevo, Musisque
Deoque, Patrologia Latina, PHI Classical Latin Texts, Index Thomisticus Treebank,
PROIEL Latin Treebank, etc.

Lexica Vallex, IT-VaLex, Latin WordNet, Oxford Latin Dictionary, Du Cange Glossarium
Mediae et Infimae Latinitatis, Thesaurus Lingua Latinae, Thesaurus Formarum
Totius Latinitatis, Lexicon musicum Latinum medii aevi, etc.

NLP Tools LEMLAT, Whitaker’s Words, LatMor, TreeTagger, Collatinus, UDPipe, Chiron, etc.

Latin is the most resourced historical language

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:

I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated

I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks

I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability!

Interoperability:
I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:

I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity

I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity
I Improves efficiency

I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


8

Computational Linguistics
Problems with Linguistic Resources and NLP Tools

These resources and tools, however, are:
I Scattered and isolated
I Developed for specific tasks
I Follow different annotation schemas and conceptual models

No interoperability! Interoperability:
I Increases productivity
I Improves efficiency
I More effective knowledge organisation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


9

Linked Data
A solution

Linked Data: Semantic Web technology

Semantic Web: set of standards and best practices to share data across the web, and to help
machines make inferences and understand the meaning of this data.

Advantages:
I Connects and defines relationships between heterogeneous datasets
I Aggregates distributed datasets to reduce dispersion and increase (serendipitous) knowledge

discovery (i.e. discoverability of the resource)
I Allows us to build systems that can reason across the web and answer complex questions

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


9

Linked Data
A solution

Linked Data: Semantic Web technology
Semantic Web: set of standards and best practices to share data across the web, and to help
machines make inferences and understand the meaning of this data.

Advantages:
I Connects and defines relationships between heterogeneous datasets
I Aggregates distributed datasets to reduce dispersion and increase (serendipitous) knowledge

discovery (i.e. discoverability of the resource)
I Allows us to build systems that can reason across the web and answer complex questions

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


9

Linked Data
A solution

Linked Data: Semantic Web technology
Semantic Web: set of standards and best practices to share data across the web, and to help
machines make inferences and understand the meaning of this data.

Advantages:

I Connects and defines relationships between heterogeneous datasets
I Aggregates distributed datasets to reduce dispersion and increase (serendipitous) knowledge

discovery (i.e. discoverability of the resource)
I Allows us to build systems that can reason across the web and answer complex questions

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


9

Linked Data
A solution

Linked Data: Semantic Web technology
Semantic Web: set of standards and best practices to share data across the web, and to help
machines make inferences and understand the meaning of this data.

Advantages:
I Connects and defines relationships between heterogeneous datasets

I Aggregates distributed datasets to reduce dispersion and increase (serendipitous) knowledge
discovery (i.e. discoverability of the resource)

I Allows us to build systems that can reason across the web and answer complex questions

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


9

Linked Data
A solution

Linked Data: Semantic Web technology
Semantic Web: set of standards and best practices to share data across the web, and to help
machines make inferences and understand the meaning of this data.

Advantages:
I Connects and defines relationships between heterogeneous datasets
I Aggregates distributed datasets to reduce dispersion and increase (serendipitous) knowledge

discovery (i.e. discoverability of the resource)

I Allows us to build systems that can reason across the web and answer complex questions

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


9

Linked Data
A solution

Linked Data: Semantic Web technology
Semantic Web: set of standards and best practices to share data across the web, and to help
machines make inferences and understand the meaning of this data.

Advantages:
I Connects and defines relationships between heterogeneous datasets
I Aggregates distributed datasets to reduce dispersion and increase (serendipitous) knowledge

discovery (i.e. discoverability of the resource)
I Allows us to build systems that can reason across the web and answer complex questions

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


10

Linked Data
How does it work?

Linked Data technology describes data as triples (statements):

I OBJECT of one triple can be the SUBJECT of another triple
I Nodes and edges are assigned persistent Uniform Resource Identifiers (URIs) for

unambiguous identification across the web
I Relationships are described by ontologies or vocabularies of knowledge representation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


10

Linked Data
How does it work?

Linked Data technology describes data as triples (statements):

I OBJECT of one triple can be the SUBJECT of another triple
I Nodes and edges are assigned persistent Uniform Resource Identifiers (URIs) for

unambiguous identification across the web
I Relationships are described by ontologies or vocabularies of knowledge representation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


10

Linked Data
How does it work?

Linked Data technology describes data as triples (statements):

I OBJECT of one triple can be the SUBJECT of another triple

I Nodes and edges are assigned persistent Uniform Resource Identifiers (URIs) for
unambiguous identification across the web

I Relationships are described by ontologies or vocabularies of knowledge representation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


10

Linked Data
How does it work?

Linked Data technology describes data as triples (statements):

I OBJECT of one triple can be the SUBJECT of another triple
I Nodes and edges are assigned persistent Uniform Resource Identifiers (URIs) for

unambiguous identification across the web

I Relationships are described by ontologies or vocabularies of knowledge representation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


10

Linked Data
How does it work?

Linked Data technology describes data as triples (statements):

I OBJECT of one triple can be the SUBJECT of another triple
I Nodes and edges are assigned persistent Uniform Resource Identifiers (URIs) for

unambiguous identification across the web
I Relationships are described by ontologies or vocabularies of knowledge representation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


11

Linked Data
Many domains

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

https://lod-cloud.net/


12

Linguistic Linked Data
Linguistics

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

https://linguistic-lod.org/llod-cloud


13
Table of Contents

Introduction
Computational Linguistics
Linked Data and Linguistic Linked Open Data

LiLa: Linking Latin

Scholarly Editions
Linked Data
Connection to LiLa

Conclusion

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


14

LiLa: Linking Latin
At a glance

I Funding: ERC Consolidator Grant, 2M EUR
I Duration: 2018-2023
I Team: 9 staff + student assistants
I Website: https://lila-erc.eu

I Objective: Knowledge Base of Linguistic Resources & Natural Language Processing Tools
I Method: Linked Data paradigm (FAIR principles)
I Purpose: Foster resource/data interoperability

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

https://lila-erc.eu


15

LiLa: Structure
Lemmas as connectors

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


16

LiLa: Structure
Lemma bank

Lemma bank of LEMLAT, our morphological analyser. Over 150,000 lemmas, including:

I Classical: 43,432 lemmas from Georges & Georges (1913-1918), Glare (1982), Gradenwitz
(1904)

I Medieval and Late: 82,556 lemmas from Du Cange (1883-1887)
I Onomasticon: 26,250 lemmas from Forcellini (1940)

http://www.lemlat3.eu/

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.lemlat3.eu/


16

LiLa: Structure
Lemma bank

Lemma bank of LEMLAT, our morphological analyser. Over 150,000 lemmas, including:

I Classical: 43,432 lemmas from Georges & Georges (1913-1918), Glare (1982), Gradenwitz
(1904)

I Medieval and Late: 82,556 lemmas from Du Cange (1883-1887)
I Onomasticon: 26,250 lemmas from Forcellini (1940)

http://www.lemlat3.eu/

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.lemlat3.eu/


16

LiLa: Structure
Lemma bank

Lemma bank of LEMLAT, our morphological analyser. Over 150,000 lemmas, including:

I Classical: 43,432 lemmas from Georges & Georges (1913-1918), Glare (1982), Gradenwitz
(1904)

I Medieval and Late: 82,556 lemmas from Du Cange (1883-1887)

I Onomasticon: 26,250 lemmas from Forcellini (1940)

http://www.lemlat3.eu/

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.lemlat3.eu/


16

LiLa: Structure
Lemma bank

Lemma bank of LEMLAT, our morphological analyser. Over 150,000 lemmas, including:

I Classical: 43,432 lemmas from Georges & Georges (1913-1918), Glare (1982), Gradenwitz
(1904)

I Medieval and Late: 82,556 lemmas from Du Cange (1883-1887)
I Onomasticon: 26,250 lemmas from Forcellini (1940)

http://www.lemlat3.eu/

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.lemlat3.eu/


16

LiLa: Structure
Lemma bank

Lemma bank of LEMLAT, our morphological analyser. Over 150,000 lemmas, including:

I Classical: 43,432 lemmas from Georges & Georges (1913-1918), Glare (1982), Gradenwitz
(1904)

I Medieval and Late: 82,556 lemmas from Du Cange (1883-1887)
I Onomasticon: 26,250 lemmas from Forcellini (1940)

http://www.lemlat3.eu/

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.lemlat3.eu/


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio
I LC is a lemma variant of a LL, e.g. anthropomorphita vs.

anthropomorphitae (pluralia tantum)
I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:

I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio
I LC is a lemma variant of a LL, e.g. anthropomorphita vs.

anthropomorphitae (pluralia tantum)
I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL

I LC is a different written representation of a LL, e.g. annuncio
vs. adnuntio

I LC is a lemma variant of a LL, e.g. anthropomorphita vs.
anthropomorphitae (pluralia tantum)

I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio

I LC is a lemma variant of a LL, e.g. anthropomorphita vs.
anthropomorphitae (pluralia tantum)

I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio
I LC is a lemma variant of a LL, e.g. anthropomorphita vs.

anthropomorphitae (pluralia tantum)

I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio
I LC is a lemma variant of a LL, e.g. anthropomorphita vs.

anthropomorphitae (pluralia tantum)
I LC is a pseudo-lemma, i.e. non Latin words

I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio
I LC is a lemma variant of a LL, e.g. anthropomorphita vs.

anthropomorphitae (pluralia tantum)
I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


17

LiLa: Structure
Lemma bank

CORPUSX

LILA

Corpus lemmas (LC) can’t connect to LiLa lemmas (LL) when:
I LC doesn’t exist in LL
I LC is a different written representation of a LL, e.g. annuncio

vs. adnuntio
I LC is a lemma variant of a LL, e.g. anthropomorphita vs.

anthropomorphitae (pluralia tantum)
I LC is a pseudo-lemma, i.e. non Latin words
I lemmatisation errors, e.g. pbiectum instead of obiectum

Manual fix

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


18

LiLa: Structure
Conceptual and structural interoperability

To build and define relationships between datasets (triples), LiLa reuses the following ontologies:

I OntoLex (Lemon): for lexical information
I OLiA (Ontologies of Linguistic Annotation) bundle: for part-of-speech tagging
I NIF (NLP Interchange Format) and POWLA (OWL + PAULA, Potsdamer Austauschformat

Linguistischer Annotationen) for corpus annotation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


18

LiLa: Structure
Conceptual and structural interoperability

To build and define relationships between datasets (triples), LiLa reuses the following ontologies:

I OntoLex (Lemon): for lexical information

I OLiA (Ontologies of Linguistic Annotation) bundle: for part-of-speech tagging
I NIF (NLP Interchange Format) and POWLA (OWL + PAULA, Potsdamer Austauschformat

Linguistischer Annotationen) for corpus annotation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


18

LiLa: Structure
Conceptual and structural interoperability

To build and define relationships between datasets (triples), LiLa reuses the following ontologies:

I OntoLex (Lemon): for lexical information
I OLiA (Ontologies of Linguistic Annotation) bundle: for part-of-speech tagging

I NIF (NLP Interchange Format) and POWLA (OWL + PAULA, Potsdamer Austauschformat
Linguistischer Annotationen) for corpus annotation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


18

LiLa: Structure
Conceptual and structural interoperability

To build and define relationships between datasets (triples), LiLa reuses the following ontologies:

I OntoLex (Lemon): for lexical information
I OLiA (Ontologies of Linguistic Annotation) bundle: for part-of-speech tagging
I NIF (NLP Interchange Format) and POWLA (OWL + PAULA, Potsdamer Austauschformat

Linguistischer Annotationen) for corpus annotation

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


19

LiLa: Structure
Triplestore

LiLa

= database of triples = triplestore

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


19

LiLa: Structure
Triplestore

LiLa = database of triples =

triplestore

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


19

LiLa: Structure
Triplestore

LiLa = database of triples = triplestore

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


20

LiLa: Overview
Resources connected and upcoming connections

I Corpora
� Index Thomisticus Treebank (Summa contra Gentiles)

� Dante (700th death anniversary coming up!)

I Lexica
� Word Formation Latin (Classical Latin)

� BRILL Etymological dictionary of Latin and the other Italic Languages
� Latin WordNet

I NLP tools
� LEMLAT (lemma bank)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


20

LiLa: Overview
Resources connected and upcoming connections

I Corpora
� Index Thomisticus Treebank (Summa contra Gentiles)
� Dante (700th death anniversary coming up!)

I Lexica
� Word Formation Latin (Classical Latin)

� BRILL Etymological dictionary of Latin and the other Italic Languages
� Latin WordNet

I NLP tools
� LEMLAT (lemma bank)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


20

LiLa: Overview
Resources connected and upcoming connections

I Corpora
� Index Thomisticus Treebank (Summa contra Gentiles)
� Dante (700th death anniversary coming up!)

I Lexica
� Word Formation Latin (Classical Latin)

� BRILL Etymological dictionary of Latin and the other Italic Languages
� Latin WordNet

I NLP tools
� LEMLAT (lemma bank)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


20

LiLa: Overview
Resources connected and upcoming connections

I Corpora
� Index Thomisticus Treebank (Summa contra Gentiles)
� Dante (700th death anniversary coming up!)

I Lexica
� Word Formation Latin (Classical Latin)
� BRILL Etymological dictionary of Latin and the other Italic Languages

� Latin WordNet

I NLP tools
� LEMLAT (lemma bank)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


20

LiLa: Overview
Resources connected and upcoming connections

I Corpora
� Index Thomisticus Treebank (Summa contra Gentiles)
� Dante (700th death anniversary coming up!)

I Lexica
� Word Formation Latin (Classical Latin)
� BRILL Etymological dictionary of Latin and the other Italic Languages
� Latin WordNet

I NLP tools
� LEMLAT (lemma bank)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


21

LiLa: Structure
Querying the lemma bank

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


22

LiLa: Structure
An example: LOD view of ITTB token lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


23

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


24

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


25

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


26

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


27

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


28

LiLa: Structure
Querying corpora

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


29

LiLa: Structure
An example: LOD view of ITTB token prosequi

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


30

LiLa: Structure
An example: LOD view of ITTB token prosequi

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


31

LiLa: Structure
An example: LOD view of ITTB token prosequi

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


32

LiLa: Structure
An example: LOD view of ITTB token prosequi

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


33

LiLa: Structure
An example: LOD view of ITTB token prosequi

eiusdem autem est unum contrario-
rum prosequi et aliud refutare sicut
medicina , quae sanitatem operatur ,
aegritudinem excludit . (ITTB, 1.1.6)

Now it belongs to the same thing to
pursue one contrary and to remove
the other: thus medicine, which ef-
fects health, removes sickness. (Trans.
Laurence Shapcote)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

https://aquinas.cc/la/en/~SCG1
https://aquinas.cc/la/en/~SCG1


34

LiLa: Structure
LodLive interface

https://lila-erc.eu/lodlive/

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

https://lila-erc.eu/lodlive/


35

LiLa: Structure
LiLa as mere reflection

LiLa reflects the annotation granularity of the resources it connects

No data enrichment or further analysis is performed

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


36

LiLa: Requirements
Connecting resources in the Knowledge Base

To enter the LiLa Knowledge Base, a textual resource must be:

I Lemmatised
I Part-of-Speech tagged (ideally, using the Universal Dependencies tagset)
I Online!

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


36

LiLa: Requirements
Connecting resources in the Knowledge Base

To enter the LiLa Knowledge Base, a textual resource must be:
I Lemmatised

I Part-of-Speech tagged (ideally, using the Universal Dependencies tagset)
I Online!

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


36

LiLa: Requirements
Connecting resources in the Knowledge Base

To enter the LiLa Knowledge Base, a textual resource must be:
I Lemmatised
I Part-of-Speech tagged (ideally, using the Universal Dependencies tagset)

I Online!

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


36

LiLa: Requirements
Connecting resources in the Knowledge Base

To enter the LiLa Knowledge Base, a textual resource must be:
I Lemmatised
I Part-of-Speech tagged (ideally, using the Universal Dependencies tagset)
I Online!

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


37
Table of Contents

Introduction
Computational Linguistics
Linked Data and Linguistic Linked Open Data

LiLa: Linking Latin

Scholarly Editions
Linked Data
Connection to LiLa

Conclusion

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


38

Scholarly Editions
Linked Data

“[...] computational philology seems to be somewhat decoupled from the recent progress
in [Linguistic Linked Open Data]: even though LOD as a concept is gaining significant popu-
larity in Digital Humanities, existing LLOD standards and vocabularies are not widely used
in this community, and philological resources are underrepresented in the LLOD cloud di-
agram [...]." (Chiarcos et al., 2018)

“[...] As of yet only a relatively small number of born-digital editions of [...] Latin texts
exists [...]." (Fischer, 2017)

Of these, only a handful provide (some) data in Linked Data format.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


38

Scholarly Editions
Linked Data

“[...] computational philology seems to be somewhat decoupled from the recent progress
in [Linguistic Linked Open Data]: even though LOD as a concept is gaining significant popu-
larity in Digital Humanities, existing LLOD standards and vocabularies are not widely used
in this community, and philological resources are underrepresented in the LLOD cloud di-
agram [...]." (Chiarcos et al., 2018)

“[...] As of yet only a relatively small number of born-digital editions of [...] Latin texts
exists [...]." (Fischer, 2017)

Of these, only a handful provide (some) data in Linked Data format.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


38

Scholarly Editions
Linked Data

“[...] computational philology seems to be somewhat decoupled from the recent progress
in [Linguistic Linked Open Data]: even though LOD as a concept is gaining significant popu-
larity in Digital Humanities, existing LLOD standards and vocabularies are not widely used
in this community, and philological resources are underrepresented in the LLOD cloud di-
agram [...]." (Chiarcos et al., 2018)

“[...] As of yet only a relatively small number of born-digital editions of [...] Latin texts
exists [...]." (Fischer, 2017)

Of these, only a handful provide (some) data in Linked Data format.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities

I Scholarly editions are complex objects. Many layers of information, including:
1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)

2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)

3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)

4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)

5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


39

Scholarly Edition
Information layers

Why so few Linked Data-compatible editions of Latin texts? Possible reasons:

I Projects lack the know-how and/or have other priorities
I Scholarly editions are complex objects. Many layers of information, including:

1. Textual, i.e. the transcription (<body>, <ab>, <div>, etc.)
2. Bibliographic, e.g. properties of the edition (<teiHeader>)
3. Source, e.g. date, material, scribe, binding, folio count, size, etc. (<teiHeader>)
4. Linguistic, e.g. lemma, etc. (<app>)
5. Palaeographic, e.g. abbreviations, ligatures, glyphs, allographs, etc. (<app>)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


40

Scholarly Edition
Information layers

Linked Data support:

� Bibliographic + Textual
I FABiO (FRBR-aligned Bibliographic Ontology)
I CiTO (Citation Typing Ontology)
I DC (Dublin Core)

� Source
I DM2E (Digitised Manuscripts to Europeana)
I FRBRoo (FRBR-object oriented) SAWS (Sharing Ancient Wisdoms)

� Linguistic
I Ontolex, NIF, POWLA, OLiA

� Palaeographic
I Peter Stokes: DigiPal project
I Paolo Monella: VeDPH seminar, 4th December 2019

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.ancientwisdoms.ac.uk/media/ontology/sawsOntology.owl
http://www.digipal.eu/about/the-project/
http://www1.unipa.it/paolo.monella/babel2019/


40

Scholarly Edition
Information layers

Linked Data support:

� Bibliographic + Textual
I FABiO (FRBR-aligned Bibliographic Ontology)
I CiTO (Citation Typing Ontology)
I DC (Dublin Core)

� Source
I DM2E (Digitised Manuscripts to Europeana)
I FRBRoo (FRBR-object oriented) SAWS (Sharing Ancient Wisdoms)

� Linguistic
I Ontolex, NIF, POWLA, OLiA

� Palaeographic
I Peter Stokes: DigiPal project
I Paolo Monella: VeDPH seminar, 4th December 2019

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.ancientwisdoms.ac.uk/media/ontology/sawsOntology.owl
http://www.digipal.eu/about/the-project/
http://www1.unipa.it/paolo.monella/babel2019/


40

Scholarly Edition
Information layers

Linked Data support:

� Bibliographic + Textual
I FABiO (FRBR-aligned Bibliographic Ontology)
I CiTO (Citation Typing Ontology)
I DC (Dublin Core)

� Source
I DM2E (Digitised Manuscripts to Europeana)
I FRBRoo (FRBR-object oriented) SAWS (Sharing Ancient Wisdoms)

� Linguistic
I Ontolex, NIF, POWLA, OLiA

� Palaeographic
I Peter Stokes: DigiPal project
I Paolo Monella: VeDPH seminar, 4th December 2019

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.ancientwisdoms.ac.uk/media/ontology/sawsOntology.owl
http://www.digipal.eu/about/the-project/
http://www1.unipa.it/paolo.monella/babel2019/


40

Scholarly Edition
Information layers

Linked Data support:

� Bibliographic + Textual
I FABiO (FRBR-aligned Bibliographic Ontology)
I CiTO (Citation Typing Ontology)
I DC (Dublin Core)

� Source
I DM2E (Digitised Manuscripts to Europeana)
I FRBRoo (FRBR-object oriented) SAWS (Sharing Ancient Wisdoms)

� Linguistic
I Ontolex, NIF, POWLA, OLiA

� Palaeographic
I Peter Stokes: DigiPal project
I Paolo Monella: VeDPH seminar, 4th December 2019

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.ancientwisdoms.ac.uk/media/ontology/sawsOntology.owl
http://www.digipal.eu/about/the-project/
http://www1.unipa.it/paolo.monella/babel2019/


40

Scholarly Edition
Information layers

Linked Data support:

� Bibliographic + Textual
I FABiO (FRBR-aligned Bibliographic Ontology)
I CiTO (Citation Typing Ontology)
I DC (Dublin Core)

� Source
I DM2E (Digitised Manuscripts to Europeana)
I FRBRoo (FRBR-object oriented) SAWS (Sharing Ancient Wisdoms)

� Linguistic
I Ontolex, NIF, POWLA, OLiA

� Palaeographic
I Peter Stokes: DigiPal project
I Paolo Monella: VeDPH seminar, 4th December 2019

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://www.ancientwisdoms.ac.uk/media/ontology/sawsOntology.owl
http://www.digipal.eu/about/the-project/
http://www1.unipa.it/paolo.monella/babel2019/


41

Scholarly Editions
Linked Data

Example:

I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters

(not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:

I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)

I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto
Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:

I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies
(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


41

Scholarly Editions
Linked Data

Example:
I Vespasiano da Bisticci, Letters (not lemmatised/PoS-tagged!)

Tools:
I TEI-to-RDF converters (e.g. RDF Textual Encoding Framework)
I Linked Data support for the Edition Visualisation Technology (upcoming talk by Roberto

Rosselli del Turco and Paolo Monella at AIUCD 2020)

Initiatives:
I Workshop Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies

(GraphSDE, 3-4.06.2019)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

http://vespasianodabisticciletters.unibo.it/
http://rdftef.sourceforge.net/
https://aiucd2020.unicatt.it/aiucd-programma
http://wp.unil.ch/graphsde/
http://wp.unil.ch/graphsde/


42

Scholarly editions
Hypothetical (and brutally simplistic) Corpus + Edition Linked Data scenario

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


43
Table of Contents

Introduction
Computational Linguistics
Linked Data and Linguistic Linked Open Data

LiLa: Linking Latin

Scholarly Editions
Linked Data
Connection to LiLa

Conclusion

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:

I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora
I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions

I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora
I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora
I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora
I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:

I provide new forms of access to corpora
I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora

I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora
I provide connections to cultural heritage objects

I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


44

Conclusion
Scholarly Editions and Corpora: Mutual benefits

Linguistic corpora:
I provide new forms of access to editions
I provide the bigger picture, i.e. large and diachronic linguistic context

Scholarly editions:
I provide new forms of access to corpora
I provide connections to cultural heritage objects
I provide philological layer of annotation (textual criticism)

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


45

Thanks!
Get in touch

Greta Franzini
CIRCSE, Università Cattolica del Sacro Cuore

greta.franzini@unicatt.it

@ERC_LiLa

https://github.com/CIRCSE

https://lila-erc.eu

Largo Gemelli 1, 20123 Milan, Italy

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme - Grant Agreement No. 769994.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

greta.franzini@unicatt.it
@ERC_LiLa
https://github.com/CIRCSE
https://lila-erc.eu


46

Works cited

I Chiarcos et al. (2018) ‘Towards a Linked Open Data Edition of Sumerian Corpora’, Proceedings
of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018),
May 7-12, Miyasaki, Japan. ISBN: 979-10-95546-00-9

I Eisner, J. (2016) How is computational linguistics different from natural language processing?
I Fischer, F. (2017) ‘Digital Corpora and Scholarly Editions of Latin Texts: Features and

Requirements for Textual Criticism’, Speculum, 92/S1. DOI: 10.1086/693823
I Mitkov, R. (2004) The Oxford Handbook of Computational Linguistics. Oxford: Oxford

University Press

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore

https://www.quora.com/How-is-computational-linguistics-different-from-natural-language-processing
10.1086/693823


47

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


48

LiLa: Structure
Participles vs. adjectives

Ambiguity

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


49

LiLa: Structure
Participles vs. adjectives

Ambiguity Solution

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


50

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

SPARQL endpoint with graphical interface to query against the LiLa triplestore.

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


51

LiLa: Structure
An example: LOD view of LEMLAT lemma prosequor

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


52

EvaLatin
Participate!

I Evaluation campaign designed following a long tradition in NLP (MUC, ACE, SemEval, CoNLL...)
I Shared tasks, shared training and test data, shared evaluation metrics
I 3 tasks:

1. PoS tagging
2. Lemmatisation

I 3 sub-tasks for each task:
1. Basic
2. Cross-Genre
3. Cross-Time

Greta Franzini | CIRCSE, Università Cattolica del Sacro Cuore


	Introduction
	Computational Linguistics
	Linked Data and Linguistic Linked Open Data

	LiLa: Linking Latin
	Scholarly Editions
	Linked Data
	Connection to LiLa

	Conclusion