key: cord-0057715-1gkl560g authors: González, Lino; García-Barriocanal, Elena; Sicilia, Miguel-Angel title: Entity Linking as a Population Mechanism for Skill Ontologies: Evaluating the Use of ESCO and Wikidata date: 2021-02-22 journal: Metadata and Semantic Research DOI: 10.1007/978-3-030-71903-6_12 sha: c22771287b3016531c3d74e0c139f08d1ce9bc9c doc_id: 57715 cord_uid: 1gkl560g Ontologies or databases describing occupations in terms of competences or skills are an important resource for a number of applications. Exploiting large knowledge graphs thus becomes a promising direction to update those ontologies with entities of the latter, which may be updated faster, especially in the case of crowd-sourced resources. Here we report a first assessment of the potential of that strategy matching knowledge elements in ESCO to Wikidata using NER and document similarity models available at the Spacy NLP libraries. Results show that the approach may be effective, but the use of pre-trained language models and the short texts included with entities (labels and descriptions) does not result in sufficient quality for a fully automated process. Competence/skill databases and knowledge bases are an important component for different applications. Notably, matching training and job offers to candidate profiles requires some kind of expression of the available capacities and the competence or skill gap, that can be used as the basis for building models, e.g. models that match needs in projects [1] . While there are some mature and curated occupational databases that connect job positions to competence components, the lexical resources they contain require in some sectors a constant update to adapt to the changing job market as expressed in job offerings, since the latter are nowadays mostly posted and disseminated in the form of semi-structured text. A promising approach for the update of those competences is that of reusing other, non-occupational or general-purpose open knowledge bases that are curated as crow-sourced resources, as for example, Wikipedia-related projects. This could be useful in reducing update time, and enrich the databases, and may also support other applications or related functionality that could exploit the knowledge graphs that such kind of general purpose resources provide. Entity linking techniques become thus a promising approach to complement expert curation in occupational databases with entities matched in open general purpose knowledge graphs. However, this requires an assessment of the effectiveness of those tools, with regards to the quality and usefulness of the links produced. In this paper, we present the results of an experiment in entity linking for occupational databases. Concretely, we provide results of the use of state of the art entity linking algorithms between the large effort of ESCO, the European Skills, Competences, Qualifications and Occupations ontology [3] and Wikidata. The rationale for using ESCO is that the structure of skills includes fine-grained knowledge items, that are more likely to produce matches that may be useful in bringing more elements to the database. For example, a match of a programming language or some concrete industrial machine may be used to extract more potential knowledge items by traversing Wikidata relations, and hopefully, some of them would reflect novel or recent skills that have been incorporated to Wikidata as part of the continuous process of crowd-sourcing by volunteer curators. The rest of this paper is structured as follows. Section 2 provides background information on occupational models and ESCO, and briefly surveys related research. Section 3 describes the materials and methods used and their rationale. Then, results are discussed in Sect. 3. Finally, conclusions and outlook are provided in Sect. 5. Occupational databases containing competences and skills have been developed in the last years, principally as a way to support statistics and policy on the labour market. These databases are typically of a national scope, and follow diverse schemas for the description of competences and skills. The European Commission is developing ESCO (European Skills, Competences, Qualifications and Occupations) together with stakeholders as employment services, employer federations, trade unions, and professional associations. ESCO [3] is an attempt to provide vocabularies for the labour market, with concepts as subclasses of SKOS concepts 1 . It covers three different domains: occupations, knowledge, skills and competences, and qualifications. Here we are concerned with the second. Concretely, we deal with the definition of ESCO skills. The ESCO skills pillar distinguishes between (a) skill/competence concepts and (b) knowledge concepts by indicating the skill type. There is however no distinction between skills and competences. Since skill/competence concepts are usually described as short phrases describing some work-related ability or performance, we focus here on mapping only "knowledge"-type skills, since they contain in many cases proper nouns (e.g. names of computer languages, software tools or machines) that are better candidates for an unambiguous mapping to resources in publicly available knowledge graphs. A central use case for ESCO is matching job offers [2] , and for that task, having a rich and updated list of concrete entities is critical. While ongoing editorial work of the ESCO Reference Groups was the primary method for initial content creation, mining external resources is considered a potential method for update. Previous work has already combined ESCO with other models or assets for particular purposes. For example, Sibarani et al. [7] combine ESCO and Schema.org 2 for the task of job market analysis. Shakya and Paudel [6] use ESCO in candidate matching as a schema to integrate disparate data. However, to the best of our knowledge, enriching ESCO with open knowledge graphs has not been addressed in previous work. The method for linking skills consisted on three steps: entity recognition, entity linking and extracted candidate entities. A pre-trained Named Entity Recognition (NER) model (described below) was used for the first step. First, the entities obtained were filtered based on the manual inspection of each of the matchings and their type. Then, the filtered skills were matched against a Wikidata dump. Finally, the resulting Wikipedia resources matched were examined manually and a final extraction step involved a search for related instances that were candidates to be added to ESCO. The NER model provided in SpaCy is based on state of the art neural models [8] , that uses convolutional networks and built on GloVe vectors. We used the en core web lg model, based on the large Ontonotes 5 3 corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows). The NER f-score reported in Spacy documentation is 85.36. The documents for the matching were the result of the concatenation of the fields preferredLabel, description and altLabels found in the file containing the skills obtained from the ESCO website (version 1.0.3), in English language. Only skills of ESCO type knowledge were used for the matching, since the departure assumption is that names of concrete entities appear in that kind of ESCO resources. A file with the matchings was produced, including information on the matched skill and the text and type of the entity identified. Entity linking was carried out by disambiguation of entities in the large Wikibase knowledge graph [4] . We used SPARQL queries that match the strings of the terms found, we look for entities whose descriptions are similar to the preferredLabel and description fields extracted from ESCO. A direct FILTER query on labels is not feasible as the queries timeout, however, Wikibase provides a way of using the MediaWiki API, with all labels in words indexed as in the following query (where