Towards a Comprehensive Assessment of the Quality and Richness of the Europeana Metadata of food-related Images Yalemisew Abgaz1, Amelie Dorn2, José Luis Preza Díaz2 & Gerda Koch3 1Adapt Centre DCU, 2ACDH-CH OeAW, 3Europeana Local - AT AI4HI-2020 Virtual Workshop @ LREC2020 26.05.2020 https://drive.google.com/open?id=1NenlAcA4q298-lzb4cgIZe3mjpTnxlaT https://drive.google.com/open?id=1NenlAcA4q298-lzb4cgIZe3mjpTnxlaT https://drive.google.com/open?id=1NenlAcA4q298-lzb4cgIZe3mjpTnxlaT https://chia.acdh.oeaw.ac.at/ai4hi-2020-workshop/ https://lrec2020.lrec-conf.org/en/ Background ChIA • Interdisciplinary Digital Humanities project (2019-2021) • Involved expertise: Digital Humanities, AI & NLP (ACDH-CH OeAW, AT) Semantic technologies (Adapt Centre, IE) Cultural Image aggregation (Europeana Local - Österreich, AT) • Projet aim & results: the ChIA system - enabled increased access and analysis possibilities of cultural (food) images for content providers and educational purposes Background Europeana data set Total: 58.6 Mio digital objects Includes: 34.2 Mio digital images from: 3.500 institutions in 42 countries The Problem Dataset: Selection based on food context of images 42.969 images (available with Free Access licenses) were selected in form of various sets (baskets) for later download & analysis of metadata and images The Problem Metadata Although descriptions are available they seldom tell all details on what is depicted in the images. In most cases content descriptions use iconographic phrases like “fruits”, “flowers”, “still life”. The Problem Vocabularies - Some institutions deliver metadata to Europeana that already includes vocabulary URIs. - Europeana enrich semantic connections with vocabularies such as AAT, ULAN, IconClass, VIAF, LCSH http://data.europeana.eu/concept/base/222 http://iconclass.org/41A671 - No specific food or drink related vocabularies are used. - Irregularity in the use of semantics across the dataset. http://data.europeana.eu/concept/base/222 http://iconclass.org/41A671 The problem The current image collection - mostly have metadata that is focused on bibliographic and format related but lacks domain-specific metadata - most records are not interlinked on the basis of content (only when joint vocabularies are used) - Thus,the current metadata needs analysis on the basis of - The quality of the current metadata - The use of multiple domain-specific vocabularies - The gap between what the image depicts and the metadata express Proposed Solution Analyse the richness of the metadata using - Quantitative approach - using objective quality assessment metrics - Qualitative approach - using expert judgement on the expressiveness of the metadata - Semantic enrichment to fill the gap - Computer vision - Semantic annotation Quality Analysis Metrics Metrics in four categories are selected - Contextual - indication of used vocabularies - Intrinsic - extensional conciseness - Accessibility - links to external LD providers - Representational Semantic Richness Analysis Semantic Richness: The availability of multiple descriptors of a resource particularly representing the main concepts represented by the target. More semantics for this image - Fruits - apple - grapes - Rose flower? etc - Objects - vase - bowl? - Culture - what culture does it represent? - rich/poor? Results so far Our initial analysis shows that the metadata - is rich in bibliographic information - provides labels with multiple languages - however, lacks semantic richness Our Current work - use of selected vocabularies to quantify semantic richness - Analysis of the images with computer vision has a potential to address the richness problem - Preparation of a training set is underway Conclusion - It is important to provide quality metadata to improve search and retrieval of historical images - Semantic richness is a key for the search and exploration of historical images - Understanding the gap is crucial to do the semantic annotation - The use of computer vision combined with expert annotation and evaluation has a potential to improve both semantic richness and quality. Thank you for listening Any Questions? Or ideas for collaboration? Yalemisew.Abgaz@adaptcentre.ie Amelie.Dorn@oeaw.ac.at kochg@europeana-local.at #chia4dh @yalemisew @adooorn @Europeanaeu @jlprezadiaz Check out our website → https://chia.acdh.oeaw.ac.at 13 https://chia.acdh.oeaw.ac.at