PowerPoint-Präsentation 1 Amir Moghaddass Esfehani Campus Library ADHO DH2019 Workshop "Towards Multilingualism In Digital Humanities: Achievements, Failures And Good Practices In DH Projects With Non-latin Scripts" No text – no mining. And what about dirty OCR? 2 No text – no mining. And what about dirty OCR? • OCR • Metadata • Tools & APIs No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani 3 Dirty OCR: Layout & Text No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani ㊅ @ 問 靈你’身栎物肩 “ 1 , ? \ ^ , 41安5 ~ 10, 與I 神的 蓋自神你 也^重孤确触&何^)! 魂 何 耶 0 原 耶。 荷本狗' 0 耶。造等答白, 0我的'曰,小 愁的物眞子 答爲 承 是 魂 也。 地如 ~萬此 ^ # ^ |+之教 之神也。 人 Precision = 0,448 Recall = 0,371 F-measure = 0,272 Error rate = 58,3% 4 „Discovery“ Of A Chinese Ice Age No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani 夾 襖綿 冰 5 „Discovery“ Of A Chinese Ice Age No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani 夾 + 冰 + 綿 + 襖 1750 來 + 永 + 秀 +澳 | | | | 6 Metadata 嚴如熤 Yan Ruyi: 苗防備覽: [22卷] Miao fang bei lan [22 juan]. 紹義堂, Daoguang 23 [China, 1843]. https://nbn-resolving.org/urn:nbn:de:bvb:12-bsb11123105-5 No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani Descriptive Structural Rigths Technical OCR MDZ > OPAC > DEAC > DDB > ZVDD > Europeana MARC RDF IIIF Retrieval MARC RDF METS/ MODS IIIF cortex EDM EDMMETS/ MODS MARC non-latin script 7 Chinese Text Project: OCR No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani Sturgeon, Donald (2018): Large-scale Optical Character Recognition of Pre- modern Chinese Texts. International Journal of Buddhist Thought and Culture (2), p. 11-44. https://digitalsinology.org/zh/wiki/Fil e:Ctext-ocr.png 8 Chinese Text Project / MARKUS: Tools & APIs No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani Text reuse Regex NER (MARKUS) 9No text – no mining. And what about dirty OCR? ADHO 08.07.2019 Amir Moghaddass Esfehani Thank you! AMIR MOGHADDASS ESFEHANI Campus Library Freie Universität Berlin amir.moghaddass@fu-berlin.de No text – no mining. �And what about dirty OCR? No text – no mining. �And what about dirty OCR? Dirty OCR: Layout & Text „Discovery“ Of A Chinese Ice Age „Discovery“ Of A Chinese Ice Age Metadata Chinese Text Project: OCR Chinese Text Project / MARKUS: Tools & APIs Foliennummer 9