id author title date pages extension mime words sentences flesch summary cache txt cord-025517-rb4sr8r4 Koutsomitropoulos, Dimitrios A. Automated MeSH Indexing of Biomedical Literature Using Contextualized Word Representations 2020-05-06 .txt text/plain 4474 236 52 4 presents our methodology and approach, by outlining the indexing procedure designed, describing the algorithms used and discussing optimizations regarding dataset balancing, distributed processing and training parallelization. There are two steps in this method: first, constructing MeSH term graph based on its RDF data and sampling the MeSH term sequences and, second, employing the FastText subword embedding model to learn the distributed word embeddings based on text sequences and MeSH term sequences. We then proceed by evaluating and reporting on two prominent embedding algorithms, namely Doc2Vec and ELMo. The models constructed with these algorithms, once trained, can be used to suggest thematic classification terms from the MeSH vocabulary. This body of text is next fed into the model and its vector similarity score is computed against the list of MeSH terms available in the vocabulary. Training datasets comprise biomedical literature from open access repositories including PubMed [19], EuropePMC [3] and ClinicalTrials [17] along with their handpicked MeSH terms. ./cache/cord-025517-rb4sr8r4.txt ./txt/cord-025517-rb4sr8r4.txt