id author title date pages extension mime words sentences flesch summary cache txt work_55utqx7tjrft5ojtbr67ypjdye Alexandra Schofield Comparing Apples to Apple: The Effects of Stemmers on Topic Models 2016 14 .pdf application/pdf 7865 602 59 Comparing Apples to Apple: The Effects of Stemmers on Topic Models First, conflating semantically related words into one word type could improve model fit by intelligently reducing the space In this work we consider two categories of word normalization1 methods: rule-based stemmers, or stemmers primarily reliant on rules converting one affix to another, and context-based methods, or strategies that use dictionaries and other contextual, inflectional, and derivational information to infer the for our data, lemmatizing the corpus took more computational time than training the topic model. fewer possible words; at its extreme, the probability of any corpus under a zero-truncation stemmer no-stemmer treatment t0, we take the difference between topic probabilities, weighted by inverse document frequency (idf) to favor words that are specific While stemming constrains all conflated word types to share one probability in each topic, it does not ensure that those ./cache/work_55utqx7tjrft5ojtbr67ypjdye.pdf ./txt/work_55utqx7tjrft5ojtbr67ypjdye.txt