id author title date pages extension mime words sentences flesch summary cache txt cord-020871-1v6dcmt3 Papariello, Luca On the Replicability of Combining Word Embeddings and Retrieval Models 2020-03-24 .txt text/plain 2135 146 55 We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. The last 5 years have seen proof that neural network-based word embedding models provide term representations that are a useful information source for a variety of tasks in natural language processing. They are grouped in three sets: classification, clustering, and information retrieval, and compare "standard" embedding methods with the novel moVMF representation. First, text processing (e.g. tokenisation); second, creating a fixed-length vector representation for every document; finally, the third phase is determined by the goal to be achieved, i.e. classification, clustering, and retrieval. We replicated previously reported experiments that presented evidence that a new mixture model, based on von Mises-Fisher distributions, outperformed a series of other models in three tasks (classification, clustering, and retrievalwhen combined with standard retrieval models). ./cache/cord-020871-1v6dcmt3.txt ./txt/cord-020871-1v6dcmt3.txt