id author title date pages extension mime words sentences flesch summary cache txt work_zhl5dw75nfhy7oputsllvjjcqq Maria Antoniak Evaluating the Stability of Embedding-based Word Similarities 2018 14 .pdf application/pdf 7871 979 66 Evaluating the Stability of Embedding-based Word Similarities NLP research in word embeddings has so far focused on a downstream-centered use case, where direct human analysis of nearest neighbors to embedding vectors, and the training corpus is not simply an Other studies use cosine similarities between embeddings to measure the variation Table 3: The three settings that manipulate the document order and presence in each corpus. of documents could be an important factor for algorithms that use online training such as SGNS. a document-based embedding method that uses matrix factorization, LSA (Deerwester et al., 1990; Landauer and Dumais, 1997). To train our PPMI word embeddings, we use This indicates that the presence of specific documents in the corpus can significantly affect the cosine similarities between embedding vectors. across runs of the BOOTSTRAP setting for the full corpus of AskScience, the whole document length, and the GloVe in cosine similarities between word embeddings vectors. ./cache/work_zhl5dw75nfhy7oputsllvjjcqq.pdf ./txt/work_zhl5dw75nfhy7oputsllvjjcqq.txt