id author title date pages extension mime words sentences flesch summary cache txt cord-001974-wjf3c7a7 Friis-Nielsen, Jens Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers 2016-02-19 .txt text/plain 5773 348 48 Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. The datasets went through a sequential pipeline with modules (in order) of preprocessing, computational subtraction of host sequences, low-complexity sequence removal, sequence assembly, clustering, association to metadata features, and taxonomical annotation. Associations from the shortest mode tended to have higher dispersion in the range of ORs. Furthermore, one block of clustering results using global alignment mode, alignment length based on the shortest contig, and a minimum sequence identity of 90% (c09ˆaSyG1), had an overall high range of ORs as well as the highest minimum values. The clusters are significantly associated with lowest p-values to biological features and the species annotations are described by HMP. ./cache/cord-001974-wjf3c7a7.txt ./txt/cord-001974-wjf3c7a7.txt