id author title date pages extension mime words sentences flesch summary cache txt cord-340907-j9i1wlak Zarai, Yoram Evolutionary selection against short nucleotide sequences in viruses and their related hosts 2020-04-27 .txt text/plain 8162 415 45 Here, based on a novel statistical framework and a large-scale genomic analysis of 2,625 viruses from all classes infecting 439 host organisms from all kingdoms of life, we identify short nucleotide sequences that are under-represented in the coding regions of viruses and their hosts. Figure 3A and B depicts the average number of under-represented sequences of size m ΒΌ 3, 4, and 5 nucleotides, identified in few subsets of viruses in both the original and random variants of the virus. A sampling analysis that we performed (see Supplementary document, Section 2.8) suggests that the number of under-represented sequences identified in dsDNA viruses matches their genomic size, when compared with RNA viruses. To show that the correspondence between selection against short palindromic sequences in viruses and restriction sites cannot be explained by basic coding region features such as amino-acid content and order, codon usage bias and dinucleotide distribution, we also evaluated the overlap between restriction sites and common under-represented sequences of random variants of viruses. ./cache/cord-340907-j9i1wlak.txt ./txt/cord-340907-j9i1wlak.txt