key: cord-0859062-gtqo2p18 authors: Climer, Sharlee title: COVID-19 and the differential dilemma date: 2021-04-15 journal: Patterns (N Y) DOI: 10.1016/j.patter.2021.100260 sha: c21eeaeeff496b51ff40ec3879b5ea5eb1745b87 doc_id: 859062 cord_uid: gtqo2p18 The conundrums of choosing candidate genes, via differential expression between treated and mock specimens, are tackled by Ghandikota et al. (2021) in their efforts to tease out genetic patterns that are characteristic of COVID-19 outcomes. COVID-19, caused by SARS-CoV-2 infection, is a heterogeneous disease exhibiting a broad spectrum of symptoms, ranging from mild (e.g. olfactory disfunction, dry cough, head or body aches, sore throat, COVID toes) to critical (e.g. cytokine storm, renal failure, cardiovascular damage, respiratory failure, lethal blood clotting, neurological disorders) 1 . Intensive Care Units dedicated to COVID-19 cases are being confounded by divergent emergency crises, demanding a breadth of specialists and specialized equipment 1 . While some COVID-19 positive individuals exhibit multiple symptoms, others only show one, and many are completely asymptomatic. Analyses of transcriptomics data hold potential to reveal patterns of gene expression associated with specific outcomes, thereby providing valuable foundational information for breakthrough advances, including diagnostic tools to facilitate precision treatment, seeds for generating hypotheses that decipher underlying biological mechanisms, and potential drug targets, some of which may already have effective medications that can be repurposed. However, gene expression data are noisy and analyses are formidable. Moreover, due to the novelty of the virus, COVID-19 data are sparse. In this issue of Patterns, Ghandikota et al. (2021) 2 launch into these challenges and present a multi-layered network modeling strategy to identify several biological processes that may help shed light on this enigmatic disease. Ghandikota et al. skillfully handle sparsity of COVID-19 data in two ways. First, they leverage pre-covid data in their network analyses. This approach has been used by others, e.g. yielding the promising bradykinin storm hypothesis for COVID-19 3 , and the current work utilizes rich data from three SARS-CoV-1 infection (SARS) models, the STRING protein-protein interactions database, the Molecular Signatures Database (MSigDB), NCBI's Phenotype-Genotype Integrator (PheGenI), and NHGRI-EBI's Genome-wide Association Studies (GWAS) catalog. Second, they integrate three separate SARS-CoV-2 infection (COVID-19) data sets for their analysis, drawing from a mouse model and human and African green monkey cell lines. They overcome the diversity of these organisms by utilizing 'consensus' genes as described below. Like many transcriptomic analyses, the study begins by determining differentially expressed genes (DEGs) with significant deviations in expression levels between the treated and mock specimens. The use of DEGs yields both obvious and subtle dilemmas. Due to the large number of statistical tests, some are likely to show significance by chance and corrections are requisite. Balancing false positives and false J o u r n a l P r e -p r o o f negatives when choosing a multiple testing correction method is challenging, as Bonferroni corrections tends to wipe out many significant results and False Discovery Rate (FDR) tends to produce too many erroneous significant DEGs 4 . Because transcriptomic analyses are generally exploratory, FDR is commonly employed, as is done by Ghandikota et al. This approach yields 8286 DEGs, from which they choose consensus genes that exhibit differential expression in at least two of the three datasets. This maneuver strives towards balancing the false positive/false negative quandary and produces a list of 1467 consensus DEGs. A more insidious issue with using DEGs is that some genes tend to be differentially expressed regardless of the phenotype being tested 5 . Using over 600 Affymetrix Human Genome U155 Plus 2.0 datasets for a wide range of phenotypes, Crow et al. ranked ~19,000 genes in a DEG 'priors' list, ordered by likelihood to appear as DEGs in arbitrary transcriptional analyses. They observed 229 genes that appear in more than 10% of the DEG lists produced in the previous analyses, and one gene, CXCL8 (aka IL8), included in nearly one-fifth of the studies. The data used by Ghandikota et al. were generated using Illumina NextSeq 500 and the impact of platform on rankings in the DEG priors list is currently unclear. To test whether their compendium of consensus genes is specific for COVID-19, Ghandikota et al. computed differential expression for 1000 permutation trials in which the phenotype labels were randomly reassigned. These trials produced orders of magnitude fewer DEGs, as well as consensus DEGs, than the unpermuted data, thereby increasing confidence in the COVID-19 specificity of the results. Looking beyond the work presented by Ghandikota et al., a pressing challenge for future analyses involving DEGs is to capture genes that do not signal differential expression when examined in isolation, but exhibit significance when examined as a group containing additional genes (Figure 1) . A single gene can yield multiple protein species due to genetic polymorphisms and via regulatory mechanisms such as alternative splicing and post-translational modifications. Furthermore, more than 500 proteins are currently known to moonlight and perform diverse tasks using a single specific amino acid sequence 6 . These gene multi-tasking operations deepen the intricacies of differential assessments. The toy example in Figure 1 portrays an epistatic interaction in which all three genes are required for the disease pathway. It should be noted that a similar situation may arise for additive interactions in which multiple contributing genes must be considered in unison to observe a signal. In general, a collection of genes interacting in a disease-associated process may exhibit strong differences between treated and mock specimens when tested as a whole, yet each involved gene may show low marginal effects. Given the accumulations of mutations that SARS-CoV-2 has sustained to date, the arms race between virus and vaccines is likely to extend into the foreseeable future 7 . The prevalence of so-called long COVID 8 and the emergence of evidence of long-term neurological and psychiatric outcomes 9 further emphasize the criticality of diagnosing and treating the heterogeneous sequalae presented. Continued generation of COVID-19 omics datasets and focused development of tactical strategies to extricate knowledge from these data are invaluable for treating individuals afflicted by this baffling disease. A rampage through the body. Science (80-. ) Secondary analysis of transcriptomes of SARS-CoV-2 infection models to characterize COVID-19 A mechanistic model and therapeutic interventions for covid-19 involving a ras-mediated bradykinin storm False discovery rate control is a recommended alternative to Bonferroni-type adjustments in health studies Predictability of human differential gene expression MoonProt 3.0: An update of the moonlighting proteins database COVID-19 Vaccines vs Variants-Determining How Much Immunity Is Enough Long COVID: understanding the neurological effects 6-month neurological and psychiatric outcomes in 236 379 survivors of COVID-19: a retrospective cohort study using electronic health records This research is supported in part by the National Institutes of Health grants 1RF1AG053303-01 and 3RF1AG053303-01S2.