key: cord-0720684-7w0zbvqz authors: Marian, Ali J. title: Clinical Interpretation and Management of Genetic Variants date: 2020-10-26 journal: JACC Basic Transl Sci DOI: 10.1016/j.jacbts.2020.05.013 sha: e9088c7278d7130c2a65c416d977c4e09512327c doc_id: 720684 cord_uid: 7w0zbvqz Genetic variants are major determinants of susceptibility to disease, response to therapy, and clinical outcomes. Advances in the short-read sequencing technologies, despite some shortcomings, have enabled identification of the vast majority of the genetic variants in each genome. The major challenge is in identifying the pathogenic variants in cardiovascular diseases. The yield of the genetic testing has been limited because of technological shortcomings and our incomplete understanding of the genetic basis of cardiovascular disorders. To advance the field, a shift to long-read sequencing platforms is necessary. In addition, to discern the pathogenic variants, genetic diseases should be considered as a continuum and the genetic variants as probabilistic factors with a gradient of effect sizes. Moreover, disease-specific physician-scientists with expertise in the clinical medicine and molecular genetics are best equipped to discern functional and clinical significance of the genetic variants. The changes would be expected to enhance clinical utilities of the genetic discoveries. The DNA replication machinery is extremely exquisite and precise. The major unit of this complex, namely the DNA polymerase, which was discovered by Arthur Kornberg in 1955, is blazingly fast and amazingly accurate, albeit it is not perfect. The replication machinery incorporates approximately 1 wrong nucleotide per every 100 million nucleotides that it synthesizes (the error rate is w1.3 Â 10 À8 per nucleotide) (2) (3) (4) . Given the size of the human genome being approximately 3.2 Â 10 9 base pairs, the error rate of the DNA replication machinery introduces approximately 50 de novo point mutations and a lesser number of larger mutations with each genome replication (2) (3) (4) (5) . Thus, each offspring differs from the parents by about 50 novel genetic variants. The replication error rate is not uniform across the human genome and varies according to complexity of the genome, with some spots being more prone to mutations (5) (6) (7) . It is this rare error rate of the DNA replication machinery that is mainly responsible for human genetic diversity, and hence, the basis for variation in susceptibility to disease, response to therapy, and the clinical outcomes. The error, however, is not restricted to those occurring during DNA replication but also encompasses mutations that occur during recombination, DNA damage, and impaired repair mechanisms. For example, slippage strand mispairing, which typically occurs at the tandem repeats, leads to expansion of di-and trinucleotide repeats in the genome, which are the causes of the so-called triplet repeat syndromes (8) . It is not unreasonable to surmise that the rare error rate of DNA replication is the essence of life, because in its absence, the eugenic human species would have been amenable to extinction by invading germs or diseases. The current pandemic of coronavirus-2019 , caused by severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2) virus, which is itself a product of a rare error of the replication machinery, is a prime example of the interindividual variability in susceptibility to COVID-2 and its clinical outcomes. The discovery of the first restriction enzyme by Table 1 ) (14) . Nevertheless, only a fraction of genetic variants in human genomes has been detected. The full extent of human genetic diversity is expected to be much greater than that observed so far. Clinicians are increasing challenged with interpretation of the results of genetic testing performed by third party providers as well as the failure in identifying the pathogenic variant(s) upon genetic testing. The lack of a commonly practiced approach to genetic testing further compounds the difficulty. Therefore, a brief discussion of the various steps involved in identification of the genetic variants using the current sequencing techniques is expected to inform physicians and the clinical investigators on the strengths and the shortcomings of the current approaches. WHOLE GENOME AND EXOME SEQUENCING TECHNIQUES. Once the short reads are aligned with the reference genome, nucleotides that differ from the reference sequences are identified as variants. As in the previous steps, confidence in calling a variant directly depends on the number of reads that cover that variant sequence. A low read number does not bode a high confidence in calling a variant. In addition, the reference genome to which the reads are compared has a number of shortcomings that influence accurate and comprehensive detection of the variants. The reference genome, which is a composite of a small number of genomes, does not adequately represent diversity of the human genomes. Consequently, a rare variant that is also present in the reference genome will not be called, and conversely, variants that are not present in the reference genome but are relatively common in another population will be called in the individual genome. This is particularly relevant to populations with different ethnic backgrounds, because variants are typically populationspecific and naturally not expected to be adequately represented in the reference genome (16) (17) (18) . Broadening the composition of the reference genome, although valuable in representation, is unlikely to be sufficient in assessing information content of the variants in the clinical setting. The reference genome also contains gaps, particularly in the complex genomic regions, which interfere with proper identification of the variants. Given that the reference genome is constructed on the basis of short-read sequencing, it does not adequately represent large indels or SVs, and hence, is not a robust reference point for their detection (19) . To overcome some of the shortcomings related to the reference genome, population-specific reference genomes comprising contiguous haploid sequence data of each chromosome are being generated (20) . In addition, long-read single-molecule sequencing technologies are available that are capable of sequencing of several thousand bases of DNA and de novo assembly of each individual's genome. The long-read sequencing approach is particularly relevant to sequencing of the repeat regions by increasing mapping certainty, as well as detection of variants located in the regulatory regions, and large SVs, which are increasingly being implicated in human diseases (reviewed in Eichler [21] ). However, the error rate of current single-molecule sequencing in basecalling, and hence, accurate detection of SNVs and indels, is higher (3% to 15%) than that of short-read sequencing technologies, which limits clinical applications of these technologies (22) . Table 1) . Small indels are also common, but SVs are uncommon. However, SVs typically involve more nucleotides than SNVs because a fraction of SVs are large and encompass several million Each exome contains about 85 and 25 coding indels that affect 3 and 6 nucleotides, respectively, and therefore maintain the coding frame (35) . In-frame indels could also lead to a phenotype, as in the case of cystic fibrosis (36) . About a third of the coding indels affect 1 or 2 nucleotides, corresponding to about 35 and 5 coding indels in each genome, respectively, and lead to a frame shift. Such coding indels often abolish expression of the involved protein and have considerable biological effects, depending on the tolerance of the gene to mutation. As for the noncoding indels, those involving 1 nucleotide are the most common, and their numbers correlate inversely with the number of affected nucleotides, that is, the smaller indels are more common than the larger indels (34) . Indels located in the Table 2) . As discussed earlier, population frequency of the variant is an important consideration, as rare variants are more likely to impart larger effect sizes than the common variants (17, 50) . Overall, the population frequency of the genetic variants inversely correlates with their effect sizes, because a common variant seldom exerts a large effect size. However, it is important to note that rare variants are population specific, and therefore, population frequency of the variants has to be assessed in that context of the specific population in which the variant is identified (16, 18, 26) . In accord with the preceding, de novo variants, defined as variants detected in the index case, but absent in the parents, are more likely to be pathogenic. Each genome has about 50 to 60 de novo variants, the vast majority of which are not expected to exert an effect that is clinically relevant ( Table 1) . However, those located in genes pertinent to the pathogenesis of the disease of interest are strong candidates to be pathogenic, and hence, clinically significant. The TTN gene, encoding the giant sarcomere protein titin, illustrates the challenges one faces in determining pathogenicity of the genetic variants. TTN is a well-established causal gene for dilated cardiomyopathy (51) . Mutations in the TTN gene that The point was well-illustrated more than 2 decades ago upon detection of myocardial tissue Doppler abnormalities before development of overt HCM in individuals who carried pathogenic variants in genes encoding sarcomere proteins (70, 71) . The genetic discovery laid the foundation for the subsequent development of PCSK9 inhibitors, which are highly effective in reducing plasma low-density lipoprotein cholesterol levels as well as reducing cardiovascular mortality (72, 73) . To conclude, the key is to discover the fundamental secrets of the nature and never to be concerned about the immediate clinical or translational impact of the discovery. The impact will become evident over time. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid Rate of de novo mutations and the importance of father's age to disease risk Leveraging distant relatedness to quantify human mutation and gene-conversion rates Novel variation and de novo mutation rates in population-wide de novo assembled Danish trios Estimating the human mutation rate using autozygosity in a founder population Genomewide patterns and properties of de novo mutations in humans Characteristics of de novo structural changes in the human genome The repeat expansion diseases: the dark side of DNA repair Initial sequencing and analysis of the human genome International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome The diploid genome sequence of an individual human The complete genome of an individual by massively parallel DNA sequencing A global reference for human genetic variation A comparative analysis of exome capture Genes mirror geography within Europe A map of human genome variation from population-scale sequencing Insights into human genetic variation and population history from 929 diverse genomes Multi-platform discovery of haplotype-resolved structural variation in human genomes Resolving the complexity of the human genome using single-molecule sequencing Genetic variation, comparative genomics, and the diagnosis of disease Single-molecule sequencing: towards clinical applications Accurate classification of BRCA1 variants with saturation genome editing Evidence from human myectomy samples that MYBPC3 mutations cause hypertrophic cardiomyopathy through haploinsufficiency Tools for predicting the functional impact of nonsynonymous genetic variation An integrated map of genetic variation from 1,092 human genomes Lamin a truncation in Hutchinson-Gilford progeria Recurrent de novo point mutations in lamin A cause Hutchinson-Gilford progeria syndrome IGD: a resource for intronless genes in the human genome Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing Characterization of the effects of mutations in the putative branchpoint sequence of intron 4 on the splicing within the human lecithin:cholesterol acyltransferase gene Extensive in silico analysis of NF1 splicing defects uncovers determinants for splicing outcome upon 5' splice-site disruption The diploid genome sequence of an Asian individual Effects of short indels on protein structure and function in human genomes Small insertions and deletions (INDELs) in human genomes Identification of the cystic fibrosis gene: genetic analysis Fine-scale characterization of genomic structural variation in the human genome reveals adaptive and biomedically relevant hotspots An integrated map of structural variation in 2,504 human genomes Relative impact of nucleotide and copy number variation on gene expression phenotypes Origin of the Swedish long QT syndrome Y111C/KCNQ1 founder mutation Of founder populations, long QT syndrome, and destiny From peas to disease: modifier genes, network resilience, and the genetics of health Modifier genes for hypertrophic cardiomyopathy Modifier genes for sudden cardiac death Genetic determinants of cardiac hypertrophy Phenotypic variability and unusual clinical severity of congenital long-QT syndrome in a founder population Identifying modifier loci in existing genome scan data Genome-wide mapping of modifier chromosomal loci for human hypertrophic cardiomyopathy Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology Truncations of titin causing dilated cardiomyopathy Integrated allelic, transcriptional, and phenomic dissection of the cardiac effects of titin truncations in health and disease Truncating titin mutations are associated with a mild and treatable form of dilated cardiomyopathy Mavacamten treatment for obstructive hypertrophic cardiomyopathy: a clinical trial Long-term safety and efficacy of enzyme replacement therapy for Fabry disease Tafamidis treatment for patients with transthyretin amyloid cardiomyopathy Inotersen treatment for patients with hereditary transthyretin amyloidosis Strategic approaches to unraveling genetic causes of cardiovascular diseases The common long-QT syndrome mutation KCNQ1/ A341V causes unusually severe clinical manifestations in patients with different ethnic backgrounds: toward a mutation-specific risk stratification Long-QT syndrome: from genetics to management Defining the genetic architecture of hypertrophic cardiomyopathy: re-evaluating the role of non-sarcomeric genes Reassessment of Mendelian gene pathogenicity using 7,855 cardiomyopathy cases and 60,706 reference samples Analysis of 51 proposed hypertrophic cardiomyopathy genes from genome sequencing data in sarcomere negative cases has negligible diagnostic yield An international, multicentered, evidence-based reappraisal of genes reported to cause congenital long QT syndrome A potential oligogenic etiology of hypertrophic cardiomyopathy: a classic single-gene disorder Oligogenic inheritance of a human heart disease involving a genetic modifier Analysis of a mutant strain of human fibroblasts with a defect in the internalization of receptor-bound low density lipoprotein Mutations in PCSK9 cause autosomal dominant hypercholesterolemia Oligogenic familial hypercholesterolemia, LDL cholesterol, and coronary artery disease Tissue Doppler imaging predicts the development of hypertrophic cardiomyopathy in subjects with subclinical disease Tissue Doppler imaging consistently detects myocardial abnormalities in patients with hypertrophic cardiomyopathy and provides a novel means for an early diagnosis before and independently of hypertrophy Inclisiran in patients at high cardiovascular risk with elevated LDL cholesterol Alirocumab and cardiovascular outcomes after acute coronary syndrome