key: cord-010273-0c56x9f5 authors: Simmonds, Peter title: Virology of hepatitis C virus date: 2001-10-10 journal: Clin Ther DOI: 10.1016/s0149-2918(96)80193-7 sha: doc_id: 10273 cord_uid: 0c56x9f5 Hepatitis C virus (HCV) has been identified as the main causative agent of post-transfusion non-A, non-B hepatitis. Through recently developed diagnostic assays, routine serologic screening of blood donors has prevented most cases of post-transfusion hepatitis. The purpose of this paper is to comprehensively review current information regarding the virology of HCV. Recent findings on the genome organization, its relationship to other viruses, the replication of HCV ribonucleic acid, HCV translation, and HCV polyprotein expression and processing are discussed. Also reviewed are virus assembly and release, the variability of HCV and its classification into genotypes, the geographic distribution of HCV genotypes, and the biologic differences between HCV genotypes. The assays used in HCV genotyping are discussed in terms of reliability and consistency of results, and the molecular epidemiology of HCV infection is reviewed. These approaches to HCV epidemiology will prove valuable in documenting the spread of HCV in different risk groups, evaluating alternative (nonparenteral) routes of transmission, and in understanding more about the origins and evolution of HCV. Hepatitis C virus (HCV) has been identified as the main causative agent of posttransfusion non-A, non-B hepatitis. 1,2 The identification of HCV led to the development of diagnostic assays for infection, based either on detection of antibody to recombinant polypeptides expressed from cloned HCV sequences or direct detection of virus ribonucleic acid (RNA) sequences by polymerase chain reaction (PCR) using primers complimentary to the HCV genome. Routine serologic screening of blood donors now prevents most or all cases of posttransfusion hepatitis. Assays 0149-2918/96/$ 3.50 for antibody also are important diagnostic tools and have been used to investigate the prevalence of HCV in different risk groups, such as intravenous drug users, patients with hemophilia, and other recipients of blood products, and to conduct epidemiologic studies of HCV transmission. The complete genomic sequence of HCV has been determined for several isolates, revealing both its overall genome organization and its relationship to other RNA viruses. Deducing possible methods of replication by analogy with related viruses is possible, although such studies currently are hampered by the absence of a satisfactory in vitro culture method for HCV. As a consequence, most conventional virologic studies are difficult and artificial. HCV contains a positive-sense RNA genome approximately 9400 bases in length. In overall genome organization and presumed method of replication, it is most similar to members of the family Flaviviridae, particularly in coding for a single polyprotein that is then cleaved into a series of presumed structural and nonstructural proteins (Figure 1 ). 3 The roles for these different proteins have been inferred by comparison with related viruses and by in vitro expression of cloned HCV sequences in prokaryotic and eukaryotic systems. These artificial systems allowed the investigation of protein expression, cleavage, and posttranslational modifications. There are numerous positive-stranded RNA virus families whose coding capacity is contained within a single open read-ing frame (ORF) as is found in HCV, and with which it may be usefully compared (Table I) . Among human viruses, these include both Picomaviridae (eg, poliovirus, coxsackievirus A and B, and hepatitis A virus) and Flaviviridae (eg, dengue fever and yellow fever virus). The genomes of those viruses have a similar organization with structural proteins at the 5' end and nonstructural proteins at the 3' end. However, virus families differ in genome size, the number of proteins produced, the mechanism by which the polyprotein is cleaved, and the detailed mechanism of genome replication. For example, the genome of the Picornaviridae is shorter than that of HCV (approximately 7200 to 8400 bases), contains four nucleocapsid proteins (compared with the single protein of HCV), is nonenveloped (and therefore contains no homologues of the two HCV-encoded HCV glycoproteins E 1 and E2), and uses exclusively virus-encoded proteases to cleave its polyprotein. This is different from both HCV and the Flaviviridae, in which cleavage of the structural proteins is thought to be carried out by the host cell--derived signalase. Members of the Flaviviridae have many features in common with HCV. They have a similar genome size (yellow fever virus has 10,862 bases 4 compared with 9379 for HCV 3) and package a viral-encoded glycoprotein into the virus envelope (El). The homologue of E2 in flaviviruses (a membrane-bound glycoprotein called NS 1; "NS" stands for nonstructural) is expressed only on the infected cell surface. Like HCV, the polyprotein is cleaved by a combination of viral and host cell proteases. Although there is no close sequence similarity between HCV and other known viruses, at least two regions with conserved amino acid residues provide Another fundamental aspect of genome organization that differs between the flavivirus and picornavirus families is the structure of the 5' and 3' untranslated regions (UTRs). These parts of the genome are involved in HCV replication and initiation of translation by cellular ribosomes of the virus-encoded polyprotein. Pestiviruses and HCV show evidence for a highly structured 5'UTR and 3'UTR, in which internal base-pairing produces a complex set of stem-loop structures that are thought to interact with various host cell and virus proteins during replication. 7,8 In particular, studies have shown that for the Picornaviridae and, more recently, for HCV and pestiviruses, 7,9A° such structures are involved in internal initiation of translation, in which binding to the host cell ribosome directs translation to an internal methionine (AUG) codon. This contrasts strongly with translation of flavivirus genomes, which act much like cellular messenger RNA in which ribosomal binding initially occurs to the capped 5' end of the RNA, followed by scanning of the sequence in the 5' to 3' direction with translation commencing from the first AUG codon. Structurally, HCV is also more similar to the pestiviruses than the flaviviruses, with an exceptionally low buoyant density in sucrose (1.08 to 1.11 g/cm3), 11 similar to that reported for pestiviruses and attributable in both cases to heavily gly-cosylated external membrane glycoproteins in the virus envelope. By contrast, flavivirus envelope glycoproteins contain few sites for N-linked glycosylation, and the virion itself is relatively dense (1.2 g/cm3) . The arrangement and number of cleavage sites of the HCV polyprotein are more similar to pestiviruses, particularly in the further cleavage of both NS4 and NS5 proteins into two subunits, in both cases with NS5b corresponding to the RNA polymerase. Recently, two distinct RNA viruses have been discovered in New World primate tamarins (Sanguinis). This monkey species had previously been shown to harbor an infectious agent causing chronic hepatitis originally derived from inoculation with plasma from a surgeon (GB) in whom chronic hepatitis of unknown etiology had developed. 12 Parts of the genome of the two viruses (provisionally called GBV-A and GBV-B) show measurable sequence similarity to certain regions of HCV. For example, a 200-amino-acid sequence of part of NS3 of GBV-A and GBV-B shows 47% and 55% sequence similarity with the homologous region in HCV (positions 1298 to 1497 in the HCV polyprotein) 3 and 43.5% sequence similarity to each other. Similarly, in NS5, the region around the active site of the RNA polymerase (including the GDD motif and positions 2662 to 2761 in HCV) shows 36% and 41% sequence similarities and 43% between GBV-A and GBV-B. 12 In these nonstructural regions, these similarity values are greater than those between HCV and pestiviruses or flaviviruses, although little homology can be found on comparison of the regions of the genomeencoding structural proteins (ie, the core and envelope), nor with the normally highly conserved 5'UTR. The degree of relatedness between HCV and other positive-stranded RNA viruses can be more formally analyzed by phylogenetic analysis of highly conserved parts of the genome, such as the NS5 region (and homologues in other viruses) encoding the RNA-dependent RNA polymerase, which invariably contains the canonical GDD motif necessary for the enzymatic activity of the protein. Comparisons of a 100-amino-acid sequence surrounding this motif indicate a close relationship between HCV and GBV-A and GBV-B, an intermediate degree of relatedness with the pestivirus bovine viral diarrhea virus, and a much more distant relatedness to fiaviviruses ( Figure 2 ). 6 '13 Remarkably, a series of plant viruses that are structurally distinct from each of the mammalian virus groups, and with different genome organizations, have RNA-dependent RNA polymerase amino acid sequences that are perhaps more similar to those of HCV than are the flaviviruses. HCV replication has been studied using a variety of experimental techniques. However, little progress has been made toward the development of a practical HCV culture. HCV does not produce obvious cytopathology, and the amount of HCV released from cells infected in vitro often is low. 14-18 This might be because the cells used for culture are not representative of those infected in vivo, or because productive infection requires a combination of cytokines and growth factors that might be present in the liver but which cannot be recreated in cell culture. The observation that low levels of HCV replication might be detected in lymphocyte 14,18 and hepatocyte cell lines 16, 17 indicates that either the tropism of HCV for different cell types may be greater than first imagined or that the virus replication detected so far does not represent the full replicative cycle of HCV that occurs in vivo. Transfection of full-length DNA sequences of the HCV genome might be expected to initiate the full replicative cycle of HCV, as it does when similar experiments are done in picornavirus sequences. However, only a low level of expression of virus proteins was observed when a complete HCV sequence was transfected into a transformed hepatocyte (hepatoma) cell line (Huh7). 19 Despite this, there was evidence of replication of the HCV genome and the production of low concentrations of progeny virus particles. Such models provide an important experimental system for future investigations of HCV replication. In common with other positive-strand RNA viruses, HCV is presumed to replicate its RNA genome through the production of a replication intermediate (ie, an RNA copy of the complete genome) and is synthesized by the activity of a virally encoded RNA-dependent RNA polymerase. The minus-strand copy would then be used to generate positive-stranded copies. Because templates can be reused, several minus-strand copies can be synthesized from the infecting positive strand, and each of these transcripts can be used several times to produce positive-strand progeny sequences. In this way, a single input sequence may be amplified several thousandfold. Although initiation of transcription is well understood for some positive-strand RNA viruses (such as the Picornaviridae), no information currently is available on how RNA synthesis of HCV or other fla- Koonin 6 for sources of non-HCV sequences. GBV-A, GBV-B, and HCV (genotypes la, lb, 2a, 2b, and 3a shown) were aligned using the program CLUSTAL, and phylogenetic analysis was done using the programs PROTDIST (PAM matrix), NEIGHBOR, and DRAWTREE in the PHYLIP package. 13 viviruses is primed. HCV lacks homopolymeric tracts (such as poly [U] in the picornaviruses) at the 5' end of the genome, whereas the 3' end is variable, containing either poly(U) or poly(A) tracts, or possibly neither, as now appears to be the case with the related pestiviruses. Furthermore, there appears to be no homologue of the VpG protein of Picornaviridae. For these reasons, it is likely that the mechanism of transcription initiation for HCV is different. Using a strand-specific PCR method, antisense HCV RNA sequences have been detected in the liver of HCV-infected patients, confirming the presumed method of replication of HCV via a replication intermediate. 2° Such assays provide a valuable technique for detecting HCV replication, as both a sensitive method of monitoring HCV replication in virus culture experiments and a way of investigating the range of cell types and distribution of HCV infection in HCV-infected patients. In particular, the possibility of replication at extrahepatic sites has been proposed on the basis of such assays; these studies have been reviewed by Lau et al. 21 The 5'UTR is thought to play a significant role in initiating and regulating translation of the large ORF of HCV. This region is approximately 341 to 344 bases long, and a combination of computer analysis, nuclease mapping experiments, and studies of covariance has led to a proposed secondary structure model for this part of the genome ( Figure 3 ). 22 Using the same methods, researchers have predicted a remarkably similar structure for pestiviruses, s despite the virtual absence of nucleotide sequence similarities with HCV, indicating the importance of the overall structure of this region in interactions with viral and cellular proteins or other RNA sequences. Direct evidence for internal initiation of translation has been obtained from in vitro translation of reporter genes downstream from the 5'UTR sequence placed in mono-or dicistronic vectors. 9'23'24 The nonpaired tip of the stem-loop structure 3 is partially complimentary to the 18S subunit of ribosomal RNA and may, therefore, be the site of binding during internal initiation. 25 The internal ribosomal entry site activity of 5'UTR is consistent with the hypothesis that translation is initiated from the AUG methionine codon at position 341.22 There is no evidence for translation from any of the variable number of AUG triplets upstream from position 341, although production of the small proteins from these upstream potential ORFs may play some role in regulating expression of the large ORF. 26 In the absence of a cell culture system for HCV, most information available on the expression and processing of HCV proteins has been obtained from transfection experiments with cloned DNA sequences corresponding to the different proteins, and more recently by direct observations of the cellular distributions and properties of HCV proteins detected in liver or plasma in vivo. Transfection of prokaryotic or eukaryotic cells with DNA copies of different parts of the HCV genome under the control of artificial promoters allows expression of the encoded proteins, and provides a useful technique for studying their synthesis, biochemical properties, and Table II ). Expression of this part of the genome in cells 27-29 or reticulocyte-lysate-containing microsomal membranes 3°,31 leads to the synthesis of a polyprotein and its cleavage into a series of proteins. The protein identified as the capsid protein on the basis of comparisons with related viruses is expressed as a protein of approximate size 21 to 22 kd. The assignment of this protein as the nucleocapsid protein is supported by the presence of regions within the protein containing numerous basic (positively charged) amino acids that may have RNA-binding properties associated with the encapsidation of HCV RNA during virus assembly. Binding of core protein to ribosomal RNA has recently been reported. 31 Using similar techniques, expression of the putative envelope proteins of HCV (El and E2) leads to the synthesis in mammalian cells of two heterogeneous proteins with sizes ranging from 31 to 35 kd and 68 to 72 kd, respectively. 27-31 Cleavage between the capsid protein and El, E1 and E2, and E2 and NS2 depends on .~o the addition of microsomal membranes, implying that the host cell signalase has a role in these processing steps. The sizes of E1 and E2 are greater than could be explained by their amino acid sequences alone and support biochemical evidence for extensive glycosylation of both proteins after translation. Both E1 and E2 have a large number of potential N-linked glycosylation sites, although the details of which sites are used, the extent to which the glycoprotein moieties are modified, and whether there is also O-linked glycosylation await further biochemical analysis. Two cleavage sites between E2 and NS2 (both microsome dependent) have recently been identified, 32 leading to the production of E2 proteins differing in size by 80 amino acid residues. 33 Evidence for intermolecular associations between E1 and E2 has been obtained through immunoprecipitation experiments, in which antibody to E1 or E2 could precipitate both proteins under nondenaturing conditions. 28,29.34 The nature or significance of this association is unclear, although current evidence suggests that the association is predominantly noncovalent 34 and does not occur simply through hydrophobic interactions between the membrane anchors of the two proteins. 33,35 Recently, monoclonal antibodies to either E1 or E2 were shown to coprecipitate NS2 and NS3, 35 and there also is evidence for associations between E2, NS2, and NS4b. 33 In vitro translation of the rest of the genome leads to the production of proteins of sizes 23, 70 to 72, 4, 27, 56 to 58, and 66 kd, corresponding to NS2, NS3, NS4a, NS4b, NS5a, and NS5b, respectively ( Figure 1 ; Table II ). Proteolytic cleavage pathways that generate the nonstructural proteins are mediated by NS2 and NS336 and have been extensively studied by several groups, as they represent possible targets of antiviral treatment. NS3 is a serine protease that catalyzes cleavage reactions between NS3/NS4a, NS4a/NS4b, NS4b/NS5a, and NS5a/ NS5b. 36~° NS2a is a metalloproteinase that cleaves the NS2/NS3 junction. 36.41 The NS3/NS4a cleavage reaction mediated by NS3 and the NS2/NS3 cleavage mediated by NS2 occur in cis, whereas other reactions can occur through intermolecular associations between NS3 and the rest of the polyprotein. Accounts of the complex sequence of events and the interactions between nonstructural proteins involved in cleavage reactions differ in detail depending on the experimental methods used. However, cleavage may be a sequential process modulated by the activities of other proteins, such as NS4a. NS2 protease activity is zinc dependent and contains an active site dependent on residues in NS3. Therefore, after the cis cleavage of the NS2/NS3 junction, the protease is inactivated and will not act in trans on other substrates. This cleavage reaction has been shown to be essential for activating NS3 protease, and that natural variation in the efficiency of the reaction may modulate the pathogenicity of HCV in vivo. 42 When released, NS3 cleaves other sites with varying efficiencies. The active site of NS3 has been mapped by deletion experiments to lie at the amino terminus of the protein (residues 1409 to 1215). 43 The substrate specificity of the serine protease activity has been defined by sequence comparisons and mutagenesis experiments 44-47 and generally conforms to the consensus sequence D/E----C/T$S/A in the target protein. There is some evidence for a less stringent requirement for spe-cific amino acids around the cis cleavage site (NS3/NS4a) than for those cleaved in trans. 46 Several investigators have described the requirement for other protein cofactors for the activity of NS3. In particular, it appears that binding of NS4a to NS348,49 is necessary at least for the cleavage of NS4b/NS5a and may modulate the activity of NS3 in other ways. Although there is now some information on the proteolytic cleavage steps used to process the HCV polyprotein, the difficulty associated with in vitro culture of HCV and production of infectious molecular clones of HCV so far has prevented a more detailed understanding of the sites of HCV replication in cells and the processes of virus assembly and release from the cell. Future research should reveal the nature of the interaction between the capsid protein and virus RNA and how this is packaged into the assembled provirion, the posttranslation modifications to the envelope proteins and where these occur in the cell, and the sites of budding of HCV through cellular membranes. To understand replication more fully, we must also identify the mechanism of priming of RNA synthesis from the ends of the genome, the nature of the primers, or whether circularization is necessary for transcription. Because a cell culture system to investigate differences in neutralization and cytopathic properties of HCV is not available, nucleotide sequence comparisons and typing assays developed from se-quence data have become the principal techniques for characterizing different variants of HCV. This type of analysis is fairly easy to perform, especially since virus sequences can be amplified by PCR directly from clinical specimens. In common with other RNA viruses, variants of HCV show considerable sequence variability, many differing considerably from the prototype HCV (HCV-PT). 3 Differences of up to 29% have been found between the complete genomic sequences of the most extremely divergent variants analyzed to date, 5° comparable to those observed between serotypes of other human positive-strand RNA viruses such as poliovirus, coxsackievirus, and coronaviruses. Sequence variability is evenly distributed throughout all virus genes (Table 111 )9 -57 apart from the highly conserved nucleotide (and amino acid) sequence of the core (nucleocapsid) protein and 5'UTR and the greater variability of the envelope gene (Table III) . Nucleotide sequence comparison of complete genomes or subgenomic fragments between variants has shown that variants of HCV obtained from Japan are substantially different from the HCV-PT variant obtained in the United States. 3 Comparison of the complete genome sequence of HCV-J 53 and HCV-BK 51 from Japan showed 92% sequence similarity to each other but only 79% with HCV-PT. At that time, the former variants were classified as the "Japanese" type (or type II), while those from the United States (HCV-PT and HCV-H) were classified as type I. Comparisons of subgenomic regions of HCV, such as El, 58 core, 59,60 and NS5, 61 Sources of sequences: la = HCV-H52; lb = HCV-j53; lc = HC-j954; 2a = HC-j655; 2b = HC-j85°; 3a = NZLI56; 3b = Tr. 57 In all comparisons, the 5'NCR is the most conserved subgenomic region (maximum 9% nucleotide sequence divergence), whereas highly variable regions are found in parts of the genome encoding El and NS2 (35% to 44% nucleotide sequence and 34% to 45% amino acid sequence differences). provide evidence of at least six major groupings of HCV sequences, each of which contains a series of more closely related clusters of sequences ( Figure 4 ). 62 The current widely used nomenclature for HCV variants reflects this hierarchy of sequence relationships between different isolates. Based on previous suggestions, 63,64 the major branches in the phylogenetic tree are referred to as "types," while "subtypes" correspond to the more closely related sequences within most of the major groups ( Figure 4) . Although NS5 sequences are analyzed in Figure 4 , equivalent sequence relationships exist in other parts of the genome. The types have been numbered 1 to 6 and the subtypes a, b, and c, in both cases in order of discovery. Therefore, the sequence cloned by Chiron 3 is assigned type la, HCV-J and HCV-BK are type lb, HC-J6 is type 2a, and HC-J8 is type 2b. This nomenclature closely follows the schemes originally described by Enomoto (type 3a) on the basis of phylogenetic analysis of sequences in the NS5, NS3, core, and 5'UTR noncoding regions. This approach avoids the inconsistencies of earlier systems and should be easier to extend when new genotypes are discovered. Some genotypes of HCV (types la, 2a, and 2b) show a broad worldwide distribution, whereas others, such as types 5a R SIMMONDS and 6a, are found only in specific geographic regions. Blood donors and patients with chronic hepatitis from countries in western Europe and the United States frequently are infected with genotypes la, lb, 2a, 2b, and 3a, although the relative frequencies of each may vary. 58,61,65-76 There is a trend for more frequent infection with type lb in southern and eastern Europe. In many European countries, genotype distributions vary with the age of the patients, reflecting rapid changes in genotype distribution with time within a single geographic area. A striking geographic change in genotype distribution is apparent between southeast Europe and Turkey (both mainly type lb) and several countries in the Middle East and parts of North and Central Africa where other genotypes predominate. For example, a high frequency of HCV infection is found in Egypt (20% to 30%), 77-79 of which almost all corresponds to type 4a. 8°,81 HCV type 4 also is the principal genotype in countries such as Yemen, Kuwait, Iraq, and Saudi Arabia in the Middle East 6° and in Zaire, Burundi, and Gabon in Central Africa. 58, 69, 71 HCV genotype 5a is frequently found among patients with non-A, non-B hepatitis and blood donors in South Africa 58,61,7°,82 but is found only rarely in Europe and elsewhere. 58, 72 In Japan, Taiwan, and some parts of China, genotypes lb, 2a, and 2b are the most frequently found. 63,83-90 Infection with type l a in Japan appears to be confined to patients with hemophilia who received commercial (US-produced) blood products, such as factor VIII and XI clotting concentrates. 63,91 The geographic distribution of type 3 varies; it is only rarely found in Japan 92 and is also infrequent in Taiwan, Hong Kong, and Macau. 93 How-ever, this genotype is found with increasing frequency in countries to the west, frequently occurring in Singapore and accounting for most hepatitis infections in Thailand. 93,94 In a small sample, it was the only genotype found in Bangladesh and eastern India. 6° As with type 4 in Africa, there is now evidence of considerable sequence diversity within the type 3 genotype, with at least 11 different subtypes of type 3 identified in Nepal, 95 India, and Bangladesh. 6° A genotype with a highly restricted geographic range is type 6a. This type was originally found in Hong Kong 69,8° and was shown to be a new major genotype by sequence comparisons in the NS5 and E1 regions. 58,61 Approximately one third of anti-HCV-positive blood donors in Hong Kong are infected with this genotype, as are an equivalent proportion in neighboring Macau sl and Vietnam. 96 A series of novel genotypes has been found in Vietnam 96 and Thailand6°; these genotypes are distinct from types 1 to 6 classified to date but are more closely related to type 6 than to other genotypes, 96 consistent with their overlapping geographic range with type 6 in Southeast Asia. Numerous investigations are being conducted into possible differences in the course of disease associated with different HCV genotypes, such as the rate of development of cirrhosis and hepatocellular carcinoma, and whether certain genotypes are more or less likely to respond to interferon treatment. A large number of clinical investigations have documented severe and progressive liver disease in patients infected with each of the well-char-acterized genotypes (types la, lb, 2a, 2b, 3a, and 4a), so there is little evidence thus far of variants of HCV that are completely nonpathogenic. However, possible variation in the rate of disease progression, differences between genotypes in routes and frequency of person-to-person transmission, or differences in the probability of achieving a sustained response to antiviral treatment would indicate the potential usefulness of identifying the infecting genotype in certain clinical situations. Several clinical studies have catalogued a variety of factors (including genotype) that correlate with the severity of liver disease and show predictive value for response to antiviral treatment. Factors that frequently have been shown to influence response to interferon treatment include age and duration of infection, presence of cirrhosis before treatment, genotype, and pretreatment level of circulating viral RNA in plasma. 97 A consistent finding reported by several different groups that used a variety of typing assays has been the greatly increased rate of long-term response found when treating patients infected with genotypes 2a, 2b, and 3a compared with type lb. 74 '85'98-106 For example, Chemello et al m2 found that long-term (>12 months) normalization of alanine aminotransferase levels was achieved in only 29% of patients infected with type 1 variants, compared with 52% of those infected with type 2 and 74% of those infected with type 3. In a study by Tsubota et al, m6 infection with type lb, the presence of cirrhosis, and a high pretreatment virus load were each independently associated with a reduced chance of response (relative risks of 16, 5, and 4, respectively) . The mechanism by which different genotypes differ in response to treatment remains obscure. For treatments such as in-terferon, we do not know whether the effect of the drug is directly antiviral or whether the inhibition of virus replication is secondary to increased expression of major histocompatibility complex class I antigens on the surface of hepatocytes and greater cytotoxic T-cell activity against virus-infected cells. Elucidating the mechanism of action of interferon and whether there are virologic differences between genotypes in sensitivity to antiviral agents awaits a cellculture model for HCV infection. Although determination of the nucleotide sequences is the most reliable method of identifying different genotypes of HCV, this method is not practical for large clinical studies. Many of the published methods for "genotyping" are based on amplification of viral sequences in clinical specimens, either by using type-specific primers that selectively amplify different genotypes, by analyzing the PCR product by hybridization with genotype-specific probes, or by using restriction fragment length polymorphisms (RFLP). The assays have different strengths and weaknesses. For example, methods based on amplification and analysis of 5'NCR sequences have advantages of sensitivity, because this region is highly conserved and can be more frequently amplified from HCV-infected patients than other parts of the genome. However, few nucleotide differences are found between different genotypes. Although reliably differentiating six major genotypes by using RFLP or by type-specific probes is possible, it is not always possible to reliably identify virus subtypes. Types 2a and 2b consistently differ at position -124, allowing them to be differentiated by the restriction enzyme ScrF167 orby probes 10 to 13 in the INNO-LIPA (Innogenetics, Zwijnaarde, Belgium). 71 However, sequences of type 2c are indistinguishable from some of those of type 2a. Similarly, some of the novel subtypes of type 1 often show sequences identical to those of type la or lb, 54,60 and a small proportion of type la variants are identical to type lb and vice versa. 22 Typing methods based on coding regions, such as core and NS5, can reliably identify subtypes as well as major genotypes because the degree of sequence divergence is much greater (Table III) . However, amplifying sequences in coding regions of the genome generally is difficult because sequence variability in the primer-binding sites may reduce the effectiveness of sequence amplification by PCR. Nevertheless, the variation is exploited in a genotyping assay that uses type-specific primers complimentary to variable regions in the core gene. Currently, this assay can identify and differentiate types la, lb, 2a, 2b, and 3a, 1°7A°8 although the method is technically complicated to perform reliably 1°9 and may be difficult to extend to the great range of HCV genotypes now described. Serologic typing methods have advantages over PCR-based methods in terms of the speed and simplicity of sample preparation and the use of simple equipment found in any diagnostic virology laboratory. By careful optimization of reagents, such assays may show high sensitivity and reproducibility. For example, type-specific antibody to NS4 peptides can be detected in approximately 95% of patients with non-A, non-B hepatitis. H° Furthermore, the assays can be readily extended to detect new genotypes. One NS4based assay can reliably identify type-spe-cific antibody to six major genotypes, n° although the antigenic similarity between subtypes currently precludes the separate identification of types la and lb and 2a and 2b using the NS4 peptides alone. In contrast to the highly restricted sequence diversity of the 5'NCR and adjacent core region, the two putative envelope genes are highly divergent between different variants of HCV (Table III) 111-114 and show a three-to-four-times higher rate of sequence change with time in persistently infected patients, ll5 Because these proteins are likely to lie on the outside of the virus, they would be the principal targets of the humoral immune response to HCV elicited on infection. Changes in the E1 and E2 genes may alter the antigenicity of the virus to allow "immune escape" from neutralizing antibodies, 112 therefore accounting for both the high degree of envelope sequence variability and the observed persistent nature of HCV infection. Supporting this model is the observation that much of the variability in the E1 and E2 genes is concentrated in discrete "hypervariable" regions, 112-114 possibly reflecting the pressures on the virus to evade immune recognition at specific sites where HCV may be neutralized. Experimental evidence supporting this theory includes the observations that variants of HCV with changes in the E1 and E2 genes are antigenically distinct, and, in many cases, the in vivo appearance of variants with different sequences in the hypervariable region is followed by development of antibodies that specifically recognize the new variants. 1~2,116-H8 In one report, 119 persistent HCV infection developed in a patient with deficient anti-body responses (agammaglobulinemia), but without the development of sequence variability in E2 consistent with the role of antibody-driving variation in immunocompetent persons. On the other hand, envelope sequences obtained sequentially from persistently infected patients sometimes show no significant change, 12°-122 whereas in others, variants coexist with antibodies that recognize the corresponding hypervariable region peptides, u6 Cytotoxic T-cell responses also may play a protective role in HCV infection, 123,124 as they do in other virus infections for which they are more important in virus clearance than antibody response to infection. Although circumstantial evidence supports the theory of immune escape, additional studies are needed to confinn this as a plausible model of virus persistence. Many of the current uncertainties may be resolved when a satisfactory in vitro neutralization assay is developed for HCV that enables the effect of amino acid changes in the envelope gene to be directly investigated. Additional information also is needed on the relative importance of humoral and cell-mediated immunity to HCV and to determine which is more important in virus clearance and protection from reinfection. Persistent infection with HCV entails continuous replication of HCV over years or decades of infection in HCV carders. The large number of replication cycles, combined with the relatively error-prone RNA-dependent RNA polymerase, leads to measurable sequence drift of HCV over time. For example, over an 8-year interval of persistent infection in a chim-panzee, the rate of sequence change for the genome as a whole was 0.144% per site per year, 115 similar to the rate calculated for sequence change in the 5' half of the genome over 3 years observed in a human carrier (0.192%) 125 and in a crosssectional study. 126 Using this "molecular clock," it is possible in principle to calculate times of divergence between HCV variants and therefore to establish their degree of epidemiologic relatedness. For example, the finding of relatively few sequence differences between variants infecting two individuals would provide evidence of recent HCV transmission between them. Sequence comparisons in variable regions, such as E2 and NS5, of the HCV genome have been used to document transmission between persons, either from mother to child, 127 within families, 128 by iatrogenic routes, 129-132 or by sexual contact. [133] [134] [135] In these studies, the possibility of transmission by different risk behaviors was assessed by measuring the degree of relatedness of HCV recovered from implicated persons. Phylogenetic analysis of nucleotide sequences provides a more formal method of investigating relationships between sequences. Phylogenetic trees produced by such methods indicate the degree of relatedness between sequences, while the branching order of the different lineages shows the most likely evolutionary history of the sampled population. For example, clustering of HCV sequences into a single phylogenetic group among recipients of an HCV-contaminated blood product (anti-D immunoglobulin) was still apparent 17 years after infection ( Figure 5 ). These approaches to HCV epidemiology will prove valuable in documenting the spread of HCV in different risk groups, evaluating alternative (nonparenteral) routes of trans- Anti-D Ig recipients PT Figure 5 . Phylogenetic relationships between sequences from the NS5 region of patients exposed to an implicated batch of anti-D immunoglobulin (Ig) in 1977 (o) and those of epidemiologically unrelated type lb variants from Japan (J), the United States (U), and Europe (E). B250 = NS5 sequence of hepatitis C virus recovered from batch B250 of anti-D Ig; donor ---sequence of variant infecting suspected donor to plasma pool used to manufacture batch B250. Phylogenetic analysis was done on a segment (222 base pairs; positions 7975 to 8196) of the NS5 gene that was amplified, sequenced, and analyzed as previously described. 61 Sequence distances were calculated using the program DNAML in a data set containing the prototype hepatitis C virus (type la) as an outgroup. Sequences were obtained from published sources. 61'76 mission, and understanding more about the origins and evolution of HCV. This paper attempts to review a rapidly expanding area of research. It is hoped that a combination of basic science and clinical studies may eventually lead to a greater understanding of the ways in which HCV infection may be prevented or cured by the use of antiviral vaccines. The information provided here will clearly form the basis of many of these developments. of the hepatitis C virus core protein. J Virol. 1994; 68:3631-3641. 32 Isolation of a cDNA derived from a bloodborne non-A, non-B hepatitis genome. Science An assay for circulating antibodies to a major etiologic virus of human non-A, non-B hepatitis Genetic organization and diversity of the hepatitis C virus Nucleotide sequence of yellow fever virus: Implications for flavivirus gene expression and evolution Hepatitis C virus shares amino acid sequence similarity with pestiviruses and flaviviruses as well as members of two plant virus supergroups The phylogeny of RNA-dependent RNA polymerases of positivestrand RNA viruses Internal ribosome entry site within hepatitis C virus RNA Secondary structure of the 5' nontranslated region of hepatitis C virus and pestivirus genomic RNAs A conserved helical element is essential for internal initiation of translation of hepatitis C virus RNA Pestivirus translation initiation occurs by internal ribosome entry Extraordinarily low density of hepatitis C virus estimated by sucrose density gradient centrifugation and the polymerase chain reaction Identification of two flavivirus-like genomes in the GB hepatitis agent PHYLIP Inference Package Version 3.5. Seattle, Wash: Department of Genetics Evidence for in vitro replication of hepatitis C virus genome in a human T-cell line Correlation between the infectivity of hepatitis C virus in vivo and its infectivity in vitro Susceptibility of human liver cell cultures to hepatitis C virus infection Multicycle infection of hepatitis C virus in cell culture and inhibition by alpha and beta interferons Susceptibility of human t-lymphotropic virus type i infected cell line MT-2 to hepatitis C virus infection Transfection of a differentiated human hepatoma cell line (huh7) with in vitro-transcribed hepatitis C virus (HCV) RNA and establishment of a long-term culture persistently infected with HCV Demonstration of in vitro infection of chimpanzee hepatocytes with hepatitis C virus using strand-specific RT/PCR. Virology In situ detection of hepatitis C virus: A critical appraisal Variation of the hepatitis C virus 5'-non coding region: Implications for secondary structure, virus detection and typing Translation of human hepatitis C virus RNA in cultured cells is mediated by an internal ribosome-binding mechanism Complete 5' noncoding region is necessary for the efficient internal initiation of hepatitis C virus RNA Unusual folding regions and ribosome landing pad within hepatitis C virus and pestivirus RNAs End-dependent translation initiation of hepatitis C viral RNA and the presence of putative positive and negative translational control elements within the 5' untranslated region Expression, identification and subcellular localization of the proteins encoded by the hepatitis C viral genome Characterization of hepatitis C virus envelope glycoprotein complexes expressed by recombinant vaccinia viruses Expression and identification of hepatitis C virus polyprotein cleavage products Gene mapping of the putative structural region of the hepatitis C virus genome by in vitro processing analysis A second hepatitis C virus-encoded proteinase Hepatitis C virus NS3 serine proteinase: Transcleavage requirements and processing kinetics Identification of the domain required for trans-cleavage activity of hepatitis c viral serine proteinase Substrate requirements of hepatitis C virus serine proteinase for intermolecular polypeptide cleavage in Escherichia coli Specificity of the hepatitis C virus NS3 serine protease: Effects of substitutions at the 3/4a, 4a/4b, 4b/5a, and 5a/5b cleavage sites on polyprotein processing Substrate determinants for cleavage in cis and in trans by the hepatitis C virus NS3 proteinase Nucleotide sequence of hepatitis C virus (type 3b) isolated from a Japanese patient with chronic hepatitis C At least 12 genotypes of hepatitis C virus predicted by sequence analysis of the putative E1 gene of isolates collected worldwide Sequence analysis of the core gene of 14 hepatitis c virus genotypes Investigation of the pattern of hepatitis C virus sequence diversity in different geographical regions: Implications for virus classification Classification of hepatitis C virus into six major genotypes and a series of subtypes by phylogenetic analysis of the NS-5 region A proposed system for the nomenclature of hepatitis C viral genotypes There are two major types of hepatitis C virus in Japan Analysis of a new hepatitis C virus type and its phylogenetic relationship to existing variants Serological responses to infection with three different types of hepatitis C virus Two French genotypes of hepatitis C virus: Homology of the predominant genotype with the prototype American strain Detection of three types of hepatitis C virus in blood donors: Investigation of type-specific differences in serological reactivity and rate of alanine aminotransferase abnormalities Identification of hepatitis C viruses with a nonconserved sequence of the 5' untranslated region Sequence analysis of the 5' noncoding region of hepatitis C virus At least five related, but distinct, hepatitis C viral genotypes exist Typing of hepatitis C virus isolates and new subtypes using a line probe assay Sequence analysis of the 5' untranslated region in isolates of at least four genotypes of hepatitis C virus in the Netherlands Use of the 5' non-coding region for genotyping hepatitis C virus Genotypes of hepatitis C virus in Italian patients with chronic hepatitis C Heterogeneity of hepatitis C virus genotypes in France Genotypic analysis of hepatitis C virus in American patients Hepatitis C virus infection in Egyptian volunteer blood donors in Riyadh Risk factors associated with a high seroprevalence of hepatitis C virus infection in Egyptian blood donors High HCV prevalence in Egyptian blood donors Sequence variability in the 5' non coding region of hepatitis C virus: Identification of a new virus type and restrictions on sequence diversity Geographical distribution of hepatitis C virus genotypes in blood donors: An international collaborative survey New genotype of hepatitis C virus in South-Africa Typing of hepatitis C virus (HCV) genomes by restriction fragment length polymorphisms Distribution of plural HCV types in Japan Clinical backgrounds of the patients having different types of hepatitis C virus genomes Genomic typing of hepatitis C viruses present in China HCV genotypes in China HCV genotypes in different countries Differences in the hepatitis C virus genotypes in different countries Prevalence, genotypes, and an isolate (HC-C2) of hepatitis C virus in Chinese patients with liver disease Imported hepatitis C virus genotypes in Japanese hemophiliacs Genotypic subtyping of hepatitis C virus Survey of major genotypes and subtypes of hepatitis C virus using restriction fragment length polymorphism of sequences amplified from the 5' non-coding region A new type of hepatitis C virus in patients in Thailand Hepatitis C virus variants from Nepal with novel genotypes and their classification into the third major group Hepatitis C virus variants from Vietnam are classifiable into the seventh, eighth, and ninth major genetic groups Prediction of response to interferon treatment of chronic hepatitis C HCV genotypes in chronic hepatitis C and response to interferon Detection of hepatitis C virus by polymerase chain reaction and response to interferon-alpha therapy: Relationship to genotypes of hepatitis C virus Factors useful in predicting the response to interferon therapy in chronic hepatitis C Hepatitis C virus genotypes--an investigation of type-specific differences in geographic origin and disease Simmonds P. Hepatitis C serotype and response to interferon therapy Prediction of interferon effect in chronic hepatitis C by both quantification and genotyping of HCV-RNA Genotypes and titers of hepatitis C virus for predicting response to interferon in patients with chronic hepatitis C Antiviral effect of lymphoblastoid interferon-alpha on hepatitis C virus in patients with chronic hepatitis type C Factors predictive of response to interferon-alpha therapy in hepatitis C virus infection Typing hepatitis C virus by polymerase chain reaction with type-specific primers: Application to clinical surveys and tracing infectious sources Characterization of the genomic sequence of type V (or 3a) hepatitis C virus isolates and PCR primers for specific detection Application of six hepatitis C virus genotyping systems to sera from chronic hepatitis C patients in the United States Use of NS-4 peptides to identify typespecific antibody to hepatitis C virus genotypes 1, 2, 3, 4, 5 and 6 Characterization of hypervariable regions in the putative envelope protein of hepatitis C virus Evidence for immune selection of hepatitis C virus (HCV) putative envelope glycoprotein variants: Potential role in chronic HCV infections Marked sequence diversity in the putative envelope proteins of hepatitis C viruses Hypervariable regions in the putative glycoprotein of hepatitis C virus Genetic drift of hepatitis-c virus during an 8.2-year infection in a chimpanzee--variability and stability Humoral immune response to hypervariable region-1 of the putative envelope glycoprotein (gp70) of hepatitis C virus Hypervariable 5'-terminus of hepatitis C virus E2/NS1 encodes antigenically distinct variants A structurally flexible and antigenically variable n-terminal domain of the hepatitis C virus e2/NS1 protein--implication for an escape from antibody Hypervariable region of hepatitis C virus envelope glycoprotein (e2 NS1) in an agammaglobulinemic patient The degree of variability in the amino terminal region of the E2/NS 1 protein of hepatitis C virus correlates with responsiveness to interferon therapy in viraemic patients Sequence variation in the large envelope glycoprotein (E2/NS 1) of hepatitis C virus during chronic infection Dynamics of genome change in the E2/NS1 region of hepatitis C virus in vivo Intrahepatic cytotoxic T lymphocytes specific for hepatitis-c virus in persons with chronic hepatitis Hepatitis C virus (HCV)-specific cytotoxic T lymphocytes recognize epitopes in the core and envelope proteins of HCV Nucleotide sequence and mutation rate of the H strain of hepatitis C virus Analysis of genomic variability of hepatitis C virus A unique, predominant hepatitis C virus variant found in an infant born to a mother with multiple variants Risk of hepatitis C virus infections through household contact with chronic carriers--analysis of nucleotide sequences Comparison of hepatitis C virus strains obtained from hemodialysis patients Hepatitis C viral markers in patients who received blood that was positive for hepatitis C virus core antibody, with genetic evidence of hepatitis C virus transmission Hepatitis C transmission in a hemodialysis unit: Molecular evidence for spread of virus among patients not sharing equipment ConfL,~mation of hepatitis C virus transmission through needlestick accidents by molecular evolutionary analysis Heterosexual transmission of hepatitis C virus Analysis of nucleotide sequences of hepatitis C virus isolates from husband-wife pairs Acute hepatitis C infection after sexual exposure