key: cord-0005286-7hyr8x0l authors: Morales Coll, Julio title: Heptad-repeat sequences in the glycoprotein of rhabdoviruses date: 1995 journal: Virus Genes DOI: 10.1007/bf01702591 sha: 0089cde75e66707ff27c3d4716b2f17a9f2be02a doc_id: 5286 cord_uid: 7hyr8x0l Two or three regions containing three or more successive newly defined heptads of a–d hydrophobic amino acid repeats have been located in the cDNA-derived amino acid sequences of glycoprotein G of all rhabdoviruses examined (rabies, vesicular stomatitis, fish, and plant rhabdoviruses) by computer search. These new heptad-repeats differ from those previously reported in other viruses because of the presence of all the hydrophobic amino acids in positions a or d, and because they are not predicted to form coiled coils by current methods and thus they have not been detected previously in any rhabdoviruses. The two or three heptad-repeat regions were the only parts of the glycoprotein with at least three successive heptad-repeats in all the rhabdoviral sequences studied and had low sequence variability among the members of each of the rhabdoviral genus but show no sequence similarity among the different genus. All these newly detected heptad repeats were in the vicinity of some of the higher hydrophobic regions in each of the rhabdovirus genera studied and were found mostly, but not always, outside the extra amino acid sequences that occur in the longer insect or plant rhabdovirus glycoprotein G. The correspondence of position and structure of these heptad-repeats among all the rhabdoviruses suggests its participation in common function(s), most probably related to viral fusion with cellular membranes. Heptad-repeats with a high propensity to form coiled coils were defined as sequences of seven amino acids (aa)--a b c d e f g--in which the aa in each position were specified by a probability matrix, with the aa usually found in positions a-d, mostly F, Y, I, L, V, N, and A (1). The presence of these repeats has been described in sequences of glycoproteins from many enveloped viruses, such as paramyxovirus, retrovirus, coronavirus, and influenza (2) , and is related to regions of high hydrophobicity or to viral fusion (3, 4) . Even though heptad-repeats with a propensity to form coiled coils had not been found previously in any rhabdoviruses (5), we searched for possible heptad-repeats in glycopro-tein G sequences of rhabdoviruses after finding that a phospholipid binding region of the viral hemorrhagic septicemia (VHSV) rhabdovirus, a rhabdovirus affecting salmonid fish, had a heptad-repeat structure that included more hydrophobic aa than those defined earlier by Lupas [6] . Rhabdoviruses possess a glycoprotein (G), which serves as a homotrimeric membrane protein that forms spikes protruding 83 A from the viral membrane (7, 8) . Glycoprotein G initiates virus attachment to cellular receptors (9) , reacts with neutralizing antibodies (10) , and has fusion properties of the rhabdoviruses that are detectable only at low pH (7, 11) . contains fish rhabdoviruses, including the VHSV or the infectious hematopoietic necrosis virus (IHNV) and plant rhabdoviruses, an interesting area of study for comparative virology. rhabdoviral glycoprotein G and, second, by requiring a low enough number to make the study feasible. The glycoprotein G sequences actually available from rhabdoviruses were divided according to their host as those infecting fish (3 sequences), arthropods (2 sequences), plants (1 sequence), lyssavirus (7 sequences), and vesiculoviruses (3 sequences selected from 34 sequences of VSV-NJ and 3 from 26 VSV-Ind). Selection of the sequences for this study was made to obtain representative sequences from as many members as possible of the Rhabdoviridae family. It was limited, first, by the actual number of published cDNA-derived aa sequences of The hydrophilicity profile, signal peptide, and predicted transmembrane regions of the cDNAderived aa glycoprotein G sequence of VHSV (Table 1) and of other rhabdoviruses were obtained by using the SOAP, ANTIGEN, and the PSIGNAL programs from the PCGene package (InteUigenetics, Geneve, Switzerland) using an average group length of 9 aa. The newly defined hydrophobic heptad-repeat sequences were searched with the program PSEARCH. Amino acids with -AG values >-0.4 kcal/mol for transfer of the aa side chain from water to ethanol were used as hydrophobic aa (single-letter code aa, W, F, Y, I, L, L, V, M, A, H, T) (14) . The subsequence used to search was (hydrophobic aa) XX (hydrophobic aa) XXX, repeated two times (with X indicating any aa). Then the positions at which more than two heptad-repeats were found successively arranged in the sequence of the glycoprotein G were selected (Table 2). Heptad-repeats with a high propensity to form coiled-coil regions were searched for with the program COIL (I) from the same package. The total number of aa in glycoprotein G varied from 507 to 526 aa for most of the rhabdoviruses, except for BEFV and SYNV, which had 623 and 628 aa, respectively (Table 1) . At least two regions in which hydrophobic aa were found in the a-d positions of heptads successively arranged (repeated) three or more times were located in all the rhabdoviral glycoprotein G sequences selected by computer search (Fig. 1) . No attention was paid to the aa compositions of the rest of the positions (b, c, e, f, or g) within the heptads, contrary to the prediction method by using the program COIL (1), which associates a probability with each aa in every heptad position (being the aa in a-d, for example, mainly = F, Y, I, L, V, M, and A, because those aa were the ones with the highest probability of being found in known coiled coils in protein structures). Therefore, the occurrence of hydrophobic a-d heptad-repeats of the kind defined in this work (aa in the a-d positions of the heptad-repeats = F, Y, I, L, V, M, A, + W, H, T, aa in the single-letter code) is not by itself an indicator of a coiled-coil structure. We thought to include aa W, H, and T as new candidates for the a or d positions because of the appearance of these aa in the a or d positions of the heptads in a phospholipid binding domain in VHSV (6) . Prior reported analysis of the VSV glycoprotein G sequence using the method of Lupas, while looking for the presence of at least four successive hep-Heptad Repeats in Rhabdoviruses 109 tad repeats (with the shortest peptides still exhibiting a stable coiled-coil structure in solution) failed to identify any predicted coiled-coil domains (1,2,5). Even on shortening the length of the aa window to three heptad-repeats in the program COIL, there were no coiled-coil regions predicted with a high enough probability in any of the rhabdoviral sequences shown in Table l . The two regions in at least three successive heptads found in most of the rhabdoviral sequences studied were situated around aa 100-150 (amino terminal) and around aa 350 (carboxy terminal). In contrast, the location of the heptadrepeats in the proteins of 20 envelope viruses reported previously was found in only one region situated either around aa 150 (near the aminoterminal aa) or around aa 400-500 (near the carboxy-terminal aa), but not in both positions simultaneously for each individual viral protein (I). VHSV and Sigma rhabdoviruses showed still another carboxy-terminal heptad-repeat region situated between aa 377-400 and 416-443, respectively (Fig. 1) . The carboxy-terminal heptadrepeats of rabies and vesiculoviruses, and the second carboxy-terminal heptad-repeat of VHSV and Sigma rhabdoviruses, contained one of the putative glycosylation positions. All of the heptad-repeats lay outside and around an internal core of glycoprotein G where seven cysteines (at around aa 170-300) were highly conserved (maximal alignment with a minority of gaps) in all the rhabdoviruses (Fig. I) . The heptad-repeat sequences show no aa sequence homologies among different rhabdovirus genera but were highly conserved in both aa sequence and relative position in glycoprotein G among members of the same genus, including VSV-NJ and VSV-Ind (Table 2 ). For simplicity, the heptad-repeat sequences that are displayed in Table 2 extend only from the first hydrophobic aa in positions a or d to the first nonhydrophobic aa in position a or d. The aa variability of the heptad-repeats was further studied in cases in which a large number of published aa sequences were available, rabies (7 sequences), VSV-NJ (34 sequences), and VSV-Ind (26 sequences). The maximum variation in the number of different aa that appeared at any given position in VSV-NJ (13), VSV-Ind (15) , and rabies (16) were obtained at the car- TDIQ MRGATDD FSYLNHL ITNMAQR TECLDAH 319 VHSDK 288 ADVQ MRGATDD FSYLNHL ITNMAQR TECLDAH 319 IHNGP 327 TPYL LSKFRSP HPGINDV YAMIIKGS IYH 354 SIGMA 316 ISKM ¥SGLPTS VFDLSYL IQV 336 BEFV 385 IGSYKRA WCEYRPF VDK 401 SYNVG 320 IEGVNRA FEDLELT YCSATCD LFA 343 RABMOK 345 TNVYYKR VDKWADI LPS 361 RABPV 330 FGKAYTI FNKTLME ADAHYKS VRTWNEI IPS 360 RHRBGD 330 FGKAYTI FNKTLME ADAItYKS VRTWNEI IPS 360 RABSAD 330 FGKAYTI FNKTLME ADAItYKS VRTWNEI IPS 360 RABHEP 330 FGKAYTI FNKTLME ADAHYKS VQTWNEI IPS 360 RABLEP 330 FGKAYTI FNKTLME ADAHYKS VRTWNEI IPS 360 RABCVS 330 FGKAYTI FNKTLME ADAHYKS VRTWNEI IPS 360 VSVGPN08 332 VGPV FTIINGS LHYFTSK YLRVELE 356 VSVGPNJA 332 VGPV FTIINGS LHYFTSK YLRVELE 356 VSVGPN29 332 VGPV FTIINGS LHYFTSK YLRVELE 356 RHGPORS 328 TGPV FTIINGT LKYFETR YIRVDIA APILSRM VGMISGT TTE 369 RHGM 328 TGPA FTIINGT LKYFETR YIRVDIA APILSRM VGMISGT TTE 369 RHVSVGR 328 TGPA FTIINGT LKYFETR YIRVDIA APILSRM VGMISGT TTE 369 Hydrophobic . . (14) . The subsequence used for the search was (hydrophobic aa) XX (hydrophobic aa) XXX, repeated two times (X for any aa). Then the positions where more than 2 heptad-repeats were successively found in the glycoprotein G sequence were selected, The highly hydrophobic and continuous hydrophobic aa sequences of the predicted transmembrane and signal peptide regions were not considered. Symbols indicate cysteine (0), putative carbohydrate (~), and predicted transmembrane and signal peptide ( I ) relative positions in rhabdoviral glycoprotein G. The names of the rhabdoviruses are explained in Table 1. boxy-terminal positions (from aa 400 to 500), including the regions close to the predicted transmembrane hydrophobic stretch, the cytoplasmic tail, and the amino-terminal signal peptide. Thus, there were no variations in aa in the a-d positions in the aa 68-102 heptad-repeat and only one (aa 288, T or A) in aa 288-319 heptad-repeat of the two available VHSV sequences. Similarly, no variations in aa in the a-d positions in the aa 140-164 or in the aa 330-360 heptad-repeats, except those for RABMOK (aa 158, 161, 345, 348, and 359) were found among the rabies strains and all the aa that changed were also hydrophobic (Table 2) . Few aa variations (aa 134, T or S; aa 141, H or Q; and aa 332, V or A) were found in the aa 134-161 or the aa 332-356 heptad-repeats of the 34 VSV-NJ isolates studied (13) . Finally, only one aa variation (aa 141, V or A; or aa 335, I or V) was found in each of the aa 134-161 or the aa 328-369 heptad-repeats of the 26 isolates of VSV-Ind studied (15) . To investigate the possible relationships between the heptad-repeats and the positions of the extra numbers of aa in BEFV and SYNV, a multiple alignment of sequences was performed with RABMOK, RABPV, VHSV-07.7I, and VSVGPN08 by using the CLUSTAL program (PCGene package, Intelligenetics). As Figure 2 shows, the relative excess of aa BEFV and SYNV was distributed in short stretches from aa 10 to 20 placed around positions belonging to the carboxy-terminal portion of the molecule (around aa 400) and to the signal peptide, but generally did not coincide with the positions of the heptad-repeats (Fig. 1) , except for the BEFV carboxy-terminal heptad-repeat. VHSV07.71 and VSVGPN08 were included as controls and did not show any large insertions of aa (~ 10 aa) at positions around aa 400. In most cases the rhabdovirus heptad-repeats in the glycoprotein G were either followed or preceded by short regions (10-15 aa) of high hydrophobicity (Fig. 3) , as has been reported in other envelope viruses (2) . The function(s) of the newly defined heptadrepeats in glycoprotein G of rhabdovirus are not known at present. The search procedure allows any of the I0 hydrophobic aa (14) in each specified position a or d, and might detect chance runs of aa unrelated to coiled-coil structures, particularly if these include a genuine alpha helix. The presence of numerous helix-breaking proline residues scattered through the newly found heptadrepeats (Table 2 ) is a major factor arguing against the existence of these structures as alpha helixes. Several lines of indirect evidence suggest that the new heptad-repeats situated in the aminoterminal region of glycoprotein G or in the regions situated around them could somehow be related to rhabdoviral membrane fusion with host membranes, neutralization, and/or phospholipid binding. For instance, the presence of a fusion-defective mutant of VSV in aa 117 first indicated that the adjacent region (aa 118-136) could be involved in the membrane fusion activity of VSV glycoprotein G. Site-directed mutagenesis finally identified the sequence of aa 123-137 of VSV glycoprotein G as a putative fusogenic peptide involved in low-pH-induced membrane fusion (5, 17) . These VSV fusiondefective mutants map immediately before the amino-terminal new heptad-repeat (aa 134-161 for most VSV studied) of VSV (Table 2) Table 1 were obtained with the program SOAP from PCGene (Intelligenetics). The profiles from the different isolates or serotypes from VHSV, rabies, VSV-NJ, and VSV-Indiana were superimposed and the figures obtained are shown. The vertical arrows indicate the position of the highest hydrophobic peaks, whereas horizontal bars show the position of the heptad-repeats. The y axis shows the hydrophobic scales, and the X axis shows the amino-terminaL position of the amino acids of rhabdovirus glycoprotein G. and 409-419, only the positions 300-360 also maps nearby another of the new heptad-repeatrich region aa 332-356 in VSV-NJ or aa 328-369 in VSV-Ind. Analogous to influenza hemagglutinin, in which pH-dependent coiled coiling of heptad-repeats exposes the upstream fusion peptide (4) and because fusion of rhabdoviruses also has a low-pH dependence, causing exposure of unidentified hydrophobic aa region(s) (7, 8, 12) , the new rhabdovirus heptad-repeats identified in this work could also be related to rhabdoviral fusion. On the other hand, some rabies monoclonal antibody-resistant (MAR) mutants have been mapped in aa 330-338 and aa 342-343, both inside the aa 330-360 new rabies heptad-repeats (10, (18) (19) (20) . Finally, a phospholipid binding domain of glycoprotein G of VHSV has been recently identified by using pepscan, synthetic peptides, and purified or recombinant glycoprotein G (21) and purified VHSV, solid-phase phospholipidbinding assays (6) . That work extended the wellknown observation of phospholipid interactions of mammalian rhabdoviruses (VSV and rabies) to fish rhabdoviruses. Study of the 15-mer phospholipid-binding peptide sequence (p106, aa 99-113) identified by pepscan showed that p106 was at the carboxy-terminal part of an a-d hydrophobic new heptad repeat with a predicted cx-helix structure (1,2,4) and led to the design of a synthetic peptide (p2) that, by containing P106 and the heptad repeat, increased the specific activity of phospholipids binding about 10-fold (6) . The sequence of p2 in VHSV 07.71 (22) was totally conserved in the sequence of glycoprotein G of the other VHSV sequence reported to date (23) ( Table 2 ). P2 (aa 82-109) was inside a region of 5 a-d hydrophobic new heptad repeats (aa 68-102). Whether or not this highly conserved domain and/or its nearby regions may be related to membrane fusion (2) induced by glycoprotein G of VHSV is not known at present. However, there is a similar pH dependence of VHSV in phospholipid (6) binding and membrane fusion (1i). Principles of Protein Structure