key: cord-011182-nbtfb39r authors: Dehghani, Behzad; Hashempour, Tayebeh; Hasanshahi, Zahra; Moayedi, Javad title: Bioinformatics Analysis of Domain 1 of HCV-Core Protein: Iran date: 2019-04-20 journal: Int J Pept Res Ther DOI: 10.1007/s10989-019-09838-y sha: doc_id: 11182 cord_uid: nbtfb39r Hepatitis C virus (HCV) infection is a serious global health problem and a cause of chronic hepatitis, liver cirrhosis, and hepatocellular carcinoma (HCC). Bioinformatics software has been an effective tool to study the HCV genome as well as core domains. Our research was based on employing several bioinformatics software applications to find important mutations in domain 1 of core protein in Iranian HCV infected samples from 2006 to 2017, and an investigation of general properties, B-cell and T-cell epitopes, modification sites, and structure of domain 1. Domain 1 sequences of 188 HCV samples isolated from 2006 to 2017, Iran, were retrieved from NCBI gene bank. Using several tools, all sequences were analyzed for determination of mutations, physicochemical analysis, B-cell epitopes prediction, T-cell and CTL epitopes prediction, post modification, secondary and tertiary structure prediction. Our analysis determined several mutations in some special positions (70, 90, 91, and 110) that are associated with HCC and hepatocarcinogenesis, efficacy of triple therapy and sustained virological response, and interaction between core and CCR6. Several B-cell, T-cell, and CTL epitopes were recognized. Secondary and tertiary structures were mapped fordomain1 and core proteins. Our study, as a first report, offered inclusive data about frequent mutation in HCV-core gene domain 1 in Iranian sequences that can provide helpful analysis on structure and function of domain 1 of the core gene. HCV infection is a serious global health problem and causes chronic hepatitis, liver cirrhosis, and HCC (Akuta et al. 2007; Ajorloo et al. 2015; Alborzi et al. 2017 Alborzi et al. , 2015 Moayedi et al. 2018; Hashempoor et al. 2018) . It is estimated that 160 million people are infected worldwide (Lavanchy 2011) . The prevalence rate of HCV infection is from 0.2 up to 40% in different countries, and this prevalence in Iran is 0.16% (Sefidi et al. 2013) . Lack of an effective vaccine and therapeutic choices has leaded to the rapid growth of HCV infection (Lauer and Walker 2001) . HCV has six large different genotypes (1-6) and 70 distinct subtypes (a, b, c, etc.) globally (Martro et al. 2011) . Genotyping analysis showed 30-33% difference in each genotype, and in subtypes around 20-25% (Sefidi et al. 2013) . According to the current investigations in Iran, the predominant HCV subtype is 1a, followed by 3a and 1b (Sefidi et al. 2013) . HCV is a positive-strand RNA virus encoding three structural components, the core protein, and two E1 and E2 envelope glycoproteins (Ajorloo et al. 2015) . The core protein has many confirmed roles: core binds RNA and DNA and has an important function in RNA packing. It has been determined that HCV-core is a nucleic acid chaperone similar to retroviral nucleocapsid (NC) proteins in act acting to rearrange HCV 3′UTR, resulting in RNA dimerization in vitro (Caval et al. 2011; Cristofari Gl; Ivanyi-Nagy et al. 2004; Steinmann et al. 2008) . HCV-core is a highly basic protein that forms the viral NC and has interactions with cellular proteins and signal transduction pathways. As a result of HCV-core and host cell interactions, core may have a function in persistent infection and the pathogenesis of HCV liver disease (Polyak et al. 2006; Hashempour et al. 2015) . Core protein consisted of three predicted domains: aa1-aa117 domain 1 (domain D1), aa117-aa177 domain 2 (domain D2), and 177-191 domain 3 (domain D3) (Strosberg et al. 2010) . Domain 1 contains frequent positively charged amino acids, and is involved in RNA binding, promotes dimerization of the viral RNA, and has a significant role in NC formation and core envelopment by endosomal membranes (Ivanyi-Nagy et al. 2006) . Several identified mutations in domain 1 are involved in the development of HCC and hepatocarcinogenesis, the efficacy of triple therapy, and interaction between core and CXCL6 (Akuta et al. 2007 (Akuta et al. , 2010 (Akuta et al. , 2011 Fishman et al. 2009; Ogata et al. 2002; Takahashi et al. 2001; Idrees and Ashfaq 2013) . Humoral and cellular immune responses against HCV infections are inefficient and there is no convincing explanation to understand HCV immune pathogenesis (Gremion and Cerny 2005) . However, HCV-specific IgM and IgG together were detected in acute infection. There are some outstanding proofs supporting a role for Abs in control of HCV infection and especially in reinfection (Cashman et al. 2014 ). Some researchers have described a rise in anti-HCV humeral immune after immunization with core protein (Aghasadeghi et al. 2006) . Other researchers have claimed that in acute HCV infection the titer of antibodies is very low, and delay in neutralizing antibody production is responsible for ineffective ability to prevent HCV infection (Netski et al. 2005) . Other sources have introduced a large diversity of epitopes in HCV proteins as a possible way to escape the humoral response (Pavio and Lai 2003) . Cellular immune responses have an important role in the clearance of HCV infection. Patients with cellular immune dysfunction like human immunodeficiency virus (HIV) have rapid HCV progression. Some researchers have claimed that this system promotes liver injury by cytolysis activity of infected cells. In spite of cellular immune responses, HCV often evades recognition and has the ability to persist (Ward et al. 2002) . Several studies have described a number of pathways for HCV to escape from cell responses, including impaired oligo-/mono-specific or no virus-specific CD4 + and CD8 + , mutation of epitopes, weakness of proliferative capacity, and cytotoxicity and ability to secrete TNF-α and IFN-γ by CD8 + T cells (Neumann-Haefelin et al. 2005) . In addition, regulatory T cells (Tregs) induced by HCV infection have a significant role in the impaired activity of cellular immune response Hashempoor et al. 2010) . Bioinformatics tools are efficient means to study viruses and different parts of the HCV genome like core domains (Idrees and Ashfaq 2013; Moattari et al. 2015; Dehghani et al. 2017; Nezafat et al. 2018; Atapour et al. 2018; Sarvari et al. 2014; Behzad Dehghani and Zahra Hasanshahi 2019) . Bioinformatics tools are efficient means to study viruses and different parts of the HCV genome like core domains. Many programs have been developed to analyze function, structures, and modification of core protein, providing a large amount of information about core domains and important mutation sites (Akuta et al. 2007 (Akuta et al. , 2010 (Akuta et al. , 2011 . Current data are useful for prediction of HCV disease development and treatment response. In this study, we employed several bioinformatics tools to find important mutations in domain 1 of the core protein, general properties of B-cell and T-cell epitopes, modification sites, and structure of domain 1 in Iranian HCV infected samples from 2006 to 2017. Domain 1 sequences of 188 Iranian HCV samples and reference sequences of HCV genotypes that were registered in NCBI gene bank (http://www.ncbi.nlm.nih.gov/) from 2006 to 2017 were downloaded. Homology among sequences was determined using multiple sequence alignment available in CLC-sequence viewer software under the following parameters: gap open cost, 10; gap extension cost, 1.0; and very accurate progressive alignment algorithm. Also, phylogenetic trees were analyzed through CLUSTAL X software, version 1.81, by neighbor-joining times to confirm the reliability of phylogenetic trees. The accession numbers of all sequences are displayed in Table 1 . By considering previous studies, several significant mutations that are involved in: 1-hepatocellular carcinoma, 2-viral response to triple therapy, and 3-the interaction between core and CXCL6, were determined. All sequences were compared to find mentioned mutations (Akuta et al. 2007; Fishman et al. 2009; Ogata et al. 2002) . General properties of domain 1 (genotype 1a) were determined by employing "Expasy'sProtParam" (http://expas y.org/tools /protp aram.html) and ProtScale at (http://web. expas y.org/prots cale/) (Gasteiger et al. 2005 ). Chou and Fasman, Karplus and Schulz, Kolaskar & Tongaonkar, Emini, Parker, and BepiPred methods at http:// www.immun eepit ope.org (http://tools .immun eepit ope.org/ tools /bcell /iedb_input ) were run for prediction of B-Cell epitopes positions (Chou and Fasman 2009; Karplus and 1 3 Schulz 1985; Emini et al. 1985; Parker et al. 1986; Larsen et al. 2006 ). On hydrophilicity, flexibility/mobility, accessibility, polarity, exposed surface and turns features by BcePred (http://www.imtec h.res.in/ragha va/bcepr ed) B-cell epitopes prediction were performed (Saha and Raghava 2004) . ABCpred software (http://www.imtec h.res.in/ragha va/ abcpr ed/) predicted 16 meric B-cell epitopes (Saha and Raghava 2006a, b) . ProPred-I (http://www.imtec h.res.in/ragha va/propr ed1/) (Singh and Raghava 2003) was employed for MHC Class-I binding peptide prediction and proposed (http://www. imtec h.res.in/ragha va/propr ed/) was used for MHC Class-II binding peptide prediction. Programs were worked at a 4% default threshold by the proteasome and immunoproteasome filters on at 5% threshold (Singh and Raghava 2001) . MHC class I and II predictions were determined using the Immune Epitope Database (IEDB) (http://tools .immun eepit ope.org/main/). For prediction of CTL epitopes, "ctlpred" and ANN methods were used (48). Probability of antigenicity was expected by VaxiJen software at http://www.ddg-pharm fac.net (Doytchinova and Flower 2007) . IgE epitopes and allergic properties were estimated at http://www.imtec h.res.in/ragha va/algpr ed/index .html by using AlgPred (Saha and Raghava 2006) . Serine, threonine, and tyrosine phosphorylation sites prediction was done using DISPHOS (http://www.dabi.templ e.edu/disph os/pred.html) (Iakoucheva et al. 2004 ) and Net-Phos (http://www.cbs.dtu.dk/servi ces/NetPh os/) (Blom et al. 1999) . Kinase specific phosphorylation sites were determined by NetPhosK (http://www.cbs.dtu.dk/servi ces/ NetPh osK/) (Blom et al. 2004) . NetNGlyc (http://www.cbs. dtu.dk/servi ces/NetNG lyc/) (Gupta and Brunak 2002) and GlycoEP (http://www.imtec h.res.in/ragha va/glyco ep/submi t.html) were employed for N-glycosylation sites prediction (Chauhan et al. 2013 ). To predict secondary and tertiary structures of core and domain 1 of genotype 1a, SOPMA at (http://npsa-pbil. ibcp.fr/cgi-bin/npsa_autom at.pl?page=npsa_sopma .html) (Geourjon and Deleage 1995) , I-TASSER at (http://zhang lab.ccmb.med.umich .edu/I-TASSE R) (Roy et al. 2010) , PHYRE 2 server at (http://www.sbg.bio.ic.ac.uk/~phyre The Signal peptide was predicted by "Signal-BLAST", and "SignalP 4.1 Server". All data were collected anonymously in accordance with legal requirements regarding data protection and medical confidentiality. Approval from the Faculty Human Research Ethics Committee (Shiraz University of Medical Sciences) was obtained before the commencement of the study. By considering all submitted sequences in NCBI GenBank we could not find any sequences related to 2017. Phylogenetic tree for all sequences was shown in Fig. 1 . All 2006 sequences were placed in a cluster at the bottom of the tree, and a sequence of 2012 has a high similarity to KF218585.1 (2014) . The majority of sequences were closer to 1a and 3a than other reference sequences. All important mutation positions were listed in Table 2 ; the majority of mutations happened in 2013 and 2016 samples. No mutation was detected in12, 23, 25, 39, 45 positions. Protparam results for domain 1 are listed in Table 3 . Because of the high percentage of basic amino acids, domain 1 is a highly basic peptide (Theoretical pI: 12). The instability index, an estimate of the stability of a protein in a test tube, showed that domain 1 is an unstable peptide. Aliphatic index, a positive factor for the increase of thermostability of proteins, indicated that this peptide is a thermostable peptide. GRAVY is a hydropath city index and increasing Phylogenetic tree based on domain1 sequences and by using neighbor joining method. The phylogenetic tree was constructed by the NJ method. The numbers at the forks show the numbers of occurrences of the repetitive groups to the right out of 100 bootstrap samples. All used reference sequences were showed after accession numbers (1a, 1b, and etc.). Sequences were categorized in five major clusters positive score indicates a greater hydrophobicity, so this peptide is a hydrophilic peptide. Hydropathicity analysis by Kyte J. and Doolittle R.F. method showed that the major part of the peptide had a negative score; the maximum hydropath city score was on aa34 (valin) and the minimum hydropath city score was on aa 14 (asparagine). Amino acids flexibility predicted by Bhaskaran R Ponnuswamy P.K method indicated that the maximum flexibility was around amino acid 58 (proline) and the minimum was around aa 95 (glycine). Transmembrane (TM) tendency calculated by Zhao, G., London E. method, showed that the major part of peptide had a negative score, and the maximum transmembrane tendency was on aa34 (valine) and the minimum was on aa10 (lysine). Peptide polarity predicted by Grantham R method showed that the maximum polarity was on amino acid 12 and 13 (lysine and arginine) and the minimum polarity was on aa 34 (valine). http://www.immun eepit ope.org online software: Chou and Fasman Beta-Turn Prediction, which is based on the rationale for predicting turns to predict antibody epitopes, showed one high score region, 99-112. Emini Surface Accessibility Prediction, which is based on surface accessibility scale, showed two high score positions (4-20 and 49-61). Karplus and Schulz flexibility scale was used for B-cell prediction; this method is based on mobility of protein segments on the basis of the known temperature B factors of the a-carbons of 31 proteins of known structure. Results demonstrated two positions with the highest score (50-60 and 5-12). Kolaskar & Tongaonkar Antigenicity method is based on physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes to predict antigenic determinants on protein. Results showed five positions of 20-39, 43-49, 63-69, 78-85, and 93-101. Parker Hydrophilicity Prediction method is based on peptide retention times during high-performance liquid chromatography (HPLC) on a reversed-phase column. By this method, two regions were found: 9-16, 51-57. Linear B-cell epitopes were determined using BepiPred. This method is based on a combination of a hidden Markov model and a propensity scale method. Three regions (1-25, 51-84, and 102-116) were founded by BepiPred analysis. For a combination of all physicochemical properties (hydrophilicity, flexibility/mobility, accessibility, polarity, exposed surface, and turns) for linear B-cell epitope prediction based on physicochemical properties on a non-redundant dataset: Using bcepred online software, three regions (4-22, 47-59, and 109-115) with the highest combined score were found. Five 16 meric conserved regions (78, 49, 1, 64 and 102) were found by ABCpred prediction Server (Table 4 ). According to a predefined cutoff of VaxiJen program, domain 1 was confirmed as a probable antigen (model: virus and threshold: 0.4). The prediction of allergenic proteins by mapping of IgE epitope, SVM, and hybrid methods showed that domain 1 was not an allergen protein. Regarding T-cell responses against HCV, previous researches found some hosts' human leukocyte antigen (HLA) alleles associations with HCV infection in Iranian patients. We found several epitopes of HLA's shown in Table 5 . Some studies found both CD4 helper and CD8 CTL responses against HCV infection. ctlpred found several epitopes for CTL (Table 6 ). Prediction of serine, threonine, and tyrosine phosphorylation sites by DISPHOS showed one position (116) in domain1. By "NetPhos" software we found 10 phosphorylation sites (Fig. 2) , 6 sites for serine (53, 56, 99, 103, 106, and 116) 3 sites for threonine (15, 49, and 52) and one site for tyrosine (86). NetPhosK results determined four phosphorylation sites, three threonine amino acids (3, 15, and 49) for protein kinase C and one serine (116) for protein kinase A. No glycosylation site was found by NetNGlyc and GlycoEP. Secondary structure prediction for core and domain1 by using SOMPA software was summarized in Table 7 and Fig. 3 . SOMPA showed there was no alpha helix structure in domain1 and the major part of it was the random coil. But the combination of (PS) 2 -v2 and PHYRE 2 showed there was an alpha helix structure in 8-15 region (Figs. 4, 5) . All programs displayed extended strand in the 29-36 region. 3D structures were determined by all three online software but only structures predicted by I-TASSER were Fig. 2 Phosphorylation sites prediction for domain 1 using "NetPhos" online software. Green lines indicate 6 sites for serine, blues lines show 3 sites for threonine, and one purple line shows tyrosine. All sites with scores above the threshold of 0.5 were considered as phosphorylation sites reliable. Final structures (Figs. 6, 7) were validated by Qmean. QMEANscore and Z-score for calculated for core were 0.242 and − 5.43. The scores were not satisfactory but at least provided an overview of the core protein structure. QMEANscore and Z-score for domain 1 were 0.61 and − 1.24 confirming the quality and reliability of the predicted structure. Ramachandran plot was assessed by RAMPAGE, and percentages of the favoured region, allowed region, and outlier region for core were 63.0%, 27.5%, and 9.5% respectively . 3 Secondary structure prediction using SOMPA. Red region is extended strand, blue is the alpha helix, green is beta turn, and purple is the random coil. The majority of core structure belongs to random coil Fig. 4 Secondary structure prediction using PHYRE2. The result of this tool shows that the majority of the core structure (40%) is alpha helix which is indicated with green helix, also the confidence keys of the predicted structure for these regions are high (Fig. 8) . RAMPAGE results for domain1 showed 61.7% of residues in favored region and 24.3 in allowed region (Fig. 9 ). Figure 10 showed the showed T cell and B-cell epitopes on the surface of the core protein. Both online tools "Signal-BLAST" and "SignalP 4.1 Server" were not able to predict any signal peptide for core protein. The results of "PeptideCutter" prediction were summarized in Table 8 . The prediction was done for all the predicted epitopes. According to the results, the antigenic epitopes that had the lower number of cleavage positions for enzymes were more potential for B cell or T cell epitopes. Although emerging bacterial viral diseases have caused great catastrophes in human history which can affect from a small and localized group to millions of people across continents, several vaccine and therapies have been introduced to control them (Dehghani et al. 2014 (Dehghani et al. , 2013 . Fishman et al. (2009) using multivariable logistic regression models ,found 10 HCV-core gene polymorphisms extensively associated with increased HCC risk (36G/C, 209A, 271C/U, 309A/C, 384U, 408U, 435A/C, 465U, 481A, 546A/C) and one significantly linked with decreased HCC risk (78U). Mentioned mutations related to change in domain1 amino acid sequence: N11S/T, K12Silent/N, A25V, G69S, Q70R, M91L, and L102P. All current amino acid changes decreased HCC risk except A25V (78U) (Fishman et al. 2009 ). In selected sequences, we found that amino acid 11 was T in all sequences except one sequence in 2013 (P) and in 1a (N). Positions 12(K) did not show any change; amino acid 25 was P and amino acid 69 was R in all sequences. (L)). In amino acid 102 we did not find any change but in one of the reference sequences, we found one change (5a (G to S)). Akuta et al. (2007) employed PCR for detecting substitutions of aa 70 and aa 91 in HCV-core gene of genotype 1b by using the mutation-specific primer as an important predictor of hepatocarcinogenesis. For wild samples, aa 70 was arginine (R) and aa 91 was leucine (L) but for mutant aa 70 was glutamine(Q)/histidine (H) and aa 91 was methionine (M) (Akuta et al. 2007) . Also , Furui et al. (2011) identified aa 70 and aa 91 substitutions among Japanese volunteer blood donors (Furui et al. 2011) . In terms of aa 70 substitutions, we recognized 2 glutamine substitutions in 2006 sequences and 3 glutamine, 2 histidines, one proline in 2013 sequences; also 4 glutamine substitutions in 2014 sequences, and 25 glutamine, and 1 Fig. 8 Ramachandran plot was used to visualize energetically allowed regions for backbone dihedral angles ψ against ϕ of amino acid residues in modeled protein structure (LCC model) for tertiary structure of core protein by RAMPAGE; the majority of amino acids residues were in favored region (119 amino acids) and allowed region (52 amino acids) 1 3 histidine in 2016. We found several methionine residues in aa 91 in , and 2016 . Ogata et al. (2002 compared sequences of the core protein of Subtype 1b HCV strains obtained from patients with and without HCC and found some amino acid mutation sites (Ogata et al. 2002) . K23Q, Q70R, and T110M substitutions were found by Ogata et al. (2002) . In comparison with our results, in all sequences, aa 23 was K, in the majority of sequences aa 70 was R and in 34 sequences it was Q. In aa 110 we did not find any methionine (Ogata et al. 2002) . Akuta et al. (2010) confirmed the role of Gln70 (or His70) in the efficacy of triple therapy and sustained a virological response, the patient with both genotype non-TT and Gln70 (His70) had the worst sustained virological response. Also, Akuta et al. (2011) by following up twenty-six patients determined the role of Gln70 (His70) substitution in the development of HCC They suggested detection of aa substitutions in the core region before antiviral therapy (Akuta et al. 2010 (Akuta et al. , 2011 . Alestig et al. (2011) showed substitution in aa 70 of the core was related to treatment response, but that was less important than IL28B polymorphism. In our study, 37 sequences had Gln70 (or His70) substitution (Alestig et al. 2011) . Tokita et al. (2000) approved the role of HCV-core region (Thr49Pro) to reduce the fluorescence enzyme immunoassay Fig. 9 Ramachandran plot was used to visualize energetically allowed regions for backbone dihedral angles ψ against ϕ of amino acid residues in modeled protein structure (LCC model) for tertiary structure of domain1 by RAMPAGE; majority of amino acids residues were in favored region (71 amino acids) and allowed region (28 amino acids) (FEIA) sensitivity. We found 2 sequences in 2013 and one in 2016 with T to P substitutions (Tokita et al. 2000) . Findings of Horie et al. (1999) indicated that alteration from glycine to serine at core codon 45 was dominant in noncancerous liver portions rather than in cancerous liver portions and sera from HCC patients (Horie et al. 1999) . Our results did not show any glycine to serine mutation in aa 45 and in all sequences, it was glycine. Idrees and Ashfaq (2013) by using molecular docking software, reported interactions of amino acid residues arginine 149, arginine 39, arginine 74, and arginine 78 in HCVcore protein and Leu44, Ala71, Ser76 and Pro97 in CXCL6. This finding clues to understanding HCV pathogenesis. Any change in these positions can relate to HCV Infection and HCC. Our results showed no alteration in aa 39 and 74, and in aa 78 just one sequence was glutamine to arginine (Idrees and Ashfaq 2013) . Using a combination of predicted B-cell antibody epitopes by all methods on the immune epitope website and also considering bcepred and ABCpred prediction ,we could define three major epitopes (4-20, 50-60, and 100-112) for domain 1. Ferroni et al. (1993) by using the algorithm of Jameson and Wolf identified four epitopes in HCV-core protein (7-21, 31-45, 49-63, and 99-113) . Harase et al. (1995) analyzed the response to HCV-core protein in mice and found a major B cell epitope (21-40). Pirisi et al. (1995) analyzed Sera from 97 HCV infected patients and found three 15-mer peptides as antigens in an enzyme immunoassay. They concluded that anti-R15P (50-64, RKTSERSQPRGRRQP) as a potent antigen might help to identify a subgroup at higher risk to develop HCC (Pirisi et al. 1995) . Also, a study on HCV positive blood donor by Lechmann et al. (1996) determined a region (aa 1-24) of domain1 that bound the antibodies from the sera of all patients and showed a great potential for detection of HCV infection by using serological B-Cell responses tests. Comparison of our results with previous studies revealed that all predicted epitopes in our research have a good potential for future studies of the immune response against HCV infection, and are useful for recognition of all kinds of HCV infections. Gededzha et al. (2014) used several bioinformatics tools to predict HLA class I and HLA class II in HCV genotype 5a. They found three T-cell epitopes of NS3, NS4B, and NS5B. Some HLA class II alleles were found in Iranian patients by Samimi-Rad et al. (2015) ; DRB1*0301, DQA1*0501, DQB1*0201, DRB1*1101, and DQB1*0301 were demonstrated in patients with HCV clearance, and DRB1*0701, DQA1*0201, DQB1*0602, DRB1*0301, DRB1*11, and DQB1*0201 occurred more frequently in chronic patients (Samimi-Rad et al. 2015) . Khorrami et al. (2015) found a relationship between HLA-G, IL-10, and response to combined therapy in HCV positive patients. They concluded that HLA-G, and IL-10 have a significant role in response to therapy with IFN-α2α and ribavirin. Also, HLA-A01 and HLA-B38 were determined as important alleles associated with Peg-IFN plus ribavirin therapy in Egyptian patients by Farag et al. (2013) . Pourhassan et al. (2014) found several HLA alleles associated with HCV in Iranian patients (2011) (2012) (2013) . A2, A3, B35, B38, BW4, CW4and CW7 were the most frequent alleles found by this group. Fig. 10 A: the position of the B-Cell epitopes (yellow region) on core tertiary structure and B: the T-cell epitopes (yellow region) on core 3D structure Table 8 The results of predicted cleavage positions for 12 common proteases: B-cell, T-cell, and CTL predicted epitopes Positions Caspase Chymotrypsin Clostripain Elastase Pepsin Proteinase K Asp-N Staphylococcal peptidase I Thrombin Proline Trypsin B-cell epitopes 78-93 3(8,9,16) 5(4, 7, 9,15, 16) 6(4, 6, 8, 9, 12, 16) 1 (11) 1(12) 49-64 5 (2, 7, 11, 13, 14) 3(1, 4, 6) 1 (5) 1 (6) 6 (2, 3, 7, 11, 13, 14) 1_16 7 9 16) 12 (2, 3, 6, 7, 8, 9, 16, 18, 20, 21, 24, 26,) HLA-Cw*0401 78-93 3(8, 9 ,16) 5(4, 7, 9, 15, 16) 6(4, 6, 8, 9, 12, 16) 1(11) 1(12) HLA-Cw*0702 1(4) 4(6 7 9 10) In accordance with the above-mentioned studies, we collected HLA alleles associated with HCV infection, and by employing in-silico analysis we established numerous T-cell epitopes for domain1 that can be helpful for future studies to design effective vaccine against HCV genotype 1a, and can provide benefit data for better understanding the role of domain1 in immune response. Many researchers have proved broader CTL responses to HCV infection and the usefulness of CTL epitopes mapping to develop therapeutic interventions or vaccines (Sabet et al. 2014; Saeedi et al. 2014; Jazayeri and Carman 2005; Arashkia et al. 2011) . In our research, we utilized reliable software to predict CTL epitopes and extracted data that can be useful for vaccine development studies. By considering results of phosphorylation sites prediction by 3 programs, we concluded that 3 sites (15, 49, and 116) were the main phosphorylation sites in domain 1. Amino acid 116 is a serine that located in Arg-Arg-Arg-Ser-Arg region; this region was similar to the usual target sequence for protein kinase A [Arg-Arg-X-Ser/Thr-X]. Two threonine amino acid residues (15, 49) were calculated as protein kinase c target sites where this kinase acts through the phosphorylation of hydroxyl groups of amino acid residue. Previous studies indicated that core protein is phosphorylated by PKA and PKC. Core phosphorylation regulates the suppressive activity of HCV-core protein on HBV gene replication and expression. Also, it has been shown that phosphorylation in core relates to the nuclear localization of the core protein. They demonstrated three serine residues (Ser-53, Ser-116, and Ser-99) as the potential phosphorylated sites in core protein, that were similar to NetPhos software results in our research (Yassin 2001; Shih et al. 1995; Lu and Ou 2002) . Secondary structure prediction indicated that the majority of domain1 was the random coil, and all B-cell epitopes and important mutations placed on random coil structure. Tertiary structures were designed by three significant and reliable online programs, but just one of them provided a reliable and high-quality protein structure model for domain1. The quality and reliability of models were confirmed by QMEAN and RAMPAGE software. By examining all the predictions core epitopes with and without signal peptide, we found out that there is no difference between these two different strategies of analysis (Pene et al. 2009; Targett-Adams et al. 2008; Ma et al. 2007; Okamoto et al. 2008; Oehler et al. 2012) . Based on previous studies core protein has a C terminal signal peptide and because the focus of our study was on domain 1, it was expected that the deletion of this region could not affect the epitope perdition results. Digestion analysis to predict possible proteases was shown that each epitope can be digested by at least 5 The position of each epitope, the number of cleave sites in each epitope as well as the position of the cleave sites in each epitope are mentioned in the table selected proteases which can have a significant effect on the reduction of the half-life of epitopes. Finally our investigations in this research provided comprehensive data about frequent mutations in domain1, and as a first report can be useful for future study about significant mutations and their role in therapeutic pathway and response to antiviral therapy for Iranian patients. Also, identification of domain1 properties provides practical information for domain1 cloning and more researches. Evaluation of a native preparation of HCV core protein (2-122) for potential applications in immunization, diagnosis and mAb production Detection of specific antibodies to HCV-ARF/CORE + 1 protein in cirrhotic and non-cirrhotic patients with hepatitis C: a possible association with progressive fibrosis Amino acid substitutions in the hepatitis C virus core region are the important predictor of hepatocarcinogenesis Amino acid substitution in hepatitis C virus core region and genetic variation near the interleukin 28B gene predict viral response to telaprevir with peginterferon and ribavirin Amino acid substitutions in hepatitis C virus core region predict hepatocarcinogenesis following eradication of HCV RNA by antiviral therapy Insights into the role of HCV Plus-/Minus strand RNA, IFN-γ and IL-29 in relapse outcome in patients infected with HCV Role of serum level and genetic variation of IL-28B in interferon responsiveness and advanced liver disease in chronic hepatitis C patients Core mutations, IL28B polymorphisms and response to peginterferon/ribavirin treatment in Swedish patients with hepatitis C virus genotype 1 infection Immunoinformatics modeling, construction of DNA plasmids Carrying CTL epitopes of hepatitis C virus and their preliminary immunological analysis Designing a fusion protein vaccine against HCV: an in silico approach QMEAN: A comprehensive scoring function for model quality assessment Prediction of CTL epitopes using QM, SVM and ANN techniques Sequence and structurebased prediction of eukaryotic protein phosphorylation sites1 Prediction of post-translational glycosylation and phosphorylation of proteins from the amino acid sequence The humoral immune response to HCV: understanding is key to vaccine development Packaging of HCV-RNA into lentiviral vector In silico platform for prediction of N-, O-and C-glycosites in eukaryotic protein sequences Protein structure prediction server Amino acid sequence The hepatitis C virus Core protein is a potent nucleic acid chaperone that directs dimerization of the viral (+) strand RNA in vitro Immunogenicity of Salmonella enterica serovar Enteritidis virulence protein, InvH, and cross-reactivity of its antisera with Salmonella strains Immunoprotectivity of Salmonella enterica serovar Enteritidis virulence protein, InvH, against Salmonella typhi Functional and structural characterization of Ebola virus glycoprotein (1976-2015)-An in silico study VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Induction of hepatitis A virus-neutralizing antibody by a virus-specific synthetic peptide Human leukocyte antigen class I alleles can predict response to pegylated interferon/ribavirin therapy in chronic hepatitis C Egyptian patients Identification of four epitopes in hepatitis C virus core protein Mutations in the hepatitis C virus core gene are associated with advanced liver disease and hepatocellular carcinoma Prevalence of amino acid mutation in hepatitis C virus core region among Japanese volunteer blood donors Protein identification and analysis tools on the ExPASy server Prediction of T-cell epitopes of hepatitis C virus genotype 5a SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments Hepatitis C virus and the immune system: a concise review Prediction of glycosylation across the human proteome and the correlation to protein function Immune response to hepatitis C virus core protein in mice Expansion of CD4+ CD25+ FoxP3+ regulatory T cells in chronic hepatitis C virus infection A decline in anti-core + 1 antibody titer occurs in successful treatment of patients infected with hepatitis C virus F protein increases CD4+ CD25+ T cell population in patients with chronic hepatitis C Mutations of the core gene sequence of hepatitis C virus isolated from liver tissues with hepatocellular carcinoma The importance of intrinsic disorder for protein phosphorylation HCV infection and NS-3 serine protease inhibitors Analysis of hepatitis C virus RNA dimerization and core-RNA interactions Virus escape CTL or B cell epitopes? Prediction of chain flexibility in proteins Protein structure prediction on the Web: a case study using the Phyre server The relationship between HLA-G and viral loads in non-responder HCV-infected patients after combined therapy with IFN-α2α and ribavirin Improved method for predicting linear B-cell epitopes Hepatitis C virus infection Evolving epidemiology of hepatitis C virus T-and B-cell responses to different hepatitis C virus antigens in patients with chronic hepatitis C infection and in healthy anti-hepatitis C virus-positive blood donors without viremi Phosphorylation of hepatitis C virus core protein by protein kinase A and protein kinase C Characterization of the cleavage of signal peptide at the C-terminus of hepatitis C virus core protein by signal peptide peptidase Hepatitis C virus sequences from different patients confirm the existence and transmissibility of subtype 2q, a rare subtype circulating in the metropolitan area of silico functional and structural characterization of H1N1 influenza A viruses hemagglutinin Comparison of IL-28B favorable genotype frequency between healthy and patients infected with HCV Humoral immune response in acute hepatitis C virus infection T cell response in hepatitis C virus infection Exploring dengue proteome to design an effective epitope-based vaccine against dengue virus AU-Sabetian, Soudabeh Structural analysis of hepatitis C virus core-E1 signal peptide and requirements for cleavage of the genotype 3a signal sequence by signal peptide peptidase Comparative sequence analysis of the core protein and its frameshift product, the F protein, of hepatitis C virus subtype 1b strains obtained from patients with and without hepatocellular carcinoma Intramembrane processing by signal peptide peptidase regulates the membrane localization of hepatitis C virus core protein and viral propagation New hydrophilicity scale derived from high-performance liquid chromatography peptide retention data: correlation of predicted surface residues with antigenicity and X-ray-derived accessible sites The hepatitis C virus persistence: how to evade the immune system? Sequential processing of hepatitis C virus core protein by host cell signal peptidase and signal peptide peptidase: a reassessment Reactivity to B cell epitopes within hepatitis C virus core protein and hepatocellular carcinoma Assemble and interact: pleiotropic functions of the HCV core protein. Hepatitis C viruses: genomes and molecular biology The first report in Azeri patients I-TASSER: a unified platform for automated protein structure and function prediction Immunogenicity of multi-epitope DNA and peptide vaccine candidates based on core, E2, NS3 and NS5B HCV epitopes in BALB/c mice Enhanced immune responses of a hepatitis C virus core DNA vaccine by co-inoculating interleukin-12 expressing vector in mice BcePr(ed): prediction of continuous B-cell epitopes in antigenic sequences using physico-chemical properties Prediction of continuous B-cell epitopes in an antigen using recurrent neural network AlgPred: prediction of allergenic proteins and mapping of IgE epitopes Association of HLA class II alleles with hepatitis C virus clearance and persistence in thalassemia patients from Iran Comparative proteomics of sera from HCC patients with different origins Distribution of hepatitis C virus genotypes in Iranian chronic infected patients Modulation of the trans-suppression activity of hepatitis C virus core protein by phosphorylation ProPred: prediction of HLA-DR binding sites ProPred1: prediction of promiscuous MHC Class-I binding sites Efficient trans-encapsidation of hepatitis C virus RNAs into infectious virus-like particles Core as a novel viral target for hepatitis C drugs Hepatitis C virus (HCV) genotype 1b sequences from fifteen patients with hepatocellular carcinoma: the 'progression score'revisited Maturation of hepatitis C virus core protein by signal peptide peptidase is required for virus production Hepatitis C virus core mutations reduce the sensitivity of a fluorescence enzyme immunoassay Cellular immune responses against hepatitis C virus: the evidence base Unraveling the mystery of liver diseases in Egypt The authors would like to acknowledge Shiraz Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.