key: cord-0037335-y45y5svy authors: Yi, Yue; Wang, Chengdong; Xu, Zhiwen; Li, Desheng; Zhu, Ling; Li, Fengqin; Qi, Xinqiao; Guo, Wanzhu title: Analysis of Codon Usage Bias in Interferon Alpha Gene of the Giant Panda (Ailuropoda Melanoleuca) date: 2012 journal: Information Technology and Agricultural Engineering DOI: 10.1007/978-3-642-27537-1_37 sha: 6538a7befb7665b542e6fd66363665912fad02f9 doc_id: 37335 cord_uid: y45y5svy The analysis on codon usage bias of IFN-a gene of giant panda (Ailuropoda melanoleuca) may provide a basis for understanding the evolution relationship of giant panda and for selecting appropriate host expression systems to improve the expression of target genes. In this paper, the codon usage bias in the mature IFN-a sequence of giant panda and 15 reference species have been analyzed. The results showed that the synonymous codons with G and C at the third codon position were widely used and the ENC-GC3S plot revealed that the genetic heterogeneity in IFN-a gene was main constrained by mutational bias. Contrastive analysis revealed that there were 40 codons showing distinct usage differences between GpIFN-a and Escherichia coli, 38 codons between GpIFN-a and yeast. and only 30 between GpIFN-a and Homo sapiens. Therefore the Homo expression system may be more suitable for the expression of GpIFN-a genes. The genetic code uses 61 nucleotide triplets (codons) to encode 20 amino acids and three to terminate translation [1] . So the genetic code is necessarily redundant which are more than amino acids. Each amino acid is encoded by between one (Met and Trp) and six (Arg, Leu and Ser) synonymous codons [2] . Different codons that encode the same amino acid are known as synonymous codons. Changes in the DNA sequence of a protein between two synonymous codons are often assumed to have no effect and are thus called synonymous changes or even silent changes [2] . In a wide variety of organisms such as bacteria, fungi, and insects (Sharp et al. 1992) , codons within synonymous groups are used with different frequencies [3] [4] [5] [6] the phenomenon called codon usage bias. There is continuing speculation regarding the reasons what have made these differences in codon preferences [7] . the most commonly accepted hypothesis for the unequal usage of synonymous codons stated that it was the result of mutational biases and natural selection acting at the level of translation. However, several other causes have also been reported, such as GC compositions, gene length, gene expression level, gene translation initiation signal, protein amino acid composition, protein structure, mutation frequency and patterns, and tRNA abundance [8] [9] [10] [11] [12] . In some other researches, it was also found that codon usage was related to gene function [13, 14] and cellular location of gene products [15] [16] [17] . Whatever the reasons for codon bias, it has become increasingly clear that codon biases can have profound impacts on the expression of heterologous proteins. In this study, characterization of codon usage bias on interferon alpha gene without signal peptide of the giant panda had been evaluated. Giant panda is a much loved animal all over the world and is considered a symbol of China. However, it is also one of the world's most endangered species. In addition, giant pandas are considered to be particularly vulnerable to infectious disease and parasites [18] . Therefore, analysis and study on cytokine genes of the giant panda are important to facilitate the use of these cytokines in the immunotherapy of infectious diseases of the giant panda [19] . The interferons are a group of natural antiviral substances first discovered in 1957 [20] . Many studies have performed on the function of IFN-a. It influences many biological processes including broad spectrum antiviral effects, inhibition of tumor cell proliferation and enhancement of immune functions. And it is considered to be a potential agents for therapy of neoplastic diseases and cancers [21] . So far, research on the giant panda genome is still rare. only a few genes with immunity function have been cloned. Recently, the interferon alpha gene of giant panda had been cloned in our laboratory called GpIFN-a. Here, we first time analyzed the codon usage data of GpIFN-a and those of other 15 species' IFN-a. Moreover, the codon usage bias in the GpIFN-a gene was also compared with Escherichia coli, yeast, and Homo sapiens. In this research, codon usage data of GpIFN-a and the comparison results might give some clues to the features of GpIFN-a gene and some evolutionary information of IFN-a gene. And also provide insights into the expression and possible function of GpIFN-a. The mature sequence of GpIFN-a which cloned in our laboratory had been submitted to GenBank and the accession number is HQ378189. The overall IFN-α reference sequences were download from NCBI (http://www.ncb i.nlm.nih.gov/). The signal peptide of every reference sequence had been removed to kept the statistical significance of codon usage bias. The informations such as GenBank accession No., Natural host and Length of the IFN-α gene of overall 16 reference sequence are shown in Table 1 . Codon bias was calculated according to methods described elsewhere. Relative synonymous codon usage (RSCU) is the observed frequency of a codon divided by the frequency expected if all synonyms for that amino acid were used equally. It is independent of amino acid composition and is thus useful for comparing different sets of genes [22] . The effective number of codons (ENC) is a measure analogous to the effective number of alleles. Based on simulation studies, ENC appears to be low sensitive to gene length [23] . The ENC value of the IFN-a gene of each reference was computed with the EMBOSS CHIPS online service program. The degree of synonymous-codon usage bias was also measured by the codon adaptation index (CAI) which estimates the extent of bias toward codons that are known to be favored in highly expressed genes. which was calculated with the EMBOSS CAI program [24] . The peculiarity in codon usage frequency ,the G+C content of the gene sequences and the G+C content at the synonymous third position of codons (GC3S) were also calculated with the EMBOSS CUSP program [25] . The mature nucleotide sequences of the GpIFN-a gene and 15 reference IFN-a were translated into amino acid sequences by using DNASTAR software.Then, multiple sequence alignment and phylogenetic analysis by the bootstrap option were performed by using ClustalW and TREEVIEW software. A comparison analysis was performed between the GpIFN-a,Escherichia coli, yeast, and Homo sapiens. The database of the codon usage in E. coli, yeast, and Homo sapiens was available at (http://www.kazusa.or. jp/ codon/). While the relative synonymous codon usage values (RSCU) and the related measures indicate the overall GpIFN-a codon bias, it is also important to look more closely at the patterns of codon bias. Table 2 shows the details of the codon usage patterns and codon usage data of giant panda IFN-α gene (excluding Met, Trp, and the termination codons). A high level of diversity in codon usage bias existed for coding the Ala, Leu and Ser amino acids(emphasized with underline). From the table 2 we can see that the frequencies of codons usage in coding the same amino acids are different. There are 26 codons, such as CTG(Leu), AGG (Arg), ATC(Ile), CCC(Pro), TGG(Trp), TAC(Tyr) and so on, were more frequently used. Among the preferred codons nineteen synonymous codons strong bias toward C-ended and nine toward G-ended at the third codon position. It is evident that codon usage bias in the mature polypeptide is strong bias towards the synonymous codons with G and C in the wobble. The value of CAI, ENC and GC and GC3S content analysis of grint panda IFN-α and other kinds of IFN-α genes were obtained by EMBOSS. The results are shown in Table 3 It has been reported that a plot of ENC against GC 3S can be effectively used to explore the heterogeneity of codon usage among genes [23] . Figure 1 shows the distribution plot of the ENC and GC3S values for the IFN-α gene in the reference species. The solid line represented the curve if codon usage was only determined by GC content on the third codon position [26] . It is evident that there is only a small number of genes lying on the expect plot curve, most of the plot lay near to the solid line of this distribution (shown in Figure. A phylogenetic tree based on the amino acid sequences of the IFN-a gene of the 16 reference species was shown in Fig. 2 . It shows that there are mainly two branches for the IFN-a family. GpIFN-a has been clustered in the mammal branch. And we can also see that the grint panda and mustela putorius furo clustered in a monophyletic clade, and have a closer relationship with canis famililiaris and vulpes vulpes. Multiple sequence alignment of the amino acid sequences testify to the conclusion that grint panda is evolutionarily closer to the animal mentioned above. The figure also shows that the IFN-a gene of grint panda is alienated to those whose natural host is avian on the evolution relationship. (Table 1) , and constructed by CLUSTAL-W and TREEVIEW software The codon usage bias in genes remains at a certain level across species. We compared GpIFN-a gene with those of E.coli, yeast and H. sapiens to see which will be the suitable host for the optimal expression of GpIFN-a gene. Contrastive analysis showed that there are 40 codons showing distinct usage differences between GpIFN-a and E.coli, 38 codons between GpIFN-a and yeast, but only 30 between GpIFN-a and H. sapiens, suggesting that codon usage of GpIFN-a genes more closely resembles that of H. sapiens genes. Codon bias refers to the nonrandom usage of synonymous codons for encoding amino acids in organisms. As it is related to the carrier molecular of genetic information(DNA) and functional molecular (protein) of life, this phenomenon implicates important biological sense [27] . Extensive studies had shown that synonymous codon usage in various organisms. Although the codon usage pattern among different species is a complex phenomenon, it plays an important role in illuminating the underlying mechanisms of codon usage patterns in order to understand the evolution of the species. Signal peptides consist of short stretches of amino acids which, after protein delivery to the correct subcellular compartment, are frequently removed by specialized signal peptidases [28] . Only after removal of the signal peptide sequence, the precursor protein may be allowed to entry into the secretory pathway and become a normal function of the mature protein [29] . We intend to express the GpIFN-a gene in prokaryotic expression system and test its antiviral effects in future. In order to keep the higher biological activity, we cloned the gene fragment coding for mature protein of GpIFN-a. In this study, we analyzed the codon usage data of GpIFN-a without signal peptide and those of other 15 species of IFN-a, Additionally, we investigated the correlation of codon usage between GpIFN-a gene with Escherichia coli, yeast, and Homo sapiens to find the suitable expression system. RSCU, CAI, ENC values, GC and GC3s content of the IFN-α gene were predicted by using the EMBOSS CUSP program and CHIPS program. RSCU with direct assessment codon usage bias of different codons in each gene sample. When its value greater than 1.0 indicated that the corresponding codon was used more frequently than expected. Table 2 showed that there were 26 preferred codons,most of them were strong bias towards the synonymous codons with G and C in the wobble, coincides with high G+C content at the third codon position (the mean value was 76.85%). CAI was used as a measure to predict the level of gene expression.The value was between 0 and 1 [30] , a higher value means a stronger codon usage bias and a higher expression level. the CAI values for IFN-α genes were slight variation among different species and with a mean value of 0.808, so we concluded IFN-α gene was likely to be moderately expressed in giant panda genome. The ENC is a measure analogous to the effective number of alleles. It ranges from 20, where only one codon is used for each amino acid to 61 if all codons are used equally [31] . In general, if the ENC value of a gene is 35 or less, that gene is thought to possess strong codon bias [32] . Analyzing the ENC values of all the IFN-α genes, the results showed the ENC values of mammal's IFN-α genes(mean value of 45.26) were higher than birds' IFN-α gene(mean value of 30.20) , So the codon usage bias was slightly low in mammal but strong in bird. The ENC-plot (ENC plotted against GC ) is used as part of a general strategy to investigate patterns of synonymous codon usage. In fig.1 , the solid line represents the curve if codon usage is only determined by GC content on the third codon position [33] , There is only a small number of genes lying on the expect plot curve, most of the plot lay near to the solid line. In principle, proprietary proportion of points lay near to the solid line on this distribution. It suggested that mutational bias was the main factor determining the codon usage variation among these IFN-α genes. Comparative analysis of IFN-α genes in giant panda and the reference species indicated that synonymous codon usage in mammal's IFN-α genes are phylogenetically conserved. Data in table 3 show that the IFN-α genes of mammal have similar CAI, ENC value and GC content. But a slight difference with birds'. Therefore we can conclude species with near homology relationship on evolution share the near codon usage frequency and preference index or the similar index. The phylogenetic tree analysis based on the IFN-α genes testify to the conclusion. So species has a centain influence to the preference of codon usage. The expression of functional proteins in heterologous hosts is a cornerstone of modern biotechnology. In order to find the suitable host for the optimal expression of GpIFN-α genes. we compared it with those of E. coli, yeast, and H. sapiens. The ratio higher than 2 or lower than 0.050 indicated the codon usage preference differed (shown in table 4.). The rusults revealed that Homo sapiens may more efficiently to express the GpIFN-α genes. If we want to express this gene in prokaryotic expression system, maybe some strategies should be taken in the target gene so that they more closely reflect the codon usage of the host. In summary, codon usage patterns and the phylogenetic results we proposed here were useful to understand the characteres of GpIFN-α gene and a better understanding of both the evolutionary history of these genes. And also might help in increasing the efficiency of gene delivery/expression systems. Codon bias and heterologous protein expression Selection on codon bias Codon usage and tRNA content in unicellular and multicellular organisms Codon usage in regulatory genes in Escherichia coli does not reflect selection for'rare'codons An evolutionary perspective on synonymous codon usage in unicellular organisms Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons Synonymous codon bias is not caused by mutation bias in G+ C-rich genes in humans Codon distribution in vertebrate genes be used to predict gene length The base composition of the genes is correlated with the secondary structures of the encoded proteins Compositional constraints and genome evolution Codon usage in bacteria: correlation with gene expressivity The relationship between synonymous codon usage and protein structure CodonO: codon usage bias analysis within and across genomes A functional significance for codon third bases Cluster analysis of the codon use frequency of MHC genes from different species Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans Interferon-γ of the Giant Panda (Ailuropoda melanoleuca): Complementary DNA Cloning, Expression, and Phylogenetic Analysis Molecular cloning and sequence analysis of the gene encoding interleukin-6 of the giant panda (Ailuropoda melanoleuca) Virus interference: I". The interferon. CA: A Synonymous codon usage bias and overexpression of a synthetic gene encoding Interferon α2b in yeast Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis The effective number of codons' used in a gene Correlation between sequence conservation of the 5'untranslated region and codon usage bias in Mus musculus genes Analysis of synonymous codon usage bias in Chlamydia Analysis of synonymous codon usage in SARS Coronavirus and other viruses in the Nidovirales The analysis method and progress in the study of codon bias. Yi Chuan= Hereditas/Zhongguo Yi Chuan Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome The signal peptide Characterization of Codon Usage Bias in the Newly Identified DEV UL53 Gene Factors affecting synonymous codon usage bias in the gC gene of DPV CHv strain Analysis of Synonymous Codon Usage in the Capsid Gene UL38 of Duck Enteritis Virus Analysis of synonymous codon usage in the UL24 gene of duck enteritis virus