key: cord-0028401-a6emwk5p authors: Elshikh, Mohamed S.; Ajmal Ali, Mohammad; Al-Hemaid, Fahad; Yong Kim, Soo; Elangbam, Meena; Bahadur Gurung, Arun; Mukherjee, Prasanjit; El-Zaidy, Mohamed; Lee, Joongku title: Insights into plastome of Fagonia indica Burm.f. (Zygophyllaceae): organization, annotation and phylogeny date: 2021-11-12 journal: Saudi J Biol Sci DOI: 10.1016/j.sjbs.2021.11.011 sha: ca421bc1a128576969005fdb00ff60981720edcf doc_id: 28401 cord_uid: a6emwk5p The enhanced understanding of chloroplast genomics would facilitate various biotechnology applications; however, the chloroplast (cp) genome / plastome characteristics of plants like Fagonia indica Burm.f. (family Zygophyllaceae), which have the capability to grow in extremely hot sand desert, have been rarely understood. The de novo genome sequence of F. indica using the Illumina high-throughput sequencing technology determined 128,379 bp long cp genome, encode 115 unique coding genes. The present study added the evidence of the loss of a copy of the IR in the cp genome of the taxa capable to grow in the hot sand desert. The maximum likelihood analysis revealed two distinct sub-clades i.e. Krameriaceae and Zygophyllaceae of the order Zygophyllales, nested within fabids. The cp (chloroplast) genome / plastome encodes several key proteins involved in the photosynthesis and in other metabolic processes important for the interactions of plants with the environment as well as defense against invading pathogens (Bobik and Burch-Smith, 2015) . The availability of organelle or even whole genome sequence data in different databases repositories are gradually increasing because of the advancement of massively parallel next-generation DNA sequencing platforms and development of bioinformatics resources during last two decades; as a result, the characterization of over 5000 chloroplast (cp) genome sequences / plastome until September 2021 available in the GenBank, have revolutionized the application of plastome genomics , genetic engineering to enhance plant agronomic traits (Cosa et al., 2001; Ruf et al., 2001; Dufourmant et al., 2004 Dufourmant et al., , 2005 Liu et al., 2007; Zhou et al., 2008; Singh et al., 2010; Lee et al., 2011; Jin et al., 2011 Jin et al., , 2012 Jin et al., , 2015 Zhang et al., 2015) , synthesis of enzymes and biomaterials Viitanen et al., 2004; Verma et al., 2010 , enhancing nutrition (Shintani et al., 1998 Schneider, 2005; Apel et al., 2009; Jin et al., 2014) , biopharmaceuticals (Grabowski et al., 2006; El Kaoutari et al., 2013; Kwon et al., 2013; Shenoy et al., 2014; Kohli et al., 2014; Holtz et al., 2015; Kwon et al, 2015) , biomedical products (Daniell et al., 2016) , and in understanding the genetic diversity, and phylogeny (Daniell et al., 2016; Brozynska et al., 2016; Jansen et al., 2007; Moore et al., 2010; Elshikh et al., 2020) . Fagonia indica Burm.f. (Family: Zygophyllaceae, Order: Zygophyllales, Clade: Fabids) (APG IV, 2016) is a densely to sparsely branched thorny herb approximately 60 Â 100 cm in height and width respectively (Fig. 1) , possess anticancer activity (Lam et al., 2014) , is widely distributed in Asian and African deserts (El Hadidi, 1985; Basto, 2002; Beier et al., 2004; Beier, 2005) to the inner zone of Empty Quarter (-hottest sand desert) (Mandavil, 1986) . A thorough survey of published reports revealed that the cp genomes of plants, like F. indica, which have the capability to grow in extremely hot sand desert, have been rarely characterized. The present report deals the complete cp genome sequence of F. indica, and discuss its genome organization including gene content and repeat features, phylogeny, and compare with the representative plants of major habitats to detect similarity and variations. The fresh leaves of F. indica were collected from the desert of Riyadh region, Saudi Arabia. The genomic DNA extracted using the Qiagen DNeasy Kit (Qiagen, Hilden, Germany) was subsequently used to construct short-insert libraries according to the manufacturer's manual (Illumina, Inc., San Diego, USA), and sequenced as a single-end run of 51 bp using the DNA Illumina sequencing platform (Quail et al., 2012) . The sequence raw reads were filtered using fastqc to obtain the high-quality clean sequence data by removing adaptor sequences. The high-quality filtered reads were then assembled using spades (Bankevich et al., 2012) . The assembled cp genome was annotated using default parameters (Tillich et al., 2017; Stothard, 2000) of GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html), the NCBI GenBank sequence of Larrea tridentata (DC.) Coville (family Zygophyllaceae) (GenBank accession number NC_028023.1) was used as a reference for annotation and for further comparison with the closely related L. tridentata using the mVISTA program (http:// genome.lbl.gov/vista/mvista/submit.shtml) in Shuffle-LAGAN mode (Brudno et al., 2003) . The tandem repeats were analyzed using the 'Tandem Repeat Finder' (https://tandem.bu.edu/trf/trf.html) (Benson 1999) , (Timme et al., 2007) , and REPuter (https://bibiserv.cebitec.unibielefeld.de/reputer/) (Castro et al., 2013) was used to identify and locate disperse repeats including the direct (forward) and inverted (palindrome) repeats. The tandem repeats less than 15 bp in length and the REPuter redundant results were removed manually, and then the candidate small inversions (SIs) were identified when the repeats' distance was less than 50 bp (Yang et al., 2010) , and the likely secondary structures of the SIs were evaluated using MFOLD (version 3.2) (http://unafold.rna.albany. edu/?q = mfold). The potential microsatellite regions were tracked by looking for five or more repeats of the nucleotides A and T using MISA (http://misaweb.ipk-gatersleben.de/) (Beier et al., 2017) . A total number of 48 chloroplast genes (Supplementary Table S1 ) present in the cp genomes belonging to 49 taxa (Supplementary Table S2 ) from 49 different orders and the three outgroup sequences belonging to Gymnosperm clade were retrieved from the GenBank, and aligned using ClustalX (Thompsone et al., 1997) . The maximum likelihood (ML) analyses was performed using MEGAX software (Kumar et al., 2018) . The mapping of the assembled cp genome resulted into circular molecule (Fig. 2) with a total number of 115 unique genes [repre-sents 1,28,379 base pair (bp) nucleotides (nt)] which includes 80 CDS (represents 80,200 bp nucleotides coding for 42,793 codons), 31 tRNA genes and four rRNA genes. The assembled cp genome sequence was submitted to the NCBI (GenBank accession number MN521457). The cp genome size of F. indica was approximately 128 kb, which was smaller than that of L. tridentata (Fig. 3, Table 1 ). The coding regions were less divergent than the non-coding regions (Fig. 4) . A total of 13 genes, including seven protein-coding genes and six tRNA genes, contained one or two introns (Supplementary Table S3 ). Among the intron-containing genes, trnK-UUU had the largest intron (2511 bp) that includes the matK gene, and trnL-UAA had the smallest intron (551 bp). The ycf3 gene had two introns of 722 and 758 bp. The sequence analysis indicates that 58.49%, 6.87%, and 3.53% of the genome sequence encode proteins, tRNAs, and rRNAs, respectively, whereas 41.50% of the genome sequence is a non-coding sequence filled with introns, intergenic spacers, and pseudogenes. Based on the sequences of proteincoding and tRNA genes within the cp genome, Phe (0.05%) and Arg (0.0038%) were the most and least used amino acids, respectively (Supplementary Table S4 ). The tandem and dispersed repeats were analyzed in the cp genome of F. indica. Forty-one tandem repeats were identified, of which 23 were 15-20 bp, 14 were 21-30 bp, two were 31-40 bp, one was 41-50 bp, and another one was 81-90 bp in size. Similarly, 43 dispersed repeats were identified, of which one was 21-30 bp, 22 were 31-40 bp, 11 were 41-50 bp, four were 51-60 bp, one was 61-70 bp, another one was 81-90 bp, and three were more than 91 bp in size. In total, 84 repeats were identified, of which 87% were in the intergenic spacer regions, 6% in introns, and 7% in the CDS regions, respectively (Fig. 5) . The repeat structures in other members of Zygophyllaceae (L. tridentata) were also analyzed using REPuter (Fig. 6) . The forward and inverted repeats were common in L. tridentata and F. indica. In addition, in the same Zygophyllaceae family, different repeat structures were found between F. indica and L. tridentata. Of the two Zygophyllaceae cp genomes studied, F. indica contained the highest total number of repeats that were 75 bp or greater in length and SIs ranging from 11 to 24 bp in size. The folded stem-loop structures of the three SIs of F. indica are shown in Fig. 7 . Within the cp genome of F. indica, 37 different SSR loci were repeated more than five times (Table 2) . Of these, 31 loci were homopolymers and six were di-polymers. All homopolymeric loci contained multiple A or T nucleotides, whereas all di-polymeric loci contained multiple AT or TA nucleotides. These SSR loci contribute to the A-T richness of the cp genome of F. indica. The present maximum likelihood (ML) bootstrap analysis revealed two major clades-monocots and eudicots. In the eudicots clade, F. indica clades with L. tridentata and T. mongolica (family Zygophyllaceae, order Zygophyllales) nested within the clade fabids. The maximum likelihood tree (MLT) also revealed two distinct clades of Krameriaceae and Zygophyllaceae (Fig. 8) . In the present study, the mapping of the assembled cp genome was found similar to the angiosperm , except for the loss of one copy of the IR as similar to majority of papilionoid (Doyle et al., 1996; Kato et al., 2000; Saski et al., 2005; Guo et al., 2007) . The rps16 gene was found in the cp genomes of most angiosperms, including the representatives of the early-branching lineages (Goremykin et al., 2003; Raubeson et al. 2007; Hansen et al., 2007) ; however, it was not found in the F. indica. F. indica had a single copy of inverted repeat resulted into the inverted gene order compared to its taxonomically close relative L. tridentata. The lengths of the cp genomes of angiosperms remain variable primarily because of nucleotide substitutions, gene/intron losses, and expansion and contraction of the inverted repeat IR region . It was noted that the coding regions were less divergent than the non-coding regions (Fig. 4) ; however, further analysis showed that clpP and accD were the most divergent coding regions (Supplementary Table S5 ). Photosynthesis is the ultimate source of biomass production (Beadle and Long, 1985) . The PAR (photosynthetically active radiation) intensity is an important factor that determines the rate of photosynthesis (Wimalasekera, 2019) . The intensity of light varies in different major habitats (Warrant and Johnsen, 2013) . The comparative cp genome analysis of F. indica as a representative from hot sand desert with the representatives of flowering plants occurring in different major habitats further supports the conservative pattern of the cp genome and suggests that the genes contained in the cp genome might not have roles solely in organism yield, rarity, or abundance and biomass, and in encountering stress (Elshikh et al., 2020) . The knowledge of phylogeny is used in almost every branch of biology (Yang and Rannala, 2012) including taxonomy [Philippe et al., 2005 ; APG IV, 2016), evolution (Edwards, 2009; Soltis et al., 2019) and comparative biology (Eisen, 1998; Mäser et al., 2001; Kellis et al., 2003; Pedersene et al., 2006; Lindblad et al., 2011) , medicine (Marra et al., 2003; Grenfell et al., 2004; Salipante and Horwitz, 2006) , and genomics (Paten et al., 2008; Green et al., 2010; Gronau et al., 2011; Li and durbin, 2011; Ma, 2011) . Moreover, the family Zygophyllaceae has previously been treated as being related either to Geraniaceae (Geraniales) or to Sapindales/Rutales or Linales/Malpighiales (Sheahan and Chase, 1996) . Secondly, the phylogenetic relationships of the two sister families, e.g., Zygophyllaceae and Krameriaceae (Soltis et al., 1998; Savolainen et al., 2000; Wang et al., 2009; Tao et al., 2018) under the order Zygophyllales, have often been controversial APG IV, 2016. The wood anatomy supports the separation of Krameriaceae from the Zygophyllaceae (Carlquist, 2005) . Granot and Grafi (2014) argued, based on epigenetic studies, that the placement of the families Krameriaceae and Zygophyllaceae under the order Zygophyllales should be re-examined. The present maximum likelihood (ML) bootstrap analysis revealed two major cladesmonocots and eudicots. In the eudicots clade, F. indica clades with L. tridentata and T. mongolica (family Zygophyllaceae, order Zygophyllales) nested within the clade fabids. The maximum likelihood tree (MLT) also revealed two distinct clades of Krameriaceae and Zygophyllaceae. The analyses of de novo genome sequence of F. indica (family Zygophyllaceae) have added the evidence of the loss of a copy of the IR in the cp genome of the taxa capable to grow in the hot sand desert. The maximum likelihood analysis revealed two distinct sub-clades i.e. Krameriaceae and Zygophyllaceae of the order Zygophyllales. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Complete chloroplast genome sequencing of Saraca asoca: cp genome characterization Enhancement of carotenoid biosynthesis in transplastomic tomatoes by induced lycopene-to-provitamin a conversion An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG IV SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing Flora de Cabo Verde 63. Centro de Botanica do Instituto de Investigação Cientifica Tropical Photosynthesis-is it limiting to biomass production? A revision of the desert shrub Fagonia (Zygophyllaceae) Phylogenetic relationships and biogeography of the desert plant genus Fagonia (Zygophyllaceae), inferred by parsimony and Bayesian model averaging MISA-web: a web server for microsatellite prediction Tandem repeats finder: a program to analyze DNA sequences Chloroplast signaling within, between and beyond cells Genomics of crop wild relatives: expanding the gene pool for crop improvement Global alignment: finding rearrangements during alignment Wood anatomy of Krameriaceae with comparisons with Zygophyllaceae: phylesis, ecology and systematics Chloroplast genome diversity in Portuguese grapevine Chloroplast genomes: diversity, evolution, and applications in genetic engineering Overexpression of the Bt cry 2Aa2 operon in chloroplasts leads to formation of insecticidal crystals The distribution and phylogenetic significance of a 50-kb chloroplast DNA inversion in the flowering plant family Leguminosae Generation of fertile transplastomic soybean Generation and analysis of soybean plastid transformants expressing Bacillus thuringiensis Cry1Ab protoxin Is a new and general theory of molecular systematics emerging? Evol Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis Zygophyllaceae The abundance and variety of carbohydrate-active enzymes in the human gut microbiota Comparative analysis of cp genome of Fagonia indica growing in desert and its implications in pattern of similarity and variations Analysis of the Amborella trichopoda chloroplast genome sequence suggests that Amborella is not a basal angiosperm The market for follow-on biologics: how will it evolve? Epigenetic information can reveal phylogenetic relationships within Zygophyllales A draft sequence of the Neandertal genome Unifying the epidemiological and evolutionary dynamics of pathogens Bayesian inference of ancient human demography from individual genome sequences Rapid evolutionary change of common bean (Phaseolus vulgaris L.) plastome, and the genomic diversification of legume chloroplasts Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms Commercial-scale biotherapeutics manufacturing facility for plant-made pharmaceuticals Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns Expression of $c$-tocopherol methyltransferase in chloroplasts results in massive proliferation of the inner envelope membrane and decreases susceptibility to salt and metal-induced oxidative stresses by reducing reactive oxygen species Release of hormones from conjugates: chloroplast expression of bglucosidase results in elevated phytohormone levels associated with significant increase in biomass and protection from aphids or whiteflies conferred by sucrose esters Engineered chloroplast dsRNA silences cytochrome p450 monooxygenase, V-ATPase and chitin synthase genes in the insect gut and disrupts Helicoverpa armigera larval development and pupation Pinellia ternata agglutinin expression in chloroplasts confers broad spectrum resistance against aphid, whitefly, lepidopteran insects, bacterial and viral pathogens Complete structure of the chloroplast genome of a legume, Lotus japonicus Sequencing and comparison of yeast species to identify genes and regulatory elements Oral delivery of bioencapsulated proteins across blood-brain and blood-retinal barriers MEGA X: molecular evolutionary genetics analysis across computing platforms Low-cost oral delivery of protein drugs bioencapsulated in plant cells Oral delivery of bioencapsulated exendin-4 expressed in chloroplasts lowers blood glucose level in mice and stimulates insulin secretion in beta-TC 6 cells Correction: an aqueous extract of Fagonia cretica induces DNA damage, cell cycle arrest and apoptosis in breast cancer cells via FOXO3a and p53 expression Expression and characterization of antimicrobial peptides Retrocyclin-101 and Protegrin-1 in chloroplasts to control viral and bacterial infections Inference of human population history from individual whole-genome sequences A high-resolution map of human evolutionary constraint using 29 mammals Stable chloroplast transformation in cabbage (Brassica oleracea L. var. capitata L.) by particle bombardment Reconstructing the history of large-scale genomic changes: biological questions and computational challenges Plant life in the Rub' al-Khali (the Empty Quarter), southcentral Arabia The genome sequence of the SARS-associated coronavirus Phylogenetic relationships within cation transporter families of Arabidopsis Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots Genome-wide nucleotide-level mammalian ancestor reconstruction Identification and classification of conserved RNA secondary structures in the human genome A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus Stable genetic transformation of tomato plastids and expression of a foreign protein in fruit Phylogenetic fate mapping Complete chloroplast genome sequence of Glycine max and comparative analyses with other legume genomes Phylogenetics of flowering plants based on combined analysis of plastid atpB and rbcL gene sequences Chemistry and biology of vitamin E A phylogenetic analysis of Zygophyllaceae R. Br. based on morphological, anatomical and rbc L DNA sequence data Oral delivery of Angiotensin-converting enzyme 2 and Angiotensin-(1-7) bioencapsulated in plant cells attenuates pulmonary hypertension Elevating the vitamin E content of plants through metabolic engineering Plastid transformation in eggplant (Solanum melongena L.) Inferring complex phylogenies using parsimony: an empirical approach using three large DNA data sets for angiosperms Darwin review: angiosperm phylogeny and evolutionary radiations The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences Evolution of Angiosperm Pollen. 6. The Celastrales, Oxalidales, and Malpighiales (Com) Clade and Zygophyllales1 The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools A comparative analysis of the Lactuca and Helianthus (Asteraceae) plastid genomes: identification of divergent regions and categorization of shared repeats Chloroplast-derived enzyme cocktails hydrolyse lignocellulosic biomass and release fermentable sugars Metabolic engineering of the chloroplast genome using the Echerichia coli ubiC gene reveals that chorismate is a readily abundant plant precursor for phydroxybenzoic acid biosynthesis Rosid radiation and the rapid rise of angiosperm-dominated forests Vision and the light environment Effect of light intensity on photosynthesis The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.) Molecular phylogenetics: principles and practice Full crop protection from an insect pest by expression of long double-stranded RNAs in plastids High-level expression of human immunodeficiency virus antigens from the tobacco and tomato plastid genomes The authors would like to extend their sincere appreciation to the Researchers Supporting Project number (RSP-2021/306), King Saud University, Riyadh, Saudi Arabia. This research was supported by the grant from the KRIBB Initiative Program of the Republic of Korea. Supplementary data to this article can be found online at https://doi.org/10.1016/j.sjbs.2021.11.011.