key: cord-252910-7qvnj6c8 authors: Li, Xin; Jin, Xiufeng; Chen, Shunmei; Wang, Liangge; Yau, Tung On; Yang, Jianyi; Hong, Zhangyong; Ruan, Jishou; Duan, Guangyou; Gao, Shan title: The discovery of a recombinant SARS2-like CoV strain provides insights into SARS and COVID-19 pandemics date: 2020-09-21 journal: bioRxiv DOI: 10.1101/2020.07.22.213926 sha: doc_id: 252910 cord_uid: 7qvnj6c8 In December 2019, the world awoke to a new zoonotic strain of coronavirus named severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). In the present study, we identified key recombination regions and mutation sites cross the SARS-CoV-2, SARS-CoV and SARS-like CoV clusters of betacoronavirus subgroup B. Based on the analysis of these recombination events, we proposed that the Spike protein of SARS-CoV-2 may have more than one specific receptor for its function. In addition, we reported—for the first time—a recombination event of ORF8 at the whole-gene level in a bat and ultimately determined that ORF8 enhances the viral replication. In conjunction with our previous discoveries, we found that receptor binding abilities, junction furin cleavage sites (FCSs), strong first ribosome binding sites (RBSs) and enhanced ORF8s are main factors contributing to transmission, virulence and host adaptability of CoVs. Junction FCSs and enhanced ORF8s increase the efficiencies in viral entry into cells and replication, respectively while strong first RBSs enhance the translational initiation. The strong recombination ability of CoVs integrated these factors to generate multiple recombinant strains, two of which evolved into SARS-CoV and SARS-CoV-2 by nature selection, resulting in the SARS and COVID-19 pandemics. A new zoonotic strain of coronavirus named severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) emerged in December 2019. Since SARS-CoV-2 is high similar to SARS-CoV, many studies focused on the investigate of the receptor binding domain (RBD) of the Spike protein and its receptor ACE2 using the same strategies and methods as in SARS-CoV [1] . Different from these studies, we previously reported several other findings on SARS-CoV-2 for the first time, including the following in particular: (1) the alternative translation of Nankai coding sequence (CDS) that characterize the rapid mutation rate of betacoronavirus at the nucleotide level [2] ; (2) a furin cleavage site (FCS) "RRAR" in the junction region between S1 and S2 subunits (junction FCS) of SARS-CoV-2 that may increase the efficiency of viral entry into cells [3] ; and (3) the use of 5' untranslated-region (UTR) barcoding for the detection, identification, classification and phylogenetic analysis of-though not limited to-CoVs [4] . By data mining betacoronaviruses from public databases, we found that more than 50 nucleotides (nts) at the 3' ends of the 5' UTRs in betacoronavirus genomes are highly conserved with very few single nucleotide polymorphisms (SNPs) within each subgroup of betacoronaviruses. We defined 13~15-bp sequences of 5' UTRs including the start codons (ATGs) of the first open reading frames (ORFs) as barcodes to represent betacoronaviruses. Using 5' UTR barcodes, 1265 betacoronaviruses were clustered into four classes, matching the C, B, A and D subgroups of betacoronavirus [4] , respectively. The 3' end of each 5' UTR includes the first ribosome binding site (RBS) of betacoronavirus, which regulates the translational initiation of downstream genes ORF1a and 1b [5] . In particular, SARS-CoV-2 and SARS-CoV have the strong first RBSs enhancing the translational initiation [4] . These previous studies indicated that receptor binding abilities, junction FCSs and strong first RBSs are main factors contributing to transmission, virulence and host adaptability of CoV. The present study started with the identification of key recombination regions and mutation sites cross three clusters of betacoronavirus subgroup B. Using the insertions and deletions (InDels) at six sites, we identified two recently detected betacoronavirus strains RmYN01 and RmYN02 from a bat [6] and discovered that RmYN02 was a recombinant SARS2-like CoV strain. This led us to report-for the first time-a recombination event in open reading frame 8 (ORF8) at the whole-gene level in a bat, which had been co-infected by two betacoronavirus strains. ORF8 (Figure 2) , existing only in betacoronavirus subgroup B, was considered to have played a significant role in adaptation to human hosts following interspecies transmission [7] via the modification of viral replication [8] . Next, we ultimately determined that ORF8 enhances the viral replication, which is another main factor contributing to transmission, virulence and host adaptability of CoV. In the present study, we analyzed these main factors in the context of betacoronavirus evolution (conjoint analysis of phylogeny and molecular [9] ) to explain the SARS and COVID-19 pandemics. Based on analysis of betacoronavirus subgroup B (Materials and Methods), key insertions and deletions (InDels) were identified at six sites (named M1 to M6) in genes ORF3a, membrane (M), ORF7a, 7b, 8 and nucleocapsid (N), respectively (Table 1) . Using the InDels at six sites, betacoronavirus subgroup B was classified into two classes: (1) the first class includes SARS-CoV-2 (from humans) and SARS2-like CoV (from animals), and (2) the second class includes SARS-CoV (from humans) and SARS-like CoV (from animals). This classification result is reliable as all recombination and mutations between them are unlikely to undergo reversible changes together. As a mutation site, M1 has a length of 8 nt in the second class and 11 nt in the first class. M2, M3, M4 and M5 in the first class have 3-nt deletions that are complete codons, whereas M6 in the first class has 6-nt deletion that are not complete codons. Almost all the identified recombination events occurred in genes ORF1a, S and ORF8 ( Table 1) . The recombination regions RC1 to RC2 and RC3 to RC7 are located in ORF1a and the S1 region of the S gene, respectively, while the recombination events in ORF8 are complex (see below). To initiate the CoV infection, the S protein encoded by the S gene need to be cleaved into the S1 and S2 subunits for receptor binding and membrane fusion. By analysis of all recombination events in all betacoronaviruses, we obtained the following results: (1) there are a few genotypes of each recombination region (RC1 to RC7); (2) RC3 to RC7 have more diversity than RC1 to RC2 in the genotypes; (3) betacoronaviruses within the SARS-CoV-2 and SARS-CoV clusters (see below) had the same genotypes of each recombination region; and (4) there were a few non-synonymous substitutions between different sequences of each genotype. These results suggested that recombination, rather than accumulated mutation directly triggered cross-species transmission and outbreaks of SARS-CoV and SARS-CoV-2, while mutation could change potential recombination sites, affecting recombination. Further analysis showed that two recombination regions (RC6 and RC7) are localized in the receptor binding domain (RBD) of S1 (Figure 1) , while three other recombination regions (RC3, RC4 and RC5) are localized in the N-terminal domain (NTD) of S1. Almost all the secondary structures of five protein segments encoded by RC3 to RC7 are disordered, which are responsible for protein protein interaction (PPI). This suggested that the recombination of RC3 to RC7 improve the adaptability of betacoronaviruses in new hosts (host range expansion [1]) by enhancing interaction of RBD and NTD with their receptors, exhibiting that the positive selection of S was particularly strong [7] . Since both RBD and NTD had similar recombination events in their PPI regions, we proposed that NTD has a specific receptor as RBD has angiotensin-converting enzyme 2 (ACE2). Thus, the S1 subunit of SARS-CoV-2 may have more than one specific receptor (Figure 1 ) like gp120 of HIV has the cluster of differentiation 4 receptor (CD4) and the C-C chemokine receptor 5 (CCR5). Comprehensive analysis and reuse of data from different sources are necessary to determine the other receptor/s of SARS-CoV-2. A previous study identified two genetic susceptibility loci (rs11385942 at locus 3p21.31 and with rs657152 at locus 9q34.2) in COVID-19 patients with respiratory failure using genome-wide association analysis [10] . The locus 3p21.31 was associated to six genes SLC6A20, LZTFL1, CCR9, FYCO1, CXCR6 and XCR1. However, the previous study only focused on the further analysis of the locus 9q34.2 and confirmed a potential involvement of the ABO blood-group system. The researchers did not notice that three chemokine receptors CCR9, CXCR6 and XCR1 merit further investigation as candidates of SARS-CoV-2 receptors. The analysis of bulk RNA-seq data showed high expression of CCR9 and XCR1 in thymus and CXCR6 in T cells, compared to other tissues and cell types [10] . In particular, the thymic cells were consistently negative for ACE2 and many CoVs can infect thymus [11] . By investigating interaction of three protein segments encoded by RC3 to RC5 in NTD (Table 1) with CCR9, CXCR6 and XCR1, we found that CCR9 is the most possible candidate among three chemokine receptors. However, the final determination of the other receptor/s of SARS-CoV-2 need more calculation and experiments on candidates at the whole-genome level. Our study did not rule out the possibilities of non-receptor proteins binding to NTD. The S protein is cleaved into two subunit S1 and S2 (in red color) for receptor binding and membrane fusion. S1 has two domains, RBD (in green color) and NTD (in blue color). It is well accepted that S1 binds to its specific receptor angiotensin-converting enzyme 2 (ACE2) by the interaction between RBD and ACE2 (in purple color). In the present study, we propose that the Spike protein of SARS-CoV-2 may have more than one specific receptor for its function as gp120 of HIV has CD4 and CCR5. The structure of S was predicted using trRosetta [24] . Recently, two betacoronavirus strains RmYN01 and RmYN02 (GISAID: EPI_ISL_412976 and EPI_ISL_412977) were detected from a bat (Rhinolophus malayanus) [6] . Since betacoronaviruses from subgroup B share many highly similar regions in their genome sequences, it is very difficult to assemble them correctly using high-throughput sequencing (HTS) data from one sample. Therefore, EPI_ISL_412976 was only assembled into a partial sequence in a previous study [6] . However, the exact identification of viruses requires the complete genomes or even the full-length genomes. Using paired-end sequencing data, we reassembled these two virus genomes and obtained two full-length sequences to update EPI_ISL_412976 and EPI_ISL_412977 (Supplementary 1). Using 5' UTR barcodes (Introduction), the betacoronaviruses RmYN01 and RmYN02 were identified as belonging to subgroup B. Using the InDels at M1 to M6, RmYN01 was further identified as belonging to the second class, respectively. Using the InDels at M1 to M4 and M6, RmYN02 was further identified as belonging to the first class but a recombinant SARS2-like CoV strain. RmYN02 was supposed to have a 3-nt deletion at the M5 site; however, it did not ( Table 1) . This led us to report-for the first time-a recombination event in ORF8 at the whole-gene level in a bat, which had been co-infected by two betacoronavirus strains. ORF8 (Figure 2 ), existing only in betacoronavirus subgroup B, was considered to have played a significant role in adaptation to human hosts following interspecies transmission [7] via the modification of viral replication [8] . A 29-nt deletion in SARS-CoV (GenBank: AY274119) was reported and considered to be associated with attenuation during the early stage of human-to-human transmission [8] . RNA-seq data from a bat was aligned to two genomes of RmYN01 and RmYN02 (GISAID: EPI_ISL_412976 and EPI_ISL_412977). RNA abundance is represented by read counts (y-axis). The RNA abundance of RmYN02 is 9 times that of RmYN01. A. RmYN01 was identified as belonging to the SARS-like CoV cluster and has a type 3 ORF8. B. RmYN02 was identified as belonging to the SARS-CoV-2 cluster but has an type 2 ORF8 (enhanced ORF8). A 382-nt deletion during the early evolution of SARS-CoV-2 (GISAID: EPI_ISL_414378) was also reported [12] , but associated with attenuation without changes in its replication [13] . Although many recombination events in ORF8 of betacoronaviruses have been reported in sequence analysis results, it is difficult to determine whether they were recombination events or small-size mutation (InDel & SNP) accumulation as most of them only occurred over very small genomic regions, excepting a few events (e.g., the 382-nt deletion in ORF8 [12] ). In the present study, the discovery of a recombination event in ORF8 at the whole-gene level led to the determination of three types (see below) of ORF8 genes in the betacoronavirus subgroup B, providing clues to understand the functions of ORF8. Based on conjoint analysis of phylogeny and molecular function that was proposed in our previous study [9] , genes (i.e. ORF1a, S1 and ORF8) containing the recombination regions under high selection pressure must be removed in phylogenetic analysis. Using large segments (Supplementary 1) spanning S2, ORF3a, 3b, envelope (E), M, ORF6, 7a, 7b, N(9a) and ORF10 (Table 1) , phylogenetic tree 1 ( Figure 3A) showed that 19 of 21 betacoronaviruses from subgroup B (Materials and Methods) were classified into two major clades, corresponding to the first and second class classified using the InDels at six sites, respectively: (1) the first major clade, named the SARS-CoV-2 cluster, includes SARS-CoV-2 and all SARS2-like CoVs (from bats and pangolins); and (2) Using only ORF8 (Supplementary 1), phylogenetic tree 2 ( Figure 3B ) also showed that the 19 betacoronaviruses were classified into the SARS-CoV-2, SARS-CoV and SARS-like CoV clusters. However, this tree did not reflect the evolutionary relationship of the three clusters due to the recombination events of ORF8. Using 21 CDSs of nsp12 (RNA-dependent RNA polymerase, RdRP), the rooted phylogenetic tree 3 ( Figure 3C ) was construct to confirm the evolutionary relationship of the three clusters in tree 1 (Figure 3A) .In phylogenetic tree 2, the SARS-CoV-2, SARS-CoV and SARS-like CoV clusters have types 1, 2 and 3 ORF8 genes, respectively. Type 1 ORF8 genes possess low nucleotide identities (below 70%) to type 3 ORF8 genes, while type 2 ORF8 genes are so highly divergent from types 1 and 3 ORF8 genes, they cannot be well aligned to calculate nucleotide identities between type 2 ORF8 genes and types 1 or 3 ORF8 genes. As RmYN02 belongs to the cluster 1 (Figure 3A) but has a type 2 rather than type 1 ORF8 (Figure 3B ), RmYN02 is a recombinant SARS2-like CoV strain. This discovery indicated that recombination occurred across the SARS-CoV-2 and SARS-CoV clusters, which has potential to generate a new strain more dangerous than SARS-CoV-2 and SARS-CoV. Recombination regions (RC1-7) and mutation sites (M1-6) were annotated in the viral genomes of SARS-CoV-2 (GenBank: MN908947) by column 2-4 and RmYN02 (GISAID: EPI_ISL_412977) by column 5-7. # These amino acid sequences are encoded by RC1-7 from SARS-CoV-2. All the insertions and deletions refer to SARS-CoV (GenBank: AY278489). * Since RmYN02 has a recombinant ORF8, it has the same allele at the M5 site as SARS-CoV. Comparing phylogenetic tree 1 (Figure 3A ) using large segments with 2 ( Figure 3B ) using only ORF8 genes, all betacoronaviruses were consistently classified into the same clusters in both trees, except RmYN02 and the SARS-like CoV strain WIV1 (GenBank: KF367457). The accession numbers of the GenBank or GISAID databases were used to represent the viral genomes: WIV1 was classified into cluster 2, but has a type 3 rather than type 2 ORF8, WIV1 is a recombinant SARS-like CoV strain. WIV1, isolated from Chinese horseshoe bats (Rhinolophus sinicus), was considered the most closely related to SARS-CoV but not its immediate ancestor [14] . A previous study predicted the immediate ancestor of SARS-CoV based on the following hypothesis: the ancestor of SARS-like CoVs from civets was a recombinant virus with ORF8 originating from greater horseshoe bats (Rhinolophus ferrumequinum) and other genomic regions originating from different horseshoe bats [7] . However, whether these recombination events occurred in bats or civets remains unclear [7] . Both phylogenetic tree 1 ( Figure 3A ) and 2 ( Figure 3B ) consistently revealed that SARS-CoV-2 is most closely related to the well-known strain RaTG13 (GenBank: MN996532) isolated from intermediate horseshoe bats (Rhinolophus affinis) . However, RaTG13 is unlikely to be the immediate ancestor of SARS-CoV-2 due to lack of the junction FCS Next, we conducted further research on the biological functions of ORF8. RmYN01 and RmYN02 were simultaneously detected in a bat, providing a special opportunity to compare their copy numbers. As RmYN01 and RmYN02 have type 3 and type 2 ORF8 genes, respectively, the difference between the copy numbers of RmYN01 and that of RmYN02 can be estimated by their relative RNA abundances to test a previous hypothesis that type 2 ORF8 genes increase replication efficiency of viruses. Aligning RNA-seq data to the genomes of RmYN01 and RmYN02, our calculation showed that the RmYN01 genome was covered 99.85% of its length with an average depth of 32.89 (Figure 2A) , while the RmYN02 genome was covered 99.89% with an average depth of 298.99 ( Figure 2B) . The RNA abundance of RmYN02 is 9 times that of RmYN01. Based on the "leader-to-body fusion" model explaining the replication and transcription of CoVs [5] , the difference in RNA abundance of the ORF1a and ORF1b genes ( Figure 2AB ) resulted from CoV replication, rather than transcription. Therefore, this result suggests that type 2 ORF8 (named enhanced ORF8) genes increase replication efficiency of RmYN02, ruling out the possibility that transcription contributes to the difference in RNA abundance of the two virus strains. Our study ultimately determined that ORF8 enhances the viral replication. Receptor binding abilities, junction FCSs, strong first RBSs and enhanced ORF8s (see above) are main factors contributing to transmission, virulence and host adaptability of CoVs. By analysis of these main factors in betacoronavirus genomes (Materials and Methods), we concluded: (1) rapid recombination of viral genomes provides CoV the strong ability of cross-species transmission and outbreak; (2) the immediate ancestor of betacoronavirus was most likely to have two junction FCS and a strong first RBS, and it transmitted across species during its outbreak; (3) after a period of adaption in new hosts, betacoronavirus was attenuated to spread widely and persist in the host population by loss of abilities attributed to one or more factors (e.g. junction FCSs); and (4) the strong recombination ability of CoVs integrated these factors to generate multiple recombinant strains, very a few of which evolved into super virus strains (e.g. SARS- CoV and SARS-CoV-2) causing pandemics by nature selection. In the betacoronavirus subgroup C (Figure 4A) , middle east respiratory syndrome coronavirus (MERS-CoV) has two junction FCSs. The first one "RSTR", located at position 694 in the S protein (noted as R694), is nonfunctioning, as a result of attenuation, because there is a disulfide bond cross the junction FCS R694. However, the second junction FCS "RSVR" (R751) is still functional. Originated from the same ancestor of MERS-CoV, MERS-like CoVs (e.g. hedgehog CoV) were further attenuated by loss of two junction FCSs. In the betacoronavirus subgroup B (Figure 4AB) , SARS-CoV-2 (GenBank: MN908947) has the junction FCS "RRAR" (R685) and lost a junction FCS by substituting "KNTQ" (R779) for "RNTR", as a result of attenuation. All SARS-2 like CoVs (from bats or pangolins [3] ) in the SARS-CoV-2 cluster were further attenuated by loss of "RRAR" and substituting "KNTQ" for "RNTR". The immediate ancestor of SARS-CoV was an attenuated variant of SARS-CoV-2 by loss of "RRAR" (e.g R667) and with inaccessible "RNTR" (R761) which has secondary structures in helix rather than coil. All SARS-like CoVs in the SARSlike CoV cluster were further attenuated by loss of "RRAR" and substituting "KNTQ" for "RNTR". In the betacoronavirus subgroup D, HKU9-CoV was attenuated by loss of two junction FCSs but still have a strong first RBS. In the betacoronavirus subgroup A (Figure 4A) , although almost all strains (e.g. HCoV-OC43 and HCoV-HKU1) still have one junction FCS, they do not have strong first RBSs and enhanced ORF8s. These strains were heavily attenuated from the immediate ancestor of betacoronavirus due to complex reasons. Firstly, the average arginine (R) pencentage of S proteins from betacoronaviruses of the subgroup A except mouse hepatitis virus (MHV) 2.63% is significantly lower than those from MHV and betacoronaviruses of the subgroup B, C and D (3.34%, 3.33%, 3.32% and 3.33%). This indicated that accumulated mutations caused attenuation by loss of arginine residues, since arginine residues are indispensable for the protease cleavage sites. Other reasons may include the loss of strong first RBSs and genetic events in the transcription regulatory sequences [5] , an important factor that is not further investigated in the present study, but merit further investigation in the future. and betacoronaviruses from pangolins) in the SARS-CoV-2 cluster are attenuated variants of SARS-CoV-2. As recombinant betacoronavirus, the immediate ancestor of SARS-CoV-2 is characterized by the junction FCS "RRAR", while the immediate ancestor of SARS-CoV is characterized by the enhanced ORF8. Therefore, WIV1 without the enhanced ORF8 and RaTG13 without the junction FCS "RRAR" may contribute to, but are not the immediate ancestors of SARS-CoV and SARS-CoV-2, respectively. The outbreaks of MER-CoV, SARS-CoV and SARS-CoV-2 were triggered by recombination events, not accumulated mutations. So it is not suitable to estimate their divergence time using current theories in evolutionary biology. The origins of the junction FCS "RRAR" and the enhanced ORF8 are still unknown. Therefore, the recombinant strains (e.g. WIV1 and RmYN02) are identified based on the reference strains that were usually reported before the recombinant strains. This reflects the phylogenetic relationship between them, not the actual recombinant events which occurred to generate the recombinant strains. Future investigation need be conducted to search for the betacoronavirus strains that provided the junction FCS "RRAR" and the enhanced ORF8 to SARS-CoV-2 and SARS-CoV, respectively. our studies also suggest rthat the nucleotide sequences of the junction FCS "RRAR" and ORF8 may originate from non-viral species. and (2) SARS2-like CoV with at least one junction FCS will be eventually detected in bats. The software VirusDetect [16] was used to detect viruses in RNA-seq data from [6] . The software Fastq_clean [17] was used for RNA-seq data cleaning and quality control. The genomes of RmYN01 and RmYN02 (GISAID: EPI_ISL_412976 and EPI_ISL_412977) were reassembled by aligning RNA-seq data on two closest reference genomes JX993988 and MN908947. SVDetect v0.8b and SVFilter [18] were used to removed abnormal aligned reads. Several haploid contigs (Supplementary 1) highly similar to the complete RmYN01 genome were also assembled. This suggested that there exists more than one betacoronavirus strain belonging to the SARS-like CoV cluster in the same sample, from which RmYN01 and RmYN02 were detected. 1,265 genome sequences of betacoronaviruses (in group A, B, C and D) were downloaded from the NCBI Virus database (https://www.ncbi.nlm.nih.gov/labs/virus) in our previous study [3] . EPI_ISL_412976 and EPI_ISL_412977), totally 21 complete genomes were used for the phylogenetic analysis applying the neighbour joining (NJ) method. Sequence alignment was performed using the Bowtie v0.12.7 software with paired-end alignment allowing 3 mismatches; mutation detection and other data processing were carried out using Perl scripts; the phylogenetic analysis was performed using MEGA v7.0.26; Statistics and plotting were conducted using the software R v2.15.3 the Bioconductor packages [23] . The structure of S (Supplementary 2) was predicted using trRosetta [24] . Recombination, Reservoirs, and the Modular Spike: Mechanisms of Coronavirus Cross-Species Transmission Bioinformatics Analysis of the 2019 Novel Coronavirus Genome A Furin Cleavage Site Was Discovered in the S Protein of the 2019 Novel Coronavirus Novel Coronavirus Leads to Insights into Its Virulence The Architecture of SARS-CoV-2 Transcriptome A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein SARS coronavirus ORF8 protein is acquired from SARS-related coronavirus from greater horseshoe bats through recombination Attenuation of replication by a 29 nucleotide deletion in SARScoronavirus acquired during the early stages of human-to-human transmission Genomewide Association Study of Severe Covid-19 with Respiratory Failure Potential impact of SARS-CoV-2 infection on the thymus Discovery of a 382-nt deletion during the early evolution of SARS-CoV-2. bioRxiv Effects of a major deletion in the SARS-CoV-2 genome on the severity of infection and the inflammatory response: an observational cohort study. The Lancet Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins VirusDetect: An automated pipeline for efficient virus discovery using deep sequencing of small RNAs Fastq_clean: An optimized pipeline to clean the Illumina sequencing data with quality control Genomewide analysis of Dongxiang wild rice (Oryza rufipogon Griff.) to investigate lost/acquired genes during rice domestication Analysis of multimerization of the SARS coronavirus nucleocapsid protein. Biochemical and biophysical research communications The E protein is a multifunctional membrane protein of SARS-CoV Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor R language and Bioconductor in bioinformatics applications(Chinese Edition) Author contributions statements SG conceived the project. SG and GD supervised this study. XJ and SC conducted programming. XL, LW, and TY downloaded, managed and processed the data Non-financial competing interests