key: cord-0845699-7t4io1ix authors: Li, Yang; Si, Hao-Rui; Zhu, Yan; Xie, Nan; Li, Bei; Zhang, Xiang-Ping; Han, Jun-Feng; Bao, Hong-Hong; Yang, Yong; Zhao, Kai; Hou, Zi-Yuan; Cheng, Si-Jia; Zhang, Shuan-Hu; Shi, Zheng-Li; Zhou, Peng title: Characteristics of SARS-CoV-2 transmission in a medium-sized city with traditional communities during the early COVID-19 epidemic in China date: 2022-01-25 journal: Virol Sin DOI: 10.1016/j.virs.2022.01.030 sha: df0fc74d124cdde16689b6b684c6361e3b7c1d03 doc_id: 845699 cord_uid: 7t4io1ix The nationwide COVID-19 epidemic ended in 2020, a few months after its outbreak in Wuhan, China at the end of 2019. Most COVID-19 cases occurred in Hubei Province, with a few local outbreaks in other provinces of China. A few studies have reported the early SARS-CoV-2 epidemics in several large cities or provinces of China. However, information regarding the early epidemics in small and medium-sized cities, where there are still traditionally large families and community culture is more strongly maintained and thus, transmission profiles may differ, is limited. In this study, we characterized 60 newly sequenced SARS-CoV-2 genomes from Anyang as a representative of small and medium-sized Chinese cities, compared them with more than 400 reference genomes from the early outbreak, and studied the SARS-CoV-2 transmission profiles. Genomic epidemiology revealed multiple SARS-CoV-2 introductions in Anyang and a large-scale expansion of the epidemic because of the large family size. Moreover, our study revealed two transmission patterns in a single outbreak, which were attributed to different social activities. We observed the complete dynamic process of single-nucleotide polymorphism development during community transmission and found that intrahost variant analysis was an effective approach to study cluster infections. In summary, our study provided new SARS-CoV-2 transmission profiles representative of small and medium-sized Chinese cities as well as information on the evolution of SARS-CoV-2 strains during the early COVID-19 epidemic in China. Since the first case reported in December 2019, coronavirus disease 2019 rapidly developed into 57 a global pandemic over several months and became an unprecedented public health disaster in human history 58 (Zhou et al. 2020) . As of December 2021, there have been more than 260 million confirmed COVID-19 cases 59 and more than 5 million deaths worldwide (https://covid19.who.int/). A series of variants with increased 60 infectivity and vaccine resistance have successively emerged in different parts of the world (Boehm et al. 61 2021; England 2020; Faria et al. 2021; Tegally et al. 2021) , casting a shadow over the expectation of ending 62 the COVID-19 pandemic in a short time via vaccine herd immunity (Gupta 2021) . At present, the global 63 COVID-19 pandemic is still far from over. Molecular epidemiology is an important scientific approach for studying the epidemics of infectious 65 diseases and has played an unprecedentedly significant role in combating the global COVID-19 pandemic. In 66 the past year and a half, molecular epidemiological studies worldwide have identified numerous newly 67 emerging and potentially threatening SARS-CoV-2 lineages, provided a thorough understanding of the 68 COVID-19 epidemic dynamics in various countries and cities, and determined the SARS-CoV-2 sources in 69 regional outbreaks. In China, some molecular epidemiological studies have focused on the early COVID-19 70 epidemic in large cities, including Beijing (Du et al. 2020) , Shanghai (Zhang et al. 2020b) , and Guangdong 71 (Lu et al. 2020) , and mainly analyzed the circulation characteristics of local SARS-CoV-2 strains based on 72 viral genome sequences obtained by next-generation sequencing (NGS). After the nationwide COVID-19 73 epidemic ended in early 2020, research attention shifted to regional COVID-19 outbreaks that successively 74 arose in multiple Chinese cities (Cao et al. 2020; Pang et al. 2020; Shiwei et al. 2021; Xiang et al. 2020 ) and 75 were generally caused by different SARS-CoV-2 variants imported from abroad. These studies aimed to trace 76 virus sources through viral phylogenetic and genome variant analyses; however, the reconstruction of (v.12.0.3; QIAGEN, Aarhus, Denmark) . Referring to previous studies (Lythgoe et al. 2021; Xiao et al. 2020), 136 two rounds of intra-host single nucleotide variation (iSNV) calling were performed. In the first round, iSNVs 137 were called using the following conservative criteria: (1) required significance > 1.0; (2) minimum coverage ≥ 138 100 reads at iSNV site; (3) minor allele frequency (MAF) > 5%. After inter-individual iSNVs were identified, 139 the second round of iSNV calling focused on the nucleotide sites of inter-individual iSNVs, using relatively 140 relaxed criteria (MAF > 1%). This iSNV calling strategy can filter false-positive sites caused by sequencing 141 errors, while iSNVs with lower MAF at sites of inter-individual iSNVs are detected and withheld. In this 142 study, SNP and iSNVs were defined in reference to a previous study (Zhang et al. 2021) . We obtained 60 nearly full-length genomes without 5 or 3 flanking regions, accounting for 92.3% of all local 146 COVID-19 cases. Combining the Anyang SARS-CoV-2 genomes and global SARS-CoV-2 reference 147 genomes, we constructed a comprehensive dataset including 492 virus genome sequences representing the 148 SARS-CoV-2 strains circulating between December 26, 2019 and March 31, 2020. We first evaluated the 149 presence of a temporal signal in this dataset using root-to-tip analysis. The correlation coefficient was 0.7641, 150 indicating a good temporal signal ( Supplementary Fig. S1 ). Subsequently, we performed Bayesian inference 151 of phylogeny and the analysis results were explained according to two major SARS-CoV-2 nomenclature 152 systems, namely Phylogenetic Assignment of Named Global Outbreak (Pango) (Rambaut et al. 2020) and 153 Global Initiative of Sharing All Influenza Data (GISAID) (Tang et al. 2020) . We found that three major 154 lineages/clades of SARS-CoV-2 were circulating globally before March 31, 2020, including lineages A (clade 155 S), B (clade L), and B.1-B.1.X (clades G, GR, and GH) (Fig. 1) . Specifically, lineage A (clade S) and lineage 156 B (clade L) were the dominant lineages before March 2020, and strains of these lineages were mainly and many of them were located next to virus genomes from other Chinese cities, indicating that there were 167 multiple geographic and lineage sources of the SARS-CoV-2 strains that caused the epidemic in Anyang. This 168 was in line with the epidemiological statistics of the infected cases, which showed that local cases infected in 169 Anyang, imported cases from Wuhan, and imported cases from other cities accounted for 69.23% (45 cases), 170 18.46% (12 cases), and 13.31% (8 cases), respectively. Besides Wuhan, other cities related to imported cases 171 in Anyang included Beijing, Hefei (the capital of Anhui Province), Jinan (the capital of Shandong Province), Yichang (a city of Hubei Province), and Zhuzhou and Yueyang (two cities of Hunan Province). Among these 173 imported cases, 8 out of 21 had produced next-generation cases after they arrived in Anyang, and nearly all 174 these infection events occurred in local families and communities (families A-I and communities 1-3). Notably, transmission events in communities 1 and 2 accounted for half of Anyang infected cases, and their 176 SARS-CoV-2 genome sequences formed two prominent clusters, which were obvious divergent from other Referring to the reference genome WIV04, we identified a total of 93 SNPs in the 60 Anyang SARS-CoV-2 189 genomes, including 36 synonymous variants, 52 non-synonymous variants, 2 other variants (one in the 5- The transmission event in family B mainly involved three small families. The small families of cases 9 216 (including cases 9, 11, and 18) and 14 (including cases 14 and 16) lived in a large house together with their 217 mother (case 17). Case 9 was the sister-in-law of case 14, and they had often helped each other with the 218 household. Case 20 was the son of case 9. He, his wife (case 21), and their son lived in another house, but 219 they frequently visited his parents (cases 9 and 11) and her father (case 23). Case 22 was a neighbor of case 9, 220 and she and case 9 regularly visited each other. Unlike family A members, family B members did not 221 participate in a large family gathering before their disease onset, but most of them lived very closely in a 222 community, and the members of the small families had relatively frequent interpersonal contacts, especially 223 during the Chinese Spring Festival (January 25). Case 9 was the first confirmed case of family B, followed by 224 her daughter, case 11. Between January 30 and February 10, as many as 10 members of family B were 225 confirmed as having SARS-CoV-2 infection (Fig. 2 ). None of them had a history of traveling of Wuhan. The epidemiological investigation revealed that 15 members of family A, including cases 1, 2, 3, 5, and 6, 227 had had lunch and supper in a restaurant where case 9 of family B worked, on January 16, 2020 (the first day 228 of the funeral). Investigators had previously suspected an epidemiological link between families A and B, but 229 except for the contact history, they did not find convincing evidence to support this speculation. To find experimental evidence to reconstruct the two transmission events in families A and B, we took 236 multiple molecular epidemiological approaches. First, the phylogenetic tree in Fig. 1 showed that the 13 virus S2). The close phylogenetic relationships and the shared cluster-specific SNP suggested that family A, family 242 B, and case 24 probably had epidemiological links and that the virus that had caused their infections could be 243 traced to a common source. Next, using the genome consensuses, we performed a SNP analysis and identified four inter-individual 245 SNPs, including T11418C, T5473C, C25490T, and C28926T (Fig. 3A, Supplementary Table S2 ). These 246 accumulated SNPs in the virus genome were like "scale marks" on the virus transmission chain. All genome 247 consensuses of the infected cases could be classified into three SNP groups based on these "scale marks" ( To more closely examine the transmission links among family A, family B and case 24 in the Anyang 257 epidemic, based on the 60 Anyang SARS-CoV-2 genome consensuses, we constructed a genome haplotype 258 network (Fig. 4) . The network showed that virus genomes within the same family/community had a short To reveal more hidden evidence, based on the high-throughput sequencing data, we conducted iSNV 267 analysis. We found four iSNVs with low allele frequency, which were either shared by multiple individuals or 268 overlapped with the inter-individual SNPs described above (Fig. 3A, Supplementary Fig S2) . Specifically, 269 T5473C (2.79%) and C25490T (1.60%) were two iSNVs in case 1 of SNP-group A. After inter-individual 270 transmission, they became signature SNPs of SNP-group B cases (Fig. 3A) . Similarly, C28926T initially was 271 an iSNV (22.41%) in case 14 of SNP-group B. After inter-individual transmission, it became a newly added 272 signature SNP of SNP-group C cases (Fig. 3A) . A10286G was an inter-individual iSNV only found in SNP-273 group B cases (no reads covered this site in the NGS data of case 22), but it was lost during transmission ( Fig. 9 , case 9-case 11, case 9-case 14, and case 9-case 22. Of note, we excluded two other inter-individual iSNVs, 277 C15157A and C241T. C15157A was only found in the amplicon-based sequencing data and the quality of the 278 NGS reads covering this site was quite poor, suggesting that it might have been a false-positive result 279 probably caused by sequencing bias or error. C241T was identified as a SNP or iSNV in multiple 280 family/community transmissions and multiple non-clustered cases. According to statistics of the China 281 National Center for Bioinformation-National Genomics Data Center (CNCB-NGDC), C241T rapidly replaced 282 the corresponding wild-type allele in the virus population and became a feature SNP of Clade G since 283 February 2020, and some studies have shown that C241 may confer an advantage to SARS-CoV-2 284 transmission (Chaudhari et al. 2021; Luo et al. 2021) . Considering that C241T was relatively prevalent in the 285 Anyang virus genomes and it could not be ruled out that it arose spontaneously, we excluded it from the 286 family/community-specific variants. The results of the multiple molecular epidemiological approaches, including lineage phylogeny, SNP, 288 genome haplotype network, and iSNV analyses, were highly consistent, providing not only crucial 289 experimental evidence to support the previous speculation that the two families had an epidemiological link, 290 but also an outline of virus transmission. Unexpectedly, we also found the infection source of case 24, which 291 was related to the family B transmission event, especially the cases within SNP-group C. Based on all these 292 analyses, we reason that the two family transmission events and case 24 can be included in a large community 293 transmission, namely community 1, which was also the largest community transmission event in Anyang, 294 accounting for 35% of the infected cases in the COVID-19 epidemic. Based on the new evidence obtained in the molecular epidemiological analyses, we conducted a 297 complementary investigation to reconstruct an elaborate virus transmission process and compared the 298 different characteristics of the two family transmissions. We re-evaluated the possibility that case 8 was the infection source of her family. First, the updated 300 epidemiological information did not support that she arrived at home carrying the virus (Fig. 5 ), although she 301 had a history of traveling to Wuhan. Case 8 had long-hours or high-frequency contact with at least four 302 persons (her boyfriend, a close friend, her grandmother, and a younger female cousin) during January 9 to 14. Her boyfriend traveled back from Wuhan with her and they sat together on the train for several hours. Her 304 close friend companied her to go shopping and dining after she arrived. However, both persons tested RT-305 qPCR negative throughout the outbreak. In contrast, all infected cases were her relatives. Second, as shown in 306 Fig. 5 , the infection confirmation times of case 8 and her two closely contacted relatives (grandmother/case 12 307 and younger female cousin/case 13) were later than those of some other members of family A, which implied 308 that case 8 and her two relatives were probably not among the earliest infected cases in the family. In contrast 309 with case 8, the five persons who had been undertaking a lot of nursing work in the hospital and 310 organizational work at the funeral (cases 1-5), were the earliest confirmed cases and were confirmed around 311 the same time (January 25 and 26), followed by other members of family A. Particularly, case 1 tested which was confirmed by the disease onset record of the epidemiological investigation. Third, the clinical 315 testing results suggested that the case 8 infection was probably transient and mild. The RT-qPCR test result of 316 case 8 was negative even on January 27, the day before her infection was confirmed. In subsequent multiple 317 viral RNA tests, all results were negative. Moreover, case 8 tested negative for IgM and IgG antibodies 318 throughout the epidemic. Collectively, the epidemiological and clinical data did not support that case 8 was 319 the first member to be infected in her family or that she had been persistently shedding virus in her family. Thus, case 8 was likely a recipient of infection rather than a superspreader. Although the source of infection was difficult to determine, the process and pattern of SARS-CoV-2 322 transmission in family A were relatively clear. First, SARS-CoV-2 entered family A before the funeral as the 323 virus was transmitted from family A at the beginning of the funeral. Case 7, who was a close friend of case 1, 324 attended the funeral only on the first two days (January 16 and 17) and had been consoling case 1 during this 325 period, and later got infected with SARS-CoV-2 (Fig. 3A ). Case 9 only had contact with family A members in 326 her restaurant on the first day of the funeral (January 16), and later also got infected (Fig. 3A) . We speculated 327 that the first round of family transmission occurred in the hospital rather than at the funeral. Second, the first 328 round of family transmission was probably due to multiple introductions from the same source. Except for 329 cases 7, 15, and 19, who were infected either at the funeral or after the funeral, the infection confirmation 330 dates of the other family A members were concentrated within one week (January 25 to 31), which suggested 331 that most family A members were probably exposed to the same infection source within a short time. Besides Table S3 ). The genome consensuses of the presumed earliest infected cases (cases 1, 2, 4, and 334 5) carried only three SNPs, including a family-specific SNP (T11418C) and two feature SNPs of lineage A 335 (C8782T and T28144C), and were completely the same. Compared with these four cases, the other infected 336 cases of family A carried one to three additional individual-specific SNPs, and none of the new individual-337 specific SNPs later developed into a family-dominant mutation, which implied that in family A, no long virus 338 transmission chain was formed. Epidemiological and variant analyses showed that SARS-CoV-2 entered family B via case 9, who was 340 infected by a member of family A at her restaurant. Family B members were confirmed as having SARS- CoV-2 infection between January 30 (case 9) and February 11 (case 23), which was a longer period than that 342 in family A, probably because this family had not held large family gatherings. The virus genome variant 343 patterns of family B also supported this transmission character (Supplementary Table S3 ). The two iSNVs 344 with low allele frequency (T5473C and C25490T) in case 1 of family A were transmitted to case 9 and 345 became two SNPs, which were later transmitted to other cases in family B. Likewise, the iSNV C28926T that 346 arose in case 14 was transmitted to the subsequent cases in family B and became a SNP. At the end of virus transmission, which was very different from the virus transmission pattern in family A. A complementary epidemiological investigation showed that case 24 once visited the building where 352 case 21 worked, and they took the same elevator several times. This well explains the close phylogenetic 353 relationship and the same molecular SARS-CoV-2 genome signature between case 24 and family B. Like in community 1, the transmission events in communities 2 and 3 in Anyang also occurred in traditional 382 communities (Fig. 1, Supplementary Fig. S5 ). Obviously, clustered infections or outbreaks are more likely in 383 traditional communities, for which it is more difficult to trace the infection source and clarify the transmission prevention of family and community transmission is key to the prevention of SARS-CoV-2 outbreaks in small 388 and medium-sized Chinese cities. Based on genome consensus sequence alignment, SNPs were identified and used for the molecular 390 epidemiological analysis. Since the onset of the COVID-19 pandemic, researchers have often used SNPs and 391 molecular signatures composed of SNPs to trace outbreaks at different scales. In the Boston epidemic, 392 researchers found that the SARS-CoV-2 genomes related to two superspreading events harbored different 393 SNPs, and on the basis of these molecular signatures, the link between individual clusters and wider 394 community spread was clarified (Lemieux et al. 2020) . In China, a significant example is the Beijing Xinfadi 395 market outbreak. Seventy-two virus genomes from this outbreak were assigned to lineage B.1.1 and shared the 396 same molecular signature, which comprised seven SNPs and was mainly carried by European strains (Pang et 397 al. 2020). Based on this evidence, the strain that caused the market outbreak was considered to have been 398 imported from Europe through food cold-chain logistics. Likewise, three molecular signature patterns related 399 to three community transmission events in Anyang were detected in our study. Like community 1, 400 communities 2 and 3 carried specific SNPs, which constituted their molecular signatures in this study 401 ( Supplementary Fig. S4, S5 ). Although molecular signatures or specific SNP have been widely used to 402 validate the source strain in successive regional COVID-19 outbreaks, iSNVs have rarely been used to this 403 end. A critical issue in the study of early SARS-CoV-2 strains is that the number of SNPs is very low, which 404 in turn limits obtaining sufficient useful information for analyses. To address this problem, we used combined 405 SNP data and iSNV data. Four significant iSNVs (T5473C, C25490T, C28926T, and A10286G) with minor 406 allele frequencies not only provided evidence to clarify the modes of SARS-Cov-2 transmission, but also 407 corroborated the SNP data and epidemiological findings. Therefore, our study showed that iSNV analysis was 408 an effective approach to study family/community transmission and early SARS-CoV-2 strains. Presumed Asymptomatic Carrier 454 Transmission of COVID-19 CoV-2 variants: the pandemics within the pandemic Xinjiang Uygur Autonomous Region, China In-Silico analysis reveals lower transcription efficiency of C241T variant of SARS-CoV-2 464 with host replication factors MADP1 and hnRNP-1 Genomic surveillance of COVID-19 cases in Beijing Investigation of SARS-CoV-2 variants of concern: technical briefings Genomics and epidemiology of a novel SARS-CoV-2 lineage in Manaus Mapping genome variation of 485 SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders Will SARS-CoV-2 variants of concern affect the promise of vaccines? Evolution During Human-to-Human Transmission of SARS-CoV-2 Intra-host variation and 570 evolutionary dynamics of SARS-CoV-2 populations in COVID-19 patients Reemergent Cases of COVID-19 -Dalian City Multiple approaches for 579 massively parallel sequencing of SARS-CoV-2 genomes directly from clinical samples Molecular and serological investigation of 2019-nCoV infected patients: implication of multiple 583 shedding routes Viral and host factors related to the clinical outcome of COVID-19 Dynamics of HIV-1 quasispecies diversity of participants on long-term antiretroviral therapy based on 590 intrahost single-nucleotide variations Jan 16-19: short term attended the funeral with family members Jan 9-10: back home from Wuhan with boyfriend Jan 19-21:two aunts (case 1 and 2) lived with her and grandmother Jan 11-14: shopped with cousin (case 13) and visited a friend several times Jan 11-15: short term visited grandpa (no test) three times with two aunts (case 1 and 2) or parents (case 3 and 6) Jan 10: met parents (case 3 and 6) and stayed with grandmother (case 12) Jan 21: family gathering and dinner Home quarantine with grandmother Jan 27: the first test, negative; her mother (case 6), positive, moderate symptoms Jan 14: one aunt (case 1) developed symptoms Jan 20-21: another aunt (case 2) and grandmother (case 12) developed symptoms Jan 23-24: parents (case 3 and 6) developed symptoms Jan 28: the second test, positive, asymptomatic Jan 30: grandmother (case 12), positive, mild symptoms Jan 31: cousin (case 13), positive, asymptomatic Jan 25: two aunts (case 1 and 2) and father (case 3), viral RNA positive, moderate symptoms Phase 1 (no symptoms, no test): Back home Close contacts and their health status: 1 boyfriend: no symptoms, test negative later; 2 grandmother (case 12): no symptoms; 3 cousin (case 13): no symptoms; 4 friend: no symptoms, test negative later; 5 parents: no symptoms.Phase 2 (no symptoms, no test): Gathering with family at the funeral and other activities Close contacts and their health status: 1 one aunt (case 1): developed symptoms; 2 another aunt (case 2) and grandmother (case 12): developed symptoms after case 1; 3 parents (case 3 and 6): developed symptoms after case 2 and 12.