key: cord-0888679-0cb2q14u authors: Novoa, Beatriz; Ríos-Castro, Raquel; Otero-Muras, Irene; Gouveia, Susana; Cabo, Adrián; Saco, Amaro; Rey-Campos, Magalí; Pájaro, Manuel; Fajar, Noelia; Aranguren, Raquel; Romero, Alejandro; Panebianco, Antonella; Valdés, Lorena; Payo, Pedro; Alonso, Antonio A.; Figueras, Antonio; Cameselle, Claudio title: Wastewater and marine bioindicators surveillance to anticipate COVID-19 prevalence and to explore SARS-CoV-2 diversity by next generation sequencing: One-year study date: 2022-04-11 journal: Sci Total Environ DOI: 10.1016/j.scitotenv.2022.155140 sha: 9d49d0602077fe8b43477ebb67877eadb9049385 doc_id: 888679 cord_uid: 0cb2q14u This study presents the results of SARS-CoV-2 surveillance in sewage water of 11 municipalities and marine bioindicators in Galicia (NW of Spain) from May 2020 to May 2021. An integrated pipeline was developed including sampling, pre-treatment and biomarker quantification, RNA detection, SARS-CoV-2 sequencing, mechanistic mathematical modeling and forecasting. The viral load in the inlet stream to the wastewater treatment plants (WWTP) was used to detect new outbreaks of COVID-19, and the data of viral load in the wastewater in combination with data provided by the health system was used to predict the evolution of the pandemic in the municipalities under study within a time horizon of 7 days. Moreover, the study shows that the viral load was eliminated from the treated sewage water in the WWTP, mainly in the biological reactors and the disinfection system. As a result, we detected a minor impact of the virus in the marine environment through the analysis of seawater, marine sediments and, wild and aquacultured mussels in the final discharge point of the WWTP. The world is experiencing a pandemic caused by the virus SARS-CoV-2, firstly detected in December 2019 in China (WHO, 2020) . The rapid spread of the virus has evidenced the need for the development of tools to massively detect the presence of the virus in local communities that, combined with individual screening methods, would contribute to SARS-CoV-2 surveillance. Diagnostic methods based on qRT-PCR assay to amplify the genetic material of the virus were set up to confirm the disease in patients and asymptomatic population (Corman et al., 2020) . Various studies confirmed the presence of SARS-CoV-2 in feces of patients with COVID-19 (Chen et al., 2020; Wu et al., 2020; Xiao et al., 2020) . As a result, the genetic material of the virus can be detected in the wastewater in varying concentrations, from 20 to 3×10 6 copies/L (Foladori et al., 2020; Lastra et al., 2022 ) depending on number of infected individuals, viral load in feces and dilution in the wastewater. the community. The viral load in sewage water was used to detect COVID-19 outbreaks and track the evolution of the infected population (Larsen & Wigginton, 2020) , with the possibility to detect hot spots sampling the sewage network or monitor specific areas, facilities, or schools (Fielding-Miller et al., 2021; Gibas et al., 2021; Haak et al., 2022) and different studies confirmed wastewater monitoring as a convenient complementary approach to COVID-19 surveillance and testing strategies (Peccia et al., 2020; Folarori et al., 2020; Kitajima et al., 2020; Randazzo et al., 2020) . After increasing evidences of the capacities of sewage surveillance, the European Commission has highlighted the importance of the surveillance of SARS-CoV-2 and its variants in wastewaters as a rapid and cost-effective analysis, as well as a reliable source of information about the transmission of SARS-CoV-2 in the population (European Commission, 2021) . Wastewater monitoring should be considered as a complementary tool for surveillance of SARS-CoV-2 and other pathogens in the population. The protocols for detection of the genetic material in wastewater and its quantification have been increasingly optimized (Ahmed et al., 2022) since the beginning of the pandemic. The screening of SARS-CoV-2 variants in complex matrices, such as urban wastewaters, can be very challenging (Martin et al., 2020; Crits-Christophet al., 2021; Graber et al., 2021; La Rosa et al., 2021) . However, multiple efforts were made during 2021 to consolidate the sequencing of SARS-CoV-2 ARN in wastewater samples, and its usefulness for detecting variants in circulation has been seen (Islam et al., 2021; Izquierdo-Lara et al., 2021; Hillary et al., 2021; Fontenele edt al., 2021) . Different studies have applied next-generation sequencing approaches to the viral RNA obtained from the sewage samples during the wastewater surveillance protocols. Many of these sequencing studies have been benefited from the ARTIC Network (https://artic.network/ncov-2019) amplicon library, which amplifies the whole viral genome with 400 bps amplicons as a previous step before sequencing. This approach works correctly with sewage samples if the genetic material is not degraded below the size threshold. Sewage sequencing protocols have allowed the detection of SARS-CoV-2 variants of concern in a certain area before these results were obtained in clinical samples (Jahn et al., 2021) . Sequencing of sewage samples can be used as well for detecting novel SNVs (single nucleotide variants) or recent J o u r n a l P r e -p r o o f Journal Pre-proof mutations that are not yet widespread and take longer to be detected in clinical samples (Fontenele et al., 2021) . The mutations that arise from the analysis of sewage samples can be compared with clinical samples from the same regions in order to reveal if there are regionally prevalent variants or if some variants are being imported to a concrete region (Martin et al., 2020; Crits-Christoph et al., 2021) . In this paper we present a methodology developed for sewage water surveillance of SARS-CoV-2 in Galicia (NW of Spain) through 11 wastewater treatment plants (WWTPs) located in medium-sized municipalities with no discharge from hospitals. The pipeline of the method, illustrated in the graphical abstract, integrates: wastewater sampling in the WWPTs of study (at the inlet, effluent and final discharge points), sampling from marine sediment and bioindicators, pre-treatment and biomarker quantification, RNA detection by RT-qPCR, sequencing of SARS-CoV-2 in wastewater samples, data management through a digital platform and epidemics forecasting by a predictive mechanistic model. This work provides new evidence of wastewater RT-qPCR analysis being a reliable method for early detection of SARS-CoV-2 outbreaks in the community as it has been validated in previous studies (Randazzo et al., 2020) and, in combination with the data reported by the health system, for monitoring and prediction of the evolution of the pandemics at the municipal-level. This work contributes a number of relevant distinguishing features with respect previously published studies, from which we highlight the following aspects: the evaluation of the fate of the virus in wastewater and marine environments, the assessment of the efficiency of the WWTPs in the removal of the genetic material of the virus, and the development of a mechanistic model that, combining data from the health system shows predictive capacity in forecasting the evolution of the pandemics at the municipality level. Besides, the integral approach including the detection of variants from wastewater samples is a significant source of information for the monitoring of the impact of the pandemic. The detection of the genetic material of the SARS-CoV-2 virus was carried out in wastewater treatment plants (WWTP) in Galicia (NW Spain) during one year, from May 2020 to May 2021 ( Figure 1 ). Eleven WWTPs with different treatment technologies were selected from medium-sized municipalities with a connected population between 2000 to 23000 inhabitants ( Table 1 ). The selected plants had no wastewater discharge from hospitals that could increase the viral load in the wastewater, falsifying the measured viral load in the sewage water generated by the infected population. Some samples from Nigrán and Baiona WWTPs (Figure 1 ), collected in March and April, 2020, were also analyzed for the presence of SARS-CoV-2 genetic material. Four additional WWTPs were included in the study in June-August, 2020 upon request of regional health authorities to help in the control of COVID-19 outbreaks (these WWTP were identified as A, B, C and D in Figure 1 ). Sampling was carried out in the WWTPs as well as in the discharge point in the marine environment. In the WWTPs, 1 or 2 samples per week were taken in the raw water in the inlet stream to the plant (identified as M1), the effluent of the secondary settling tank (M2) and the final discharge effluent after disinfection (M3). Samples of thickened sludge were also collected (M5) to determine if the increasing solid concentration in the sludge would allow for more sensitive detection of the genetic material of the virus. In the marine environment, samples were taken every two weeks in the final discharge point, and identified as follows: samples of seawater (M4), marine sediment (BIOIND-S) and bioindicators: wild mussels (BIOIND) and aquaculture mussels (BIOIND-A). The authors paid special attention to the detection of SARS-CoV-2 in mussels due to their extraordinary capacity for filtering water and the possibility of concentrating viral RNA, increasing the probability of the detection of the viral genetic material. Samples M1-M3 were collected in the WWTPs as 24 h-composite samples using an automatic sampling system (Teledyne ISCO, model 3700 full size, USA). M4 samples were collected in the discharge point of the WWTP, 1 m below the sea surface. M5 samples were collected as grab sample directly from the sludge settling tank. 1 L of all the samples (M1-M5) were collected and stored in amber glass bottles and kept at 4°C during storage and transportation to the laboratory (maximum 24 h). In the laboratory, the samples (M1-M5) were filtered through a 20-25 μm cellulose filter (Whatman®-Grade 4) to remove coarse suspended particles. The filtered sample was divided into two aliquots: one for wastewater concentration, RNA isolation, and detection by qPCR; and the other aliquot for biomarkers determination by liquid chromatography. Caffeine (1,3,7-trimethylxanthine) was selected as a biomarker in the sewage water for each WWTP included in this study. The biomarker was used to normalize the viral load variability due to dilution and floating population effects. Caffeine was determined by HPLC using an Agilent 1260 Infinity II (Agilent Technologies, CA, USA) chromatograph equipped with a gradient quaternary pump, oven, autosampler and a UV-vis diode array detector. 10 mL of the filtered wastewater samples were spiked with 4 μg of caffeine and then determined by HPLC as described by Moret et al. (2012) . The spiked caffeine samples were eluted at 1.0 mL·min -1 and separated on a 4.6 x 100 mm (4 μm) Poroshell 120EC-C18 column at 25°C. The mobile phase consisted of 0.1% phosphoric acid aqueous solution (eluent A) and acetonitrile (eluent B). Elution was held for 6 min in isocratic mode, using 80% of eluent A and 20% of eluent B. 50 μL injection volume was used and caffeine was detected in the UV range at 274 nm. All the samples were analyzed in triplicate to assure the precision and reproducibility of the results with a coefficient of variation below 10%. (2019), was selected because of better quality and quantity of RNA obtained. Briefly, 150 mL of water was transferred into a beaker of 200 mL and 75 L of the enveloped rhabdovirus SVCV (10 5 TCID 50 /mL) (an enveloped RNA virus) was inoculated to each water sample as a concentration control. The SVCV virus was selected because it is an enveloped virus like the SARS-CoV-2, and it is frequently used in our research group. The pH of each sample was adjusted to 6.0 and 0.9 N AlCl 3 solution was added to the sample at 1:100 ratio. The pH was again readjusted to 6.0 and samples were mixed at room temperature in an orbital shaker at 150 rpm during 15 min. Then, samples were centrifuged at 1700g for 20 min. in a Sorval ST Plus Series centrifuge (Thermo Scientific, USA) and the pellet was resuspended in 10 mL of 3% beef extract at pH 7.5 and transferred to 15 mL centrifuge tubes (Falcon). Samples were mixed again at room temperature in an orbital shaker at 200 rpm for 10 min. and centrifuged at 1900g for 30 min. in the Allegra™ X-22R Centrifuge (Beckman Coulter). Finally, the pellet was resuspended in 1 mL 1X PBS and samples were stored at -20ºC until RNA isolation. Three procedures were used to extract RNA from the concentrated sewage water, seawater, marine sediment and bioindicator (mussels) samples. The concentration and purity of the isolated RNA were determined by a NanoDrop™ 1000 spectrophotometer (NanoDrop Technologies, Inc., DE, USA). RNA was kept at -80ºC until further use. Before the SARS-CoV-2 detection, the viral concentration methodology was validated by quantitative PCR using the SVCV primers N-SVCV-For and N-SVCV-rev of SVCV rhabdovirus ( Table 2 ). The results of this validation have been detailed in the Supplementary material S1. SARS-CoV-2 RNA was detected by TaqMan real-time RT-PCR using the kit GoTaq® Probe 1-Step RT-qPCR System, 12.5 mL (Promega, Winsconsin, US). Two oligonucleotide primers and probes designed by the US Centers for Disease Control and Prevention (CDC) were used to target two regions of the nucleocapsid gene (N), as well as the primers and probe that target the gene E (Table 2 ). For each target gene, each sample was analysed by qPCR in two technical replicates (6 qPCRs per sample) using a StepOnePlus Real-time PCR System (Applied Biosystems, USA). Each reaction mix (20 µL) contained 10 µL GoTaq® Probe qPCR Master Mix with dUTP (2X), supplemented with CXR Reference Dye, 30 µM according to manufacturing protocols, 0.4 µL GoScript™ RT Mix for 1- Step RT-qPCR, 1 µL of each primer pair (10 µM), 0.5 µL probe (10 µM), 2.1 µL of Ultrapure™ distilled water (Invitrogen) and 5 µL of each RNA sample. The thermal cycling conditions were 45ºC for 15 min. followed by a preheating at 95ºC for 2 min. and 45 cycles amplification at 95ºC for 3 s, and 55ºC for 30 s. Standard curves were added on each qPCR plate. They were composed by seven Five samples of interest were selected for ARTIC amplicon-based sequencing of SARS-CoV-2 in order to identify SARS-CoV-2 variants. Sewage samples were selected based on qPCR Cts for genes N and E below the threshold of 30. The only exception was a sample selected based on its nature (sampled from a Mink farm in which the propagation of the virus to the animals was reported). This sample was characterized by lower Cts but it was also included. The second criterion was the sampling time. Samples were selected from the months of December 2020 and January 2021, which were characterized by the maximum levels of viral spread in Galicia (NW Spain). Selected samples belong to the WWTP in the towns of Baiona, Melide and Noia ( Figure 1 ). Starting from the concentrated samples and the RNA extraction explained in sections "2.4. RNA isolation" and "2.5. SARS-CoV-2 detection by qPCR assay", selected samples were submitted to J o u r n a l P r e -p r o o f reverse transcription to cDNA using the High-Capacity cDNA Reverse Transcription Kits (Applied Biosystems) following the manufacturer instructions. The obtained cDNA was amplified using the ARTIC primers set V.3. (Integrated DNA Technologies, USA) developed by the ARTIC Network and which cover the entire genome of SARS-CoV-2 with amplicons of 400 bps (https://github.com/articnetwork/artic-ncov2019). Two PCR reactions were performed for every sample, each one with one Pool of ARTIC primers. PCR reactions consisted in 12 µL of DreamTaq Master Mix, 4 µL of the respective ARTIC Primer Pool (10 µM), 2.5 µL cDNA and 6.5 µL of Nuclease free water for a total reaction volume of 25 µL. Cycling conditions were 1x (98ºC, 30s), 30x (98ºC, 15s; 65ºC, 5min). After this, PCR products were merged for each sample and submitted to purification. Samples quality and concentration were measured and sequencing libraries were prepared using the Nextera XT DNA Library Preparation Kit (Illumina). Finally, amplicon sequencing libraries were sequenced in an Illumina NovaSeq6000 equipment (Macrogen, Korea). Reads were quality-checked using the FastQC tool on Omics Box (Biobam; https://www.biobam.com/omicsbox). Paired reads were trimmed in the QIAGEN CLC Genomics Workbench 20.0 (https://digitalinsights.qiagen.com/). Quality trimming was performed removing the first 20 nts of each read and applying a quality threshold of 0.01 (Phred=20). Nextera adapter sequences were trimmed as well. Trimmed reads were mapped against a SARS-CoV-2 reference genome (ID: NC_045512). Mapping parameters were set to Match score = 1, Mismatch cost = 2, Insertion cost = 3, Deletion cost = 3, Length Fraction = 0.5, Similarity Fraction = 0.8. Average mapping coverage was calculated for each sample based on the mapped reads sequencing depth (considering the bases of the mapped reads and the bases that conform the reference genome). Mapped reads tracks were generated and analyzed for variant detection using the basic variant detection tool of the CLC Genomics Workbench 20.0.1. The variant analysis parameters were set to Ploidy = 1, Nonspecific read matches were ignored, Minimum coverage = 10, Minimum variant reads J o u r n a l P r e -p r o o f count = 2, Minimum frequency = 5%. Frequency was calculated considering the number of reads carrying the mutation in comparison with every read covering the position. Only mutations that caused changes in the protein sequence were considered for further analyses. Known mutations that define variants of current concern were retrieved from the Covariants database (CoVariants.org). Moreover, novel mutations non-defined in previously described SARS-CoV-2 variants of concern were retrieved from the sequencing data with stricter filters (Minimum coverage = 100, Quality > 30, Variant reads > 10 and Frequency > 5%). SARS-CoV-2 genomes sequenced from Galician patients were downloaded from the GISAID database (Elbe and Buckland-Merrett, 2017) , from the beginning of the pandemic (December 2020) until the end of this study (27/04/2021). A total of 500 genomes were downloaded and used for further analyses. Mutations were analyzed in comparison to the Wuhan reference using the Nextclade tool (https://clades.nextstrain.org). The frequency of each mutation was calculated considering the number of genomes in which each mutation appeared compared to the total number of genomes. A comparative analysis was performed to study which mutations where shared between Galician clinical samples and the studied sewage samples. Genomes were also downloaded from GISAID for clinical samples collected in Portugal and the Spanish Autonomous Regions during the months of December 2020 and January 2021 (a total of 865 and 1,862 genomes respectively were considered). The same comparative analysis between clinical and sewage samples in the same winter period was performed. The online tool covidcg (covidcg.org) was used in order to determine the worldwide distribution of several mutations found in the sewage samples. Specifically, mutations that were not shared with clinical samples, i.e. mutations exclusively identified in sewage and not related to specific variants of interest so far, were analyzed using this tool. The municipalities included in this study are small-medium size, between 2000 to 23000 inhabitants (Table 1) . For this population range, the effects of stochasticity in the dynamics are significant, and therefore they need to be taken into account. In a stochastic regime, the dynamics corresponding to Model calibration: The model calibration was performed using two different time course data sets (viral number of copies in wastewater and number of infected persons reported by the health system). Data were partitioned in periods of maximum 7 days in which the restriction levels can be considered to be constant. The model has only two parameters for calibration. We obtained 4 parameter vectors, corresponding to 4 different levels of restrictions. Interestingly, there were no significant variations among the parameters among the different WWTPs. Process variability monitoring: In order to assess the variability of the process due to the effects of dilution and floating population, we monitor two auxiliary variables: the concentration of caffeine in the sample (previously reported as a reliable biomarker), and the daily cell-phone mobility data for the municipalities in the study (provided by Nommon Solutions and Technologies, Spain). Sampling and analysis of SARS-CoV-2 in 11 WWTP and their marine environment were carried out from May 2020 to May 2021. Besides, 3 additional samples from Nigrán (M1 in March 12, 2020) and Baiona WWTP (M1 and M2 samples in April 28, 2020) were also included in this study. Overall, 1342 samples (sewage water, seawater, marine sediment and mussels) were taken and analyzed. It (Table 3) . The viral load was decreasing during Nov-2020 as the 3 rd wave active cases decreased in the area. During Dec-2020, the active cases remained constant in the region, but the viral load in M1 (raw water in the WWTP inlet) was steadily increasing until the end of the month (29 Dec 2020) when the viral load was about 20 times higher than that in the beginning of the month. These results anticipated the explosion of COVID-19 cases in the early January 2021 and the 4 th wave. March-May 2021 (with ~5% to ~30 % of fully vaccinated people, respectively) but the decreasing of new and active cases did not correspond with the reduction of the viral load in the WWTP raw water. The presence of the viral RNA in the marine environment was mainly limited to the discharge points of the Cambados and Muros WWTP (Table 4 ). The viral RNA was not detected on marine sediment; it was only detected in mussels (bioindicator) and seawater. It is believed that the presence of the viral RNA in the marine environment was associated with uncontrolled discharge points and wastewater bypass in the plants. Moreover, the viral RNA was detected mainly in the marine environment in July-November in 2020 but not in January 2021 when the highest viral loads (over 10 6 copies/mL) were measured in the raw water to the WWTP during the 4 th COVID-19 wave. As we will discuss later, here it is important to remark that RT-qPCR detection cannot be used to assess infectivity. The mathematical model developed in this study (Model-1) shows a good predictive capacity allowing us to forecast the evolution of the pandemics (number of infected persons, observed and unobserved by the health system) in all the municipalities within a horizon of 7 days. In Figure 4 we include, as representative illustration of the model outcome, the predictions for three different municipalities (Ares, Melide, Baiona) at different time periods. The figures, data and software codes for the whole study are available online at https://github.com/DIMCoVAR/Model-1. The model predictions (mean and standard deviations) for total number of infected (blue) and observed number of infected (black) are depicted in Figure 4 together with the real data obtained from wastewater samples (blue circles) and health system (black squares). From the 10 5 realizations of the SSA algorithm, the mean (solid blue line) and standard deviation (dotted blue line) are computed. Note that, as expected from a stochastic process, the confidence interval increases with time, making longer term predictions impractical, as it has been reported in previous studies (Castro et al, 2020) . Sequencing process allowed us to obtain about 18 million reads per sample. After the trimming, more than 17 million reads were retained for further analyses in all the samples. Trimmed reads were mapped to the SARS-CoV-2 reference genome (ID: NC_045512), being possible to obtain a successful number of reads of SARS-CoV-2 in 4 out of 5 sequenced sewage samples. It is noteworthy that the sample in which an insufficient coverage of reads was obtained was the worst sample in terms The analysis of variants displayed an important number of mutations contained in our samples, ranging from 68 to 100 single amino acidic changes (Table 5) . Among all these mutations, it is worth mentioning the presence of non-synonymous nucleotide changes that have defined new virus lineages of concern. These mutations, as well as the lineages that they define are displayed in table 5. Specifically, mutations of Spike (A222V) and N gene (A220v) that are defining mutations of 20E (EU1) clade, initially expanded in Spain and spread widely across Europe, were detected in a very high frequency (88%-100%) in Noia (929) and Melide (963 and 996) samples ( Figure 5A ). Moreover, mutations defining another clade spread across Europe and named as 20A.EU2 have been also found in two samples, Noia (929) and Baiona (989). These defining mutations are located in the ORF1b (V767L) and Spike (S477N) and showed a frequency of 10%-20% ( Figure 5A ). The analysis allowed to find as well other important mutations of Spike that determine relevant lineages in terms of virus spread: N501Y (mutation shared by B. 1.1.7, B.1.351 and P1 lineages, commonly known as UK, South African and Brazilian variants respectively) and D614G (shared by all the new variants of concern). In fact, this last mutation is one of the most frequently represented as well as showing the J o u r n a l P r e -p r o o f Journal Pre-proof highest read coverage. These mutations were detected in the 989 sample from Baiona. In addition to the aforementioned mutations shared by several lineages of clinical concern that have been appeared throughout the pandemic, two additional mutations characteristic of the UK variant (B.1.1.7) were found in two different samples, 989-Baiona and 996-Melide. More specifically, S:A570D and ORF8:Q27* that are represented in the set of reads with a frequency of 17% and 40% respectively ( Figure 5A ). After identifying the known mutations defining linages of concern that were found in the sewage samples, a comparative analysis was performed with viral genomes from patients. A set of SARS-CoV-2 genomes from 500 Galician patients collected since the beginning of the pandemic until the end of April 2021 were analyzed in order to pool the set of mutations present in the population. Figure 6B shows the most common mutations (those that appear at least in 10 out of 500 genomes) present in the population that were associated with the aforementioned lineages of concern. Note that some of the most common mutations from sewage, are also found in patients (S:A222V, S:N501Y, S:A570D, S:D614G, ORF8:Q27* and N:A220V) ( Figure 5A ). The information associated with viral variants in the sewage samples uncovers the mutation pool of the overall population, as seen with the matching results. We have been able to find relatively high number of amino acidic variations that until now have not been associated with relevant variants as those presented above. A total number of 87 of these novel mutations were identified in sewage samples, with the employed filtering criteria ( Figure 6A ). Some of these mutations showed remarkably high frequencies and were found in several samples, as is the case of ORF1a:A656V, ORF1b:P314L, ORF3:W45L or N:A308S ( Figure 6A ). The most relevant case, ORF1b:P314L was also found in the Galician patients in a very high frequency ( Figure 6A) , which, together with the previous results, shows a remarkable correlation between sewage samples and patients from the same geographic area. Certainly, many of the mutations appear at low frequencies. In order to determine if the sample size of 500 Galician genomes was not large enough, the same analysis was performed with genomes from the J o u r n a l P r e -p r o o f rest of Spain and Portugal. After the comparison, it became clear that the Galician genome database represents quite accurately the mutation pool of the Iberian Peninsula, since the results are practically identical to those already described. Focusing on the less frequent mutations, Spike mutations not associated with variants of concern and which appeared in sewage samples and not in patients, were analyzed in detail. The places and dates in which they had been detected were analyzed (Supplementary material S5, Figure S4 ). Despite that these mutation had not been detected in Galicia until the present sewage samples, it was seen that S:D215H was a relatively old mutation with a great presence especially in the UK. The other Spike mutations previously undetected in Galicia are much more recent and their number of worldwide sequences is still starting to rise since the beginning of 2021. Sewage samples can be useful to detect new mutations that in the future may be of great importance in clinical samples as well. The pandemic of the SARS-CoV-2 spread rapidly in Europe since February 2020 with a significant impact in Spain. The restrictive policies upon the population issued by the government resulted in a minimum impact of COVID-19 during May-June 2020. Therefore, at that moment the detection and control of the new outbreaks was of utmost importance for the health authorities. The analysis of the sewage water for the virus and its genetic material achieved widespread importance (Ahmed et al., 2020; Randazzo et al., 2020; Cervantes-Avilés et al., 2021) . In that context, our results confirmed the capacity of the sewage water surveillance to follow the evolution of the pandemic in the studied community, through SARS-CoV-2 sewage monitoring of a number of representative municipalities (medium sized and without discharges from hospitals). In addition, and in contrast to other research conducted on the presence of SARS-CoV-2 in sewage, our study aimed also at exploring the detection of the virus in the marine environment and the capacity of the WWTP to remove the virus (Galicia is well known for its fish, shellfish and aquaculture activities). Our results confirmed the capacity of the J o u r n a l P r e -p r o o f biological reactors and the disinfection system in WWTP to eliminate the virus (Balboa et al., 2021; Serra-Compte et al., 2021) . The impact in the marine environment was of minor importance and the detection of the virus in seawater and wild and aquacultured mussels can be associated to uncontrolled discharge of wastewater and alleviation from the sewage network. In terms of bioindicators, recently Le Guernic et al. (2022) also detected the presence of SARS-CoV-2 genetic material in zebra mussels exposed to raw and treated wastewater confirming these mollusks as good environmental indicators that can be used for active surveillance of pathogenic microorganisms in environmental waters. It is important to remark here that, the RT-qPCR analysis in this work targets specific RNA gene sequences and it cannot be used to assess infectivity. However, several studies exploring SARS-CoV-2 stability and replication in the environment supported that, after wastewater treatment, the release of infective viral particles is unlikely (Rimoldi et al., 2020; Westhaus et al., 2021) . We observed that the average viral load signal over all the WWPTs peaked before the health system To the best of our knowledge, this is the first study in which wastewater samples and data from the health system are combined to successfully predict the number of active cases of COVID-19 using a mechanistic mathematical model. The inclusion of wastewater samples endows the model with higher anticipative capacity. Moreover, the model is demonstrated to be robust, and therefore easily extrapolated to other WWTP of municipalities in the same population range (the low number of parameters of the model makes very easy to re-calibrate the model for its use in very different locations and or scenarios). As key factors for the success (predictive capacity) and robustness of the J o u r n a l P r e -p r o o f model, we highlight the following: i) the evolution of the pandemics is evaluated locally at the level of municipalities, ii) we take into account the effects of stochasticity, such that the confidence intervals for the predictions are automatically generated, iii) the predictions are made for time horizons of 7 days. As the pandemic progressed, the occurrence of mutations and new virus lineages caused much concern due to the increased infectivity of the virus (Korber et al., 2020; Plante et al., 2021; Volz et al., 2021a) . As this fact conditions the control of the pandemic, being able to find mutations of interest as early as possible using sewage seems very interesting. The most advanced mass sequencing methods have allowed to obtain the sequence of the virus in several countries in the world (Crits-Christoph et al., 2021; Fontenele et al., 2021; Jahn et al., 2021) . Despite the difficulty of sequencing given the high degradation of the sample and the fragmentation of the viral genome, it is possible to detect specific mutations that provide guidance on the variants that may be present in the population. In the present work, linage-defining mutations as well as shared mutations by the most infective lineages acknowledged at the time of study, were identified. The constant emergence of new viral variants casts doubts on whether a mutation will invariably define a single lineage. However, sequencing results allow the identification of specific mutations compatible with arising lineages of concern at the time the samples were taken. Specifically, some mutations defining the B.1.1.7 lineage at the time of sequencing were identified. This variant, also known as Alpha variant, spread worldwide by the end of 2020 (Volz et al., 2021b) , a time compatible with the detection of some of these mutations in our samples dated December 2020/January 2021. Moreover, it is important to highlight that two of the most frequent mutations with high coverage in our analysis are ORF1b:P314L and S:D614G, that are commonly found to be co-occurring and that can greatly increase the infectivity of the virus (Ogawa et al., 2020) . Remarkably, methods based on RT qPCR for mutation monitoring have been developed in order to detect and monitor specific SARS-CoV-2 variants (see for example Lee et al., 2021 , Wurtzer et al., 2022 . Furthermore, it could be evidenced in this work a marked agreement between the most frequent mutations detected in sewage and patients of the same geographical area, which constitutes another J o u r n a l P r e -p r o o f Journal Pre-proof evidence of the potential of sewage analysis to represent the evolution of the pandemic. Novel mutations are appearing continually. In sewage samples we have detected mutations which appeared recently and had not been detected in the geographical area of study. Novel mutations that the virus is adopting may end up having great relevance, so early detection in sewage could be very useful. This study confirmed that the analysis of the genetic material of SARS-CoV-2 in the sewage water in the WWTP inlet stream is a sensitive and practical method to detect new outbreaks of COVID-19 and to evaluate the evolution of the pandemic in the community. As part of the integral approach for SARS-CoV-2 surveillance, this study developed a mechanistic model to predict the number of infected people in municipalities of small-medium size based on the viral load in the sewage water and the data of infections from the public health system. Our results confirm that the model is robust and has predictive capacity, being capable of forecasting the evolution of the pandemics in the municipalities time horizon of seven days. We highlight as key aspects for the predictive capability and robustness of the model that i) is local (municipality-based), ii) it takes into account the stochastic nature of the process. Moreover, the analysis of variants showed a marked agreement between the most frequent mutations detected in sewage and patients of the same geographical area. The treatment of wastewater in biological reactors and the subsequent disinfection favors the elimination of the virus from treated sewage water. The study of the presence of virus in the marine environment in the discharge point of the WWTP revealed a minor impact of the SARS-CoV-2 in seawater, marine sediment and, wild and aquacultured mussels. Foundation. The authors also thank Aguas de Galicia and Consellería de Sanidade -Xunta de Galicia for their support and funding of this study. The IIM-CSIC group is also funded by projects The turning point and end of an expanding epidemic cannot be precisely forecast Approaches applied to detect SARS-CoV-2 in wastewater and perspectives post-COVID-19 The presence of SARS-CoV-2 RNA in the feces of COVID-19 patients Gastrointestinal Manifestations of SARS-CoV-2 Infection and Virus Load in Fecal Samples From a Hong Kong Cohort: Systematic Review and Meta-analysis Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Genome Sequencing of Sewage Detects Regionally Prevalent SARS-CoV-2 Variants Data, disease and diplomacy: GISAID's innovative contribution to global health HERA Incubator: Anticipating together the threat of COVID-19 variants Wastewater and surface monitoring to detect COVID-19 in elementary school settings: The Safer at School Early Alert project SARS-CoV-2 from faeces to wastewater treatment: What do we know? A review High-throughput sequencing of SARS-CoV-2 in wastewater provides insights into circulating variants Implementing building-level SARS-CoV-2 wastewater surveillance on a university campus Stochastic Simulation of Chemical Kinetics Near real-time determination of B.1.1.7 in proportion to total SARS-CoV-2 viral load in wastewater using an allele-specific primer extension PCR strategy Spatial and temporal variability and data bias in wastewater surveillance of SARS-CoV-2 in a sewer system Monitoring SARS-CoV-2 in municipal wastewater to evaluate the success of lockdown measures for controlling COVID-19 in the UK Molecular Epidemiology of SARS-CoV-2 in Diverse Environmental Samples Globally Monitoring SARS-CoV-2 Circulation and Diversity through Community Wastewater Sequencing, the Netherlands and Belgium Detection of SARS-CoV-2 variants in Switzerland by genomic analysis of wastewater samples SARS-CoV-2 in wastewater: State of the knowledge and research needs & Montefiori, D. C, 2020. Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Rapid screening for SARS-CoV-2 variants of concern in clinical and environmental samples using nested RT-PCR assays targeting key mutations of the spike protein Tracking COVID-19 with wastewater SARS-CoV-2 detection in wastewater as an early warning indicator for COVID-19 pandemic. Madrid region case study Evaluating recovery, cost, and throughput of different concentration methods for SARS-CoV-2 wastewater-based epidemiology Quantitative SARS-CoV-2 alpha variant B. 1.1. 7 tracking in wastewater by allele-specific RT-qPCR First evidence of SARS-CoV-2 genome detection in zebra mussel (Dreissena polymorpha) Zebrafish larvae are unable to mount a protective antiviral response against waterborne infection by spring viremia of carp virus Tracking SARS-CoV-2 in Sewage: Evidence of Changes in Virus Variant Predominance during COVID-19 Pandemic Presence of SARS-Coronavirus-2 RNA in Sewage and Correlation with Reported COVID-19 Prevalence in the Early Stage of the Epidemic in The Netherlands Simple and Fast Methods Based on Solid-Phase Extraction Coupled to Liquid Chromatography with UV Detection for the Monitoring of Caffeine in Natural, and Wastewater as Marker of Anthropogenic Impact The D614G mutation in the SARS-CoV2 Spike protein increases infectivity in an ACE2 receptor dependent manner Transient hysteresis and inherent stochasticity in gene regulatory networks Measurement of SARS-CoV-2 RNA in wastewater tracks community infection dynamics Detection Of Genomic Variants Of SARS-CoV-2 Circulating In Wastewater By High-Throughput Sequencing A new mathematical model for relative quantification in real-time RT-PCR Spike mutation D614G alters SARS-CoV-2 fitness Making waves: Wastewaterbased epidemiology for COVID-19 -approaches and challenges for surveillance and prediction Interlaboratory Comparative Study to Detect Potentially Infectious Human Enteric Viruses in Influent and Effluent Waters SARS-CoV-2 RNA in wastewater anticipated COVID-19 occurrence in a low prevalence area Quantification on the LightCycler Presence and infectivity of SARS-CoV-2 virus in wastewaters and rivers Elimination of SARS-CoV-2 along wastewater and sludge treatment processes Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England Detection of SARS-CoV-2 in raw and treated wastewater in Germany -Suitability for COVID-19 surveillance and potential transmission risks Naming the coronavirus disease (COVID-19) and the virus that causes it Omicron in wastewater settled solids using mutation-specific assays is associated with regional detection of variants in clinical samples Prolonged presence of SARS-CoV-2 viral RNA in faecal samples Evaluation of lockdown effect on SARS-CoV-2 dynamics through viral genome quantification in waste water SARS-CoV-2 genome quantification in wastewaters at regional and city scale allows precise monitoring of the whole outbreaks dynamics and variants spreading in the population Infectious SARS-CoV-2 in Feces of Patient with Severe COVID-19 Author Contributions Statement Conceptualization, Methodology, Resources, Supervision, Writing -Original Draft, Writing -Review & Editing Raquel Ríos-Castro: Methodology, Investigation Conceptualization, Methodology, Writing -Original Draft, Writing -Review & Editing Susana Gouveia: Investigation, Resources Adrián Cabo: Investigation, Resources Formal analysis, Investigation Formal analysis, Investigation Methodology, Formal Analysis, Software, Validation, Writing -Review & Editing. Noelia Fajar: Data curation, Formal Analysis, Investigation, Validation. Raquel Aranguren: Investigation Lorena Valdés: Investigation Pedro Payo: Conceptualization, Resources, Funding acquisition Alonso: Conceptualization, Methodology, Funding acquisition Conceptualization, Formal analysis, Investigation, Funding acquisition, Resources, Supervision, Writing -Original Draft, Writing -Review & Editing. J o u r n a l P r e -p r o o f 20E (EU1 20E (EU1) -Spain 20I/501Y.V1 -B.1.1.7 -UK The authors gratefully acknowledge the support of this work through the project "DIMCoVAR" funded in the program "Fondo Supera COVID" of the CRUE (Spanish Universities)-Santander J o u r n a l P r e -p r o o f Journal Pre-proof