key: cord-0868426-wczwwbzn
authors: Andriamandimby, Soa Fy; Brook, Cara E.; Razanajatovo, Norosoa; Randriambolamanantsoa, Tsiry H.; Rakotondramanga, Jean-Marius; Rasambainarivo, Fidisoa; Raharimanga, Vaomalala; Razanajatovo, Iony Manitra; Mangahasimbola, Reziky; Razafindratsimandresy, Richter; Randrianarisoa, Santatra; Bernardson, Barivola; Rabarison, Joelinotahiana Hasina; Randrianarisoa, Mirella; Nasolo, Frédéric Stanley; Rabetombosoa, Roger Mario; Ratsimbazafy, Anne-Marie; Raharinosy, Vololoniaina; Rabemananjara, Aina H.; Ranaivoson, Christian H.; Razafimanjato, Helisoa; Randremanana, Rindra; Héraud, Jean-Michel; Dussart, Philippe
title: Cross-sectional cycle threshold values reflect epidemic dynamics of COVID-19 in Madagascar
date: 2021-11-29
journal: Epidemics
DOI: 10.1016/j.epidem.2021.100533
sha: 33c873738ba6e1c530ce23074a170ccafab49e56
doc_id: 868426
cord_uid: wczwwbzn

As the national reference laboratory for febrile illness in Madagascar, we processed samples from the first epidemic wave of COVID-19, between March and September 2020. We fit generalized additive models to cycle threshold (C(t)) value data from our RT-qPCR platform, demonstrating a peak in high viral load, low-C(t) value infections temporally coincident with peak epidemic growth rates estimated in real time from publicly-reported incidence data and retrospectively from our own laboratory testing data across three administrative regions. We additionally demonstrate a statistically significant effect of duration of time since infection onset on C(t) value, suggesting that C(t) value can be used as a biomarker of the stage at which an individual is sampled in the course of an infection trajectory. As an extension, the population-level C(t) distribution at a given timepoint can be used to estimate population-level epidemiological dynamics. We illustrate this concept by adopting a recently-developed, nested modeling approach, embedding a within-host viral kinetics model within a population-level Susceptible-Exposed-Infectious-Recovered (SEIR) framework, to mechanistically estimate epidemic growth rates from cross-sectional C(t) distributions across three regions in Madagascar. We find that C(t)-derived epidemic growth estimates slightly precede those derived from incidence data across the first epidemic wave, suggesting delays in surveillance and case reporting. Our findings indicate that public reporting of C(t) values could offer an important resource for epidemiological inference in low surveillance settings, enabling forecasts of impending incidence peaks in regions with limited case reporting.

As the national reference laboratory for febrile illness in Madagascar, we processed samples from the first epidemic wave of COVID-19, between March and September 2020. We fit generalized additive models to cycle threshold (C t ) value data from our RT-qPCR platform, demonstrating a peak in high viral load, low-C t value infections temporally coincident with peak epidemic growth rates estimated in real time from publicly-reported incidence data and retrospectively from our own laboratory testing data across three administrative regions. We additionally demonstrate a statistically significant effect of duration of time since infection onset on C t value, suggesting that C t value can be used as a biomarker of the stage at which an individual is sampled in the course of an infection trajectory. As an extension, the populationlevel C t distribution at a given timepoint can be used to estimate population-level epidemiological dynamics. We illustrate this concept by adopting a recently-developed, nested modeling approach, embedding a within-host viral kinetics model within a population-level Susceptible-Exposed-Infectious-Recovered (SEIR) framework, to mechanistically estimate epidemic growth rates from cross-sectional C t distributions across three regions in Madagascar.

We find that C t -derived epidemic growth estimates slightly precede those derived from incidence data across the first epidemic wave, suggesting delays in surveillance and case reporting. Our findings indicate that public reporting of C t values could offer an important resource for epidemiological inference in low surveillance settings, enabling forecasts of impending incidence peaks in regions with limited case reporting.

3 international travelers [1] . Subsequent to this introduction, the first wave of the COVID-19 epidemic was geographically staggered, with early cases in May 2020 largely concentrated in the eastern city of Toamasina, part of the Atsinanana administrative region, followed by a more severe outbreak which peaked in July 2020 in the capital city of Antananarivo, part of the Analamanga administrative region. Test positive rates exceeded 50% at the epidemic peak for both regions and at the national level, indicating widespread underreporting [2] , a common feature of COVID-19, for which some 20-40% of infections are thought to be entirely asymptomatic [3] [4] [5] [6] . Early reporting on the first epidemic wave in Madagascar indicated an extremely high (56.6%) proportion of asymptomatic cases, based on targeted surveillance of symptomatic patients and their contacts [1] .

Madagascar closed its borders to international air travel on 20 March 2020 and, subsequent to identification of the first case, implemented several non-pharmaceutical interventions aimed at curbing epidemic spread, including non-essential business closures, curfews, stay-at-home orders, and mandates for social distancing. These restrictions were relaxed after the first epidemic subsided in September 2020 but have since been re-implemented in the face of a second epidemic wave. In other regions of the globe, widespread efforts to estimate the effective reproduction number, R t , for COVID-19 at national, regional, and local levels [7] have been used to inform public health interventions and retrospectively assess their effectiveness [8] :

disease transmission rates are increasing at R t > 1 and decreasing at R t < 1. Estimation of R t , or its related counterpart, r, the epidemic growth rate [9, 10] , from available case count data is challenged by limitations or variability in surveillance, uncertainty surrounding the shape of disease parameter distributions, and delays in reporting [8] . Despite the enormity of these challenges in the limited surveillance settings common to many lower-and middle-income countries (LMICs), real-time estimation of R t from COVID-19 case-counts has been attempted for most regions of the globe [7] and has been implemented locally in Madagascar [11] [12] [13] [14] .

Recent methodological advances have introduced a new resource to the epidemiological toolkit by which to conduct real time estimation of epidemic trajectories [15] , one that leverages the often-discarded cycle threshold, or C t , value, that is returned as an-inverse log-10 measure of viral load from all RT-qPCR-based platforms [16] . After observing that SARS-CoV-2 viral loads-and, as a consequence RT-qPCR C t values-demonstrate a predictable trajectory following the onset of infection [17] [18] [19] , Hay et al. 2021 showed that the C t value can be used as J o u r n a l P r e -p r o o f 4 a biomarker of time since infection and, consequently, be leveraged to back-calculate infection incidence, in a manner analogous to previous work leveraging serological titer information in other systems [20] [21] [22] . Probabilistically, a randomly-selected infection is more likely to be early in its infection trajectory when identified during a growing epidemic and later in its trajectory in a declining epidemic [23, 24] , and as a consequence, the population-level distribution of C t values for any viral infection is expected to shift across the duration of an epidemic. Indeed, low-C thigh-viral-load infections have been observed to coincide with growing COVID-19 epidemics and high-C t -low-viral-load infections with declining epidemics in several settings [18, 25, 26] .

Exploiting this phenomenon, Hay et al. 2021 developed a method that embeds a within-host, viral kinetics model in a population-level disease transmission model to derive epidemic trajectories from cross-sectional C t samples. Because this method depends on quantitative information captured in the biological sample itself, rather than the relationship between case count and reporting date, C t value estimation more accurately predicts true epidemic trajectories than traditional incidence estimation in settings with uneven surveillance [15] .

During the early phase of the COVID-19 epidemic in Madagascar, the Virology Unit laboratory (National Influenza Centre) at the Institut Pasteur of Madagascar (IPM) processed the majority of all SARS-CoV-2 testing samples derived from 114 districts across 6 major provinces in the country. Consistent with findings reported elsewhere [18, 25, 26] , we observed a population-level decline in C t values derived from RT-qPCR-testing in our laboratory, coincident with the epidemic peak across the first wave of COVID-19 in Madagascar. We here adopt the methods presented by Hay et al. 2021 to estimate COVID-19 epidemic growth rates at the national level (2018 population ~26 million [27] ) and in two major administrative regions of Madagascar: Atsinanana (east coast of Madagascar; 2018 population ~1.5 million [27] ) and

Analamanga (including Antananarivo, capital city; 2018 population ~3.6 million [27] ). We evaluate the robustness of this C t -based method in comparison with epidemic growth rates derived from more traditional case-count methods applied to the same regions and at the national level. Atsinanana and Analamanga comprised the geographic epicenter of the first COVID-19 wave in Madagascar but also the source of the majority of samples sent for testing. Given that vast majority of testing resources across the duration of the Madagascar pandemic have been concentrated in Antananarivo [28] , 'national' reported trends may belie temporally lagged disease dynamics in more rural regions of the country. We advocate for more widespread C t J o u r n a l P r e -p r o o f 5 reporting from rural areas, as our analysis indicates that C t -based R t estimation could be a particularly robust method of inferring epidemiological trajectories in low-surveillance settings.

IPM SARS-CoV-2 C t Data.

Methods for collection, transport, and processing of SARS-CoV-2 testing samples at IPM have been previously described [1] . Briefly, nasopharyngeal and oropharyngeal swabs were Madagascar [1] . Despite these efforts, the first infection with SARS-CoV-2 was first confirmed in Madagascar on 19 March 2020, after which testing efforts were refocused to screen contacts of confirmed cases, as well as any patients reporting to clinics or hospitals with symptoms aligned with pre-existing surveillance systems for influenza-like illness (ILI) and severe acute respiratory infection (SARI), following guidelines from WHO [29] . IPM has processed samples from ILI and SARI surveillance platforms in Madagascar for over a decade; in April 2020, we added SARS-CoV-2 to the routine screening of incoming specimens [1, 30, 31] .

Due to a dearth of available reagents in the early stages of the epidemic, our lab used seven different WHO-recommended kits and corresponding protocols [32] to assay infection in these samples [1] : Charity Berlin [33] , Hong Kong University [34] are derived from these positive test results, as C t -values were not reliably recorded following negative results. We further subset our data as appropriate for each analysis of interest.

Estimating growth rates from IPM case data.

We first sought to obtain an estimate of new daily cases reported from our laboratory to the Malagasy government between 18 March and 30 September 2020. To this end, we reduced our dataset to include only sampling from the first reported positive test date for each unique patient; we assumed that reinfection was unlikely within the short duration of our study and that any subsequent positive tests were reflective of longer-duration infections in repeatedly sampled individuals. A patient was considered "positive" for SARS-CoV-2 infection if any test for any SARS-CoV-2 target was positive, and the results of the other samples were not inconsistent with this finding. We then summed cases by date at the national level and for two administrative regions (Atsinanana and Analamanga) that reported the majority of total cases across the study period overall. In total, 5,276 cases were reported from our laboratory across the study period at the national level, 3,505 in Analamanga region and 758 in Atsinanana. Daily cases for the two target regions and for the nation at large are summarized in Fig. 1 . September 2020), across three focal regions. Darker shading shows data derived from the IPM RT-qPCR platform, while lighter shading depicts data nationally reported and consolidated on [11] . Righthand y-axis shows corresponding epidemic growth rate computed from case count data in EpiNow2 [35] , with darker line corresponding to computation from IPM data and lighter line to computation from publicly reported data; background shading around each line depicts the corresponding 50% quantile by EpiNow2 [35] .

We applied the opensource R-package EpiNow2 [35] to the daily incidence data to estimate the epidemic growth rate for COVID-19 across the study period. EpiNow2 builds on previous R t estimation packages [36] , using a non-stationary Gaussian process model to estimate the instantaneous time-varying reproduction number, R t , and the corresponding time-varying epidemic growth rate, r, while incorporating uncertainty in the generation interval. Following best recommended practices [8] , we modeled the SARS-CoV-2 incubation period as a lognormal distribution with a mean of 1.621 days (sd=0.064) and a standard deviation of 0.418 days (sd=0.061) [37] and the generation time interval as a gamma distribution with a mean of 3.635 J o u r n a l P r e -p r o o f 8 (sd=0.71) and a standard deviation of 3.075 (0.77) [38] . Since the IPM testing data reported the actual date of sample collection, no reporting delay was incorporated in our growth rate estimation.

To compare our laboratory-derived epidemic growth estimates with those undertaken in real time in Madagascar, we collaborated with colleagues who recorded data on the number of new PCR-confirmed cases reported daily on national television by the Ministry of Health of the Government of Madagascar across the duration of the first epidemic wave. From these daily case estimates, we used the EpiNow2 package [35] to again estimate the epidemic growth rate across the same study period, assuming the same incubation period and general time interval referenced above [37, 38] . For these estimates, we followed methods outlined in [39] , to additionally model a reporting delay from a log-normal distribution fit to 100 subsamples with 1000 bootstraps from a publicly available linelist that collates data globally for COVID-19 cases for which both infection onset and notification dates are available [40] .

In our next series of analyses, we leveraged information captured in the individual C t value returned from each positive test. To control for extensive variation in qPCR test and target 9 only using the current version of the kit: SarbeCoV TibMolBiol). We fit linear mixed effect regression models in the lme4 [42] package in R to the resulting C t curves returned from each testing platform across the dilution series and used the fitted slope and y-intercept of each regression equation to reproject all C t values in our dataset to correspond to results returned from the TaqPath N gene test. We report, analyze, and visualize these TaqPath N-corrected Ct values in all analyses. We selected the TaqPath N gene C t as the basis for reporting because our laboratory adopted the TaqPath assay for exclusive use after supply chains stabilized nine months into the pandemic; of the three targets returned from the TaqPath assay, the N-gene is the most common target across other SARS-CoV-2 diagnostic platforms [43] .

After observing a population-level dip in the average C t value recovered from our testing platform, roughly coincident with the epidemic peak in the three regions of interest, we asked the broad question, what is the population level time-trend of SARS-CoV-2 C t values across these three regions? To address this question, we compiled all positive tests from the first date of positive testing for each patient, recording the date, region, test, and target that corresponded to each corrected C t value, in addition to the numerical ID and the symptom status (asymptomatic, symptomatic, or unknown) of the patient from which it was derived. Symptom statuses were recorded by medical staff at the timepoint of sampling and merely indicated whether or not the patient presented with symptoms; thus, 'asymptomatic' classification did exclude the possibility that the same patient reported symptoms at later or earlier timepoints across the course of infection. The resulting data consisted of 8,055 discrete C t values, corresponding to 5,280 patients, most of whom were tested using multiple tests and/or gene targets of interest. C t values for these positive test results ranged from 6.36 to 39.91. When reprojected to TaqPath N levels, the range shifted from 7.82 to 39.99, such that 507 C t values classed as "positive" by the cutoff thresholds on other platforms exceeded the C t <=37 threshold for positivity on the TaqPath platform. These samples were nonetheless retained for generalized additive modeling (GAM) of longitudinal C t trends but GAM-projected C t values still exceeding the TaqPath cutoff were later excluded in mechanistic modeling of transmission trends fitted to positive data.

Using the mgcv package [44] in the R statistical program, we next fit a GAM in the gaussian family to the response variable of corrected C t value, incorporating a numerical J o u r n a l P r e -p r o o f thinplate smoothing predictor of date, and random effects on the categorical variables of test (Charity Berlin, Hong Kong, Da An, LightMix SarbeCoV, SarbeCoV TibMolBio, TaqPath, or GeneXpert), target (E,N,Orf1a/b, or S), and individual patient ID. We refit the model to three different subsets of the data, encompassing the Atsinanana and Analamanga regions, as well as the entire National data as a whole. We then used the resulting fitted GAMs to simulate population-level C t distributions for each date in our dataset, excluding the effects of test and target in the predict.gam function from mgcv. This produced a test-and target-controlled average C t estimate for each positive patient at the timepoint of sampling. We used these GAM-simulated C t distributions to carry out mechanistic model fitting in subsequent analyses, as described below, excluding 15 patients with Ct projections >37, which exceeded the positive threshold for the TaqPath N gene assay.

To validate observations from the literature which indicate that the viral load and corresponding C t value follow a predictable trajectory after the onset of SARS-CoV-2 infection [17, 18] within our own study system, we next concentrated analyses on a subset of 4,822 Ct values (corresponding to 2,842 unique samples derived from 2,404 unique patients), for which the timing of symptom onset was also recorded. For each of these samples, we randomly drew a corresponding incubation time from the literature-derived log-normal distribution above [37] to approximate the timing of infection onset. To answer the question, how does C t vary with time since symptom onset?, we fit a GAM in the gaussian family to the resulting data with a response variable of C t and a numerical thinplate smoothing predictor of days since infection onset, as well as random effects of test, target, and patient ID. After fitting, we used the predict.gam function from the mgcv package, excluding the effects of target and test, to produce a distribution of C t values corresponding to times since symptom onset (one per each unique patient ID). We used these C t trajectories to estimate parameters for the within-host viral kinetics model described in final methods section below.

We next asked the question, does C t value vary in symptomatic vs. asymptomatic cases? J o u r n a l P r e -p r o o f 11 Our first investigation of this question required only reconsideration of the individual trajectory GAM described above to include additional predictor variables of age and symptom status, in addition to days since infection onset, target, and test. Since symptom status was recorded only at the first timepoint of sampling for each individual, we limited our individual trajectory dataset to a 4,072 datapoint subset of Ct values from 2,404 discrete patients reporting both date of symptom/infection onset and symptom status at the timepoint of sampling; as mentioned previously, 'asymptomatic' classification in our dataset included patients reporting symptoms from earlier or later timepoints prior to or following the sampling date. Thus, this GAM tested whether symptom status and C t value interacted merely as a function of the timing since symptom onset (e.g. high C t values were recovered from patients either very early or late in their infection trajectory), or whether independent interactions between symptomatic vs.

asymptomatic infections and C t were also present, while also controlling for age.

After observing results, we extended this analysis by applying another GAM in the gaussian family to a 7,937 datapoint subset of the data used to model longitudinal C t trajectories at the National level, which additionally reported symptom status (symptomatic vs. asymptomatic) at the timepoint of first sampling for 5,202 unique patients. Corrected C t values in this data subset ranged between 7.82 and 39.99. This GAM incorporated a response variable of C t and random effects predictor smoothing terms of symptom status, test, target, and patient ID, as well as a numerical smoothing predictor for age of the infected patient.

Finally, following newly-developed methods [15] , we sought to estimate the epidemic growth rate across our three regions of interest using cross-sectional C t distributions and compare these results against estimates derived from the case count methods described above. In their original paper, Hay et al. 2021 [15] applied two population-level models (a Susceptible-Exposed- independent of delays and biases in reporting [15] .

To this end, we adopted the method developed in Hay et al. 2021 [15] , first fitting the within-host viral kinetics model to the test-and target-controlled C t values produced from the above GAM describing C t as a function of time since infection. We used the resulting parameter estimates as informed priors (Table S1) Beyond the viral kinetics parameters, we adopted less-constrained priors from the original paper [15] for other epidemiological parameters included in both population-level models (Table S1) , then re-fit both transmission models in turn to cross-sectional weekly C t distributions derived from the Atsinanana, Analamanga, and National-level datasets. We fit both models to each dataset using an MCMC algorithm derived from lazymcmc R-package [46] , as described in the original paper [15] , applying the default algorithm to the GP fit and a parallel tempering algorithm able to accurately parse multimodal posterior distributions to the SEIR fit. Four 

Epidemic trajectories from case count data.

The first wave of COVID-19 infections in Madagascar, between March and September 2020, was characterized by two subsequent outbreaks: one early, May 2020 peak centered in the eastern port city of Toamasina (region Atsinanana), followed by a second peak in July centered in the capital city of Antananarivo (region Analamanga) (Fig. 1) [1] . Estimation of the epidemic growth rate showed broad agreement in trends at both the national and regional levels, whether computed from IPM testing data assuming perfect reporting of testing date, or from publicly reported national data, including a reporting delay parameterized from a global opensource database ( Fig. 1) [40] . Since IPM data comprised just over 30% of nationally reported data throughout the first six months of the Madagascar epidemic, this concurrence in growth rates was unsurprising but nonetheless validates the applicability of the globally parameterized reporting delay for use in Madagascar. In both datasets, we estimated the national level epidemic growth rate to be increasing in the months preceding the two epidemic sub-peaks (in April and in June) and declining beginning in mid-July after the last peak in national case counts (Fig. 1) .

When IPM data were considered at the regional level, we discovered the April peak to be concentrated in Atsinanana, preceding the Toamasina outbreak and the June peak to be concentrated in Analamanga preceding the Antananarivo outbreak. Growth rate estimation from publicly reported data confirmed this pattern for Analamanga but was not possible for the Atsinanana region due to a lack of clarity in regional reporting; indeed, nearly 70% of nationally reported data in the public dataset were derived from the Analamanga region.

All RT-qPCR platforms used in our laboratory demonstrated increases in C t value corresponding to 10-fold dilutions of RNA extracted from the original virus isolate (Table S2) , though the estimated slope and y-intercept of each regression varied across the tests and targets considered, with the steepest slope recovered from GeneXpert N-gene tests and the shallowest from the Hong Kong ORF1a/b kits (Fig. S1, Table S3 ). We used the corresponding slope and y-J o u r n a l P r e -p r o o f 14 intercept for each test and platform to transform C t values in all subsequent analyses into values predicted for a TaqPath N-gene platform.

We observed a population-level dip in C t values obtained from our SARS-CoV-2 RT-qPCR platform concurrent with the regional peak in cases in May for Atsinanana and June for Analamanga, with both peaks observable in the National data ( Fig. 2A) . GAMs fit to Atsinanana, Analamanga, and National data subsets explained, respectively, 98.8, 98.9, and 98.9% of the deviation in the data (Table S4 ). All three GAMs demonstrated statistically significant effects of date, test, and individual patient ID, which contributed to the total deviance capture by each model. GAMs fit to the Analamanga and National data subsets showed an additional significant effect of target on the C t value. Partial effects plots were computed from the resulting GAMs ( Fig. S2) following methods described in [47] and demonstrated no significant effects of any particular test or target gene. In general, most variation in C t value beyond that of the individual patient was driven by the significant effect of date across all regions (Table S4 ).

Population-level SARS-CoV-2 corrected Ct values from IPM RT-qPCR platform across three Madagascar regions from March-September 2020. C t values are colored by the test and shaped by the target from which they were derived (legend), though note that all C t values were first corrected to TaqPath N gene range. The vertical, black line gives the date of peak case counts per region in the IPM dataset, from which these C t values were derived (May 20, 2020 for Atsinanana and July 22, 2020 for both Analamanga and National). The black, horizontal curve gives the output from a gaussian GAM fit to these data (Table S4) , excluding the effects of target and test, which were also included as predictors in the model; 95% confidence intervals by standard error are shown in translucent shading. Partial effects of each predictor are visualized in Fig. S2 . Righthand plots visualize partial effects of (B.) days since infection, (C.) patient age, and (D.) patient symptom status on C t value from our individual trajectory GAM (Table S5 ). Significant predictors are depicted in light blue and non-significant in gray (Table S5) .

The SARS-CoV-2 C t value also demonstrated a predictable trajectory from the timing of onset of infection. Our GAM fit to data reporting a date of symptom onset (which we converted to a date of infection onset) and incorporating a predictor smoothing term of days since infection onset, and random effects of test, target, and patient ID explained 92.7% of the deviance in the data and demonstrated statistically significant effects of all predictor variables, including days since infection onset (Table S5 ). These findings confirmed that C t value can be used as a biomarker of time since infection, validating the applicability of methods outlined in [15] for our Madagascar data.

As an extension of the individual trajectory analysis, we hypothesized that C t value would likely be linked to symptom status, since many infection trajectories begin with a brief presymptomatic phase, progress to symptom presentation, then become asymptomatic during recovery [17, 18] . The first GAM we employed to address this question considered age and symptom status as additional predictor variables in our individual trajectory analysis. This final GAM explained 98.5% of the deviation in the data and included significant effects of days since infection onset, symptom status, test, target, and patient ID (Table S5 ). Despite the significance of symptom status as a predictor variable in the GAM overall, partial effects plots demonstrated no significant association between asymptomatic status and high C t values or symptomatic status and low C t values, while controlling for age (Fig. 2B, 2C, 2D) . These results suggest that, in our dataset, C t value varies predictably with an individual's infection trajectory regardless of symptom classification or age of the patient, further validating its adoption as a robust biomarker of time since infection (Table S5) .

J o u r n a l P r e -p r o o f 16 We additionally extended this analysis to our National-level C t dataset, including a predictor variable of symptom status, in addition to test, target, patient age, and patient ID in longitudinal GAMs. This model explained 98.9% of the deviation witnessed in the data, including significant effects of test, target, patient ID, and symptom status (Table S6 ). Test and target were here included as control variates only and cannot be considered for prediction, as both co-varied with date, which was not used as a predictor in this model. In this model, partial effects plots indicated a significant association of asymptomatic status with high C t values and symptomatic status with low C t values (Fig. S3) , even when controlling for effects of age; as this larger dataset did not report date of symptom/infection onset, it is likely that this association covaried with the timing of infection onset, suggesting that previous reports of a high proportion of asymptomatic infections in Madagascar [1] could reflect a high proportion of pre-or postsymptomatic infections.

After confirming the predictable pattern of C t value across an individual's infection trajectory, and the predictable decline in population-level C t in conjunction with the epidemic peak, we used our individual trajectory GAM to simulate a distribution of C t values across a 50day duration of infection and fit the within-host viral kinetics model described in [15] to the resulting data (Fig. S4) . The model demonstrated a good fit to the data, and estimated posterior distributions for viral kinetics parameters were largely on par with those used previously in models of SARS-CoV-2 dynamics in Massachusetts, though the modal C t value at peak viral load was slightly lower in our Madagascar dataset ( Fig S4; Table S1 ).

After fitting the within-host model, we next used longitudinal population-level GAMs ( Fig. S2 ) to generate weekly cross-sectional C t distributions, controlled for test and target, across our three regions of interest. As expected, weekly cross-sectional C t distributions demonstrated a shift across the duration of the epidemic wave; with lower C t values temporally correlated with high growth rates estimated from case count data (Fig. 3) .

J o u r n a l P r e -p r o o f Finally, we used the viral kinetics posterior distributions resulting from the within-host viral kinetics model fit as prior inputs into SEIR and GP population-level epidemiological models, which we fit to the weekly cross-sectional C t data. MCMC chains generated in the fitting process demonstrated good convergence (Fig. S5 , Table S7, Table S8 ) and produced posterior distributions for all parameters on par with those estimated in previous work (Table S1 , Fig. S6 ), which effectively recaptured cross-sectional C t value histograms across the target timeseries in all three regions (Fig. 3, Fig. S7-S9 ) [15] . From the resulting fitted models, we simulated epidemic incidence curves, which we used to compute growth rate estimates across the duration of the first epidemic wave in each of the three regions (Fig. 4) . We compared these estimates to growth rates inferred from case count data; patterns from both SEIR and GP models were largely complementary, though, as expected, the more flexible GP model demonstrated less extreme J o u r n a l P r e -p r o o f 18 variation in epidemic growth rate. Both C t -model fits demonstrated similar patterns to epidemic trajectories estimated from incidence data, with increasing growth rates in the months preceding both epidemic sub-peaks (April and June) and decreasing growth rates beginning in July.

Nonetheless, growth rate estimates derived from the C t model slightly preceded those estimated from case count data. The C t model fits further predicted uncertainty in growth rate directionality towards the end of the study period for the Analamanga and National-level data, while incidence estimation projected decreasing cases at this time. This finding suggests that cross-sectional C t distributions indicated a possible epidemic resurgence which was overlooked by growth rates estimated from declining incidence. If incidence declined in part due to declining surveillance, as was the reality at the end of Madagascar's first epidemic wave [1] , only the C t method remained robust to the possibility of epidemic renewal. Comparison of COVID-19 epidemic growth rates from March-September 2020, estimated from IPM (blue) and publicly reported (gray) case count data using EpiNow2 [35] with estimates derived from Gaussian process (GP; red) mechanistic model fit to the time series of C t distributions (Fig. 3A ). Median growth rates are shown as solid lines, with 50% quantile on case-based estimates and 95% quantile of the posterior distributions from C t -based estimates in corresponding sheer shading. (B.) Growth rate estimates from individual SEIR C t -model fits to each C t -distribution shown in Fig. 3A ; median growth rates are given as horizontal dashes, with the 95, 70, 50, and 20% of the posterior distribution indicated by progressively darker coloring. Estimates >0 (indicating growing epidemics) are depicted in gold and <0 (indicating declining epidemics) in purples. (C.) Raw case count data from the time series (dark=IPM data; light=publicly reported data) is shown for reference.

Real-time estimation of epidemiological parameters, including the time-varying effective reproduction number, R t , and the related instantaneous epidemic growth rate, r, has played an important role in guiding public health interventions and policies across many epidemic outbreaks, including COVID-19 [48] [49] [50] . In Madagascar, an opensource platform [11] was developed shortly after the introduction of COVID-19 in March 2020, to collate and visualize publicly reported data and estimate R t using traditional methods applied to daily reported incidence [35, 36] . We here compare the results from this platform applied to the first epidemic wave in Madagascar, with new estimates of the time-varying epidemic growth rate applied to our own laboratory data across the first epidemic wave-including those derived using a novel method based on the cross-sectional C t value distribution at the time of sampling [15] .

We find our new estimates to be largely congruent with those predicted from publicly reported data, demonstrating a pattern of increasing epidemic growth rates prior to a peak in cases, which occurred first in May 2020 in the Atsinanana region, followed by a second outbreak in July 2020 in the Analamanga region. Critically, our growth rate estimates derived using novel methods applied to the C t distribution over time slightly precede those estimated from incidence data. As previous work has demonstrated C t estimation to offer a more robust approximation of true dynamics under limited surveillance scenarios [15] , these findings suggest that incidencebased methods to estimate epidemic trajectories in Madagascar may be underestimating the true pace of the epidemic, likely as a result of underreporting. Additionally, C t -based methods adopted by a single laboratory allow for estimation of epidemic growth rates even in the absence of publicly reported case counts: in October 2020, the Malagasy Ministry of Health shifted its daily COVID-19 case notifications to weekly, interfering with incidence-based approaches to estimate epidemic trajectories [11] . C t -based approaches, instead, should be robust to this variation in reporting, offering a powerful tool for public health efforts in low surveillance settings. Indeed, our analysis demonstrates that C t -based epidemic growth rates show uncertain directionality towards the end of the first wave of COVID-19 in Madagascar, presaging eventual epidemic resurgence, while incidence-based rates categorically declined due to both truly declining cases and declining surveillance. Incidence-based growth rate estimation ceased during the continued limited surveillance period from October 2020 through March 2021 [11] ; had C t -J o u r n a l P r e -p r o o f 20 based methods been available at the time, it is possible that the current second wave could have been predicted and mitigated by earlier rollout of public health interventions.

Statistical analysis of our C t data indicates that C t values vary predictably with days since onset of infection, allowing viral kinetics data to be leveraged for population-level estimation of epidemiological patterns. In our system, this pattern held even after controlling for the effects of age and symptom status on the C t trajectory, further validating the applicability of C t value as an indicator of time since infection. Nonetheless, in future work, it may be possible to fit unique viral kinetics trajectories for different classes of people; for example, older age cohorts or cohorts of people infected with more transmissible variants may be better represented by lower average C t trajectories than the population as a whole [18, 51] . Our application of generalized additive models to both individual infection trajectory and population-level C t distributions offers an effective means by which to control for variation in test and target across diverse RT-qPCR platforms to generate C t values for epidemiological inference which represent a reliable average of population-level patterns overall.

We acknowledge the limitations of our current method, especially as it relates to testing biases. Nearly two-thirds of both case data and C t -value data were derived from Madagascar's Analamanga region, including the capital city of Antananarivo; as such, national trends are strongly influenced by the Antananarivo epidemic and may belie more time-lagged dynamics in more rural, less connected regions of the country. Nonetheless, these biases are unlikely to seriously impact inference from the early stages of the Madagascar epidemic, which began with an introduction event in Antananarivo [1] , and despite the concentration of testing resources in the Analamanga [28] , our C t -based estimation methods applied to the national dataset were nonetheless able to detect the epidemic in the Atsinanana region, too. During the earliest phases of the epidemic in Madagascar, testing resources were limited in our laboratory, which may have additionally biased sample intake towards high-viral-load, low-C t -value cases that could bias epidemiological inference towards increasing growth rates even after the epidemic has, in reality, already begun to decline. As the epidemic ensued, however, the Madagascar Ministry of Health focused sampling on symptomatic patients and their suspected contacts, leading to a high proportion (56.6%) of reported asymptomatic infections in our dataset [1] , which may have instead prematurely biased inference towards a declining epidemic. Nonetheless, our Ct-based J o u r n a l P r e -p r o o f 21 projections of epidemic trajectories do not appear to underestimate realized trends, suggesting that our method was robust to these inconsistencies.

We apply a novel method leveraging within-host viral load data that is currently largely overlooked in the epidemiological literature to describe the dynamics of the first wave of COVID-19 in Madagascar. Our approach validates an important new tool for epidemiological inference of ongoing epidemics, particularly applicable to limited surveillance settings characteristic of many lower-and middle-income countries. We advocate for public release of real time data describing the C t value distribution, in addition to daily case counts, to improve epidemiological inference to guide public health response and intervention.

The COVID-19 Epidemic in Madagascar: clinical description and laboratory results of the first wave

Menzies3, Bayesian nowcasting with adjustment for delayed and incomplete reporting to estimate COVID-19 infections in the United States

Prevalence of asymptomatic SARS-CoV-2 infection: A narrative review

Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship

Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19)

COVID-19: PCR screening of asymptomatic health-care workers at London hospital

Practical considerations for measuring the effective reproductive number

How generation intervals shape the relationship between growth rates and reproductive numbers

Reconciling early-outbreak estimates of the basic reproductive number and its uncertainty : framework and applications to the novel coronavirus (SARS-CoV-2 ) outbreak

Mathematical model and non-pharmaceutical control of the coronavirus 2019 disease in Madagascar

Probabilistic modelling of COVID-19 dynamic in the context of Madagascar

Scrutinizing the spread of COVID-19 in Madagascar

Estimating epidemiologic dynamics from cross-sectional viral load distributions

To interpret the SARS-CoV-2 test, consider the cycle threshold value

SARS-CoV-2: virus dynamics and host response

Viral load of SARS-CoV-2 across patients and compared to other respiratory viruses

Quantifying antibody kinetics and rna detection during early-phase SARS-CoV-2 infection by time since symptom onset

Estimating time of infection using prior serological and individual information can greatly improve incidence estimation of human and wildlife infections

An open source tool to infer epidemiological and immunological dynamics from serological data: Serosolver

Reconstruction of antibody dynamics and infection histories to evaluate dengue risk

How generation intervals shape the relationship between growth rates and reproductive numbers

Using Combined Diagnostic Test Results to Hindcast Trends of Infection from Cross-Sectional Data

Universal admission screening strategy for COVID-19 highlighted the clinical importance of reporting SARS-CoV-2 viral loads

Viral load in community SARS-CoV-2 cases varies widely and temporally

Troisieme Recensement General de la Population et de L'Habitation (RGPH-3) Resultats Provisoires

Integrating health systems and science to respond to COVID-19 in a model district of rural Madagascar

Maintaining surveillance of influenza and monitoring SARS-CoV-2 -adapting Global Influenza surveillance and Response System (GISRS) and sentinel systems during the COVID-19 pandemic: Interim guidance

Evaluation of the influenza sentinel surveillance system in Madagascar

Study on causes of fever in primary healthcare center uncovers pathogens of public health concern in Madagascar

Molecular assays to diagnose COVID-19: Summary table of available protocols 2020

Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR

Molecular diagnosis of a novel coronavirus (2019-nCoV) causing an outbreak of pneumonia

EpiNow2: Estimate real-time case counts and timevarying epidemiological parameters

A new framework and software to estimate time-varying reproduction numbers during epidemics

The incubation period of coronavirus disease 2019 (CoVID-19) from publicly reported confirmed cases: Estimation and application

Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data

Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts

Epidemiological data from the COVID-19 outbreak, real-time case information

Assessment of inactivation procedures for SARS-CoV-2

Fitting linear mixed-effects models using lme4

Molecular and serological tests for COVID-19. A comparative review of SARS-CoV-2 coronavirus laboratory and point-of-care diagnostics

mgcv: GAMs and Generalized Ridge Regression for

Reconciling model predictions with low reported cases of COVID-19 in Sub-Saharan Africa: insights from Madagascar

Viral zoonotic risk is homogenous among taxonomic orders of mammalian and avian reservoir hosts

Real-time estimates in early detection of SARS

Early dynamics of transmission and control of COVID-19: a mathematical modelling study

Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China

S-Variant SARS-CoV-2 lineage B1.1.7 is associated with significantly higher viral load in samples tested by TaqPath Polymerase Chain Reaction

CRediT authorship contribution statement

Jean-Marius Rakotondramanga: Data curation, Supervision., Fidisoa Rasambainarivo, Resources., Vaomalala Raharimanga, Data curation, Supervision., Iony Manitra Razanajatovo: Investigation, Validation, Resources, Supervision, Project administration., Reziky Mangahasimbola, Data curation, Supervision