key: cord-0011953-ycmk6a4t
authors: McCarthy, Zachary; Athar, Safia; Alavinejad, Mahnaz; Chow, Christopher; Moyles, Iain; Nah, Kyeongah; Kong, Jude D.; Agrawal, Nishant; Jaber, Ahmed; Keane, Laura; Liu, Sam; Nahirniak, Myles; Jean, Danielle St; Romanescu, Razvan; Stockdale, Jessica; Seet, Bruce T.; Coudeville, Laurent; Thommes, Edward; Taurel, Anne-Frieda; Lee, Jason; Shin, Thomas; Arino, Julien; Heffernan, Jane; Chit, Ayman; Wu, Jianhong
title: Quantifying the annual incidence and underestimation of seasonal influenza: A modelling approach
date: 2020-07-10
journal: Theor Biol Med Model
DOI: 10.1186/s12976-020-00129-4
sha: 1c9a399924214438f496bf8be49816be9b96cec3
doc_id: 11953
cord_uid: ycmk6a4t

BACKGROUND: Seasonal influenza poses a significant public health and economic burden, associated with the outcome of infection and resulting complications. The true burden of the disease is difficult to capture due to the wide range of presentation, from asymptomatic cases to non-respiratory complications such as cardiovascular events, and its seasonal variability. An understanding of the magnitude of the true annual incidence of influenza is important to support prevention and control policy development and to evaluate the impact of preventative measures such as vaccination. METHODS: We use a dynamic disease transmission model, laboratory-confirmed influenza surveillance data, and randomized-controlled trial (RCT) data to quantify the underestimation factor, expansion factor, and symptomatic influenza illnesses in the US and Canada during the 2011-2012 and 2012-2013 influenza seasons. RESULTS: Based on 2 case definitions, we estimate between 0.42−3.2% and 0.33−1.2% of symptomatic influenza illnesses were laboratory-confirmed in Canada during the 2011-2012 and 2012-2013 seasons, respectively. In the US, we estimate between 0.08−0.61% and 0.07−0.33% of symptomatic influenza illnesses were laboratory-confirmed in the 2011-2012 and 2012-2013 seasons, respectively. We estimated the symptomatic influenza illnesses in Canada to be 0.32−2.4 million in 2011-2012 and 1.8−8.2 million in 2012-2013. In the US, we estimate the number of symptomatic influenza illnesses to be 4.4−34 million in 2011-2012 and 23−102 million in 2012-2013. CONCLUSIONS: We illustrate that monitoring a representative group within a population may aid in effectively modelling the transmission of infectious diseases such as influenza. In particular, the utilization of RCTs in models may enhance the accuracy of epidemiological parameter estimation.

The exact number of cases of a disease is complex to capture. Different methods can be used, from epidemiological studies to disease surveillance systems. While data collected routinely for surveillance purposes have the advantage of being readily accessible over a long period of time, they are subject to underestimation. Underestimation is a combination of under-reporting (failure to capture cases that seek care due to underdiagnoses or under-notifications) and under-ascertainment (failure to seek health care) [1] . Symptomatic individuals who seek medical care but are misdiagnosed due to an atypical presentation which does not fit the case definition or to the lack of sensitivity of the laboratory test (under-diagnosis) and/or for which administrative steps may not be taken at the physician's office to report the case contribute to under-reporting [1] . Also, infected individuals may be asymptomatic or with a mild form of the disease and may not seek healthcare, leading to under-ascertainment. Mathematical modelling may play a role in quantifying the effects of factors contributing to underestimation to assess the true number of influenza illnesses, ultimately to assist in policy development and to evaluate the impact of influenza vaccination.

Passive influenza surveillance systems are not designed to capture all illnesses; however, surveillance data has been utilized to assess under-reporting, underestimation and incompleteness. Statistical modelling has been utilized to quantify under-reporting and underestimation of influenza-associated hospitalizations, morbidity and mortality in the United States (US) and Canada [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] . However, assessing the underestimation associated with the true number of influenza illnesses has received less attention. Influenza surveillance data has been utilized to assess underestimation and estimate symptomatic influenza illnesses using mulitiplier methods [13, 14] , Bayesian evidence synthesis [15] , and dynamic models using ordinary differential equations [16] . In one study, assessments of underestimation were shown to be unreliable when utilizing only virologic surveillance data [16] . Similarly, a noted contributor of error of symptomatic illness estimates was the quality of sentinel influenza-like illness (ILI) data, which is subject to both overestimation and underestimation [15] . A method to assess underestimation of influenza illnesses utilizing a reliable data source, such as active surveillance, may enhance the ability to quantify the true underestimation and the number of influenza illnesses. In the present study, we develop a framework to utilize data from a closely monitored group whose epidemiological status is subject to minimal uncertainty. In particular, we utilize a randomized-controlled trial (RCT) designed to minimize the effects of underestimation and completely capture participant's illnesses through active surveillance and systematic testing [17] .

A recent vaccine RCT, which assessed the efficacy of a high-dose (HD) influenza vaccine compared to a standard-dose (SD) influenza vaccine, presents an opportunity to explore a new method using mathematical modelling to correct for underestimation in national virological influenza surveillance, ultimately to infer the number of symptomatic influenza illnesses. In particular, we establish a parameter estimation technique based on a mechanistic disease transmission model to assess the underestimation and symptomatic illnesses in the US and Canada during the 2011-2012 and 2012-2013 influenza seasons.

The present modelling study integrates multiple sources of data to generate estimates for the underestimation factor, disease transmission rate, expansion factor, and symptomatic influenza illnesses (Fig. 1) . The laboratory-confirmed influenza cases during the 2011-2012 and 2012-2013 seasons are included in the final seasonal surveillance reports from FluWatch and Flu-View in Canada and the US, respectively [18, 19] . Additionally, we utilize the US and Canadian census profiles [20] [21] [22] [23] .

A recent RCT assessed the efficacy of a HD influenza vaccine compared to a SD influenza vaccine among 14,500 and 17,489 participants during the 2011-2012 and 2012-2013 seasons, respectively [17] . We utilize the laboratoryconfirmed influenza counts among participants receiving SD and HD influenza vaccines in this RCT, which have been provided in the Supplementary Appendix of a prior publication [17] . The number of laboratory-confirmed influenza cases within the RCT are provided for three different case definitions [17] . In the present study, we utilize the laboratory-confirmed influenza cases associated with the least and most restrictive case definitions in the RCT. We utilize data associated with the least restrictive case definition, respiratory illness (RI), to provide estimates representing the true underestimation of symptomatic influenza illnesses. We utilize a more restrictive case definition, modified CDC-defined ILI, which is closer to the case definition used by surveillance systems in the US and Canada, to provide estimates representing underestimation of symptomatic influenza illnesses which were captured by virologic surveillance. RI was defined as the occurrence of one or more of the following: sneezing, nasal congestion or rhinorrhea, sore throat, cough, sputum production, wheezing, or difficulty breathing and modified CDC-defined ILI was defined as RI with cough or sore throat, concurrent with a temperature above 37.2°C [17] . The SD influenza vaccine effectiveness and coverage over the 2011-2012 and 2012-2013 influenza seasons has been established in the US and Canada [24] [25] [26] [27] . Also, the HD influenza vaccine coverage estimates have been made available in the US [28] . In Canada, the HD influenza vaccine was not yet licensed for use during 2011-2012 and 2012-2013 seasons.

We develop two compartmental models consisting of vaccinated and infected individuals for the 1) seniors aged 65+ who participated in the RCT and 2) the general population. Individuals are assumed to be vaccinated by the beginning of the influenza season, so there is no in-flow to the vaccinated compartment during the season. Since we study the dynamics of influenza within one season, we ignore the demographic dynamics (e.g. birth and death). As the size of the population for participants of the RCT is small relative to the total population, we ignore the influenza transmission from these participants to the general community; however, we assume the same transmission rate among the RCT participants. Also, we assume that the age groups 0-64 and 65+ are homogeneously mixed. In addition to homogeneous mixing by age group, we assume spatial homogeneity in transmission. We assign units of symptomatic influenza illness to the infected compartments of the following models for dimensional consistency with data.

We develop a compartmental model consisting of vaccinated and infected individuals for the seniors aged 65+ who participated in the RCT [17] . We make use of the Tables 4 and 5 for the estimate values and ranges following notation for model variables and parameters; the subscript " + " refers to compartments of individuals aged 65+ and subscript " − " refers to individuals aged 0-64. The vaccination status of the compartment will also appear in the subscript as SD (standard-dose) or HD (high-dose), if applicable. Lastly, let the superscript "O" denote individuals outside of the RCT and the superscript "C" denote participants within the RCT. With these assumptions and notations set, we formulate the following system of ordinary differential equations, where V C +,SD represents RCT participants vaccinated given the standard-dose vaccine, V C +,HD represents RCT participants vaccinated given the high-dose vaccine, I C +,HD represents infected RCT participants with high-dose vaccine, and I C +,SD represents the infected RCT participants given standard-dose vaccine. These model variables are described in Table 1 . Submodel (1) is equipped with the following initial conditions to replicate the RCT conditions preceding each season: 

Parameters +,SD and +,HD denote the vaccinemodified susceptibilities corresponding to the standarddose and high-dose influenza vaccine efficacy among seniors, respectively, β is the transmission rate and γ is the influenza recovery rate. Specifically, β = b/N where b is the daily effective contact rate and N is the total population size. The vaccine-modified susceptibilities represent a multiplier for the reduced infection rate of vaccinated individuals who are susceptible to influenza infection. Lastly, J represents the total infected individuals in the population from model (2) ,

We subdivide the general community (i.e., the entire US or Canada) into two age groups: seniors aged 65+ and non-seniors aged 0-64, as well as vaccination status (SD or HD). The model variables and notations for model (2) are displayed in Table 2 and described in more detail below.

As identical with model (1), β = b/N where b is the daily effective contact rate and N is the total population size. Model (2) is equipped with the following initial conditions: the influenza season begins with the initial vaccinated population Table 3 and estimated in "Parameter estimation" section. 

HD-vaccinated RCT participants who avoided infection 7243 8651 [17] Respiratory illness case definition

HD-vaccinated RCT participants who avoided infection 7196 8468 [17] Parameter for models (1) and (2) γ Influenza recovery rate

The approach for quantifying the underestimation factor and transmission rate utilizes the modelling framework developed in "Mathematical modelling" section. The mathematical details are presented in Appendix A. We assume that, since the RCT participants were actively monitored in the community, i.e. instructed to contact their study site if they had any respiratory symptoms and were, in addition, contacted weekly or bi-weekly by the site, that there was no underestimation of influenza infection within the RCT. The key idea is captured in the relationship pJ = R; a percent p (which denotes the underestimation factor) of symptomatic influenza cases J yields the laboratory-confirmed influenza cases R. We use this relationship pJ = R, the final epidemic size relationships for submodel (1) and model (2) , and structure of the SVIR model to derive a tractable system of nonlinear equations. The representative equation derived from submodel (1) captures information from the RCT, while the representative equation derived from model (2) captures information from the general population. Specifically, we derive a nonlinear system of the form

where

andR is the laboratory-confirmed cases in an influenza season captured in national virological surveillance. To solve this system of nonlinear equations (4) for p and β, we use Matlab's fsolve function in Matlab R2016a.

The expansion factor, E, is defined in this study as the number of symptomatic influenza illnesses per laboratory-confirmed infection. In terms of the underestimation factor, the expansion factor is its multiplicative inverse. Hence, we compute the expansion factor by finding the multiplicative inverse of p, that is E = p −1 . The number of estimated symptomatic influenza illnesses is ER whereR is the total laboratory-confirmed influenza cases from national surveillance during an influenza season.

The basic reproduction number is the average number of secondary cases produced by one infected individual introduced into a population of susceptible individuals. We determine the basic reproduction number of model (2) using the next generation method [35] .

We determine estimates for parameters associated with submodel (1) and model (2) utilizing data in "Data sources" section. We provide an outline of the methods in the main text and more complete calculations and explanations are included in Appendix B.

Estimating vaccine-modified susceptibility −,SD , +,SD and +,HD : Recall the vaccine-modified susceptibility captures the protection added from vaccination among vaccinated individuals against influenza infection. Here we outline the method for estimating parameter values +,SD , +,HD and −,SD for influenza seasons 2011-2012 and 2012-2013. This process integrates model analysis and prior vaccine effectiveness (VE) studies. We make use of a relationship between vaccine-modified susceptibility and VE [36] .

To infer −,SD and +,SD we use prior estimates of vaccine effectiveness (VE) against influenza. Specifically, we relate these VE estimates to vaccine-modified susceptibility with the relationship = 1 − VE [36] . Remaining is to find +,HD , which we use +,SD and the ratio +,HD / +,SD . We use submodel (1) to determine the ratio +,HD / +,SD . Specifically, we find +,HD / +,SD by dividing the first two equations in submodel (1), which yieldṡ

. Finally, we use separation of variables to find the ratio in terms of known RCT outcomes [17] :

The quantities V C +,SD (∞) and V C +,HD (∞) are the limit values of the state variables V C +,SD and V C +,HD , respectively. Finally, we use estimates of now known quantities; +,SD from VE studies and the ratio +,HD +,SD from the RCT to estimate +,HD [17] .

Initial susceptible and vaccinated populations: To inform the initial conditions for model (2) in the years 2011-2012 and 2012-2013, we use population sizes given by the US and Canadian census programs [20] [21] [22] [23] . The population size and the estimated vaccine coverage in each country then gives us the susceptible and vaccinated population initial conditions. The values and descriptions of these parameters embedded in the initial conditions; Table 3 . Similarly, the initial conditions for submodel (1) are given in Table 3 .

Influenza recovery rate γ : We inform the recovery rate γ using the infectious period of influenza, which has been estimated to be 3.8 days with a 95% confidence interval (CI) of 3.1 -4.6 days [34] . The recovery rate γ is then the inverse of the mean sojourn time in the infectious compartment; hence, we consider γ = 1 3.8 day −1 as a baseline. We utilize the bounds on the CI of the estimated infectious period for sensitivity analysis and vary γ from 1 4.6 to 1 3.1 day −1 .

We utilize the laboratory-confirmed influenza counts reported in the RCT according to two case definitions to inform V C +,HD (∞) and V C +,SD (∞). We use 1) laboratoryconfirmed cases associated with modified CDC-defined ILI and 2) laboratory-confirmed cases associated with respiratory illness (RI) [17] (Table 3) . Results generated for each case definition has an interpretation and is left for the Discussion.

To ensure that the parameter estimation process for obtaining p and β will yield biologically relevant results, we study the well-posedness of the inverse problem outlined in "Disease transmission and burden estimates" section. The solution p and β to system (4) is unique and positive, i.e. the problem is well-posed. From f (β, p) = 0, we note that β is related to p by the following equation

Substituting this expression for β into g(β, p) yields the following equation of p

where

Let (p) denote the left hand side of equation (6). Note that is a monotone-decreasing function of p with (0) > 0 and lim p→∞ (p) = 0. On the other hand, the right hand side of the equation is a monotonedecreasing function of p with lim p→0 γRp −1 = ∞ and lim p→∞ γRp −1 = 0. Therefore, there exists a unique solution p ∈ (0, 1] if (1) > γR.

From the parameter sets estimated in "Parameter estimation" section, (1) > γR, hence we have a unique solution of System (3) corresponding to each contextspecific parameter set. In other words, according to this specific US/Canadian demographic information, vaccinespecific parameters, influenza surveillance reports, and RCT study results, there is a single underestimation factors p (which we ensure is logically between 0 and 1) and transmission rate β which satisfy System (3) [17] [18] [19] .

To quantify uncertainty in the underestimation factor, expansion factor, number of symptomatic influenza illnesses, and basic reproduction number, we utilize variability in parameter estimates in Table 3 . Note that model parameter estimates appear as point values with the exception of the recovery rate γ (Table 3) . We solve system (4) according to each case definition (Modified-CDC and RI), country (US and Canada) and study year (2011-2012 and 2012-2013) with the mean value of γ , lower 95% confidence bound and upper 95% confidence bound on γ . The estimated value of p and β corresponding to the mean value of γ represent baseline results. The ranges of p and β obtained using the 95% confidence bounds of γ represent their sensitivity to an estimated 95% CI of the infectious period of influenza. We obtain baseline estimates of the expansion factor, number of symptomatic influenza illnesses and the basic reproduction number using baseline p and β estimates and methods in "Expansion factor and symptomatic illnesses" section. To retrieve an interval based on variation of recovery rate γ , we propagate the variability in p and β to each of the epidemiological parameters using methods in "Expansion factor and symptomatic illnesses" section.

We have assumed perfect reporting and ascertainment of influenza virus infection within the RCT. Participants were instructed to contact their study site if they had any respiratory symptoms [17] . In addition, participants were contacted by a call center twice weekly (between the beginning of January and the end of February) or weekly until the end of illness surveillance (April 30 each year) [17] . In light of these frequent participant follow-ups in the RCT, we expect this assumption to hold. However, we consider the possibility of underestimation occurring within the RCT to be exhaustive in our analysis [17] . For details regarding the sensitivity of p and β to underestimation in the RCT, see Appendix C.

The estimates for Canada are displayed in Table 4 . Using the laboratory-confirmed influenza counts associated with modified CDC-defined ILI within the RCT, we quantified the underestimation factor p = 2.6% Table 5 . Using the laboratory-confirmed influenza associated with modified CDC-defined ILI within the RCT, we estimated p = 0.5% We estimated the disease transmission rate β range to between 0.25 and 0.39 in each season and country, cor-responding to basic reproduction numbers R 0 ranging between 1.19 and 1.22 (Tables 4 and 5 , RI case definition).

Recall we have assumed perfect reporting and ascertainment of influenza virus infection among the RCT participants and proposed to revisit this assumption with a sensitivity analysis. We have conducted a sensitivity analysis and present the details in Appendix C. Overall, the sensitivity analysis suggests that the results in "Results" section are robust to underestimation within the RCT [17] . We find that our estimates for β are weakly dependent on underestimation within the RCT [17] . Further, the captured fraction of influenza infections by virologic surveillance p is also robust to underestimation within the RCT [17] , maintaining an order of magnitude while varying RCT underestimation over its full range. See Appendix C for a full presentation of this analysis.

This study develops and illustrates a method utilizing a parameter estimation technique based on a mechanistic model and data synthesis to quantify the underestimation factor associated with season influenza and the number of symptomatic illnesses. These estimates take into account mechanistic detail of influenza transmission, vaccine effectiveness, relative vaccine efficacy of SD to HD from the RCT, and vaccine coverage. While this method does utilize RCT outcomes, the remaining data required to generate the estimates become publicly available by the end of the influenza season. This method can be utilized for epidemiological parameter estimation for infectious diseases, provided there is an appropriate form of active surveillance (e.g., clinical trial) data available. A coupled system of differential equations for the actively monitored population and general community can be developed to integrate surveillance epidemiological data to quantify key population-level parameters, including the underestimation factor.

Our estimates generated from the laboratory-confirmed influenza associated with modified CDC-defined ILI case definition are representative of the underestimation of cases which could be captured by the surveillance system (Tables 4 and 5, modified CDC-defined ILI case definition). In this light, our analysis indicates that the surveillance system in Canada captured 2.6% (2.1 − 3.2%) and 1.2% (0.98 − 1.2%) of symptomatic cases closely associated with modified CDC-defined ILI in 2011-2012 and 2012-2013 influenza seasons, respectively. In the US, the virologic surveillance system was estimated to capture 0.1% (0.08 − 0.12%) and 0.09% (0.07 − 0.11%) of symptomatic influenza cases closely associated with modified CDC-defined ILI in 2011-2012 and 2012-2013 influenza seasons, respectively. In each country, the percentage of captured cases decreases with an increase in laboratoryconfirmed cases from passive surveillance, which may be due to laboratory testing capacity or changing testing practices (Tables 4 and 5 , modified CDC-defined ILI).

The underestimation factors generated from laboratory-confirmed influenza associated with RI, i.e., a broader range of symptoms, provide an underestimation factor more closely representing the true symptomatic influenza underestimation factor and are apt to assess the true number of influenza illnesses (Tables 4 and 5 , respiratory illness case definition). When considering a range of symptomatic influenza illness from the estimates using data specified by these two case definitions, we provide symptomatic influenza illness estimates which are consistent with US CDC's in 2011-2012 and 2012-2013 (Fig. 3b) . The US CDC estimate in 2012-2013 is near the low end of the range we estimated, which may be due to difference in methodology. In particular, this difference may be due to the nonlinear (exponential) relationship between the underestimation factor and laboratoryconfirmed illnesses in surveillance data. An explanation to support this nonlinear relationship and our results could be that the increased number of influenza illnesses results in limited availability of laboratory tests, therefore a fewer proportion of cases were captured in virological surveillance.

We offer two sources of comparison from studies of the 2009 influenza pandemic. In an expert opinion from the 2009 influenza pandemic, the number of expected influenza infections per laboratory-confirmed infection ranged from 10 to 500 among experts [37] . For Canada, our estimates of the expansion factor lie in this range of 10 to 500 (Table 4) . For the US, we estimate expansion factors greater than 500 using respiratory illness case definition (Table 5) . Also, estimates of underestimation factors were between 0.1% and 0.7% during the 2009 influenza pandemic in Mexico [16] . Our estimates for the true underestimation in Canada and the US lie in this range (Table 4 and 5, respiratory illness case definition). In both cases, investigations of the 2009 pandemic influenza in Canada and Mexico align well with Canada; however, according to our analysis the underestimation in laboratory-confirmed surveillance in the US is more substantial. This is likely due to differences in surveillance systems, laboratory-testing protocols, or health-seeking behavior. We also note that epidemiological characteristics differ from pandemics to seasonal influenza and the level of awareness may have impacted health seeking behavior and diagnosis practices. In the US, we compare with US CDC's estimates of symptomatic influenza illnesses [14] . For data values and ranges see Tables 4 and 5  Table 5 Summary of estimates in the US during 2011-2012 and 2012-2013 influenza seasons. Values reported as estimated baseline value and range from variation of recovery rate γ (see section "Sensitivity analysis" for the details of sensitivity analysis) Seasonal The basic reproduction number ranged from 1.19 -1.22 for estimates associated with RI case definition, which more closely represent the true R 0 . While these basic reproduction numbers align tightly in the US and Canada, our estimates for R 0 are higher in Canada in all influenza seasons (Tables 4 and 5) . A recent systematic review found an interquartile range of reproduction numbers for seasonal influenza of 1.19-1.37, with median value 1.27 [38] . Our estimates for R 0 are in line with typical findings representative of seasonal influenza [38] .

We estimate the symptomatic influenza illnesses in the US and Canada; however, the number of asymptomatic influenza illnesses has not been assessed in this work. A recent review of estimates for the fraction of asymptomatic infections vary widely; however, values from 20%-50% are typical [39] . These estimates indicate that the number of asymptomatic influenza illnesses may be substantial. Even so, based on our scan of the literature, there is no clear consensus on the contribution of asymptomatic individuals to influenza transmission. Future studies may attempt to estimate the total number influenza illnesses, symptomatic and asymptomatic, to provide a more accurate assessment of influenza illnesses; influenza underestimation; and the force of infection. As a result of this simplification, we may underestimate R 0 and hence the force of infection.

There are several limitations in the current study. There are differences in symptomatic attack rates, disease presentation, and health-seeking behaviour between age groups. As a result, the symptomatic reporting fraction likely varies from age group to age group. Future study may be extended to address these age group disparities by quantifying the underestimation and expansion factors for each age group. Another limitation to this study is the assumed homogeneous mixing between age groups; however, this may also be addressed in a future study. In fact, the RCT data may allow the contact preference between age groups (i.e. the contact rate between age groups 0-64 and 65+, and vice-versa) to be estimated. In this light, the methods presented herein may be used as a method to validate empirical social contact data obtained from surveys or existing contact mixing patterns. In addition to age-specific heterogeneity, there may be spatial heterogeneity in contact mixing, surveillance systems, and social behavior which impacts case ascertainment and reporting rates. In this light, it may be advantageous to conduct analysis at more granular scales (e.g. city, state, or provincial levels) to more accurately capture transmission and underestimation at these sub-national levels. The RCT was conducted in the US and Canada, and we use aggregate case counts and RCT participant counts from US and Canada. In other words, the RCT is assumed to take place in the US or Canada. The underestimation factor could also be separated into several multipliers, as in prior works [13] . An improvement could be made to account for laboratory test accuracy; laboratory-confirmed cases from the RCT and surveillance reports can be preprocessed for the method to generate more representative estimates. Overall, this work lays a functional framework to expand as the research questions at hand require.

We develop a method for quantifying the underestimation factor and disease transmission rate by integrating several data sources including RCT outcomes. We use our method to assess surveillance system capacity, number of symptomatic influenza illnesses, and R 0 in the US and Canada during 2011-2012 and 2012-2013 seasons. The utilization of outcomes from RCTs (in this case a comparative vaccine RCT) may allow for the extraction of additional information characterizing an epidemic which may not be possible by limiting data usage to national influenza surveillance reports. This work illustrates the point that monitoring a representative group within a population may aid in effectively modelling the transmission of infectious diseases such as influenza. In general, the utilization of available surveillance data may increase the capabilities of disease models and broaden their power to draw inferences. A formal structural and practical identifiability analysis should be carried out to rigorously address these details of parameter identification.

It is key to provide more accurate methods to estimate the annual incidence of influenza to guide evidence-based immunization policy-making. In this light, it is vital to develop more accurate mathematical models of influenza transmission, and to accurately evaluate the impact of influenza vaccination. These methods may contribute to and enable the design of optimal vaccination programs that best reduce the annual incidence of seasonal influenza as well as associated hospitalizations, medical visits and deaths.

Here we present the details for estimating p and β. We also introduce the following notation for convenience and readability; let a hat (ˆ) over a variable denote integration over all nonnegative time. 

Similarly, taking the second equation from submodel (1), we have:

For findingÎ C +,SD ,Î C +,HD , adding the first and third equations, & second and fourth of submodel (1) followed by integrating both sides give us: Thus, we define the following functions:

Now, from model (2) Recall pJ = R, where p is the underestimation factor. Integrating the equations above yields the respective final size relations:

. Further, by summing all the equations of model (2), we obtain the following equality:

Integrating both sides of the above equations with the assumption that there are no infections at t 0 and t ∞ gives: 

With this definition, we obtain the following system of equations, which β and p must satisfy:

f (β, p) = 0, g(β, p) = 0.

Suppose that β and p satisfy f (β, p) = 0, then we have: Note that for both of these equalities to hold, it is required that 

Equation (10) does hold since this is precisely the expression used to estimate +,HD from vaccine RCT information and +,SD . Since all known variables in f and g have been estimated except for β and p, we can obtain estimates for β and p by solving (15) (e.g., numerically).

f (β, p) = 0, g(β, p) = 0.

Solving System (15) yields estimates for p and β, which we provide with numerical methods in Section "Disease transmission and burden estimates". Lastly, we note that Equations (9) and (12) remain intact if a latent compartment were to be added to models (1) and (2) to account for the incubation period of influenza. Informing +,SD : In the same test-negative case control study, VE was estimated to be 47% among participants aged 50+. As a result, we estimate +,SD using the approximation = 1−VE to find +,SD = 1−0.47 = 0.53 [36] .

Informing −,SD : First, we inform −,SD using the estimates for the VE against medically-attended influenza over all ages in the US of 49% [27] . We then calculate −,SD = 1− 0.49 = 0.51 using the approximation +,SD = 1 − VE [36] .

Informing +,SD : In the same US study, VE against medically-attended influenza was estimated to be 29% among those aged 65+ [27] . We then calculate +,SD = 1−0.29 = 0.71 using the approximation +,SD = 1−VE [36] .

Informing +,HD : Now, to estimate the high-dose vaccine-modified susceptibility we first use Equation (3) in the main text informed by RCT results to calculate +,SD / +,HD [17] . We use the known ratio: +,SD / +,HD to calculate +,HD = 0.79 * 0.71 = 0.56. In the US we have +,HD = 0.56.

In this section, we present the details of the sensitivity analysis exploring the impact of underestimation in the RCT on study results. Specifically, we investigate the sensitivity of p and β on underestimation in the RCT [17] . To consider underestimation in the RCT, we introduce a underestimation factor p 1 in the RCT. As before, p remains the underestimation factor in the general community. We also assume that RCT participants provided with standard-dose and high-dose vaccines are equally underestimated. Lastly, we define δ to be the fraction of true remaining vaccinated participants at the end of the RCT. 

Measuring underreporting and under-ascertainment in infectious disease datasets: a comparison of methods

The underrecognized burden of influenza in young children

Estimating the undetected burden of influenza hospitalizations in children

Underestimation of the role of pneumonia and influenza in causing excess mortality

The burden of influenza-associated critical illness hospitalizations

Influenza pneumonia surveillance among hospitalized adults may underestimate the burden of severe influenza disease

Improving the estimation of influenza-related mortality over a seasonal baseline

Estimates of mortality attributable to influenza and rsv in the united states during 1997-2009 by influenza type or subtype, age, cause of death, and risk status. Influenza Other Respir Viruses

Improving accuracy of influenza-associated hospitalization rate estimates

Hospitalization attributable to influenza and other viral respiratory illnesses in canadian children

Role of influenza and other respiratory viruses in admissions of adults to canadian hospitals. Influenza Other Respir Viruses

Statistical estimates of respiratory admissions attributable to seasonal and pandemic influenza for canada. Influenza Other Respir Viruses

Estimates of the prevalence of pandemic (h1n1) 2009, united states

An evidence synthesis approach to estimating the incidence of seasonal influenza in the netherlands. Influenza Other Respir Viruses

Estimating the incidence reporting rates of new influenza pandemics at an early stage using travel data from the source country

Efficacy of high-dose versus standard-dose influenza vaccine in older adults

Public Health Agency of Canada

Age and Sex Composition in the United States

Age and Sex Composition in the United States

Low 2012-13 influenza vaccine effectiveness associated with mutation in the egg-adapted h3n2 vaccine strain not antigenic drift in circulating viruses

Influenza a/subtype and b/lineage effectiveness estimates for the 2011-2012 trivalent vaccine: cross-season and cross-lineage protection with unchanged vaccine

Influenza vaccine effectiveness in the 2011-2012 season: protection against each circulating virus and the effect of prior vaccination on estimates

Influenza vaccine effectiveness in the united states during 2012-2013: variable protection by age and virus type

Comparative effectiveness of high-dose versus standard dose influenza vaccines in us residents aged 65 years and older from 2012 to 2013 using medicare data: a retrospective cohort analysis

Vaccine Coverage Amongst Adult Canadians: Results from the 2012 Adult National Immunization Coverage (aNIC) Survey

A Review of the Literature of High Dose Seasonal Influenza Vaccine for Adults 65 Years and Older

Flu Vaccination Coverage, United States

Flu Vaccination Coverage, United States

Seasonal Influenza Vaccine Effectiveness

A bayesian mcmc approach to study transmission of influenza: application to household longitudinal data

Reproduction numbers of infectious disease models

Study designs for evaluating different efficacy and effectiveness aspects of vaccines

Pandemic influenza: modelling and public health perspectives

Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature

The fraction of influenza virus infections that are asymptomatic: a systematic review and meta-analysis

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Special thanks to the Fields Institute and the Centre for Quantitative Analysis (CQAM) for hosting the Industrial Problem Solving Workshop for which this research originated. 

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.Ethics approval and consent to participate Not applicable.

Not applicable.

Here we present the details for the calculations of parameter values +,SD , +,HD and −,SD for influenza seasons 2011-2012 and 2012-2013 . For this purpose, we utilize an approximation which relates vaccine-modified susceptibility to vaccine effectiveness (VE) [36] . We also leverage influenza vaccine effectiveness (VE) from several studies in the US and Canada.

Informing −,SD : First, we inform +,SD using VE estimates. In Canada, the VE against Real-Time Polymerase Chain Reaction (Real-time PCR) confirmed influenza in a test-negative case-control study was estimated to be 55% among all participants of all ages [25] . We use the approximation = 1 − VE to estimate −,SD ; that is −,SD = 1 − 0.55 = 0.45 [36] .Informing +,SD : In the same test-negative case-control study, the VE against Real-time PCR confirmed influenza among participants aged 50+ was estimated to be 58% [25] . We use this figure to estimate +,SD using the approximation = 1 − VE to find +,SD = 1 − 0.58 = 0.42 [36] .

Informing −,SD : In the US, the VE against medicallyattended influenza among all ages was estimated to be 47% [26] . We now use the approximation = 1 − VE to estimate −,SD as follows: −,SD = 1 − 0.47 = 0.53.Informing +,SD : The VE against medically-attended influenza among those aged 65+ was estimated to be 43% in the US in 2011-2012 [26] . With the relationship +,SD = 1 − VE we calculate −,SD = 0.57 [36] .Informing +,HD : Lastly, to estimate the high-dose vaccine-modified susceptibility we use Equation (3) in the main text informed with RCT results to calculate +,HD / +,SD [17] . Note that with the ratio of two vaccine-modified susceptibilities and +,SD known, we can calculate +,HD . For the US, we then have +,HD = 0.55 × 0.53 = 0.29.

Informing −,SD : We inform −,SD using the VE against PCR-confirmed influenza from a test-negative case control study in Canada [24] . In this study, the VE against PCR-confirmed influenza was estimated to be 51% [24] . Now, we use the approximation = 1 − VE = 1 − 0.51 = 0.49 [24] .

The total number of observed infections in the RCT can be written as:The above equation can also be formulated in terms of p 1 and δ, that is:The relationship between p 1 and δ can be determined by equating (16) and (17) which yields:Under these conditions, by conducting the same analysis of model (1) as shown in Appendix A, we derive the condition:Continuing to follow the analysis from Appendix A, we arrive at the following system of equations:We analyze the robustness of the estimates of β and p by varying δ in the above system of equations. Exploring sensitivity with respect to δ gives insight into how underestimation in the RCT affects our results in Section "Results". Specifically, for a given p 1 , we calculate a corresponding δ then solve the above system numerically.

The relationship between the RCT underestimation factor, p 1 , with β and p estimates is shown in Figs. 4, 5, 6, and 7. We explore the sensitivity of p 1 on p and β for each context of interest; both regionally in the US and Canada during 2011-12 and 2012-13 influenza seasons. We present the sensitivity analysis using the laboratoryconfirmed influenza associated with RI case definition outcomes in the RCT.Overall, p and β are robust to underestimation in the RCT. We see that the effect of underestimation in the RCT is negligible on the transmission rate β for reasonable values of p 1 . On the other hand, p is also weakly dependent on p 1 . By varying p 1 over its entire range, p remains to be on the same order of magnitude.