key: cord-0291640-q9wf5thi
authors: Wong, A.; Kramer, S. C.; Piccininni, M.; Rohmann, J. L.; Kurth, T.; Escolano, S.; Grittner, U.; Domenech de Celles, M.
title: Using LASSO regression to estimate the population-level impact of pneumococcal conjugate vaccines
date: 2022-02-15
journal: nan
DOI: 10.1101/2022.02.12.22270888
sha: c9993d6e38bc26c3c30d9389100e4421e6ec30c0
doc_id: 291640
cord_uid: q9wf5thi

The pneumococcal conjugate vaccines (PCVs) protect against diseases caused by Streptococcus pneumoniae, such as meningitis, bacteremia, and pneumonia. It is challenging to estimate their population-level impact due to the lack of a perfect control population and the subtleness of signals when the endpoint - like all-cause pneumonia - is non-specific. Here we present a new approach to estimate PCVs' impact - using LASSO regression to predict the counterfactual outcome for vaccine impact inference. We first used a simulation study to test the performance of LASSO regression and established methods including the synthetic control (SC) approach. We found that LASSO achieved accurate and precise estimation, even in complex simulation scenarios where the association between outcome and all control variables was non-causal. We then applied LASSO to real-world data and found that it yielded estimates of vaccine impact similar to SC. The LASSO method is accurate, easily implementable, and can be applied to study the impact of PCVs and of other vaccines.

at least 18 months in each country. Country-specific data periods are detailed in Table 1 . The fitting period for each method is specified in Table 2 .

Statistical model -LASSO regression: LASSO is an extension of linear regression that decreases the variance of regression coefficients and the prediction error by adding a term to the log-likelihood to penalize the complexity of the model 27 . This leads to a parsimonious model with a subset of control variables that best predicts the outcome.

To estimate the penalty parameter, we first generated a grid of 100 values for the penalty parameter and fitted LASSO regression to the pre-vaccine period data for each value in the grid. Next, we selected the best value for the penalty using either 10-fold cross validation (CV) or Akaike Information Criterion (AIC) 27 . In a 10-fold CV, the pre-vaccine data period was randomly divided into 10 groups of equal size, with 9 groups forming the training set and 1 group forming the test set. A model was fit on the training set and the minimized mean squared error (MSE) was obtained when tested on the test set. This was repeated 10 times to yield an average MSE. This was repeated 100 times on each value in the grid of penalty parameter and the penalty with the lowest MSE was selected. Using the AIC for the penalty selection, we fitted LASSO regression to the pre-vaccine data period and the penalty with the lowest AIC was selected.

We tested two variants of the LASSO regression model: the first one included all seasonal variables by default (season-forced, SF); the second one treated seasonal variables as control variables and allowed LASSO regression to select from them (season-unforced, SU). The selected model was re-fitted onto the entire pre-vaccine period to predict the counterfactual outcome ( ! " ) during the evaluation period -that is, the hospitalization counts that would have occurred in the population if PCV had not been introduced, assuming the distribution and associations of the population features captured in the pre-vaccine period data remained unchanged. With the LASSO-predicted counterfactual under the no-vaccine scenario and the observed outcome ( ! ), we calculated the vaccine impact using Equation 1 . An IRR less than 1 indicates a reduction in all-cause pneumonia hospitalization due to the vaccination program.

where is the set of time points during the evaluation period. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ;  https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint Statistical model -other methods : We compared LASSO regression to three established methods in the field of vaccine impact estimation. The key features of the implementation of LASSO regression and all comparator methods are summarized in Table 2 . The three comparator methods are described below:

1. Interrupted Time Series (ITS) is a method that includes an indicator variable for vaccination, secular trends before and after PCV introduction, and background seasonality 16, [32] [33] [34] where + ! is the fitted outcome and ! " is the counterfactual outcome during the evaluation period. We used the fitted model to predict the counterfactual outcome ! " during the evaluation period in the absence of vaccination (i.e., indicator variable set to "0" for all time points).

In accordance with the synthetic control (SC) method 23, 25 , time series of different control variables were weighted according to their fit to the outcome time series in the pre-vaccine period using Bayesian variable selection. The weighted time series were jointly used to predict the counterfactual outcome ! " .

The model was adjusted for background seasonality using 11 monthly indicator variables and the logarithm of non-respiratory hospitalization was included as a covariate. The vaccine impact was calculated using Equation 1.

For the Seasonal-Trend decomposition using LOESS plus Principal Components Analysis (STL+PCA) method, a smoothed trend for each of the control variable's time-series was extracted with seasonal-trend decomposition using locally-estimated scatterplot smoothing (LOESS) 24 . A PCA was performed on the extracted smoothed trends, and the first principal component was selected as the composite trend, which

. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Performance assessment with simulated data: To assess the performance of all methods to estimate vaccine impact, we designed a simulation study. We generated the outcome, monthly pneumonia hospitalization ( " ), 

The intercept, α, is calculated as the logarithm of the mean ratio of pneumonia hospitalization to all non-respiratory hospitalization (ln( G / GGGGGGG )). For the association of any control variable and the outcome not to be unrealistically strong, the values assigned to the of the included control variables were randomly sampled from a uniform distribution with range -0.3 to 0.3, such that a change of one standard-deviation in the control variable, holding the other variables constant, would result in 0.74 -1.35-fold change in the outcome. " was modelled as a Fourier series of 11 terms that consisted of 6 cosine and 5 sine functions. We assigned a value of 0.5 to the of the first cosine function such that the outcome peaked in January and oscillated approximately 50% above and below the annual mean to mimic pneumonia seasonality in the real world. The impact of vaccination was modeled by the parameter . In all simulations, we assumed a vaccine with null impact ( = 0), . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint IRR=1. The simulated data were screened to be realistic, such that the maximum ratio of annual maximum-tominimum for the expected count of the outcome in any simulation set would not exceed 10.

We tested the performance of all methods in four different scenario types. First, we used five randomly selected control variables and a seasonal variable to simulate the outcome. This analysis was performed five times (simulation sets A to E) with re-sampling of the 5 control variables. To illustrate this process using an example, we plotted the first set (set A) of five control variables and their respective assigned values used to generate the outcome time-series in Figure 1A and the generated outcome time series in Figure 1B . Second, we repeated this analysis with 10 control variables (simulation sets F to J). Third, we tested the performance of all methods on sparse data, which may affect the performance of these methods, as has been previously suggested 24 . We generated the sparse data (simulation set K) by taking a 10% binomial sub-sample of the first set (set A) of simulated data.

A flow-chart for the outcome simulation procedures can be found in Appendix 2 Figure S1 ; plots similar to Figures   1A and 1B for simulation sets A to E can be found in Appendix 2 Figure S2 . The complete list of control variables used to simulate the outcome can be found in Appendix 1.

Finally, we tested all methods in a fourth scenario type (simulation sets L to P), in which the variables causing the outcome were not available. We used three control variables (C1, C2 and C3) to generate four conditions (Z1 to Z4) and the outcome, such that the conditions Z1 to Z4 and the outcome were not causally related but non-causally associated via common causes. We then removed C1, C2 and C3 together with their associated control variables (that is, diagnosis in the same ICD-10 chapter) from the list of control variables. Such a framework is more realistic because observed associations between different causes of hospitalization is likely due to common causes rather than direct causal influence. The Directed Acyclic Graph (DAG) for the underlying data generation process under this framework can be found in Appendix 2 Figure S3 and the control variables used for outcome simulation can be found in Appendix 1.

We evaluated each method's performance by comparing each method's estimates to the true IRR, which was 1.

IRR estimates less than 1 indicated an overestimation of vaccine impact; IRR estimates greater than 1 indicated an underestimation of vaccine impact. In the LASSO regression and ITS, we report uncertainty of estimation as the 95% prediction intervals (95% PI) of the IRR obtained from each simulation, extracted from the 2.5 th and 97.5 th percentiles of the Poisson distribution of the predicted value. In SC and STL+PCA, we report the 95% is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint credible intervals (95% CI) of the IRR from each simulation, extracted from the 2.5 th and 97.5 th percentiles of the Bayesian posterior distributions. Currently, the calculation of standard error remains an unresolved challenge in frequentist LASSO methods and the usual statistical constructs such as confidence intervals and p-values do not exist in its implementation 35,36 . Therefore, we reported different uncertainty measures for different methods, and they are not directly comparable. For each set of 100 simulations, we measured the accuracy of each method by calculating the mean IRR with standard deviation (SD). We measured the precision as the width of the 95% uncertainty intervals (95% UI). For each set of 100 simulations, we assessed performance stability by coverage (proportion of the time that the uncertainty intervals contain the true value). For each scenario type, we report these performance indicators as a range.

Application to real-world data: In the analysis on the primary endpoint (all-cause pneumonia), the two variants of LASSO regression, LASSO-SF and LASSO-SU, as well as SC, were applied to each age group in each country.

Similar to the procedure used on the simulated data, we fitted models to the pre-vaccine period and used these fitted models to predict the counterfactual outcomes. We performed model selection, the prediction of the counterfactuals, and the calculation of vaccine impact using the same procedures in the performance assessment using the simulated data; we then compared the results from the three methods.

We then applied LASSO-SF, LASSO-SU and SC on the US data using different endpoints (IPD, pneumococcal pneumonia, and two definitions of all-cause pneumonia) and compared the results. We performed a sensitivity analysis by removing "bronchitis and bronchiolitis" from the list of possible control variables for LASSO selection, because this control variable could be affected by PCV and violate the assumption that all control variables are not affected by the health intervention, by definition 23,37 .

Numerical implementation: All analyses were conducted in RStudio with R version 4.1.0 (R) 38 . LASSO regression was implemented using the package "glmnet" version 4.1-2 39 . SC and STL+PCA were implemented using the package "InterventionEvaluatR" version 0.1 40 . The project's R dependencies were recorded by the package "renv" version 0.14.0 41 for reproducibility.

. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint Performance assessment with 5 causal control variables: We present the vaccine impact estimated by each method for each simulation in five scenarios (sets A to E) in Figure 2 . In each scenario, a different set of five randomly selected control variables was used to generate the outcome. LASSO-SF and LASSO-SU performed well in all five scenarios; both achieved high coverage (SF: 97-100%; SU: 96-100%) and accurate mean IRR (SF:

1.00-1.02; SU: 1.00-1.03) with good precision (SF; SU: 0.12-0.13), as shown in Figure 2 and Table 2 . The estimates obtained by LASSO-SF and LASSO-SU were similar. In general, LASSO regression tended to select the causal variables (Appendix 3 Figure S4 ). When comparing CV selection and AIC selection, we did not observe different performance in terms of accuracy and precision, but we noticed CV selection resulted in models with more variables while AIC selection led to more parsimonious models (Appendix 3 Figure S5 ).

Other methods showed variable performance. ITS yielded accurate and precise estimates with high coverage in one scenario (set D) but the estimates were biased, although precise in the other scenarios (sets A, B, C & E), resulting in variable coverage (0-100%). Similarly, the mean IRR estimated by STL+PCA was biased in some scenarios (sets B & D), causing the coverage to be variable (0-100%). SC showed relatively high coverage (78-94%) and accurate mean IRR (0.99-1.02) with good precision (0.07-0.11). The performance indicators of all methods are summarized in Table 3 .

Performance assessment with 10 causal control variables: The performance of LASSO-SF and LASSO-SU remained robust in another five scenarios (sets F to J), in which different sets of ten randomly selected control variables were used to generate the outcome. As the causal control variables increased from 5 to 10, the number of control variables that were consistently selected by LASSO-SF and LASSO-SU also increased (Appendix 4 Figure S7 ). Again, the performance of SC was satisfactory and consistent, while that of other methods appeared to be variable (Appendix 4 Figure S6 ). The performance is summarized in Table 3 .

Performance assessment with sparse data: When the monthly hospitalization counts became as sparse as 10% of the first simulation scenario (set K), the performance of all methods remained consistent in terms of accuracy, but the precision notably decreased as the 95% UIs of the IRR estimated by all methods widened considerably, which in turn increased coverage. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint Performance assessment with non-causal control variables: When tested on the data generated from three causal control variables (C1, C2 and C3) that were subsequently removed (sets L to P), the estimation by LASSO-SF and LASSO-SU remained accurate and precise (Table 3, Figure 2 ). In absence of C1, C2 and C3, LASSO-SF and LASSO-SU selected the control variables that were associated with the outcome via the common parent, such as Z2, Z3 or Z4. We also observed that LASSO-SF and LASSO-SU preferentially selected the control variable more strongly associated with the common parent, Z2; whereas Z1, the control variable with a weaker association with the common parent, was almost never selected in all five scenarios (Appendix 4 Figure S8 ).

Application to real-world data: The characteristics of the four country's datasets used in this analysis are summarized in Table 1 . The IRR estimated by LASSO-SF and LASSO-SU were comparable to those by SC for Chile, Ecuador, Mexico and the US. The three methods generally arrived at the same conclusion as to whether there was a significant impact of PCV, except for two instances: (1) the two LASSO methods found a significant impact of PCV in the age group 40 to 64 years in Chile, in age groups 18 to 64 years in Ecuador, and in the age group 40 to 64 years in the US while SC did not, and (2) the SC method detected a significant impact of PCV in the age groups 0 to 1 year in Mexico, which was not detected by the two LASSO methods. Complete results are shown in Figure 3 . In general, we found that LASSO delivered comparable estimates to SC, which has been shown to be a reliable method in vaccine impact estimation 23, 25 .

The results using Ecuador, Mexico, and US data were sensitive to removing "bronchitis and bronchiolitis" from the list of control variables that LASSO regression and SC could choose from. In Ecuador, the reduction in allcause pneumonia hospitalization attenuated in the youngest age group, but remained statistically significant in age groups 18 to 64 years. In Mexico, the reduction in the youngest two age groups detected by SC was also attenuated and was no longer statistically significant. In the US, only a marginal reduction was detected by LASSO-SF and LASSO-SU in the age group 40 to 64 years and not in older adults before removing "bronchitis and bronchiolitis", but after doing so, a more pronounced reduction was observed in all the age groups from 18 to 79 years. Full results are shown in Appendix 5 Figure S9 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint reduction in IPD hospitalization across all age groups, except for the age group 5 to 17 years. LASSO-SU and SC found a significant reduction in IPD hospitalization in age groups younger than 5 years and older than 64 years.

As the endpoint definition became less specific, the reduction point estimate became smaller in size and the 95% UI more often included 1. The results are shown in Figure 4 .

In this study, we aimed to assess whether LASSO regression models can accurately estimate vaccine impact.

Using a simulation study, we first assessed the performance of LASSO regression compared with other commonly-implemented methods, such as ITS, SC 23,25 and STL+PCA 24 . We then applied LASSO regression and SC 23,25 to real-world data and compared their results. Overall, we found that LASSO regression allowed for accurate and precise estimation of vaccine impact and performs comparably to established methods, such as SC 23, 25 .

The results from the simulation study showed that LASSO regression was able to estimate the pre-determined vaccine impact accurately and precisely, and its performance remained stable even under more complex data simulation procedures, such as the one without any causal variables. While ITS was able to estimate the predetermined vaccine impact accurately in some simulation scenarios, its performance was not robust across scenarios. As ITS did not include any control variables but only the offset and seasonal terms, its assumption that the characteristics in the population remained unchanged throughout the study period limited its performance 42 .

In practice, control variables can be included in more advanced ITS models to improve performance 16, 42 ; however, the process of hand-picking control variables is subjective and can introduce biases into the analysis 43 . By design, ITS assumes a linear (or exponential, when a log-link function is used) trend for the continuous effect of an intervention 23, 42, 44 . This assumption can limit the method's validity because the nature of continuous effect of an intervention is often unknown or difficult to ascertain.

We compared LASSO regression with two other methods -the SC approach 23,25 , and STL+PCA 24 . In contrast to a priori selection of control conditions 21 , these methods use data-driven approaches to select various control is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The results from the simulation study showed stable performance of LASSO methods and SC, while the performance of STL+PCA was not robust. One possible explanation for the biased estimation by STL+PCA in some of the simulation scenarios is that only one principal component was used for the counterfactual prediction, which may not be sufficient in some of the simulation scenarios. As shown in the stimulation study, LASSO regression had the tendency to select the causal variables, or the associated variables when causal variables were not available, which is consistent with its feature in identifying few predictors with strong association 27,45 .

When we applied LASSO regression to the real-world data, we found that PCV statistically significantly reduced all-cause pneumonia hospitalization in the youngest children age groups in Chile and Ecuador, which is consistent with existing literature on PCV impact in Chile 46,47 and Ecuador 37,48 , but we did not observe similar results in the youngest age groups in Mexico and in the US. This is in contrast to the established PCV impact estimated among children in Mexico 48 and the US 29 . Although we found a significant reduction in pneumonia hospitalization in adult age groups (18 to 64 years) in Chile and Ecuador, we did not observe similar results in Mexico and the US.

Ben-Shimol in a systematic review 49 . Part of these discrepancies may be explained by the pragmatic aspects of When we compared different endpoints using LASSO regression, we found a larger effect size in the reduction in IPD hospitalization than in all-cause pneumonia hospitalization. As expected, using a more specific endpoint gave estimates of larger effect size because a larger fraction of the measured outcome was caused by the pathogen of interest, which is consistent with prior studies, in which different methods were used for the PCV impact is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Our results showed that applying LASSO regression to pneumonia hospitalization data was sensitive to removing "bronchitis and bronchiolitis" from the pool of control variables subject to selection by LASSO. One possible explanation for this observation is a potential violation of our assumption that bronchitis and bronchiolitis hospitalization was not impacted by PCV. Bruhn et al. 24 and Jimbo Sotomayor et al. 37 highlighted that including bronchitis and bronchiolitis hospitalization can be important for the accurate prediction of pneumonia hospitalization, especially in young children, due to its association with respiratory syncytial virus (RSV)

infections. Of note, the fraction of bronchitis and bronchiolitis hospitalization caused by the pneumococcus and the prevalence of RSV differ by age groups [53] [54] [55] [56] ; therefore, it is important to consider the potentially different pathogen-pathogen dynamics in different age groups when estimating vaccine impact.

There are a few major limitations in our study that should be considered. First, the simulated data were generated based on the time series of ten or fewer causes of hospitalization, and LASSO tends to perform well in situations where a few variables predict outcome because of its property to eliminate variables by shrinking their coefficients to zero. Therefore, the simulation scenarios in our study may favor LASSO regression. Nevertheless, it is possible that pneumonia hospitalization can be predicted by a few control conditions given its relatively strong seasonality and well-established etiology. Second, we extracted 95% PI from the Poisson distribution of the predicted values of the outcome during the evaluation period, and, as a result, only the uncertainty around the Poisson distribution was considered. The narrow 95% PI may therefore be over optimistic and cannot be compared to the 95% CI in SC 23, 25 or STL+PCA 24 , which consider the uncertainty of the included parameters. Third, monthly case counts lower than 10 were masked in the US due to privacy reasons, and we imputed these masked values by randomly selecting a number between 0 and 9. It is less likely to pose problems to the primary endpoint analysis because pneumonia hospitalization case counts in all age groups were high (in the scale of 100 to 10,000). In contrast, more specific endpoints such as IPD and pneumococcal pneumonia had lower hospitalization counts in younger age groups, although information from the trend was retained because the masked value had a definite range (less than 10). Lastly, using the predicted counterfactual based on pre-vaccine period data to infer vaccine impact assumes the relationship between the control conditions and the outcome remained the same before and after PCV introduction. Therefore, if the relationship between the control conditions and the outcome changed near the time point of vaccine introduction, the prediction performance of LASSO would be impacted; however, we believe it is rare for the relationship between all of the control conditions and the outcome to be altered at the same time.

An exception may be the scenario of a vaccine introduced to mitigate the effects of a disease that has a very strong is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint impact on lifestyle, mortality, and the healthcare system capacity, as was seen during the COVID-19 pandemic, for example.

Despite the aforementioned limitations, our study presents a novel approach for counterfactual prediction to serve the goal of vaccine impact inference. The validation using simulated data under a variety of scenarios and an application using epidemiological data in four countries in this study demonstrated LASSO regression's ability to estimate PCV impact. Given its stable performance shown in this study and ease of implementation, we argue that LASSO regression can be useful to assess the impact of other vaccines and ultimately help process epidemiological data to inform health policy making. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Acknowledgements We would like to thank Daniel M. Weinberger for making the data available and offering insightful comments, and Iris Artin for offering help with the use of the package "InterventionEvaluatR". We also thank Annette Aigner, Madlen Schranz, Elizabeth Goult, and Laura Barrero for their helpful discussions.

. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint

For each country, we reported the data period included in this study, the type of PCV being introduced and the date of PCV introduction, as well as how we defined the evaluation period. We also presented the median monthly hospitalization counts with 2.5 th and 97.5 th percentiles for the outcome, all-cause pneumonia, from all age groups, in the pre-vaccine and post-vaccine period, to illustrate the scope of the disease burden captured in these datasets. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint We applied five methods (ITS, LASSO-SF, LASSO-SU, SC, and STL+PCA) to sixteen sets of 100 simulations.

The sixteen sets can be grouped into four types of simulation scenarios: sets A to E -outcome simulated using 5 causal control variables; sets F to J -outcome simulated using 10 causal control variables; set K -same as set A but outcome count reduced to 10% as sparse; and sets L to P -outcome simulated using 3 causal control variables, which were then removed. We summarized the Incidence Rate Ratio (IRR) estimated for each of the 100 simulations using the mean IRR with standard deviation (SD). We summarized the precision of the IRR estimated for each of the 100 simulations using the mean width of 95% uncertainty intervals. We report the range of mean IRR and the range of SD, as well as the range of mean width mean width of 95% uncertainty intervals, for each type of simulation scenarios, except for set K, there was only one set of simulation and therefore we report the mean IRR with SD, mean width or 95% uncertainty intervals, and coverage, instead of a range*.

. CC-BY-NC-ND 4.0 International license It is made available under a perpetuity.

is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint Figure 3 . Age-group-specific incidence rate ratios (IRR) in four countries, estimated by two LASSO methods and SC.

A -D) Each panel shows the age-group-specific incidence rate ratios (IRR) for all-cause pneumonia in a population whose infants were vaccinated with pneumococcal conjugate vaccines (PCV) compared to a counterfactual population in which PCV was never introduced, estimated by LASSO-SF (green), LASSO-SU is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint Figure 4 . Age-group-specific incidence rate ratios (IRR) regarding invasive pneumococcal diseases (IPD) and other disease endpoints in the US, estimated by two LASSO methods and SC.

A -C) Each panel shows the age-group-specific incidence rate ratios (IRR) for four diseases endpoints in a population whose infants were vaccinated with pneumococcal conjugate vaccines (PCV) compared to a counterfactual population in which PCV was never introduced, estimated by A) LASSO-SF, B) LASSO-SU, and C) SC. The four endpoints used were invasive pneumococcal diseases (IPD) (red filled triangle), pneumococcal pneumonia (orange filled square), all-cause pneumonia as primary diagnosis or as first diagnosis after sepsis, is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

The copyright holder for this this version posted February 15, 2022. ; https://doi.org/10.1101/2022.02.12.22270888 doi: medRxiv preprint

Streptococcus pneumoniae: Transmission, colonization and invasion

Estimates of the global, regional, and national morbidity, mortality, and aetiologies of lower respiratory infections in 195 countries, 1990-2016: a systematic analysis for the Global Burden of Disease Study

Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study

A new pneumococcal capsule type, 10D, is the 100th serotype and has a large cps fragment from an oral streptococcus

Pneumococcal serotype evolution in Western Europe

Towards new broader spectrum pneumococcal vaccines: The future of indirect effects

Pneumococcal Conjugate Vaccine and Pneumococcal Common Protein Vaccines. Sixth Edit

Design and Analysis of Vaccine Studies

European Centre for Disease Prevention and Control. Disease factsheet about pneumococcal disease

Burden of invasive pneumococcal disease and serotype distribution among Streptococcus pneumoniae isolates in young children in Europe: impact of the 7-valent pneumococcal conjugate vaccine and considerations for future conjugate vaccines

Effect of ten-valent pneumococcal conjugate vaccine on invasive pneumococcal disease and nasopharyngeal carriage in Kenya: a longitudinal surveillance study

Estimating the population-level impact of vaccines using synthetic controls

Challenges in Estimating the Impact of Vaccination with Sparse Data

Estimated impact of the pneumococcal conjugate vaccine on pneumonia mortality in South Africa, 1999 through 2016: An ecological modelling study

Challenges to estimating vaccine impact using hospitalization data

An Introduction to Statistical Learning

Direct and indirect impact of 10-valent pneumococcal conjugate vaccine introduction on pneumonia hospitalizations and economic burden in all age-groups in Brazil: A time-series analysis

Hospitalizations for Pneumonia after a southern area of

Impact and effectiveness of 10 and 13-valent pneumococcal conjugate vaccines on hospitalization and mortality in children aged less than 5 years in Latin American countries: A systematic review

Declines in Pneumonia Mortality Following the Introduction of Pneumococcal Conjugate Vaccines in Latin American and Caribbean Countries

Indirect (herd) protection, following pneumococcal conjugated vaccines introduction: A systematic review of the literature

International Journal of Infectious Diseases Changing trends in serotypes of S . pneumoniae isolates causing invasive and non-invasive diseases in unvaccinated population in Mexico

The Pan America Health Organization. Data and Statistics (Immunization)

Evaluation of the effectiveness of pneumococcal conjugate vaccine for children in Korea with high vaccine coverage using a propensity score matched national population cohort

Pediatric Bronchitis: Practice Essentials

The management of acute bronchitis in children

pre-vaccine) 5151 (2636, 12209) (post-vaccine) 5036 (2745, 10092) Ecuador

A) Each row shows the estimates using a different method

Each column represents a scenario with outcome simulated with a different set of five causal control variables and one seasonal variable, from left to right: set A -health exams, bronchitis and bronchiolitis, dermatological condition, non-pneumonia infection, and non-pneumococcal septicemia; set B -nonpneumococcal septicemia, urinary tract infection (UTI), diabetes, stroke, and injury; set C -human immunodeficiency virus (HIV) infection, cholelithiasis, dermatological condition, health exams, diabetes, and bronchitis and bronchiolitis