key: cord-0453257-scqtgi5d
authors: Kunzmann, Kevin; Lingjaerde, Camilla; Bird, Sheila; Richardson, Sylvia
title: The `how' matters: A simulation-based assessment of the potential contributions of LFD tests for school reopening in England
date: 2021-03-02
journal: nan
DOI: nan
sha: 3a637fb975ea6a3c46abd1f4e2d0c7a649f84e2b
doc_id: 453257
cord_uid: scqtgi5d

During Covid-19 outbreaks, school closures are employed as part of governments' non-pharmaceutical interventions around the world to reduce the number of contacts and keep the reproduction number below 1. Yet, prolonged school closures have profound negative impact on the future opportunities of pupils, particularly from disadvantaged backgrounds, as well as additional economic and social impacts by preventing their parents from returning to work. Data on Covid-19 in children are sparse and policy frameworks are evolving quickly. We compare a set of potential policies to accompany the reopening of schools by means of an agent-based simulation tool. The policies and scenarios we model reflect the public discussion and government guidelines in early March 2021 in England before the planned nationwide reopening of schools on the 8th of March. A point of particular interest is the potential contribution of a more wide-spread use of screening tests based on lateral flow devices. We compare policies both with respect to their potential to contain new outbreaks of Covid-19 in schools and the proportion of schooldays lost due to isolation of pupils. We find that regular asymptomatic screening of the whole school as an addition to a policy built around isolation of symptomatic pupils and their closest contacts is beneficial across a wide range of scenarios, including when screening tests with relatively low test sensitivity are used. Multiple screening tests per week bring only small additional benefits in some scenarios. These findings remain valid when test compliance is not enforced although the effectiveness of outbreak control is reduced.

How to control infections in schools while allowing pupils as much in-school contact with teachers is an important question that governments throughout the world have grappled with. Balancing the health risks from infection of children in schools with the risks of loss of skills for the young and increase in inequality, the risks to child and parental mental health and the economic and social impact of parents not being able to return to work is a challenging conundrum to resolve [The DELVE Initiative, 2020] .

Since the start of the pandemic, many countries have incorporated school closures as part of their non-pharmaceutical interventions (NPI) implemented to control disease transmission [Thomas et al., 2021] . A report summarising evidence on schools and transmission from the Children's Task and Finish Group submitted December 17 to SAGE stated that accumulating evidence was consistent with increased transmission occurring among school children when schools are open, particularly in children of secondary school age; besides multiple data sources showing a reduction in transmission in children following schools' closure for half term [Office for National Statistics, 2020, Children's Task and Finish Group, 2020] .

In England, following the end of the first lockdown, schools fully reopened in September and remained open throughout the autumn term. But, in view of the increasing circulation of the Variant of Concern (VOC) B.1.1.7, SAGE told government on December 22 2020 that it is highly unlikely that the stringency of and adherence to the set of NPI measures which were in place from November in England which did not include school closures, would be sufficient to maintain the effective reproduction number R below 1 [SAGE, 2020] . In early January, in view of the increased transmission of the VOC, the UK government took the decision to postpone an announced programme of testing in schools, which relied in part on rolling out rapid tests using lateral flow devices (LFD) and to close schools till further notice [NHS Test & Trace, 2020] . During this period, there was intense discussion about which infection-control policies combining rapid testing and isolation would be both beneficial and feasible to implement in schools [Wise, 2020 , Deeks et al., 2021 and how to evaluate their effectiveness, including by randomisation [Bird et al., 2005] .

It is difficult to disentangle the part played by within-school child-to-child transmission from the knock-on effect of adult-to-child transmission-chains and increased social contact when schools are opened. A recent modelling study using social contact matrices from surveys at times when schools were opened or were closed suggests that altogether-school-opening could increase the effective reproduction number from 0.8 to between 1.0 and 1.5 [Munday et al., 2021] .

Our work focuses on the within-school transmission and directly addressed this important debate at a time when the reopening of schools is planned for March 8. We compare a set of NPI policies that take inspiration from measures that are currently discussed or that have already been implemented and do so with respect to both outbreak control as well as their capability to enable face-to-face schooling. To this end, we propose a realistic agent-based model tailored to the school setting. We primarily focus on the bubble-based contact pattern recommended for primary schools in the UK but also consider a scenario where bubbles are not feasible to implement. This latter case is particularly relevant to secondary schools or settings where rooms are too small to implement effective between-bubble isolation. Following concerns about compliance with LFD testing, we also explore a scenario where non-compliance with asymptomatic LFD testing is modelled explicitly.

By its flexibility, the open access agent-based simulation prototype that we have built will extend to a variety of school and small population settings but here we focus on: i) setting out the framework of our school SARS-CoV-2 agent-based model, which adapts the viral load based model of Larremore et al. [2021] to small-scale school settings;

ii) a range of testing policies including, as reference, the symptomatic Test & Trace recommendations as well as policies making use of rapid lateral flow tests in combination with specific isolation recommendations;

iii) uncovering the influence of key parameters like infectivity and test sensitivity on the effectiveness of the policies in schools; iv) demonstrating that our tailored agent-based modelling allows relative ranking of policies with regards to offering a good compromise between maintaining infection control and avoiding large number of school days lost, thus providing inputs to help designing control measures that are more likely to be good candidates for being evaluated in-context by specifically designed studies.

We have constructed our model around the following key assumptions, using current literature on SARS-CoV-2 infection and imperfect knowledge on school policy as both are evolving at pace.

1. The proportion of asymptomatic SARS-CoV-2 infections is believed to be higher in children than in adults [Hippich et al., 2020 , He et al., 2021 , Wald et al., 2020 .

2. Transmission can occur from asymptomatic infections, both pre-symptomatic and neversymptomatic [Arons et al., 2020 , Sutton et al., 2020 , Oran and Topol, 2020 .

3. Transmissibility is related to viral load (VL) [He et al., 2020] .

4. Transmissibility from asymptomatic infections may be lower, and is included in our set-up indirectly by prolonging VL clearance as done in Larremore et al. [2021] .

5. Delay from swab-date to PCR-result-date is seldom less than 24 hours [Fraser, 2021 , Larremore et al., 2021 .

6. Lateral flow devices give a non-quantitative test-result within 30 minutes and are billed as answering a different question than PCR-testing, namely: is a person likely to be contagious ? 7. Innova LFD tests have been used for screening purposes in nursing homes (now only in conjunction with PCR testing), work-places and primary schools but robust published evaluation has been lacking [Department for Education, 2021b, Department of Health and Social Care, 2021, 2020].

8. Initially, PCR-confirmation of any LFD-positive results was intended. This is no longer the case for on-site LFD tests but LFD positive results obtained through home testing still need to be verified with a follow-up PCR test [Department for Education, 2021b, p. 30] 9. Plans were well advanced to evaluate (via cluster randomised trial), as the alternative to 10 days of self-isolation at home, that secondary school-pupils who are a close contact of a confirmed case may remain at school provided that their daily LFD tests are negative [Department for Education, 2021a].

10. The above policy initiative, known as daily-contact-testing, was expected to be trialled in secondary schools which already implement weekly-LFD tests for all pupils, but these plans may be overtaken by a newly reported policy shift for LFD tests to be used at home twice weekly for secondary school pupils [Department for Education, 2021b].

To assess the impact of various policies on the level of individual schools we adopt an agent-based approach where agents correspond to pupils. Contacts involving staff are not modelled explicitly for simplicity, as the policy choice is focused on the pupils. The overall model is composed of independent sub-models for i) the contact structure between individual pupils, ii) viral load and symptom status trajectories during an acute SARS-CoV-2 infection, iii) the infection probability depending on the latent viral load, iv) and the sensitivity of the tests (PCR or LFD) that might be required for a policy.

The time resolution of the overall model is daily, i.e. daily symptom status and viral load are determined at 07:30AM. We further assume that any policy intervention (screening tests, isolation) is executed before individuals have a chance to meet. This is an optimistic assumption but justifiable since a recent announcement by the Department for Education includes the possibility of screening tests being sent home from the 15th of March [Whittaker, 2021] . We consider a time horizon of 6 weeks which roughly corresponds to the length of a half-term.

The average size of a primary school in England was 281 pupils with an average class size of 27 [GOV.UK, 2021, academic year 2019/20]. The English primary school education consists of six years. A typical primary school thus offers either one or two classes per year-group. We consider a school with two classes per year-group (12 overall) and 27 pupils per class, i.e. 324 pupils overall. We further assume that each class is subdivided into 3 bubbles of 9 pupils each. Here the term bubble refers to a group of pupils that is isolated as best as possible from other members of the same class or school [Department for Education, 2021c]. Although contact tracing is an effective tool to control an epidemic [Ferretti et al., 2020] , social distancing and contract tracing within bubbles are deemed unrealistic for younger pupils. The degree of isolation between bubbles depends, among other factors, on the availability of large enough rooms and sufficient staff.

We represent the school structure as a three-level hierarchical population where each pupil belongs to a bubble nested within a class. The classes, in turn, are nested within a school. For each of these groups we assume a fixed probability of a risk-contact between any pair of members per school day.

Within-Bubble Contacts: The highest intensity contact at the bubble level is treated as reference and we set the daily probability of a risk-contact at the bubble level to p bubble = 100%. This means that each pair of pupils within a bubble is guaranteed to meet on every single school day unless a pupil is isolated.

Within-Class Contacts: Each pair of pupils within a class has a daily probability of an additional risk-contact of p class .

Within-School Contacts: Each pair of pupils within the school has a daily probability of an additional risk-contact of p school .

The magnitude of the parameters p class and p school in relation to the 100% chance of having a risk-contact on the bubble level thus jointly represent the respective degree of isolation between groups on the different levels of the hierarchy. The contact probabilities on the class-or school level also account for factors not explicitly modelled, such as indirect interactions via staff or contacts on the way to or from school.

· · · · · · · · · . . . · · · · · · · · · 1/p bubble + 1/p class + 1/p school 1/p class + 1/p school 1/p school Figure 1 : Diagram of the contact structure between pupils; big black dots represent individual pupils; rounded rectangles represent bubbles (dark gray) or classes (light gray); one representative connection on the bubble-, class-, or school level is drawn as curved line annotated with its respective number of daily expected risk contacts.

To the authors' knowledge, data on the number of per-class or per-school contacts of young children are not available and would highly depend on context-specific definition of what is assumed a 'risk-contact'. Parameter choices thus have to remain somewhat arbitrary. For our primary analysis, we chose p class = 3/(| class | − 1) and p school = 1/(| school | − 1). This implies that each pupil has an expected daily number of 3 additional daily risk-contacts within their class and one additional risk contact with any pupil in the school (8 + 3 + 1 = 12 in total). The expected number of contacts decreases naturally as pupils start to go into isolation (see Section 2.5). The adjacency matrix of the school structure used for the primary analysis is shown in Figure A .7. We also investigate a scenario where effective between bubble isolation is impossible and the whole class becomes one bubble (see Section 3.4).

Data on the evolution of viral load (VL) in children during an acute infection with SARS-CoV-2 are rare but cross-sectional data suggest that there is no substantial difference between VL of symptomatic children and adults [Baggio et al., 2020 , Jones et al., 2020 . We thus build on available evidence for VL-trajectories over time in adults and the model proposed in Larremore et al. [2021] . Here, each individual's VL-trajectory is determined by a set of pivot points with ordinates on the log 10 (VL) scale and subsequent linear interpolation of the pivot points. The pivot points are start of fast exponential growth: t 1 , log 10 (VL start fast growth ) peak log-10 VL: t 2 , log 10 (VL peak ) clearance point: t 3 , log 10 ( LLI)

where LLI is the viral load at the lower limit of infectivity, a point were the infection probability is zero or close to zero (see Section 2.3). Larremore et al. [2021] used LLI = 10 6 and VL start fast growth = 10 3 . We assess the sensitivity with respect to LLI in Section 3.6. The distribution of the log 10 (VL)-trajectories is given implicitly by the following sampling procedure. Firstly, it is determined whether the trajectory will ultimately become symptomatic by sampling from a Bernoulli distribution with a probability p symptomatic . Secondly, the first pivot time t 1 is sampled uniformly between 2.5 and 3.5 days after the infection time t 0 = 7.5/24. Here we deviate from Larremore et al. [2021] since they consider a continuous-time model while we discretize all relevant values at 07:30AM. Thirdly, peak VL-delay with respect to t 1 is sampled as t 2 − t 1 = 0.5 + min(3, X) where X ∼ Gamma(1.5). The corresponding peak log-10 viral load, log 10 (VL peak ), is sampled uniformly between [7, 11]. The timing of the third pivot t 3 is then sampled conditional on whether or not an individual is symptomatic: For asymptomatic cases, t 3 − t 2 ∼ Unif(4, 9). For symptomatic cases, a symptom onset time with delay t symptoms − t 2 ∼ Unif(0, 3) is sampled to determine the time to symptom onset and this symptom onset delay is added to t 3 . The latter implies that symptomatic cases have a slower clearance of their peak VL but the same peak VL. For symptomatic individuals, we assume that the symptomatic period lasts from the sampled onset time until the viral load drops under LLI.

We set the initial VL 7.5/24 = 1 and extrapolate linearly after t 3 until VL t = 1 again 1 . Outside of this interval, VL t = 0, i.e. log 10 (VL t ) = −∞ (see Figure 2 for example trajectories). We assume a daily rate of 1% for Covid-like symptoms like dry cough etc. due to non-Covid-related causes.

Given the short time-horizon of only 6 weeks, we assume that individuals who already went though an infection are no longer susceptible to infection ('short term immunity'). We model the probability to infect a susceptible individual during a risk-contact ('infection probability') as function f (VL t ) of the infected individual's latent viral load on the day of the risk-contact t. Larremore et al. [2021] conduct sensitivity analyses for different functional forms of f and base their main results on a model where the infection probability is assumed to be proportional to log 10 (VL t ) if a lower limit of infectivity, LLI, is exceeded, i.e., f Larremore (VL t ) := min 1, max 0, γ log 10 (VL t ) − log 10 (LLI)

.

(1)

Whenever the LLI is fixed externally, infectivity only depends on the choice of γ, referred to henceforth as infectivity parameter. We follow the suggestion of Larremore et al. [2021] to match γ to a target school-level reproduction number R S (see Section A.2.1). Here, the reproduction number is defined as the average number of infections from a given index case in a completely susceptible school population, i.e. no isolation or immunity, followed for 21 days. 

Sensitivity of LFD tests has been shown to depend on viral load [University of Liverpool, 2020, Lennard et al., 2021] . This is a crucial feature since a joint dependence of test sensitivity and infection probability on the latent viral load trajectories implies a positive correlation between the two. Following data presented in Lennard et al. [2021] , we consider a logistic regression model for the functional from g(VL) of the test sensitivity as function of viral load

We calibrate the sensitivity curve by fitting it to cross-sectional data assuming that 50% of individuals are asymptomatic (see Section A.2.2). The specificity of LFD tests can generally be considered fairly high and we assume a fixed value of 0.998 [University of Liverpool, 2020].

Concerns have been raised that, due to person specific effects, assuming independence between results of repeated tests is unrealistic (see comments by Jon Deeks, et al. on Kmietowicz [2021] ). In our model, there is an implied dependence between subsequent tests results of an individual as these are functionally linked to the latent VL. Importantly, within-individual autocorrelation of test results will directly affect the performance of policies which rely on repeated screening tests: if the autocorrelation is high, repeated testing of the same individual has less benefit than under a model with less autocorrelation because even a screening test with low sensitivity might be able to identify pre-symptomatic infections after two or three days of daily testing.

We explore the impact of increased within-subject autocorrelation of test results by imposing an auto-regressive structure on the screening test sensitivity. For each individual and each time point t, we first look back if there has been a LFD test done within a time-window consisting of the three days previous to t. If no testing took place in the window, equation (2) is not modified. If one or several tests were carried out in that window, we amend equation (2) as follows: let x i t be the most recent LFD test result in the time-window for individual i (x i t = 0 for negative, x i t = 1 for positive). We then define

Here a, 0 ≤ a ≤ 1 is the auto-regression coefficient and a large a implies that the results of repeated tests are heavily biased towards the respective last result. The effect of a on the autocorrelation of repeated test results is visualized in Figure 3 . Note that even for a = 0 the smoothness of the VL-trajectories implies implicit substantial autocorrelation between repeated tests. If a testing scheme only re-tests the same individual after a time gap between individual tests larger than 3 days, the test characteristics remain unchanged. In particular, cross-sectional testing of a population (as done with the Liverpool study) is not affected. Testing policies that rely on repeated testing of individuals within the specified time-window are, however, affected since the chance of repeated false negative findings is increased when the initial test was itself a false negative. This is particularly important when considering policies like test for release (see Section 2.5.5).

A crucial feature of the overall model is the assumed relation between the test sensitivity and the infection probability -if it can be assumed that a LFD test is highly sensitive while the infection probability is still small, test-based policies for containment are easier to implement. We thus also explore a scenario, where the LLI is much lower, LLI = 1000, instead of LLI = 10 6 as suggested by Larremore et al. [2021] (see Figure A .10 and Section A.5).

We compare different test and isolation policies that have been discussed in the context of reopening primary schools in England. For simplicity we do not consider multi-level strategies with policies on the class or school level but only policies that intervene on the bubble level.

In all cases, we assume that the swab for a confirmatory PCR follow-up test is taken on the day of symptom onset or of testing positive with a LFD screening test . Note that a PCR follow-up test is no longer required for on-site LFD testing according to the latest guidance released by the Department for Education [2021b] . We assume that the turnaround time for a PCR test is two days (including the swab-day) [Fraser, 2021] . The required isolation time for PCR confirmed cases is 10 days starting with the swab-day [NHS Test & Trace, 2020] .

PCR tests are more sensitive than antigen-based screening tests and we assume a flat sensitivity of 97.5% above a limit of detection of 300 cp / ml and a specificity of 100% (see e.g. FDA [2020] for a detailed listing of different assays' limit of detection). Across all policies we assume that any pupil who becomes symptomatic is immediately isolated at home before school on the day of symptom onset and a swab for a follow-up PCR test is taken. Such a pupil only returns to school after isolating for either 10 days from their swab date (positive result) or 2 days (negative swab test, only isolated during the PCR turnaround time).

The reference policy follows the current Test & Trace recommendations. Its implementation assumes that the close contacts of an index case are the 8 other children in the bubble of the index case. This reference policy does not use LFD tests and solely relies on symptom-driven isolation. If an index case shows symptoms and starts their self-isolation period, the remaining members of the bubble (and class) continue to attend school until the test result of the symptomatic index case becomes available. Only if the index case's PCR test turns out to be positive do the remaining individuals in the bubble isolate for the remaining 8 days. Newly symptomatic cases while in isolation are also checked with PCR tests and newly emerging PCR-positive results reset the isolation clock for the entire bubble.

As a simple-to-implement variant of the reference policy, we consider an extension where the entire school is closed on Thursdays and Fridays, and teaching switched to online. Otherwise the same procedures as under the reference policy apply. This effectively introduces a mini-lockdown of four days over the extended weekend which facilitates the identification of symptomatic cases before they can spread the virus in school.

To assess the added benefit of regular screening tests we consider the reference policy extended by regular rapid LFD screening tests on Mondays before going into class for every pupil in the school (except those already isolating). Since LFD tests are considerably more specific than mere symptoms, we assume that a positive LFD test result for an index case leads to an immediate isolation and return home of the entire bubble of the index case. The bubble (and the index case) return to school either after 2 days if the index case's PCR test turns out to be negative (2 days isolation) or after the full 10 days of isolation if the index case's PCR test turns out to be positive. Note that due to the 7 days gap between the screenings, this policy would not be affected by the introduction of additional retest autocorrelation (see Section 2.4).

Policies with multiple screening tests per week have been discussed. Austria, for instance, has laid out a plan for twice-weekly screening tests at schools [Haseltine, 2021] . We thus also consider a policy that extends the reference policy by twice-weekly testing on Mondays and Wednesdays. In this case, the results of the Wednesday screening will be affected if we include positive autocorrelation (a > 0) between the tests (see Section 2.4).

Finally, we consider a policy that we refer to as 'test for release'. Such an approach was proposed in early 2021 to avoid preemptive bubble isolation in schools [Department for Education, 2021a]. Test and release avoids bubble-isolation completely. Instead, under a test for release policy members of the bubble around symptomatic or LFD-positive index cases are followed up using daily LFD testing. No preemptive isolation on the bubble level is imposed. Only newly symptomatic or LFD-positive individuals isolate, while the remainder of the bubble attends school. Symptomatic LFD-positive cases are told to self-isolate immediately and are then followed up with PCR tests as under the default strategy. The bubble-wide LFD testing starts on the day of the index case's triggering event (either symptom onset or a positive LFD test) and continues for up to 7 school days, i.e. neither Saturdays nor Sundays count towards the LFD follow up days. Daily bubble-contact testing is terminated early if the index case's follow-up PCR test turns out to be negative (after 2 days).

We implemented the individual components of the overall model in a package [Kunzmann et al., 2021a] for the programming language Julia [Bezanson et al., 2017] .

For each scenario, we reran the simulation 250 times to capture the variability of the outcome measures of interest. Each run was conducted by first initialising the individuals and the school structure according to the specified scenario. The start day is 0 and we assume that no pupils are infected at onset. For each day of the simulation (6 weeks, 42 days) we then 1. Randomly sample new school-external infections for each pupil. We use a fixed Binomial probability for each pupil and day of 1/324/7 which results in one expected external infection per week.

2. If school day (default: Monday to Friday): Execute the test and isolation policy. This entails checking for symptomatic cases and/or conduct LFD testing if specified. Isolation of individuals or bubbles is then handled according to the respective policy.

3. If school day: Randomly sample risk contacts for pupils not isolating according to the school contact structure, i.e., on the bubble level, the class level, and the school level.

The plots used in this manuscript were generated using a combination of R [R Core Team, 2020] and Julia and the source code is available online [Kunzmann et al., 2021b ].

The baseline scenario considered is based on a fraction of 50% asymptomatic cases [Hippich et al., 2020] , an expected number of weekly community infections of 1, LLI = 10 6 , and no additional within-subject autocorrelation of test results (a = 0). We set R S = 3 and fix the mean LFD test sensitivity to be 60%. Recall that R S has been calibrated specifically for our school-based three-level contact pattern and choice of probability of contacts between pupils, as described in Section A.2.1. We then consider extensive sensitivity analyses around this baseline scenario.

We first look at the relative effectiveness of the different policies in terms of containing the number of infections among pupils and the number of school days lost, the main criteria of interest for comparing policies. proportion of pupils infected over the 6 weeks time horizon (cumulatively) on the left and the total number of school days lost on the right for the 5 policies.

In terms of containing school outbreaks, the reference policy, which relies on symptomatic testing only, is neither better nor worse than the LFD-based test for release approach. Additional regular weekly asymptomatic testing on Mondays clearly improves outbreak control over the reference policy. A second regular screening on Wednesday improves containment only marginally. The extended weekend scenario gives intermediate results in terms of containment while increasing considerably the number of school days lost. The reference policy leads to a small increase of school days lost (median 3.7%) with regards to the test for release policy (me-dian: 1.8%). Both the reference policy and test for release fail to achieve sufficient containment in terms of cumulative infections however (see Figure 4 ). Figure 5 : Cumulative proportion of infected pupils over a 6-week horizon by R S and the fraction of asymptomatic cases for all policies for a mean LFD test sensitivity of 60% (panel A), and by mean LFD test sensitivity and AR coefficient for R S = 3, 50% asymptomatic and all policies using LFD tests (panel B).

The differences between policies only start to emerge in the medium infectivity scenario (R S = 3) and are exacerbated in the high infectivity scenario (R S = 6). This latter is unrealistic but useful to draw out clearly the differences. For R S = 1.5, all approaches are equally viable for containment and choices should be made on other considerations.

As expected an increased proportion of asymptomatics leads to a deterioration of infection containment for all policies. But it is particularly interesting to see that going from 25% to 75% asymptomatics in the pupil population has a fairly similar impact on the reference policy and the test for release one. It is impressive to see that even with R S = 6, test sensitivity as low as 40% and proportion of asymptomatics as high as 75%, the most challenging set-up, the additional Monday screening enables to keep key infections low, while neither the reference nor the test for release policies obtain good control, the test for release faring slightly worse.

As already noted in Larremore et al. [2021] , Figure 5 shows that the sensitivity of the LFD test has only a modest impact on the performance of policies involving such tests. The Monday screening policy shows only slightly more variability in performance in the extreme scenario of low sensitivity and high infectivity (top right corner of Figure 5 ).

One unique feature of our agent-based model is how we have allowed for additional autocorrelation between successive test results beyond that implied by the intrinsic dependence on the latent VL. The bottom part of Figure 5 shows results both with (a = 0.75) and without (a = 0) an auto-regressive component.

A value of 0.75 for the auto-regressive component is fairly high and implies that the probability of a positive test result within 3 days of a negative initial result is at most 25% -even if the test characteristics imply a sensitivity of 100%. This relatively extreme scenario was chosen since the intrinsic dependence between repeated tests is already high (see Figure 3) and smaller values of a have little impact on results (data not shown). Despite this, the impact of higher autocorrelation on containment in test-based policies is fairly modest in comparison to the sampling variability even under this extreme scenario. A clear difference is only discernible for 'test for release' in scenarios with relatively bad operating characteristics of the LFD test (mean sensitivity of 40%) and high infectivity (see top right corner in lower part of Figure 5 ).

Additional testing on Wednesdays (not just Mondays) can be even more effective, particularly in the high-infectivity (R S = 6) scenario, but the difference is small. Merely extending the weekend by two days does not improve containment substantially over the reference policy.

Finally, note that we have chosen to present the cumulative number of infections, but that an alternative metric to evaluate containment would be the mean daily number of infectious and non-isolating pupils. We found no difference with regards to policy comparison of using the alternative metric (data not shown). Both can thus be treated as being interchangeable.

The health impact associated with Covid-19 is largely determined by age and is much smaller in young children. This implies that a sole focus on the number of infections over the 6-weeks period that we consider in our simulation study is an insufficient performance measure for policies in a primary school context. The various policies' trade-off between schooldays missed and the effectiveness of the containment of new outbreaks is a key performance indicator. The fraction of schooldays missed is plotted against the fraction of ultimately infected individuals in Figure 6 .

Since all policies incorporate some form of isolation component once new cases are detected, the proportion of schooldays missed is positively correlated with the cumulative number of infections. Policies clustering above the first bisector favour containment over attendance. Interestingly, the reference policy is dominated by the 'test for release' one when considering the trade-off between attendance and containment although both fare poorly in terms of their Figure 6 : Fraction of schooldays missed plotted against fraction of ultimately infected pupils; black line is first bisector; baseline case (central plot) with additional infectivity (R S ) scenarios; 50% asymptomatic, and no additional autocorrelation for repeated tests; results for higher/lower mean sensitivity are qualitatively similar (data not shown).

capability to control new outbreaks in high-infectivity scenarios (see also Figure 5 ).

The proposed bubble isolation concept might be infeasible in individuals institutions for a number of reasons. In primary schools, there might not be enough room physically to separate groups of young children or it might turn out that additional staff is required to enforce effective separation between bubbles during class. Moreover, in secondary schools, the concept of 'bubble' is not relevant.

As an additional scenario of interest, we consider the case of a single bubble per class. In effect this means that the contact structure of the 27 pupils in the class is now that assumed for a bubble, i.e. that each pair of pupils in the class has one daily risk contact, and that all policies are executed at the class level. The altered class structure leads to an increase in expected daily risk-contacts per pupil as compared to a class with 3 bubbles of 9 pupils each. This, in turn, increases the R S for any given infectivity constant γ. For the sake of comparability between scenarios, we do not re-calibrate R S to this new 'one bubble' class structure.

Our simulations indicate that the increased number of expected daily contacts more than offsets the wider scope of policy execution (i.e. isolation of the whole class if there is a positive case, etc) and that, in consequence, the containment properties of most policies are worse than under an effective bubble partition of the whole class (see Figure A.11) . Jointly, the increased number of contacts and the wider scope of the respective isolation policies lead to an increased variability of outcomes but the qualitative results on relative effectiveness of the policies remain the same (see Figure A .11).

All preceding scenarios assumed perfect compliance of individuals with the respective testing schemes (both PCR and LFD). PCR tests are usually conducted as follow-up to either becoming symptomatic or receiving a positive result from a screening test and it is reasonable to assume a high compliance rate. For asymptomatic LFD tests, this is not necessarily the case and compliance rates of children and parents as low as 40% cannot be ruled out in practice [Wheale and Adams, 2021] . We explore the impact of non-compliance by assuming that each pupil has a latent 'LFD test compliance probability' of actually carrying out a policy recommended LFD test. For simplicity, we also assume that failure to comply with a LFD testing request does not affect their compliance with other recommendations such as isolation, and that non-compliant children are attending schools along the compliant children, a worst case scenario. It is reasonable to assume that the willingness to comply with LFD tests varies between pupils and we model this by drawing individual compliance probabilities from a U-shape dispersed Beta(a, b) distribution with mean 0.66 (see Figure A .12). Again, we found that increased non-compliance reduces the effectiveness of measures slightly without changing the relative efficiency of different policies (see Section A.4 for results and details of the implementation).

A critical factor determining the effectiveness of LFD-test-based policies is the ratio of test sensitivity relative to the infection probability per risk-contact. If test-sensitivity is high before individuals show symptoms or have a substantial probability of infecting others, it is easier to detect asymptomatic cases and contain outbreaks. Vice versa, a larger limit of infectivity or worse operating characteristics of an LFD leads to longer time windows of transmitting the virus during the pre-or even asymptomatic phase (see Figure A .10). We investigate the impact of lowering LLI from 10 6 (original value proposed in [Larremore et al., 2021] ) to LLI = 10 3 . To allow for a fair comparison, we re-calibrate γ to match the target R S values again (see Figure A .8). This approach allows a more targeted comparison of the relative performance of policies with respect to when infections occur while keeping the overall level of 'infectiousness' at a comparable level. Detailed results for this scenario are shown in Figure A. 14. The overall structure and relative performance characteristics remain unchanged although containment of outbreaks is impeded. However, this affects all policies to some extent, irrespective of whether or not they make use of LFD tests. Twice weekly asymptomatic screening tests in addition to the reference policy of symptomatic bubble isolation is still able to contain outbreaks effectively.

A first and important step to mitigate the impact of schools on the overall infection rate is to control the child-to-child transmission within the school, and this is the question that we addressed in this paper.

Any model necessarily has to simplify and the choice of modelling tool is dictated by the focus of the analysis at hand. Other agent-based simulation tools are available and were used to simulate policy impact during Covid-19 outbreaks. However, they these models tend to focus on larger-scale settings [Silva et al., 2020, Li and Giabbanelli, 2021] or local geo-spacial aspects of transmission [Vermeulen et al., 2020] . The tool openABM [Hinch et al., 2020 , Oxford Big Data Institute: Pathogen Dynamics Group, 2021 allows the evaluation of very flexible NPIs, including delayed reaction to tests and allows agent-based simulations on much larger scale than single schools. However, for our application, openABM is still lacking the very fine-grained control required to implement the 'test for release' approach and the detailed model for LFD-test sensitivity as function of viral load. Our agent-based simulation has been set-up carefully to capture important features of the SARS-CoV-2 infection process and how they bear on LFD test results. It has been specifically adapted to the contact structure in schools and has considered a range of policies that have been discussed in the UK or abroad. We stress that we have followed [Larremore et al., 2021] in our simulation of VL, a model which has been criticised by [Deeks et al., 2021] as being unrealistically light tailed. In future work, we will investigate the impact of increasing the variability of VL in the tails.

The recently released school policy [Department for Education, 2021b] recommends repeated testing. We have taken a simple approach to model compliance, allowing for overdispersion. While some data are available, compliance patters under repeated testing policies are still largely speculative. It will thus be important to track and characterise compliance, so that in the future realistic modelling of compliance can be calibrated against data. We do not distinguish between self-testing at home (as currently planned in the UK) and supervised testing before attending schools.

Further aspects that we did not look into may be of importance when considering the impact of policies in the context of school re-openings. For instance, the reduction in within-household transmission from children being at school or adult work-days gained from children being at school. Moreover, we have not considered any potential behavioural impact of a false negative test on the contact pattern of pupils. There has been some discussion of this as a potential issue, but behavioural modelling is beyond the scope of our work.

Despite the limitations posed by a lack of detailed longitudinal data to fit more complex joint models of viral load, infectivity, and test-sensitivity we reach the following conclusions:

1. Policies cannot be judged on either their ability to contain outbreaks or the amount of faceto-face schooling that they enable alone. Performance can only be judged by considering these quantities jointly.

2. Depending on the scenario, the distribution of the outcomes of interest may be heavy tailed and simple mean comparison may fail to capture adequately the risks associated with a particular policy.

3. We found that the relative performance of different policies is qualitatively stable over a wide range of scenarios. In particular, additional autocorrelation between repeated testing, lower LFD-test compliance, or a worse LLI profile for infectivity all impede outbreak control but do not change the relative merits and disadvantages of the policies considered.

4. Containment depends on the fraction of asymptomatic cases -it is harder to control outbreaks in scenarios with fewer symptomatic cases. Policies making use of regular asymptomatic screening tests (Mon or Mon/Wed) are generally less affected by this. 'Test for release', however, still needs a symptomatic index case to trigger dynamic testing within a bubble and thus struggles to contain outbreaks in scenarios with high infectivity and a high fraction of asymptomatic cases. Hence it is a misconception to think that using repeated LFD tests of close contacts as designed in the 'test for release' policy is more effective than the reference symptom-based Test & Trace policy when there is a large fraction of asymptomatics.

5. Additional autoregression of repeated test results impacts frequent testing negatively. In particular, the performance of 'test for release' in conjunction with low or medium sensitivity screening tests deteriorates. Depending on the time window over which repeated test results are assumed to be correlated, in extreme cases, increased autocorrelation can negate the benefits of testing more than once per week. Since no data are available to inform plausible level of additional autocorrelation, our results remain simply indicative. The additional autocorrelation would however have to be fairly strong to negate the added benefit from a second regular screening day per week.

6. The 'test for release' policy consistently achieves similar containment to the reference policy at a smaller loss in schooldays. Both fare badly in terms of their absolute ability to contain outbreaks however.

7. An extended weekend strategy can only be recommended as a last-resort if no screening tests are available whatsoever since already a once-weekly regular screening test dominates it clearly.

8. If no effective between-bubble isolation is possible (one bubble per class), containment is impeded since the higher number of contacts offsets the wider scope of isolation and testing.

9. We conclude that LFD tests are not fit to replace symptomatic isolation of close contacts but that the addition of asymptomatic testing to an existing valid policy shows at least some benefit across all scenarios considered. This finding remains valid even if the test sensitivity is fairly low but the degree of additional benefit scales with the test quality.

We believe that our results have delivered new quantitative understanding of school policy effectiveness for controlling transmission of SARS-CoV-2, and should be used by policy makers to guide the choice of effective policies to be trialled and evaluated, so that schools can stay open for the benefit of our children and their future. Figure A.7: Adjacency matrix of a typical school with either 12 classes of 3 bubbles á 9 pupils each or only one bubble per class respectively. Connectivity strength is given in terms of expected number of daily pair-wise risk-contacts assuming that there is no within-bubble isolation (p bubble = 1), limited between-bubble isolation (p class = 3/(| class | − 1)), and each pupil has an expected number of school-wide contacts of 1 (p school = 1/(| school | − 1)).

The proposed overall model requires calibration with respect to to crucial parameters: We follow Larremore et al. [2021] in matching the inactivity constant γ to the replication number R S . The operating characteristics of the screening test are matched to data presented in University of Liverpool [2020] and Lennard et al. [2021] .

We simulate forward for a given model and a given value of R S under no policy intervention with a single index infection at day 0 and a follow-up of 21 days. For each simulation run, the actual reproduction number is determined as the number of individuals infected by the index case via exact contact tracing. To derive the infectivity constant γ as a function of the target population R S , we fit a linear regression. We then use numerical root finding to invert the fitted conditional mean and identify the γ giving rise to a particular R S . The calibration does depend on the fraction of asymptomatic cases since their viral load trajectories are different under the Larremore-model. We use a medium value of 50% asymptomatic cases to derive the calibration curve shown in Figure A .8. This then allows us to derive γ(R S ) for the sensitivity parameter R S . We consider R S = 1.5, 3, and 6. 

We begin by fitting the logistic regression model (2) to data presented in Lennard et al. [2021] to obtain the shape of the relationship between VL and sensitivity. Since we were unable to obtain the raw data, we fit a logistic curve to a set of control points directly read off the Innova curve in Figure S1 [ Lennard et al., 2021 ]. The fitted model can then be related to data presented by theUniversity of Liverpool [2020]. The Liverpool pilot found that the test sensitivity of the Innova test in a practical setting for pre-symptomatic individuals was 40% (95% confidence interval: 28.5% to 52.4%) which is in line with findings in Dinnes et al. [2020] for other rapid antigen tests. This information can be used to scale the fitted logistic regression model such that the mean sensitivity corresponds to the findings of the Liverpool study. To this end we introduce a scaling factor η to reconcile the shape of the sensitivity curve found in the Oxford data with the mean sensitivity of the real-world experiment from Liverpool by considering scaled sensitivity sensitivity η (VL) : = logit −1 β VL · log 10 VL η + c test .

(4)

We simulate 10 5 viral load trajectories (assuming a moderate rate of 50% asymptomatic cases) and randomly select one pre-symptomatic viral load value per trajectory resulting in a crosssectional sample VL i , i = 1 . . . l, l ≤ 10 5 of viral load values mimicking the structure of the Liverpool data set. For any given target mean sensitivity x, the final value of η is then identified by solving

for η. We explore three sensitivity scenarios (x = 0.4, x = 0.6, and x = 0.8) in the main simulation study. A crucial property of the overall model is the implied correlation between the infection probability and the screening test sensitivity by means of their respective dependency on the latent viral load trajectories. Since we consider three scenarios for infectivity (R S = 1.5, 3, 6) and test sensitivity (sensitivity of 0.4, 0.6, 0.8) each, this implies 9 scenarios of the dependency between infection probability and test sensitivity. Additionally, we consider a scenario where LLI = 1000 instead of LLI = 10 6 as in [Larremore et al., 2021] (see Figure A .10 and Section A.5 for results). (4)) for the 9 scenarios defined in terms of infectivity and mean LFD-test sensitivity ('x' in equation (5)). .

We model individual compliance with LFD testing by drawing a random effect per-pupil from a Beta(2/15, 1/15) distribution (see Figure A .12). This implies a population mean compliance of 66.7%. The U-shape was chosen to reflect the assumptions that an individuals choice to comply with LFD testing will correlate over time. Compliance with PCR testing is always 100%. B Figure A .14: Panel A: Change in the marginal distribution of the cumulative proportion of infected pupils for LLI = 1e3 instead of LLI = 1e6; Panel B: Scatterplot of the fraction of schooldays missed against the fraction of ultimately infected pupils; R S was not re calibrated (same γ as under the LLI = 1e6), actual R S depends on the number of daily contacts, the actual R S for the scenario with LLI = 1e3 is higher since individuals are infectious earlier.

Presymptomatic sars-cov-2 infections and transmission in a skilled nursing facility

SARS-CoV-2 viral load in the upper respiratory tract of children and adults with early acute COVID-19

Julia: A fresh approach to numerical computing

Performance indicators: good, bad, and ugly

Children's Task and Finish Group: update to 4th Nov 2020 paper on children, schools and transmission

Covid-19 INNOVA testing in schools: don't just test, evaluate

COVID-19) asymptomatic testing in schools and colleges

Department for Education. Schools coronavirus (COVID-19) operational guidance

Restricting attendance during the national lockdown: schools

Evidence summary for lateral flow devices (lfd) in relation to care homes

Government boost to rapid workplace testing

Rapid, point-of-care antigen and molecular-based tests for diagnosis of SARS-CoV-2 infection

CoV-2 Reference Panel Comparative Data

Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing

NHS Test and Trace performance tracker

Get a free nhs test to check if you have coronavirus

Self-testing: A route to school re-opening -the Austrian example

Proportion of asymptomatic coronavirus disease 2019: A systematic review and meta-analysis

Temporal dynamics in viral shedding and transmissibility of covid-19

Openabm-covid19-an agent-based model for non-pharmaceutical interventions against covid-19 including contact tracing. medRxiv

A public health antibody screening indicates a 6-fold higher sars-cov-2 exposure rate than reported cases in children

An analysis of SARS-CoV-2 viral load by patient age. medRxiv

Covid-19: Controversial rapid test policy divides doctors and scientists

Supplemental Material: Code for Simulation and Plots

Test sensitivity is secondary to frequency and turnaround time for COVID-19 screening

An observational study of SARS-CoV-2 infectivity by viral load and demographic factors and the utility lateral flow devices to prevent transmission

Returning to a normal life via covid-19 vaccines in the usa: a large-scale agent-based simulation study. medRxiv

Clarifying the evidence on SARS-CoV-2 antigen rapid tests in public health responses to COVID-19. The Lancet, 0(0)

Estimating the impact of reopening schools on the reproduction number 2 of SARS-CoV-2 in England, using weekly contact survey data

NHS Test & Trace. COVID-19 national testing programme: Schools & colleges handbook

COVID-19 schools infection survey round 1, england

Prevalence of asymptomatic sars-cov-2 infection: a narrative review

Oxford Big Data Institute: Pathogen Dynamics Group. BDI-pathogens/OpenABM-Covid19

R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing

Seventy-fourth SAGE meeting on COVID-19

Covid-abs: An agent-based model of covid-19 epidemic to simulate health and economic effects of social distancing interventions

Universal screening for sars-cov-2 in women admitted for delivery

Balancing the risks of pupils returning to schools

Oxford covid-19 government response tracker, 2021. University of Liverpool. Liverpool community testing pilot, interim evaluation

An agent-based policy laboratory for covid-19 containment strategies

A pediatric infectious disease perspective on covid-19

English school leaders despair over new rules on Covid tests and masks. The Guardian

Secondary schools can start testing pupils on-site before march 8, DfE confirms. Schools Week

Covid-19: Lateral flow tests miss over half of cases, Liverpool pilot data show

We thank Professor Jon Deeks for his helpful comments that lead to our including the sensitivity analysis with respect to the role of LLI.

Sylvia Richardson's work was funded by the UK Medical Research Council programme MRC_MC_UU_00002/10 and the Alan Turing Institute fellowship TU/B/000092.