key: cord-0472332-4dtlu6on
authors: Vajdi, Aram; Cohnstaedt, Lee W.; Noronhaz, Leela E.; Mitzelz, Dana N.; Wilsonz, William C.; Scoglio, Caterina M.
title: A Non-Markovian Model to Assess Contact Tracing for the Containment of COVID-19
date: 2021-11-05
journal: nan
DOI: nan
sha: b621dee3da32f3f0fbc0c78462e4bd7577823d73
doc_id: 472332
cord_uid: 4dtlu6on

COVID-19 remains a challenging global threat with ongoing waves of infections and clinical disease which have resulted millions of deaths and an enormous strain on health systems worldwide. Effective vaccines have been developed for the SARS-CoV-2 virus and administered to billions of people; however, the virus continues to circulate and evolve into new variants for which vaccines may ultimately be less effective. Non-pharmaceutical interventions, such as social distancing, wearing face coverings, and contact tracing, remain important tools, especially at the onset of an outbreak. In this paper, we assess the effectiveness of contact tracing using a non-Markovian, network-based mathematical model. To improve the reliability of the novel model, empirically determined distributions were incorporated for the transition time of model state pairs, such as from exposed to infected states. The first-order closure approximation was used to derive an expression for the epidemic threshold, which is dependent on the number of close contacts. Using survey contact data collected during the 2020 fall academic semester from a university population, we determined that even four to five contacts are sufficient to maintain viral transmission. Additionally, our model reveals that contact tracing can be an effective outbreak mitigation measure by reducing the epidemic size by more than three-fold. Increasing the reliability of epidemic models is critical for accurate public health planning and use as decision support tools. Moving toward more accurate non-Markovian models built upon empirical data is important.

C ONTACT tracing is a primary public health response to infectious disease outbreaks [1] , [2] . The adoption of contact tracing to contain COVID-19 met considerable challenges, due to the intensive process related to manual contact tracing and the hesitancy for adopting app-based contact tracing tools [3] . Assessing the impact of contact tracing as a control measure remain of great importance.

In [4] , the authors derived contact patterns in the UK using a postal and online cross-sectional survey. They predicted that, under effective contact tracing, fewer than one in six cases would generate any subsequent infections that will not be traced. This comes at a high cost, with an average of 36 individuals traced per infected case. Making the definition of a close contact more stringent can reduce this cost, but with increased risk of untraced cases.

A mathematical model assessing contact tracing and isolation for COVID-19 has been developed in [5] . The study estimated that a high proportion of cases would need to self-isolate and a high proportion of their contacts successfully traced to contain the epidemic. If combined with moderate physical distancing measures, self-isolation and contact tracing would likely achieve control of SARS-CoV-2 transmission.

Another aspect of COVID-19 that can alter the effectiveness of contact tracing is the high transmissibility of SARS-CoV-2 before and immediately after symptom onset [6] . Therefore, finding and isolating symptomatic patients alone may not suffice to interrupt transmission, and that more generalized measures might be required, such as social distancing.

Considering the act of quarantining susceptible individuals (uninfected contacts) as the contact tracing cost, a model developed in [7] has shed light on an interesting phenomenon. The number of quarantined susceptible people increases with the increase of tracing because each confirmed case increases the number of total contacts. However, there is an inflection point after which the number of traces decreases with increased tracing because there are fewer confirmed cases.

Models for assessing contact tracing have been developed for many infectious diseases, before the COVID-19 pandemic. A systematic review of these models can be found in [8] which, reviewed mathematical models for contact tracing and follow-up control measures of SARS and MERS transmission. All models required data to estimate specific model parameter distributions. A major concern identified was whether public health administrators can collect all the required data for building epidemiological models in a short period of time during the early phase of an outbreak where contact tracing is more effective.

The analysis of case data from the Wuhan COVID-19 outbreak highlighted non-exponential distributions for critical transition times during infection, such as the infectious period and the incubation period [9] , [10] . Simulations show that different distributions of the infectious period duration with the same mean values lead to different epidemic curves.

Consequently, it is critical to develop models that can arXiv:2111.03722v1 [stat.AP] 5 Nov 2021 input empirical distributions, often non-exponential, for the transition time between compartments. The majority of network-based individual level pathogen spread models assume exponential distributions for transition times among state pairs. However, these models need adaptation to incorporate general transition time distributions. One possible approach is based on the adoption of phase-type distributions [11] .

A phase-type distribution can mathematically represent the transition between any two states in a pathogen spread model. By expanding the source state of the original model into a series of q states, then any arbitrary distribution can be approximated as a phase-type distribution to represent the time distribution between subsequent states. Techniques exist to estimate the new parameters to realize the desired distribution. When the transition time distribution between the two original states is general, this distribution can be approximated by transitions between subsequent pairs of states among the q states that are exponentially distributed. When the objective is to derive analytical results concerning non-Markovian models, both first-order [12] and second order [13] closure techniques can be used. For efficient methods to simulate non-Markovian models Gillespie based algorithms are available [14] .

The goal of this paper is to develop a non-Markovian model, both exact and first-order closure approximation, appropriate for assessing contact tracing and to determine the epidemic threshold. A new networked compartmental model with compartments able to represent the disease evolution and the contact tracing process is presented in section 2. The model is general to allow arbitrary transition time distributions and the mean field equations for such a non-Markovian model is derived. A mathematical analysis was performed to derive the epidemic threshold, in section 3. The model was applied to the Kansas State University case study reported in section 4. Summarizing, the original contributions of the paper are the following:

• A novel and general network-based model structure for pathogen transmission,

• The determination of the first-order epidemic threshold,

• Numerical results about the effectiveness of contact tracing in the case study of Kansas State University.

We begin by presenting a continuous time non-Markovian transmission model to describe the tractability of COVID-19 spread when contact tracing strategies are implemented in a population. The model assumes individuals (nodes) are in one of ten states. If individuals are susceptible (S), they can become infected but not immediately infectious, i.e, exposed (E), through interaction with infectious individuals that are either symptomatic (Is) or asymptomatic (Ia). The model assumes that the infectious asymptomatic state does not contain the presymptomatic cases that present symptoms at some point during the course of infection. Presymptomatic and symptomatic states are combined into the infectious symptomatic (Is) state. Furthermore, the model assumes the fraction of individuals in the E state that moves to the Is state is p Is , and an asymptomatic infectious (Ia) individual cannot The assumption in this model is some contacts from detected cases get tested, and if they are infected, they move to confirmed traced state (C), and if virus negative, they transition to alert traced (A) state for a period of time. Indeed, testing capacity could be limited, and the contact tracing strategy may only require quarantining of close contacts. In such a case, we can regard the alert state as quarantine or a subset of S but temporarily removed from the susceptible population. Moreover, we can add more states to the model and divide the confirmed (C) state into three sub-states, to differentiate the disease state of the traced contacts. However, to keep the model simple, the model only considers one C state. Since an individual induces contact tracing as long as is in the D state, after a period of time it should moves to the removed (RD) state so that the contact tracing stops. Similarly, the model assumes an individual in the C state initiates contact tracing and later moves to removed state (RC). We refer to the transmission model described above as SAIDR, and Fig. 1 shows the diagram of the transitions that a person may experience in the SAIDR model. Transition from the S state to the E state is induced by contacts with individuals in the Is or Ia states. Transitions shown by dotted arrows are due to contact tracing and are induced by contacts with individuals in the C or D states. If a transition is not induced by any interaction, we call it nodal transition and it is shown by solid arrow.

In the SAIDR model, we assume the transition times are random variables drawn from appropriate distributions for the state. The traditional approach in the analysis of epidemic models assumes the distribution of transition times to be exponential, but this might not be an accurate description of all the processes considered in our spreading model. For instance, real-world data suggest transition times for the E → I process are not distributed exponentially. Some factors that have been suggested to potentially influence these transition times, specifically the incubation period (E → Is), include age, infectious dose, and physiological stressors.

Moreover, allowing non-exponential distributions for auxiliary processes such as D → RD, increases the degrees of freedom in the model. Hence, here we allow the nodal transitions to have non-exponential distributions. An exponential distribution is specified by the rate parameter λ, and the assumption that a transition time is exponentially distributed implies for any infinitesimal period of time dt the transition happens with a constant probability λ dt, regardless of the age of the process. Indeed, such a constant rate of transition is a suitable approximation for the transmission process. Therefore, for the transmission process S → E, the model assumes the transition times are exponentially distributed as long as the infecting individual stays infected. Additionally, if a susceptible person has several infectious contacts, the model assumes the infecting processes are independent. Similarly, in the contact tracing modeling, we use exponential distribution for the transition times of processes that change state of an individual to the C or A states. Finally, to model the interactions that result in virus transmission or lead to contact tracing, we use an undirected network where the nodes represent individuals in the population and the links denote the contacts among them. Consider a network G = G(V, E, W inf , W tr ), with V representing the set of nodes, E a set of links between the nodes and W inf , W tr : E → [0 1] weight functions defined over the links. We use the weight functions W inf , W tr to quantify heterogeneity of the contacts in relation to virus transmission probability or contact tracing probability, respectively. In other words, for each contact, we modify the rate of exponential distributions for the infecting and contact tracing processes multiplying the rates by the contact weights. This allows us to define various types of contacts with different probabilities for virus transmission or contact tracing.

In this section we develop a set differential equations that approximately describes the behavior of the transmission model discussed in section 2. Indeed, there is an extensive body of research concerning Markovian spreading unfolding over networks [15] , [16] , [17] , [18] , [19] , [20] , [21] , and it is shown the exact mathematical treatment of such a system is not tractable because we need to follow the joint state of all the nodes. Therefore, a mean-field type approximation, which assumes statistical independence of neighboring nodes' states, is often employed to study network spreading models. Here, we limit the non-exponential random variables in the spreading model of 2 to the Erlang family of distributions. This enables us to use a characteristic of Erlang distribution to cast our spreading model into a Markovian model, for which we can develop an N-intertwined set of differential equations [15] . The Erlang distribution is a two-parameter family of continuous probability distributions with the density function

where the shape parameter k is a positive integer, and λ > 0 is called the rate parameter. From equation above, we can see that the exponential distribution is a special case of Erlang distribution with k = 1, and the mean and variance of Erlang distribution are k/λ and k/λ 2 , respectively. Since the Erlang distribution has two parameters, we can adjust them to obtain a density function with a desired mean and a variance close to a target value. Thus, we can use the Erlang distribution as an approximation for a large class of distributions.

One important characteristic of the Erlang distribution can be stated as follows: distribution of sum of k independent random variables, each having an exponential distribution with rate λ, is Erlang distribution with the shape parameter k and the rate parameter λ, i.e.

This characteristic implies if the random transition time of a process follows Erlang(k, λ) distribution, the process can be modeled as k successive transitions between k + 1 auxiliary states where each transition time is exponentially distributed with the rate λ. Therefore, if a nodal transition time in the spreading model of Fig. 1 has an Erlang type distribution, it is possible to replace the nodal transition with a set of successive Markovian transitions and obtain an equivalent spreading model. Fig. 2 shows a spreading model where all the transition times are exponentially distributed and it is equivalent of the original model in Fig. 1 . Indeed, even if the distribution of a transition time is not Erlang, we still can use phase-type distribution method to obtain a mixture of Markovian transitions that is approximately equivalent to the original transition [11] . However, we limit the distribution of nodal transition times to Erlang distribution because it incorporates a family of density functions with arbitrary means and variances.

Considering the Markovian spreading model in Fig. 2 , let s i represents state of node i among M possible states in the model, and S = [s 1 , s 2 , · · · , s N ] represents the joint state of all the individuals in the population. Based on the nodal description of the processes, we deduce the joint state is a continuous-time Markov chain over a space consisting of M N possible network states. Therefore, the exact mathematical treatment of the system is not tractable when the number of nodes is large. But if we use mean-field approximation [15] , that assumes statistical independence of neighboring nodes' states, we obtain a set of M × N equations, which we can solve computationally. This approximation assumes at any time, the joint probability of finding nodes i and j in the states s i and s i can be written as the multiplication of marginal probabilities, Pr(s i ) × Pr(s j ).

To formulate the approximate behavior of the Markovian model depicted in Fig. 2 , we use dynamic variables,

, which represent the probabilities of finding node i in the corresponding states. If r i I and r i T denote the rates for the transmission and contact tracing processes exerted on node i, we have where λ tr is the rate for the the contact tracing process, and we have assumed symptomatic and asymptomatic infectious states have different infectiousness which is reflected in the infectious rates parameters, β Is , β Ia . Finally, using the joint state independence approximation, we can write the following set of equations for all the nodes in the network

The system of nonlinear ordinary differential equations (ODEs) above, describes evolution of nodal states' probability. We can use this set of equations to study behavior of the spreading processes or to estimate the model parameters.

In addition, stability analysis of disease-free state of the system leads to derivation of epidemic threshold which establishes a condition that determines whether an initial infected population will vanish or has the possibility to grow.

To derive the epidemic threshold the first step is to linearize the nonlinear system 3 about the disease-free steady state. In the disease-free state the probability of finding nodes in any state other than the susceptible state is zero. After linearization we arrive at an independent subsystem that describes production of new infections. Since the variables in this subsystem drive other variables in the larger system, stability of this subsystem determines stability of the larger system. This subsystem can be written as follows:

Theorem 1. Assuming the infection weight matrix W inf is irreducible, the disease-free steady state is a locally stable fixed point of the dynamical system 3 if the following condition holds:

where ρ(W inf ) denotes the spectral radius of the weight matrix W inf .

To derive the threshold condition in equation 5, we rewrite the linearized subsystem 4 in a matrix form where the infection processes, represented by a matrix ∆, are separated from other transitions,

The square matrix ∆ can be written as the Kronecker product of the network weight matrix W inf and a matrix δ which represents the rates and the driving nodal states in the infection process,

where 0 k1,k2 and J k1,k2 are k 1 × k 2 matrices of zeros and ones, respectively,

. (8) The matrix Σ in equation 6 is a block diagonal matrix which can be written as the Kronecker product of the identity matrix of size N and a matrix σ which represents nodal transitions,

and the matrices H, G used in the definition of σ have the following structures

In the H matrix, the diagonal elements are −1, the lower diagonal elements are 1 and the rest of elements are zero. In the G matrix, the element in the upper-right corner is 1 and the rest of elements are zero. The stability of steady state of the linear system 6 is determined by the spectral bound of the square matrix ∆+Σ, defined as

where η (∆ + Σ) denotes the set of eigenvalues of ∆ + Σ. The linear system is exponentially stable if and only if the real parts of the eigenvalues are negative (i.e., if s (∆ + Σ) < 0). To find the condition that leads to a negative spectral bound we apply theorem A.1 from reference [22] , which, for the sake of completeness, we state as the following theorem. We note that in reference [22] , positive matrices are defined as non-zero matrices with all entries non-negative; and a matrix is defined as positive off-diagonal if all entries are non-negative except possibly those on the diagonal. To apply theorem 2, we first investigate if the matrices ∆ and Σ satisfy the condition stated in the theorem. From the definition of ∆ and Σ in the equations 7,9, we can see ∆ is a positive matrix and Σ is positive off-diagonal. Moreover, since Σ is positive off-diagonal lemma 6.12 in [23] shows that s (Σ) < 0 if and only if Σ is invertible and −Σ −1 is a positive matrix. Indeed, we can directly calculate −Σ −1 as fallows

where the matrices 0 and J are defined in the equation 8, and F k is a lower triangular matrix of size k, with non-zero entries equal 1

It is clear that −Σ −1 is a positive matrix. Therefore we can use theorem 2 to conclude that

Finally to prove theorem 1 we will obtain an expression for ρ −∆Σ −1 . Using the properties of the Kronecker product we can write

where η(.) denotes set of eigenvalues. After calculating the matrix −δσ −1 , we can see only the first row is non-zero

and therefore the eigenvalues of the matrix −δσ −1 are β Is p Is k Is λ −1 Is + β Ia p Ia k Ia λ −1 Ia and 0. In addition, since we assume W inf is an irreducible non-negative weight matrix, Perron-Frobenius theorem shows W inf has a positive eigenvalue, denoted by ρ(W inf ), and absolute value of any other eigenvalue of W inf is strictly smaller than ρ(W inf ).

Combining these results about the eigenvalues of W inf and −δσ −1 with the equation 12, we find that

which leads to the proof of theorem 1.

To numerically investigate the threshold condition, we computed prevalence of the exposed state (E) through time for a spreading unfolding on the largest component of the coauthorship network [24] . We calculated the prevalence of a state as the average of nodes' probabilities. Fig. 3a shows the prevalence of exposed state for different values of ρ −∆Σ −1 when we assumed no contact tracing. Fig. 3b shows similar prevalence when there is contact tracing. In both figures, we can see the exposed state exponentially dies out when the spreading is below the threshold. To compere the size of epidemic for different values of ρ −∆Σ −1 , we have plotted the R state prevalence in Fig. 3c and 3d .

The threshold condition we have derived, only depends on the network structure, infection transmission rates, probability of becoming a symptomatic case, and expected infectious periods for the symptomatic and asymptomatic states. This is clear from equation 13, if we notice that k Is λ −1 Is and k Ia λ −1 Ia are the expected values for the corresponding Erlang distributions. Although the variances for the infectious period distributions do not change the threshold condition, they affect the prevalence of the states as shown in Fig. 4 . To generate this figure, we calculated the prevalence curves for two spreading processes unfolding on the network of the previous example. In the first process, we assumed the symptomatic and asymptomatic infectious periods are distributed exponentially with expected values of four and six, respectively. For the second process, we changed the exponential distributions to Erlang distributions with similar expected values but variance of 2.

In this section the we apply the SAIDR spreading model to COVID transmission among Kansas State University (K-State) students in Manhattan, Kansas. Such a model can be used to evaluate the effectiveness of different strategies for reducing the spread of virus or to predict the size of epidemic. However, this requires information about the network structure and other model parameters. We performed a survey among individuals associated with the university and the survey results were used to build a random contact network. Assuming this contact network for the population, we use weekly positive COVID cases to estimate unknown model parameters such as the transmission rate β Is . Later, we use the estimated parameters to study the effectiveness of contact tracing among the students.

To build a weighted contact network we sent an online survey to all Kansas State University students and staff via the university webmail system in December 2020. This survey asked about participation in social interactions during the 2020 fall semester (August 2020-December2020). During this semester, some of the classes were held in-person and some held online. Students were present in Manhattan, Kansas, during this time and a mask mandate was in effect in all the facilities related to the university and in community public spaces.

We asked questions about housing status, age and role in the university, as well as the number of close contacts such as roommates, family members, or coworkers. In another question we also asked for the number of people they regularly meet in close proximity for a total contact duration of less than four hours per week (such as friends). The responses to this question were used to build a second layer of contacts that can be traced but have lower transmission probability. To find the level of social interaction we asked the respondents to specify average number of visits per week to different public locations such as bars, restaurants, coffee shops and stores. We also asked about the frequency of their participation in social events such as religious or sports events. For each public location type, the respondents also indicated duration of the visits and number of people with whom they interact in close proximity.

After processing the data, based on their housing type and role in K-State, we divided the respondents into six groups. These groups are: (1) graduate students (2) factually and staff, (3) undergraduate students living in off campus apartment or houses, (4) undergraduate students living in fraternities and sorority houses (Greek houses), (5) and (6) undergraduate students using two available on-campus housing options. The rationale for this division is that the respondent age and housing type possibly leads to different levels of social interaction and number of close contacts. Using the survey data, we calculated the following parameters for each group g

1 , average number of close contacts • V g 2 , average number of people met for total duration of less than four hours per week and for the three types of public spaces which are (1) bars, restaurants, and coffee shops, (2) stores and services, and (3) social events • P g l , proportion of individuals in group g who visits public space type l, and H g l , average weekly hours they spent in these locations, and n g l , average number of people they encounter Values of these parameters are presented in the table 1.

Using these parameters, we built three layers of networks for the whole population comprising the groups mentioned before and an additional group which is the rest of the town population. Since we conducted the survey only among the individuals associated with university, we did not have the parameter values for the general public not associated with the university. Hence, we extended the parameters extracted for the group of faculty and staff to that additional group. Population of different groups that we assumed in our calculations are given in the For layer one, L 1 , which is the layer of close contacts, consider clusters of V g 1 + 1 individuals within the main groups. We assume among V g 1 close contacts for each individual, a fraction F of the contacts happens within the cluster the individual belongs to, and the remaining contacts are randomly established using configuration model. While setting up the random links we assumed undergraduate students have only links with other undergraduate students and graduate students with other graduate students.

For layer two, L 2 , we used configuration model where the node degree of an individual in group g was set to V g 2 .We assumed undergraduate students have only links with other undergraduate students and graduate students with other graduate students. To decrease the number of model parameters, we the express daily infection transmission rate in this layer in terms of transmission rate through the close contacts of layer L 1 , which we denote by β. If we assume the effective daily contact duration in L 1 is only eight hours and compare that with weekly duration of links in L 2 , which is around four hours, we expect a daily transmission rate of 4β/(7 × 8) for the L 2 links.

For the layer three, L 3 , which represents interaction through public spaces, we considered a complete graph over the whole population and weighted the link between any two nodes i and j by H g(j) l β/(7×8T l ) gives an estimate of daily infection transmission rate between nodes i and j in the public space l. Here, we use T 1 = 35, T 2 = 70, T 1 = 15.

For some of the parameters in the spreading model described in section 3, we use the values below in our calculations.

Following reference [25] , we assume the proportion of infected individuals that never show symptoms is p Ia = 0.30, and their infectiousness is lower than those who develop symptoms such that β Ia = 0.75β Is .

For transition from the E state to the I states, we use λ −1 E = 1.255 days, and k E = 3. These values lead to a distribution with the mean of 3.76 days and the standard deviation of 2.17 (Fig. 5) . We chose this distribution using the data in reference [26] , where authors report the distributions for the incubation period ( i.e., time from exposure to symptoms onset) and the serial interval which is time between the symptoms onsets of successive cases in a chain of transmission. In fact, the reported negative serial interval implies pre-symptomatic transmission of COVID-19 during the incubation period. To estimate the E state period in our model, we assumed that, in the incubation period, infected individuals go through several stages with the same expected time, and in one of the stages they start transmitting the infection. If this stage is before the onset of symptoms, we will observe negative serial intervals. Hence, we approximated the incubation period with an Erlang distribution with k = 4 and λ −1 = 1.255 days, which has the same median and 95th percentile as the reported distribution in [26] . Next, we simulated a chain of transmission assuming transmission starts at a specific stage of incubation period with a specific transmission rate, and we recorded the distribution of the serial interval. By exploring different values for the stage that the transmission starts, and also the transmission rate, we found that the simulated distribution of the serial interval is closest to the reported one in [26] , if the transmission starts at the stage four. Therefore, we used an Erlang distribution with k = 3 and λ −1 = 1.255 days for the period of E state in our model.

Considering the transition from the Ia state to the RU state, we assumed an Erlang distribution of mean six days and variance of two days. Fig. 5 shows the corresponding density function. This value of variance implies that there is almost no transmission after day nine [27] and there is transmission in the first week when the viral load is high [27] .

Here we use similar distributions for the random times of the of the transitions C→ RC and D→ RD. If we assume k C = k D = 8 and λ −1 C = λ −1 D = 0.25 days, and λ tr = 1.5 day −1 , then the probability that a node in D or C state induces a contact tracing transition in its neighbor is 0.92. We calculate the contact tracing success probability as

We can adjust this probability by changing λ tr or k C , k D , λ C , λ D . For instance, if we change λ tr to only 0.18 day −1 , the probability becomes 0.3. In our calculations, we assume the rate of success for contact tracing through the layer of close contacts, L 1 is high, and through the layer L 2 is lower, therefore, we chose λ tr = 1.5 and 0.18 day −1 for the layers L 1 and L 2 , respectively. We need to note that the duration of staying in the C or D state cannot be long, otherwise these states may induce repetitive contact tracing in the model.

The other transition that we specify its random time distribution is A→ S. Here, we set the k A = 12 and λ −1 A = 0.5833 days. These values lead to a distribution that is concentrated between days 4 and 11 with a mean of 7 days (Fig. 5 ).

Practically, we can solve the network spreading model ODEs of the system 3, even if the population is large. This system of equations is applicable for any network with an arbitrary structure. However, the three-layer network we described in section 4.1, to some extent, is homogeneous. Specifically, the nodes that belong to a same group have similar type and number of links in the layers L 1 and L 2 , and their community contacts through the layer L 3 are identical. Hence, we expect that the probability vectors of the nodes with a similar group assignment will be similar, and we may use one probability vector for all the nodes in a group. Within this approximation, the dimension of the dynamical system that describes the spreading process is smaller, and each group of nodes is represented by only one node. The ODEs for this system are similar to those in the equation 3, except that the superscript i for the variables now enumerate the groups and run from 1 to 6. To have a closed system of ODEs, we also need to approximate the infection and tracing rates, r i I and r i T in term of the probability vectors of the groups. The infection rate of a node in group i can be written as where Ω i,j 1 determines the contribution of group j in the infection rate of a node in group i and β Is , β Ia are infection transmission rates through close contacts of the layer L 1 . Considering definition of the network parameters in section 4.1, Ω i,j 1 is approximated by

where δ i,j is the Kronecker delta function, N i is the population of group i and ζ 1 = 4/(7 × 8) is the ratio of infection transmission rate for the contacts in the layer L 2 to that of the layer L 1 . Furthermore, ω i,j = 1, if existence of links between the nodes in groups i and j is allowed in the network layers L 1 and L 2 , otherwise ω i,j = 0. The first, second and third lines in the equation above give the contribution of the network layers L 1 , L 2 and L 3 in the infection rate, respectively. Also, the contact tracing rate for the nodes in group i, represented by r i T , can be approximated as

where λ tr = 1.5 day −1 is the tracing rate through the close contacts in the layer L 1 and

In the equation above, ζ 2 is the ratio of tracing rate through the layer L 2 to λ tr , which we set at 0.18/1.5.

To compare the approximated spreading model with the high dimension network model, we calculated the total population of students in the D, C, RD, RC states using both models assuming similar initial condition and the contact network parameter F = 0.3. Fig. 6 shows this population for different values of β Is and λ tr . In this figure, the points shown by square markers were calculated using the network spreading model equations and the line plots are the result of the approximate spreading model. We can see, for the type of contact networks we defined in this section, the approximate model generates epidemic curves comparable to those obtained by the network spreading model.

In this section we use the reported positive COVID cases to estimate the effectiveness of contact tracing among the students during the 2020 fall semester. We assume the infection spreading follows our spreading model, and the contact network and the parameters' values are those discussed before in sections 4.1, 4.2. Since we do not know the value of infection transmission rate β Is and the distribution of infectious period for the symptomatic state, Is, we use a Markov chain Monte Carlo (MCMC) scheme to estimate these unknown parameters and eventually obtain an estimation for the effectiveness of the contact tracing. Although there are some published results regarding these parameters, we believe they cannot be extended to all populations. For instance, the distribution of the symptomatic infectious period, by which we mean the period an infectious symptomatic is spreading infection before removal from the population, depends on the level of population vigilance and testing.

Here we use the Metropolis-Hastings algorithm, which is a MCMC method, to estimate the spreading model parameters. In general, this algorithm generates samples of a random variable for which we can calculate the probability ratio between the samples. Later, these samples can be used for calculating numerical approximations of functions of the random variable. To detail the algorithm, consider a k-dimensional random vector X = (X 1 , . . . , X k ), with a probability distribution proportional to f (x 1 , . . . , x k ). The algorithm generates a sequence of M random samples {x t = (x t 1 , . . . , x t k )} M t=1 following the steps below 1) Choose an initial sample x 1 , and set t = 1 2) Generate a proposal sample x = (x 1 , . . . , x k ), such that each x i is normally distributed with a mean of x t i and a predefined standard deviation, i.e.,

f (x t ) 4) Generate a uniformly distributed random number in the interval [0, 1]. 5) If r ≤ ρ accept x as the next sample, x t+1 = x, otherwise set x t+1 = x. 6) Set t = t + 1, and go back to step 2.

In order to adopt this algorithm in our estimation problem, we need to define the function f (x) for our problem. Let {o s } and {y s } denote an observed spreading data series and the corresponding outputs of spreading model. For a vector of spreading model parameters, x, we assume f (x) = S(x) −2 , where S(x) is the sum of absolute deviation of the spreading model outputs, S = s | o s − y s |. Using this function, the fitting algorithm generates a sequence of samples from the parameter space such that the density of samples in a region with the error S is four times the density in a region with the error 2S. In the definition of f (x), we have used the sequences {o s } and {y s }, which we need to specify. Since we had access to the weekly new positive COVID cases among the students, we use the weekly cumulative cases from the start of the fall semester as the sequence of observed spreading data {o s }. The corresponding sequence from the model, {y s }, is the weekly population of students in the states D, DR, C, CR. Regarding the spreading model, we use the approximate model in section 4.3 because it is computationally fast and its outputs are similar to those calculated by the network model ODEs 3. Concerning the open parameters in spreading model, we assumed the parameter vector x consists of the following components:

• Mean and variance of the infectious period for the symptomatic state. Assuming these parameters, we find an Erlang distribution with same mean and a variance closest to the parameter value, and we use this distribution in the model.

Transmission rate β Is , which we allow to change every 15 days. Effectively, through the course of spreading, average β Is in every 15 days is represented by one component of x.

• Timepoint of the first observation in the sequence {o s } with respect to the initial day of the model calculation. We solve the ODEs for the model with the initial E state probability equal to 0.005.

• Parameters F which is defined in section 4.1, and relates to the structure of the close contact network layer.

Following the procedure we explained in this section, we generated sequences with 1 million samples of the open parameters. Fig. 7a shows a histogram of the samples' absolute error S. The output of the spreading model for a sample with the minimum error of 330 is shown in Fig.  7b where we have plotted the population of students in the D, C, RD and RC states and compared it with the reported students' positive cases.

To estimate the effectiveness of contact tracing, we set the contact tracing rate in the model to zero, λ tr = 0, and for each sample of the open parameters we recalculated the total population of infected students at the day of last observation. Next, we calculated the ratio of this population to the same population in the original model where λ tr = 1.5 day −1 , and we use this ratio to quantify effectiveness of contact tracing. Histogram of this effectiveness is shown in Fig. 7c . When we examine the distribution of samples error in Fig. 7a , we notice the distribution has a long tail. This is because, in the sampling algorithm, we chose f (x) = S(x) −2 . In fact, we could instead use an exponential function that would not generate a long tail distribution, but the resulting Markov chain could possibly stay in a local minima for a long time.

To eliminate the effect of samples with large errors, in our estimation of parameters, we only use samples with error S smaller than 5000. This effectively truncates the original distribution. Fig. 7d shows the distribution of contact tracing effectiveness when we only use samples with error less than 5000. The mean for this distribution is 3.35 which is slightly different from that of the distribution in Fig. 7c which is 2.94.

Moreover, from each sample we can extracted k Is for the corresponding Erlang distribution of the infectious period. The histogram of k Is is plotted in the Fig. 8b . This shows the most probable distribution for the duration of the Is state is exponential. We have also shown the distribution of the infectious period mean for the Is state in Fig. 8a . The mean for this distribution is 1.5 days which is significantly shorter than the infectious period for the Ia state. The reason for this might be the vigilance of population and availability of testing. Fig. 8c shows the distribution for the network parameter F, which has mean of 0.6. We have also shown the Interquartile range and the estimated median for β Is in Fig. 8d . In this plot we can see a range of values for β Is , but if we assume β Is = 0.1, the factor β Is p Is k Is λ −1 Is + β Ia p Ia k Ia λ −1 Ia in the threshold equation 13 becomes 0.24. Considering the fact that the largest eigenvalue of a network is larger than its minimum node degree, we can deduce that if the individuals in a population have more than four close contacts the threshold condition for the epidemic die-out will not satisfy. This is especially significant because in the calculation we have assumed the mean of infectious period for the Is state is only 1.5 days which is very small.

In this work, we have developed and investigated a novel non-Markovian network-based model to assess the effectiveness of contact tracing and shed light on conditions to contain a COVID-19 outbreak. In particular, using equation 5 for the epidemic threshold, it is clear the quantitatively relevant impact of the asymptomatic infected individuals in the spreading process. Furthermore, when considering homogeneous contact patterns, even limiting the number of contacts to a very small number (i.e.,4-5) per individual will not be enough to contain the epidemic. We have evaluated contact tracing with a two-layer network, a regular-contact layer and a casual-contact layer. Extensive simulations show that contact tracing can be very effective by reducing the size of the epidemic more than three-fold. This result takes into account that 90% of the regular closed contacts are traced and only 30% of the casual contacts are traced. In our model, the duration of the symptomatic and asymptomatic infection periods are evaluated at 1.5 days mean for symptomatic individuals due to prompt isolation, while at 6 days for asymptomatic not-detected ones. Since the threshold equation is derived using a mean-field first-order closure approximation, future work will explore higher order closure approximations evaluating the trade-off between improved accuracy and increased costs. Overall, we have proposed a non-Markovian model for evaluating SARS-CoV-2 transmission containment strategies. We believe these model types offer more flexibility to incorporate any type of transition time distributions, hence improving the model's reliability for SARS-CoV-2 and other pathogens.

Covid-19 contact tracing: challenges and future directions

Impact of delays on effectiveness of contact tracing strategies for covid-19: a modelling study

Effect of manual and digital contact tracing on covid-19 outbreaks: a study on empirical contact data

Efficacy of contact tracing for the containment of the 2019 novel coronavirus (covid-19)

Effectiveness of isolation, testing, contact tracing, and physical distancing on reducing transmission of sars-cov-2 in different settings: a mathematical modelling study

Contact tracing assessment of covid-19 transmission dynamics in taiwan and risk at different exposure periods before and after symptom onset

Contact tracing evaluation for covid-19 transmission in the different movement levels of a rural college town in the usa

Epidemic models of contact tracing: systematic review of transmission studies of severe acute respiratory syndrome and middle east respiratory syndrome

High contagiousness and rapid spread of severe acute respiratory syndrome coronavirus 2

Incubation period of 2019 novel coronavirus (2019-ncov) infections among travellers from wuhan, china

A general class of spreading processes with non-markovian dynamics

Meanfield models for non-markovian epidemics on networks

Exact and approximate moment closures for non-markovian network epidemics

Simulating non-markovian stochastic processes

The n-intertwined sis epidemic network model

Generalized epidemic mean-field model for spreading processes over multilayer complex networks

Epidemic processes in complex networks

Spread of epidemic disease on networks

Absence of epidemic threshold in scale-free networks with degree correlations

Epidemic thresholds in real networks

Localization and spreading of diseases in complex networks

The construction of next-generation matrices for compartmental epidemic models

Mathematical epidemiology of infectious diseases: model building, analysis and interpretation

Finding community structure in networks using the eigenvectors of matrices

Sars-cov-2 transmission from people without covid-19 symptoms

Evidence for pre-symptomatic transmission of coronavirus disease 2019 (covid-19) in china

Sars-cov-2, sars-cov, and mers-cov viral load dynamics, duration of viral shedding, and infectiousness: a systematic review and meta-analysis

This material is based on work supported by the National Science Foundation under Grant No.2027336 IIS.

In this expression, l enumerates three different public spaces we mentioned before, g(i) represents group assignment of node i and N is the total population. C l is a coefficient that when is multiplied by H g(i) l H g(j) l β, gives an estimate of infection transmission rate between nodes i and j in public space l, assuming β is transmission rate for a close contact link in the layer L 1 . Indeed, if T l is the total hours per week that the public space is active H g(i) l H g(j) l /(7T l ) is daily expected hours that nodes i and j overlap in such a location. Moreover, if we assume the effective daily duration of close contact is eight hours, H g(i) l