key: cord-0547213-73ppdqus
authors: Bu, Fan; Aiello, Allison E.; Volfovsky, Alexander; Xu, Jason
title: Likelihood-based inference for partially observed stochastic epidemics with individual heterogeneity
date: 2021-12-15
journal: nan
DOI: nan
sha: 3b4a82762b9e551f6e78e20848663a128782b4bd
doc_id: 547213
cord_uid: 73ppdqus

We develop a stochastic epidemic model progressing over dynamic networks, where infection rates are heterogeneous and may vary with individual-level covariates. The joint dynamics are modeled as a continuous-time Markov chain such that disease transmission is constrained by the contact network structure, and network evolution is in turn influenced by individual disease statuses. To accommodate partial epidemic observations commonly seen in real-world data, we propose a likelihood-based inference method based on the stochastic EM algorithm, introducing key innovations that include efficient conditional samplers for imputing missing infection and recovery times which respect the dynamic contact network. Experiments on both synthetic and real datasets demonstrate that our inference method can accurately and efficiently recover model parameters and provide valuable insight at the presence of unobserved disease episodes in epidemic data.

Modern epidemiological studies seek to understand disease dynamics, evaluate intervention strategies and differentiate between population level and individual level effects. A traditional approach to the modeling of infectious disease relies on mechanistic compartmental models, where only the summary of disease statuses of individuals in the population plays a role in understanding the disease dynamics. Examples of such mechanistic compartmental models abound in the epidemiology and mathematical biology literature, e.g. the susceptibleinfectious-recovered (SIR) model (Kermack and McKendrick, 1927) . The majority of these simplify disease transmission to a population level event -as such these models are posed to answer population level questions about disease outbreaks (e.g., "will the outbreak end?"), but cannot resolve those at the individual level (e.g.,"what is my risk of infection?"). This is exemplified by the "random mixing" assumption that underpins many of these models. Under this assumption any infectious individual can transmit the disease to any other susceptible individual with equal chances. However, it is clear that the contact network of individuals plays an integral role in disease transmission and that interventions on individual behavior can change the overall dynamics of an outbreak (Eames and Keeling, 2003; Kiss et al., 2006; Lunz et al., 2021) .

The literature on epidemics diffusing through networks has been greatly bolstered during the SARS-COV-2 pandemic, and a myriad of mechanistic models attempting to capture both the network and disease characteristics have been proposed (Ferguson et al., 2020; Cencetti et al., 2020; Nielsen et al., 2021; Small and Cavanagh, 2020; Skums et al., 2020; Lee et al., 2020; Soriano-Arandes et al., 2021) . While a number of these make use of available mobility data collected through powered mobile devices, they still largely operate on the population level with disease dynamics considered at the county or zip-code level. At the same time, highresolution data collected at the individual level are becoming available, requiring new model development. For example, in Figure 1 we plot the transmission and interaction dynamics of individuals on a college campus, a snippet from the eX-FLU data we analyze in detail in Section 6.2.

A number of approaches to individual-level modeling have been introduced in the literature. Several papers have introduced a graph-coupled Hidden Markov Model to account for changes in the infectiousness state of individuals (Dong et al., 2012; Fan et al., 2015 Fan et al., , 2016 , and there have been developments in agent-based disease transmission models that consider covariates associated with infections (Touloupou et al., 2020; Ju et al., 2021) . Such individual-level inference is very computationally challenging and becomes intractable for even moderately sized datasets with few predictors of interest. Bu et al. (2020) introduced an individual level framework for stochastic epidemic processes, where contact information about infections and individual-to-individual interactions are nearly-completely observed. Specifically, the proposed approach introduced an exact sampler when exact recovery times are unobserved that relies on fully observing the infection times, building on earlier versions of individual-level stochastic models that do not model the network (Auranen et al., 2000; Cauchemez et al., 2006; Hoti et al., 2009; Britton, 2010) .

We note that many of these prior approaches assume that the epidemic process is fully and exactly observed, so inference may proceed via the likelihood of the complete data. However, available epidemic data typically only provide a partial view on exposure, infection and recovery times -the exposure time is often "latent" because of the incubation or latency period, and infection and recovery times are often not fully kept track of due to limited resolution of data collection or failure of follow-up. Even in considerably rich and high-throughput modern datasets such as the eX-FLU study that we analyze in this paper (Aiello et al., 2016) , the true exposure times are not observed even though infections can be inferred via daily symptom reporting, and the exact recovery times are not available because only weekly updates were obtained for influenza recoveries. Recent work carries out inference from partial epidemic data using the marginal likelihood under simpler SIR or SIS models Ju et al., 2021) , but the techniques are computationally intensive and difficult to extend to account for the factors (e.g., the contact network structure) we seek to model. Many infectious diseases such as influenza and COVID-19 have a substantial incubation or latency period; our contributions both incorporate a latency period and propose an inference procedure to account for the unknown infection and recovery times through latent variables.

Lastly, contemporary studies collect information on covariates such as hygiene habits, vaccination statuses, and preventative measures and disease control, which go beyond the simple infection states and contact tracing information that were previously available. The model and procedure of Bu et al. (2020) consider the binary contact status between individuals for modeling infection rates but more flexibility is needed to accommodate heterogeneity in epidemic rate parameters as a function of multiple covariates. As a concrete example, consider the role of hygiene habits on disease spread. Previous studies have shown that good hygiene habits such as frequent hand-washing can help reduce the transmission of infectious diseases (Aiello et al., 2010; Hübner et al., 2013; Hovi et al., 2017; Thompson and Rew, 2015) . These studies are mostly randomized trials without closed and interacting populations and have thus been analyzed using simple two-sample t-tests or randomization tests Arbogast et al., 2016; Savolainen-Kopra et al., 2012) . While this allows us to get a sense of whether hygiene is important overall, it does not quantify the effect of individual hygiene behavior on disease transmission within a joint inference procedure.

In our motivating dataset, the eX-FLU study of influenza-like-illnesses on a college campus (Aiello et al., 2016) , raw counts of hand-washers and infection cases suggest that 37% of those who did not optimally * wash their hands experienced flu-like-illnesses, while only 24% of those who did became sick during the study. However, a Fisher's exact test of proportions does not detect this as a significant difference, which is unsurprising as many of the observations are dependent through a network. Inspecting the local network of an infected individual and (manually) tracing the disease transmission over a few weeks (see an illustration in Figure 1 ) suggests that optimal hand-washers are indeed less likely to contract the disease, even after contact with infectious individuals. This exploration illustrates how individual-level information affects the complex dependencies between the epidemic process and the contact network in a way that goes undetected in population-level summaries, motivating a framework that directly models individual covariates into the transmission mechanism.

week 7 Figure 1 : Weekly aggregated contact networks for select participants in the eX-FLU study. Healthy and sick individuals are marked in blue and red respectively. Individuals who wash their hands optimally are marked by squares, while those who don't are marked by circles. Here we present the contact networks centered at person 19 from week 4 to week 7 of the study. In week 4, person 19 is infectious, and by week 5 four of his neighbors are infected; later on person 23 and person 7 keep infecting their neighbors and thus spread the disease onto person 99 and 14 by week 7. It is notable that all subsequent infections after 19 are for those who do not wash their hands optimally. Moreover, it appears that an individual tends to lose previous contacts after getting sick.

The model proposed in Section 2 extends the literature on continuous-time Markov processes to accommodate a contact network, latency period, more missingness in epidemic observations, and individual-level covariates for heterogeneous disease transmission dynamics. We choose to build on the stochastic SEIR model, a variation on the widely used SIR model that explicitly considers the latency period. This flexibility comes at a cost -exact inference becomes intractable because of the complex dependencies between the covariates, disease process and missingness mechanism. To address this, we derive a stochastic expectation-maximization (stEM) algorithm that can exploit the complete data likelihood while augmenting the data through exact conditional sampling (Nielsen et al., 2000) . This approach is made computationally tractable due to three key realizations: many of the required computations can be cast as offset Poisson regressions which can be solved efficiently, a rejection sampler for missing exposure times can be deployed naively in parallel, and an exact sampler for missing recovery times can be adapted from Bu et al. (2020) . We further leverage the theoretical guarantees for stEM to compute conservative asymptotic variance estimates.

The remainder of this paper is organized as follows: In the next section, we introduce our model framework. Sections 3 and 4 discuss our proposed inference procedures for complete data and partial data. We evaluate the performance of the proposed inference methods through simulation experiments in Section 5, and finally apply our model to analyzing the eX-FLU dataset in Section 6.

We adopt a stochastic compartmental model for epidemics, where all members of the target population are divided into non-overlapping subsets related to their disease statuses, and the mechanism of disease spread is described by the transition between disease statuses for each individual. We base our epidemic model on the SEIR model with four disease statuses: S (susceptible), E (exposed), I (infectious), and R (recovered or removed). An S individual may get exposed (and thus become an E person) upon contact with an I individual, and an infectious (I) person will eventually recover and transition to the R status. In this model, the E status resembles a latency period and does not entail any transmissibility, and a recovered person enjoys immunity to the disease and therefore no longer contributes to the contagion process.

These disease spread dynamics evolve as a continuous-time Markov chain (CTMC) defined through exponentially distributed waiting times between consecutive events (Guttorp, 2018) . This implies that the disease process progresses as a series of competing Poisson processes at the individual level. For example, suppose β ij is the rate of exposure between an infectious person i and an susceptible person j who are in contact at time t. Then the probability of j getting exposed (thus becoming an E person) at time t + h for small h > 0 is P r(j gets exposed by

Given that the contact structure of the population is subject to change in time as well, we extend the CTMC model to the dynamics of the contact network. For any pair of individuals i and j, they either share an undirected contact link ("connected") or they do not ("disconnected"); the contact network can thus be represented by a binary symmetric matrix W called the adjacency matrix. Its dynamics are described at the pairwise level, where each entry W ij evolves as a CTMC that takes values in {0, 1}. For example, if individuals i and j are disconnected at time t, with link creation rate α ij , the probability of them engaging in contact by time t + h for small h > 0 is

We will consider heterogeneous exposure rates, allowing individual characteristics and network information to play a substantive role in transmission probabilities. Moreover, we consider two types of infectious individuals -these can be thought of as symptomatic and asymptomatic caseswho exert different transmission forces on the population. This is summarized in Figure 2 . Figure 2 : Diagram of the epidemic process: an extension of the stochastic SEIR model, with heterogeneous exposure rates and two sub-types of infectives. Disease transmission (exposure) is conditioned on pairwise contact status in the dynamic contact network.

Model Specification. Our goal is to construct an individualized framework that can capture the interplay between the epidemic process and the evolution of the contact network. Disease transmission relies on contact between individuals, while the change in contacts, in turn, depends on individual disease statuses. To do so, we model the joint evolution of the epidemic and network processes as the combination of continuous-time Markov chains for individuals (or pairs of individuals) in the population. At any time point t in the process, conditioned on the current status of the process Z t , five types of events may occur for an individual or a pair of individuals: exposure (an S person becomes exposed by an I person), manifestation (an E person becomes infectious after a latency period), recovery (an I person recovers and becomes a R person), link activation (a previously disconnected pair get connected in the network), and link termination (a previously connected pair break off their contact).

We accommodate different types of individual heterogeneity in the disease transmission dynamics: (1) people may exhibit different levels of susceptibility that can be explained by individual characteristics such as health conditions, hygiene habits, and behavioral choices;

(2) those who are infectious might not be equally contagious to the susceptible population (e.g., symptomatic and asymptomatic cases for COVID-19); (3) contact rates in the network may vary in time to reflect phases of social intervention and/or behavioral changes as response to an epidemic (e.g., a pre-and post-lockdown period denoted by T 0 and T 1 , respectively); (4) contact rates in the network may vary between pairs of individuals based on their healthy (denoted by H, the collection of S, E, and R statuses) versus infectious (denoted by I) status.

Combining the above descriptions, we design the model framework as follows. For any members i and j in the target population, given the current system state Z t at time t > 0, one of the following five events may occur after an exponentially distributed waiting time with the associated rate (see Figure 2 for a summary of the epidemic process):

• Exposure. If i is infectious, j is susceptible, and they are in contact at time t, then i exposes j with instantaneous rate β ijt that can be decomposed as

where β is the baseline exposure rate, η i (t) represents i's contagiousness level at time t (details in the next item), and b S are the regression coefficients on j's individual characteristics x j that account for additional heterogeneous effects of susceptibility. (For example, x j can characterize if person j has been vaccinated, washes their hands frequently and/or wears a mask during the flu season, which may reduce j's exposure risk.)

• Manifestation. If i (or j) is exposed, then he or she becomes infectious with rate ϕ.

For additional generality we assume that any infective (status I person) gets assigned to one of two categories with different levels of contagiousness: I s ("symptomatic") or I a ("asymptomatic" or "less symptomatic"). Individuals are assigned to the first category with probability p s and to the second with probability (1 − p s ). † With the two-type infective setup, the η i (t) term in Eq. (3) can be written as

which means that an I s person is on average e η times more infectious than an I a individual.

• Recovery. If i (or j) is infectious, then he or she recovers with rate γ.

• Link activation. If i and j are not in contact, then they get into contact with rate α ijt , where

where A it is the healthy or infectious status of person i at time t and α ABk stands for the activation rate of link type A − B in phase T k (A, B ∈ {H, I} and k ∈ {0, 1}).

• Link termination. If i and j are in contact, then they break off the contact with rate ω ijt , where

and ω ABk stands for the termination rate of link type A − B in phase T k .

Here T 0 and T 1 represent two time phases of different social behaviors that partition the entire observation time window (0, T ) (e.g., T 0 and T 1 can be intermittent lockdown and no-lockdown phases).

The link rates are dependent on the individual disease statuses H or I since we wish to characterize the social adaptation behavior in response of an epidemic -for instance, a healthyinfectious (H − I) pair that are in contact might be more likely to disconnect from each other than a pair of healthy individuals to avoid disease transmission. Further, since we assume the contact network is symmetric and only dependent on healthy or infectious statuses, the link rates satisfy α HI k = α IH k and ω HI k = ω IH k for k = 0, 1.

Note that we can also include individual-level covariates in the link activation and termination rates (α ijt and ω ijt ) to allow for more heterogeneity in the network dynamics, but choose to focus on individual variability in the epidemic process in the main text, and relegate details of heterogeneous network link rates to Section S1 of the Supplementary Material (Bu et al., 2021) .

In this section, we demonstrate how to learn model parameters in the missing data setting. Though our focus will be on inference in the partially observed regime, we begin by describing key inferential terms related to the complete data setting as these quantities will play a role in our stochastic EM algorithm.

A complete dataset refers to the fullly observed event sequence between time 0 and maximum time T (> 0) of one realization from the generative model. In particular, if we were to continuously observe an epidemic as it progresses under this model, we would have access to (1) exact times of exposure (t 

= β n E γ n R ϕ n I p n Is s (1 − p s ) n Ia i:i got exposed Table 1 summarizes the notation used in the expressions above. Since the generative model is a CTMC comprised of individual-level Poisson processes, the above likelihood can be decomposed into epidemic-related components (1st and 3rd lines above) and network-related components (2nd and 4th lines), where each component is simply the product of all the exponential rates for inter-event times. Some of those inter-event rates can be considered at the population level, and so only require bookkeeping of aggregated counts (e.g., n E = total number of exposed cases , E(t) = number of E people at time t, and C HI0 = number of link activation events for H-I pairs in phase T 0 ). Other inter-event rates, however, depend on individual-level information, and so require tracking terms such as the time-varying neighborhood structure and the exposure and manifestation times for each individual. Note that, for consistency, we set individual i's exposure time t (E) i and manifestation time t (I) i to T if i has never been exposed or manifested. n Is , n Ia , n E , n I , n R total number of I s , I a , exposed (E), infectious (I) and recovered (R) cases 

Despite a lengthy likelihood expression, parameter estimation is straightforward when complete data are available. We can obtain closed-form maximum likelihood estimates (MLEs) for most of the parameters, and find the remaining MLEs for parameters β, η and b S through a simple numerical procedure. This suggests that likelihood-based inference given completely observed data is easily implementable through a few lines of code, and can be modularized toward inference when some data are missing (discussed in the next section).

The MLEs of all parameters can be obtained by taking partial derivatives of the log-likelihood (denoted by ) as follows and setting them all to zero:

∂ ∂b S = i:i got exposed

∂ ∂e η = i:i got exposed

The MLEs for parameters ϕ, p s , γ and α ABk ,

, while the MLEs for β, η and b S can be solved from equations (8)-(10) numerically. An efficient iterative procedure is detailed in Section S2 of the Supplementary Material (Bu et al., 2021) ; in particular, steps for solving the MLE of b S can be largely simplified based on the observation that it is equivalent to solving for the linear coefficient of a Poisson regression model with individual offset.

As demonstrated in the previous subsection, parameter estimation is relatively straightforward when the full event sequence is observed. However, as discussed in the introduction, real-world epidemic data rarely rarely include measurements of each epidemic event. In the case of the eX-FLU study, true exposure times are not available even though the data contain daily symptom reports. This is because there is typically an incubation period for people who contract the flu. Similarly, exact recovery times are not available in the eX-FLU study, with recoveries discernible only at a weekly resolution from epidemic surveys. Therefore, we need to consider inference with partially observed epidemic data, in particular with exposure times and recovery times unknown.

To this end, we derive a method based on the stochastic expectation-maximization (stEM) algorithm (Celeux, 1985) . Expectation maximization (EM) offers an approach to efficiently carry out maximum likelihood estimation for continuous-time Markov chain models in missing data settings (Doss et al., 2013; Guttorp, 2018) . Imputing the missing data in the E-step requires access to the conditional expectation, and stEM is a variant that builds an approximation to the conditional expectation using augmented data obtained via conditional simulation. To be more precise, let X denote the observed data and Z be the missing data; a general outline of the stochastic EM algorithm for estimating θ is as follows:

For s = 1 : maxIter, do

• (E-step) draw one sample of missing data, Z (s) from its conditional distribution p(Z | X, θ (s−1) ), and then let Q(θ | θ (s−1) ) = log L(θ; X, Z (s) );

• (M-step) maximize with respect to target function Q(θ | θ (s) ) to update θ:

There are two advantages of this approach. First, in the E-step, integrating to obtain an expected log-likelihood (as in the traditional EM algorithm) is replaced by sampling, which avoids the often intractable marginalization step in the case of complex models (Renshaw, 2015; Stutz et al., 2021) . Second, the M-step simply requires solving for the MLEs given a version of the complete data, which is often straightforward, and has been discussed in the previous subsection for the present setting.

These advantages come at the cost of a potential challenge: we have to conditionally sample the missing data given our observed data and current parameter estimates. In our framework, this is equivalent to sampling event times of a continuous-time Markov chain conditioned on end-points, a notably difficult problem (Hobolth and Stone, 2009; Rao and Teg, 2013) . The following section will focus on the conditional sampling (or "imputing") of missing data, i.e., missing exposure and recovery times.

Let t (E) and t (R) denote all missing exposure times and recovery times, respectively. We assume that all manifestation times {t (I) i } are observed, ‡ and there is no missingness in the contact network events given high-resolution contact-tracing. Thus, our inference procedure with partial epidemic observations can be outlined as follows.

For s = 1 : maxIter, do 1. sample missing exposure times t (E) (s) from their joint conditional distribution p(t (E) | observed events, t (R) (s−1) , Θ (s−1) ); ‡ This assumption is reasonable as manifestation times can be collected via daily symptom monitoring or frequent screening or testing.

3. combine the sampled event times in Steps 1 and 2 with observed data to form an augmented dataset, then solve for the MLEs with the augmented dataset to get updated parameter estimates Θ (s) .

Step 3 is already addressed in Section 3.1, we now address Steps 1 and 2 separately.

Step 1: conditional sampling of missing exposure times

Inspecting the complete data likelihood in Eq. (7) reveals that given all the other event times, person i's exposure time t is independent from other individuals' exposure times, and thus the joint conditional density for t (E) can be factorized into individual density components of exposure. Thus, it suffices to derive the conditional density for i's exposure time t

i , which is assumed to lie within a "plausible" latency interval L i = (t i min , t i max ). Note it is always valid to choose this interval as L i = (0, t (I) i ), meaning that i's exposure time may occur any time after time 0 and before i's manifestation time. In practice, one may alternatively specify a shorter plausible interval based on prior knowledge about latency duration, which can improve computational efficiency. § Then, i's instantaneous risk function of exposure and contracting the disease during L i can be written as (here let

This exposure hazard is a step function with change points occurring when either (1) i activates a link with an I s or I a person, (2) i deactivates a link with an I s or I a person, (3) one of i's contacts enters status I s or I a , or (4) one of i's contacts exits status I s or I a . For person i, denote this set of change points

Thus, the conditional density for i's exposure time t can be expressed as

where the normalizing constant

We derive a rejection sampler for the missing exposure time t

§ For example, if we believe that latency should be longer than 2 days but shorter than 2 weeks, then we may set t i min = max(0, t

. ¶ Here we suppress the notation that tj is associated with individual i to avoid double subscripts, focusing on one individual during exposition.

which is the density function of a truncated inhomogeneous Exponential distribution with rate λ i (t). A rejection sampler for t (E) i runs in two steps:

1. Sample t from q i (t), an inhomogeneous Exponential with rate λ i (t) truncated on L i : (a) sample an interval A j (recall that the risk function is constant on each interval) via

where we denote the length of A j by len(A j ).

2. Compute the acceptance probability for t by (here M > 1 is a constant)

and draw U ∼ U nif (0, 1); accept t as a sample of t

, and otherwise go back to Step 1 and repeat.

A full derivation of the above (importantly showing that M > 1) is provided in Section S3 of the Supplementary Material (Bu et al., 2021) . Note that it is easy to generalize to a setting with other choices of the plausible interval bounds t i min and t i max , which simply entails changing the limits of integration in the above derivations.

Step 1 of the stEM procedure can be carried out by running the rejection sampler above, and we may further speed up computation by running the sampler for each person i in parallel. In the simulations in Section 5.2, the rejection sampler accepts approximately 45% of proposals.

Every infectious individual recovers with rate γ independently of other members in the population, but when conditionally sampling missing recovery times, we have to make sure that the imputed timepoints are compatible with observed data and the sampled exposure times (Cauchemez and Ferguson, 2008; Fintzi et al., 2017) . The conditional samples of missing recovery times should satisfy two conditions: first of all, an individual q cannot recover before a time t if q is known to be still infectious by t; and more importantly, if another individual p gets exposed during his contact with q, then the recovery time for q cannot leave p with no possible infection source.

Sampling missing recovery times, therefore, amounts to conditionally sampling event times with endpoints restricted by count and contact data. This challenging task was addressed by the DARCI algorithm developed in Bu et al. (2020) (Proposition 4.2) for a simpler epidemic model with only one type of infectives. Here, we cannot directly use DARCI because the different transmissibility levels of I a and I s individuals must be taken into account when we consider the possible ranges of recovery times to ensure the existence of viable infection sources for those exposed. Instead, we adapt the DARCI algorithm to accommodate our two types of infectives. Below we discuss the details of the modified DARCI procedure.

Utilizing the Markov property, this algorithm first segments the observation window (0, T ] into contiguous, non-overlapping time intervals. On each time interval (u, v] , the disease statuses of all people are known at endpoints u and v, and thus we know the set of people Q who should recover during (u, v] . ‖ Further, conditioned on the sampled exposure times, we also know the infection/exposure cases and their exposure times during (u, v] and let these individuals be P. We sample the missing recovery times {t (R) q } q∈Q in the following steps: 1. Initialize a vector of "feasible lower-bounds" LB of length |Q| with LB q = u for every q ∈ Q; for any p ∈ P ∩ Q, further set LB p = i p , where i p is p's exposure time;

2. Arrange the set of exposed individuals P in the order of {p 1 , p 2 , . . . , p |P| } such that their exposure times i p 1 < i p 2 < . . . < i p |P| , and for each p ∈ P (chosen in the arranged order), examine p's "potential infectious neighborhood"

where N p (t) is the set of p's neighbors at time t, and I(t) is the set of known infectious individuals at time t . If I p ⊂ Q (i.e., potential infection sources are all members of Q), then select one q ∈ I p , with probability

and set LB q = i p .

3. Draw recovery times t

is a truncated Exponential distribution with rate γ and truncated on the interval (s, t).

Despite the dense notation, the intuition of this sampling algorithm is straightforward: if person q is the only possible infection source of person p, then q should wait until p gets exposed before he or she recovers, so that the resulting augmented data are consistent with the observed data.

We note that the conditional sampling of recovery times is parallelizable as well, since operations on each interval (u, v] are independent and thus can be run in parallel.

While the algorithm proposed above provides a way to estimate Θ, we may further quantify uncertainty in our estimates by leveraging expressions for their asymptotic variances by appealing to results established in Nielsen et al. (2000) .

LetΘ denote the parameter estimates from the stochastic EM algorithm, and Θ 0 be the true parameter values. Then the asymptotic variance matrix ofΘ is

where I(Θ 0 ) is the information matrix of the observed data evaluated at the true parameter values, F (Θ 0 ) is a matrix representing the fraction of missing information due to partial observations, and I p is the p-dimensional identity matrix. We may decompose Eq. (21) into two parts: the first term is the asymptotic variance of the observed data MLE, while the second term accounts for the additional variance arising from stochastic simulations (i.e., the conditional sampling). It can be shown that the latter contribution increases the variance by no more than 50% over the observed data MLE (Proposition 4 in Nielsen et al. (2000)).

Moreover, the first term in the asymptotic variance formula can be regarded as a constant given a model and a fixed sample size, but the second term can be reduced by averaging over either across iterations in one run, or over multiple runs of the algorithm. That is, to improve efficiency, we may adopt two strategies:

1. Take the average of the last m iterations and take the averaged parameter values as the final estimates; denote such estimates byΘ (it) (m), and then its asymptotic variance is (see Proposition 5 in Nielsen et al. (2000) )

2. Take m independent runs of the stochastic EM algorithm and take the average of the m estimates produced by each run; denote such estimates byΘ (ind) (m), and then its asymptotic variance is (see Section 4.2.2 in Nielsen et al. (2000))

For instance, if we run the stochastic-EM procedure m = 10 times in parallel and take the average of the estimates produced by those 10 independent runs, then the asymptotic variance of these estimates is bounded above by (1 + 1 10 × 0.5)I(Θ 0 ) −1 = 1.05I(Θ 0 ) −1 . We can further plug in the estimatesΘ (ind) (10) for Θ 0 to get a variance estimate of 1.05I Θ (ind) (10) −1 .

In the simulations in Section 5.2 these approximations to the variance are conservative, but not for all parameters. While the Wald-type intervals for most parameters cover the truth nearly 100% of the time, the β and η in our simulations are covered 93% and 81% of the time, respectively. Because the marginal likelihood of observed data isn't available, the observed data information matrix I(·) is not immediately obtainable. However, we can nonetheless estimate I(·) using the Louis identity (Louis, 1982 

where obs is the (marginal) log-likelihood of the observed data, and is the log-likelihood of the complete data. Both terms of the right-hand side only involve the complete data likelihood and can be estimated via Monte Carlo approximation using the augmented data samples generated across the iterations of our inference procedure (Diebolt and Ip, 1995) . That is, these samples are already available from the estimation process, and so evaluating Eq. (24) does not require additional sampling.

We now turn to validate the methods of inference via simulation experiments. Synthetic data are simulated using the Gillespie algorithm (Gillespie, 1977) according to the generative process described in Section 2. A generated complete dataset includes the initial contact network G 0 , the initial exposed/infectious individual(s) I 0 , and the full event sequence {e }, where each event e consists of the event time, the identities of the individuals involved in this event, as well as the event type (exposure, manifestation, recovery, link activation and termination).

In addition, we assume that for each person i in the target population, we observe a vector x i of covariates, which in simulation experiments are randomly sampled binary and standard normal variables. The full set of parameters to estimate is Θ = {β, b S , η, ϕ, γ, α, ω}. For simplicity, throughout this section, we use the following ground-truth setting for the link rates: α T = (α HH0 , α HI0 , α II0 , α HH1 , α HI1 , α II1 ) = (6, 6, 6, 6, 2, 6) × 10 −4 ; ω T = (ω HH0 , ω HI0 , ω II0 , ω HH1 , ω HI1 , ω II1 ) = (5, 5, 5, 5, 50, 5) 

That is, we assume that in the second social behavior stage T 1 , α HI (link activation for H − I pair) is reduced and ω HI (link termination for H − I pair) is increased, mimicking a "quarantine" or "lockdown" phase. The initial network G 0 is a random Erdős-Rényi graph with edge density 0.05. Parameters for the epidemic process are chosen as follows:

These values are chosen to ensure a low probability of epidemic extinction at the beginning and so that a fair proportion (at least 50%) of the population has been exposed by the end of the outbreak.

We first validate our iterative inference procedure for complete data, as derived in Section 3.1. Table 2 presents the mean absolute errors (MAEs), variances and mean square errors (MSEs) for all model parameters across 40 simulations on N = 200-sized populations. Here we present results for e η (true value ≈ 1.22) instead of η as e η is the quantity that we directly solve for in inference. In the simulation results presented below, the regression coefficient b S is set as (1, 1) T .

We can see from Table 2 that when the complete event sequence is available, the derived inference method can estimate model parameters quite accurately. The most challenging parameters to estimate are η (shown in terms of e η in the table) and b S , largely because we have to resort to numerical optimization procedures to solve for their MLEs.

Our inference method can accommodate two types of missingness in epidemic observations: (1) exposure times and (2) recovery times. Between these two types of missingness, the former is more difficult to handle as it requires the conditional sampling step derived in Section 4.1.

First, we test out the central part of our inference procedure. To do this, we hold out all the exposure times (i.e., every t (E) i for each person i who ever got infected) from each simulated dataset and treat those time points as unobserved while considering all other information as observed. We then run the proposed stochastic EM algorithm (without the "sampling recovery times" step) on each partial dataset. Columns 2-4 in Table 3 (under "Missing expo. times") summarize the estimation results for some of the model parameters across 40 independent simulations. Same as in Table 2 , we provide the MAE, variance and MSE for the estimates. Next, on the same simulated data, we hold out all the recovery times (i.e., t (R) i for each person i who ever became infectious) and treat both t for each infected person i as unobserved. Now the full stochastic EM inference algorithm is applied to each partial dataset, and the results are summarized in columns 5-7 in Table 3 (under "Missing both") .

As one might expect, compared to inference based on complete data, there is some decrease in accuracy, in particular for e η , b S and β. This decrease in performance is largely due to the variability in the sequential samples of individual local neighborhoods. That is, when the exposure time t (E) i is unknown and has to be conditionally sampled, the local neighborhood structure for each person i (i.e., I a i (t

i )) is also unknown and fluctuates throughout the rest of the sampler. This would greatly impact the numerical optimization procedure for these three parameters, as they either directly depend on I a i (t

which changes whenever t (E) i gets updated. Moreover, when the exposure times are unknown, the accumulated amount of infection forces exerted on each susceptible person is also unavailable, which would make solving for β and b S more challenging (see Section S2 of the Supplementary Material (Bu et al., 2021) for details on the numerical optimization).

We also note that when recovery times are missing in addition to exposure times, there tends to be more variability in the parameter estimates, since more missingness in the data should tend to induce increased uncertainty. Much of the additional uncertainty is reflected in the estimation of exposure-and latency-related parameters (e.g., e η , b S and ϕ), as the conditional samplers for exposure times and recovery times are co-dependent, and the complexity in estimating those parameters is more vulnerable to the loss of more information.

We further investigate the estimation of these more difficult parameters and note that estimation accuracy increases when more data are available. We demonstrate this by conducting simulation experiments for different population sizes (N = 100, 200 and 300). Since the number of epidemic events increases approximately linearly in N , more events can be observed with a larger population size. For each simulated dataset, we take out both the exposure and recovery times and run the inference algorithm, but in Step 3 (solving for the MLEs) we fix all other parameters at the true values and only estimate e η and b S . Moreover, we run a separate set of experiments where we only estimate e η (i.e., fixing b S at the truth as well).

In Figure 3 we present the MSEs of the produced estimates for e η and b S when fixing other parameters (shown as "b_S" and "exp(eta)" in red circles and green triangles), and for e η only while fixing all other parameters (shown as "exp(eta) only" in blue squares). We can see that with increased population size, which yields more observed events, the error in estimating these parameters decreases.

We now return to the study of transmission of influenza-like illnesses among students on a university campus, where high-resolution contact tracing was conducted to track physical proximity between study subjects. This dataset was collected over a 10-week epidemiological study, eX-FLU (Aiello et al., 2016) , where inter-personal physical contacts of study participants were surveyed to investigate the effect of social intervention on respiratory infection transmissions. 590 university students enrolled in the study and were asked to respond to weekly surveys on influenza-like illness symptoms and social interactions; they also completed a comprehensive entry survey about demographic information, lifestyles, immunization history, health-related habits, and tendencies of behavioral changes during a flu season or a hypothetical pandemic. 103 individuals among the study population were further recruited to participate in a sub-study in which each study subject was provided a smartphone equipped with an application, iEpi. This Here, when running Step 3 of the inference algorithm, we fix all other parameters at the true values and only estimate e η and b S (shown in red circles and green triangles); moreover, we run similar estimation procedures but only focus on e η (i.e., b S is fixed at the truth, shown in blue squares). It is clear that when the population size increases and more events are observed, e η and b S are estimated more accurately.

application pairs smartphones with other nearby study devices via Bluetooth and thus can record individual-level contacts (i.e., physical proximity) at five-minute intervals. The iEpi sub-study took place from January 28, 2013 to April 15, 2013 (that is, from week 2 until after week 10 in the main study). Between weeks 6 and 7, there was a one-week spring break (March 1 to March 7), during which epidemic data collection was paused and volume of recorded contacts also dropped considerably. In our application case study, we use data obtained on the N = 103 sub-study population from January 28 to April 4 (week 2 to week 10), and treat the two periods before and after the spring break as two different social behavior phases. That is, we regard weeks 2-6 as T 0 and weeks 7-10 as T 1 in our analysis.

Furthermore, we consider two types of "infectious" (status I) members within the study population: (1) multi-symptomatic and (2) uni-symptomatic. To maintain notation consistency, we label the former by I s and the latter by I a . They are defined as follows.

1. multi-symptomatic (I s ), a case with a cough AND one of these three symptoms: fever or feverishness, chills, or body aches. * * 2. uni-symptomatic (I a ), a case with a cough, which is an important symptom for influenza.

For each infection case, we set the reported symptom onset time as the manifestation time (denoted by t (as dictated by the design of the assumed epidemic mechanism), we set the plausible latency interval as L i = (0, t (I) i ). Using weekly surveys (which asked each participant if they felt sick in the past week), we know that the missing recovery times must lie within a 7-day interval for each individual, where the lower and upper bounds * * This is the definition of "influenza-like-illnesses". are the start and end of a week. Moreover, we assume that all the contact network events are fully observed, as the high-resolution contact tracing can provide timepoints of initiation and termination of all individual-level contacts. † † This suggests that the proposed inference procedure in Section 4 is applicable to this dataset.

In the rest of this section, we first address a realistic concern of possible external infection sources for the sub-study population, and then present details and discuss results of our data analysis.

Since the 103 individuals in the dataset are sub-sampled from the 590 study participants, which are also sub-sampled from the entire university campus population, we have to treat the data as observed in an open population instead of a closed one. Therefore, some slight modifications should be made to the model. Specifically, individuals in our target population may get infected by people who are outside of the N = 103 small population, and we call those people "external infectors".

For simplicity, we represent the joint forces of all external infectors by a single infector that exists outside of the population and exhibits a constant level of transmissibility over time, and this external force of infection is exerted uniformly on all members of the target population.

For each susceptible individual j, let the rate of disease onset (i.e., manifestation) due to external infectors be ξ j , and let this onset rate depend on individual characteristics x j , similar to our treatment of the internal exposure rate β ij :

where ξ denotes the population average external onset rate, and coefficients b E represent the effects of individual characteristics x j on subject j's deviation of susceptibility from the average level.

Here ξ j is the rate of moving from status S directly to either I a or I s , rather than from S to E, and that's why we are naming it the "external onset rate" instead of "external exposure/infection rate". We are not introducing both an exposure rate (like β ij ) and a manifestation rate (like ϕ) for external infection cases because of identifiability concerns: since all susceptible people are exposed to the same external infector with time-invariant transmissibility, the exposure rate and manifestation rate would not be identifiable at the same time when the exposure times are not observed. Thus, to ensure identification, we choose to include only one rate instead of two, and the "onset rate" can be thought of as the rate of any susceptible individual developing contagiousness due to external infection forces. Now the set of parameters is extended toΘ = {β, ϕ, γ, η, b S , ξ, b E , α, ω}, and we can write down a complete data likelihood by slightly modifying Eq. (7), where the term related to the new parameters ξ and b E are separate from the other terms. This means that introducing external cases wouldn't affect parameter estimation of the other parameters at all, and that we can still use the partial data inference procedure detailed in Section 4 to analyze the real data. We include details on complete data inference with external cases in Section S4.1 of the Supplementary Material (Bu et al., 2021 ).

Before applying our model framework to the data, we first discuss how we identify internal and external infection cases and describe the individual characteristics used in the analysis. We adopt the following labeling criteria for internal and external infection cases: if an infected person had any infectious contact (within the 103-person population) up to 2 weeks prior to symptom onset, then we label this case as "internal", and otherwise this case is labelled as "external". This procedure gives us 18 internal cases and 16 external cases in total. Moreover, among all 34 cases, 13 are multi-symptomatic (I s ) and 21 are uni-symptomatic (I a ). We provide a summary of the breakdown of all infection cases in Table 4 . We consider the following four individual-level characteristics (all collected from the entry survey) that have previously been linked to disease transmission risk: ‡ ‡ 1. flushot: a binary indicator of whether or not the study subject has taken a flu shot for this year.

2. wash_opt: a binary indicator of whether or not the study subject's hand-washing habit is considered "optimal", which is derived from survey questions about how long and how frequently one usually washes their hands.

3. change_behavior: a derived score that measures how willingly the study subject would change their lifestyle during a hypothetical pandemic; this is calculated by standardizing the numeric sum of the 0/1 scores of 13 Yes/No questions (Yes=1, No=0) about voluntary behavioral changes that essentially translate to reduced social activities or isolation in a lockdown; a higher score represents more willingness in changing one's lifestyle in response to a pandemic.

4. prevention: a derived score that measures one's belief in the effectiveness of different preventative practices in reducing the risk of catching the flu; this is calculated by standardizing the numeric sum of response scores (ranging from 1 to 5, 1=strongly disagree, 5=strongly agree) to 6 questions regarding potential preventative measures against flu transmission; a high score represents stronger belief in the effectiveness of preventative practices.

We perform 20 independent runs of the stochastic-EM inference procedure on the dataset, each time with a different random initialization and 60 burn-in steps. For each run, we take the average of the last 20 iterations (after burn-in) and then average over the 20 averages (across runs) to produce estimates of the parameters. Asymptotic standard errors are obtained using ‡ ‡ We have included the original survey questions used to calculate the derived covariates "change_behavior" and "prevention" in Sections S4.2 and S4.3 of the Supplementary Material (Bu et al., 2021) . the method described in Section 4.3; here we obtain a conservative estimate of standard errors by setting m = 20 and upper-bounding the asymptotic variance matrix by 1.025I(Θ) −1 , wherễ Θ are the final parameter estimates produced by averaging.

Tables 5 and 6 present estimates of select parameters of interest. Note that here we take one day as 1 unit of time. From the epidemic parameter estimates, we can see that for this population, the baseline exposure rate is quite high, indicating fast disease exposure upon contact (it takes approximately 0.22 days on average for an H − I contact to lead to infection if the susceptible individual is not vaccinated and does not wash hands properly); the latency period lasts slightly less than 5 days on average, while recovery from symptoms and contagiousness takes about 6 days on average. The total external infection force experienced by the entire N = 103-person population is on the scale of 0.00445 × 103 ≈ 0.458, indicating that on average there would be a disease onset due to external sources every other day if nobody in the study population had a flu shot or washed their hands optimally. In terms of the effects of individual-level covariates, we note that the estimates are associated with relatively large standard errors (indicated in the parentheses), and this is potentially due to the small sample size (in particular, the limited number of infection cases). Nevertheless, the effect of hand-washing ("wash_opt") seems to be considerable, given that there is a 11-fold reduction (1/e −2.42 ≈ 11.2) in the exposure risk if one washes their hands optimally compared to suboptimal hand-washing; this effect appears significant, in that the 95% Wald-type confidence interval constructed using the conservative standard error estimate is (−4.054, −0.786), which does not cover zero.

In Table 7 we include estimates of several parameters related to the contact network process. Here we emphasize the difference between the change rates of H − H (healthyhealthy) links and H − I (healthy-ill) links, as well as the difference between the two phases (T 0 before spring break and T 1 after). We can clearly see that the link deletion rates for H − I links are higher than those of H − H links in both phases, suggesting that the duration of contact between a healthy person and an infectious person is on average shorter than the contact between two healthy people, probably because the students were cutting meetings short with peers who seemed sick in order to avoid getting infected during the flu season. Moreover, we can clearly see that the level of network activity is much higher (both in terms of establishing and breaking contact) in T 0 (weeks 2 to 6, before spring break) compared to T 1 (weeks 7 to 10, after spring break) when we compare the rates for phase T 0 and phase T 1 . Such findings are enabled by our model design which allows for different levels of network activities by introducing different time phases. Table 7 : Estimates of link activation and deletion rates for different link types in the two phases ( T 0 spans from week 2 to week 6, and T 1 from week 7 to week 10). Link type T 0 act. T 1 act. T 0 del. Through our data analysis, we have found that proper hand-washing is significantly associated with reduced risk of flu infection, and that there is a considerable external force of infection for the study population. Moreover, study participants exhibit adaptive contact behavior to flu transmission in that contact between healthy and infectious individuals is less frequent and also lasts less time compared to contact between healthy individuals. Our model has also identified differential inter-personal contact patterns in the two observational periods before and after the school break. These findings are consistent with our intuition and are now quantified statistically within a joint inferential framework.

In this paper, we present a continuous-time Markov chain model for infectious diseases that accounts for individual-level heterogeneity and a dynamic contact structure. Our proposed model can capture the interplay between the epidemic process and network changes, and, more importantly, can describe heterogeneous susceptibility and transmissibility through individual covariates. To accommodate unobserved exposure times and recovery times in real epidemic data, we develop a data-augmented inference procedure based on the stochastic EM algorithm so that we can make use of the complete data likelihood. We also design an efficient method to conditionally sample missing exposure times that are compatible with observed data and respect the dynamic contact network structure. Experiments show that the developed inference procedure performs well on partial data and is able to uncover notable phenomena from modern epidemic data with high-resolution contact tracing.

It is important to note that the modeling framework we propose is flexible beyond our choice of underlying compartments. That is, our approach can be easily adapted to incorporate notions of reinfection (by allowing some individuals to reenter the susceptible population) or to distinguish between more than two types of infections. In pursuing generalizations of the methodology, introducing additional parameters requires careful consideration of the uncertainty quantification from the stochastic EM algorithm. Although we are able to show that in our setting, the estimated confidence intervals perform well empirically compared to their nominal coverage, they rely on variance approximation formulas, and it is crucial to conduct similar validations in more complex models.

Finally, our analysis of the iEpi data provides confirmation of the importance of handwashing on the reduction of the spread of influenza-like-illness. Unlike previous claims in this area, we are able to measure the actual effect on the transmission rate of a disease in an active population with dynamically changing contact patterns. We hope that this development encourages greater data collection of observational high-frequency individual-level data in this area to gain better understanding of other pharmaceutical and non-pharmaceutical interventions. For example, future studies will be able to estimate the effectiveness of vaccination in preventing transmission under different social interaction rates and population densities, as well as validate claims about the efficacy of mask-wearing and active social distancing. Importantly, such data can be collected discretely in closed populations and provide invaluable insight into the deployment of public health interventions (Motta et al., 2021) . Supplement to Likelihood-based Inference for partially observed stochastic epidemics with individual heterogeneity S1 Additional details of the model framework As a supplement to the model framework introduced in Section 2, we may consider individuallevel covariates in the change rates of links in the contact network. Specifically, given the status of the process at time t, Z t , if i and j are not in contact, then they initiate contact with rate α ijt , where

where α ABk represents the baseline link activation rate for link type A − B in phase T k , and b α are the coefficients that describe the additional effects of individual characteristics on the pairwise link activation rate. Similarly, we can introduce the same regression structure to link termination rate ω ijt ,

where b ω are the coefficients that describe the effects of individual characteristics on the pairwise link termination rate. With these additional terms added to the link change rates, the complete data inference procedure described in Section 3.1 has to be updated. First of all, the complete data likelihood now becomes L(Θ; complete data) =β n E γ n R ϕ n I p n Is s (1 − p s ) n Ia i:i got exposed

Here N c i and N c j denote the total number of link activation and termination events that i has been involved in, respectively. Moreover, 1 i−j (t) is an indicator of whether or not i and j are connected at time t.

Then we can see that all the inference steps related to the epidemic parameters shall remain unchanged, but we need modified steps to estimate α, b α , ω, and b ω . Take partial derivatives of the log-likelihood with respect to these parameters and set them to zero,

where d

ABk is the total time i and j spend as a disconnected A-B type pair during T k , and c (ij) ABk is the total time i and j spend as a connected A-B type pair during T k .

Then we can employ the following iterative procedure to solve for α and b α : repeat until convergence or maximum number of iterations is reached: 

where N c ij is the total counts of link activation between i and j and x ij = x i + x j , and then this is equivalent to solving for the linear coefficients of a Poisson regression model with individual offset log k=0,1

ABk . Estimation for ω and b ω can be conducted in almost exactly the same manner as in the steps above for α and b α , so we will omit the details here.

Here we provide details on the numerical iterative procedure for solving the MLEs of paramters β, η and b S by setting the partial derivatives in equations (8) -(10) to zero. Since it is equivalent to operate with e η instead of η, we directly estimate e η in this procedure. Let be the total amount of "pathogen exposure" for person i, which is a function of e η . We run the following steps until convergence or the maximum number of iterations is reached:

where in y i = 1(i ever got exposed), and solving for the objective function (S11) is equivalent to solving for the linear coefficients of a Poisson regression model with individual offset log β + log F i (e η );

• update e η by numerically solving § § 0 = i:i got exposed

Given that the risk λ i (t) is a step function, the normalizing constant C i (t i min , t i max ) in Equation (17) can be explicitly evaluated as (note that λ i (t) ≡ λ j on each interval A j = (t j−1 , t j ))

.

(S13)

If we don't have prior knowledge about the latency period, we should perhaps search for possible exposure time between 0 and the manifestation time. That is, we can adopt the natural choice of L i = [0, t (I) i ], and then the density (17) is simplified into

.

(S14) § § Any built-in solver offered by computational softwares (e.g, "optim" function in R) should work.

Let

be the density function of a truncated inhomogeneous Exponential distribution with rate function λ i (t), and then we would have

and it's straightforward to see that M > 1. This suggests that we can sample exposure time t (E) i from p i (t) via rejection sampling with proposal density q i (t), as described in Section 4.1.

In Figure S1 , we provide a diagram of the epidemic model with external onsets. Here external onset is the transition between S and I, with the E status subsumed. For convenience, when working with the real data, we assume that we have knowledge of which cases are internal infection cases and which are external ones. ¶ ¶ Denote the former set of cases by I (int) and the latter by I (ext) . Now the set of parameters are extended toΘ = {β, ϕ, γ, η, b S , ξ, b E , α, ω}, and the complete data likelihood (7) should be modified intõ

In addition to the notation explained in Section 2, here n (int) I , n (ext) I denote the total number of infection (manifestation) cases due to internal and external sources, respectively. Again, similar to previous definitions, for convenience, if an individual i never got infected, then we set t

Examining the above expression, it is obvious that estimation of all previously existing parameters are not really impacted, except that all formulas related to β, b S , η and ϕ should be restricted to internal infection cases instead of all infection cases. Therefore, equations (12) -(15) shall remain completely unchanged, and the rest of the partial derivatives should be ¶ ¶ Details on how we identify internal and external cases are discussed in Section 6.2. modified slightly, on top of adding two more for ξ and b E :

Same as described in Section 3.1, solving for MLEs requires setting all partial derivatives to zero, which leads to an updated inference method that includes all steps in Section 3.1, with an additional iterative procedure that runs until convergence or the maximum number of iterations is reached:

where z i = 1(i ∈ I (ext) ), and (very similar to our treatment to b S in (S11)) this is equivalent to solving for the linear coefficients of a Poisson regression model with individual offset log ξ + log t (I)

i . Therefore, with this updated inference procedure for maximum likelihood estimation from complete data, we can run the inference algorithm stated in Section 4 on the real data, where epidemic observations are incomplete. Note that the addition of the external infection sources doesn't affect conditional sampling of the missing exposure times or recovery times, so Step 1 and Step 2 of the inference scheme remains unchanged, while only Step 3 is modified to the procedure discussed above.

The following prompt is copied verbatim from the original eX-FLU survey:

The next section asks questions about a possible outbreak of pandemic flu in the US-a new type of flu that spreads rapidly among humans and causes severe illness. Imagine that a lot of people were getting very sick from the flu and the flu was spreading rapidly from person to person when you answer the following questions. During an outbreak, would you voluntarily make the following changes to your life? (Select Yes (1) S1. Additional details of the model framework. As a supplement to the model framework introduced in Section 2, we may consider individual-level covariates in the change rates of links in the contact network. Specifically, given the status of the process at time t, Z t , if i and j are not in contact, then they initiate contact with rate α ijt , where (S1)

where α ABk represents the baseline link activation rate for link type A − B in phase T k , and b α are the coefficients that describe the additional effects of individual characteristics on the pairwise link activation rate. Similarly, we can introduce the same regression structure to link termination rate ω ijt ,

where b ω are the coefficients that describe the effects of individual characteristics on the pairwise link termination rate.

With these additional terms added to the link change rates, the complete data inference procedure described in Section 3.1 has to be updated. First of all, the complete data likelihood now becomes L(Θ; complete data) =β nE γ nR ϕ nI p nI s s (1 − p s ) nI a i:i got exposed

Here N c i and N c j denote the total number of link activation and termination events that i has been involved in, respectively. Moreover, Then we can see that all the inference steps related to the epidemic parameters shall remain unchanged, but we need modified steps to estimate α, b α , ω, and b ω . Take partial derivatives of the log-likelihood with respect to these parameters and set them to zero,

ABk e (xi+xj) T bα (for k = 0, 1, (A, B) ∈ S), (S4)

ABk is the total time i and j spend as a disconnected A-B type pair during T k , and c (ij) ABk is the total time i and j spend as a connected A-B type pair during T k .

Then we can employ the following iterative procedure to solve for α and b α : repeat until convergence or maximum number of iterations is reached: 

where N c ij is the total counts of link activation between i and j and x ij = x i + x j , and then this is equivalent to solving for the linear coefficients of a Poisson regression model with individual offset log k=0,1

ABk . Estimation for ω and b ω can be conducted in almost exactly the same manner as in the steps above for α and b α , so we will omit the details here.

S2. Details of complete data inference. Here we provide details on the numerical iterative procedure for solving the MLEs of paramters β, η and b S by setting the partial derivatives in equations (8) -(10) to zero. Since it is equivalent to operate with e η instead of η, we directly estimate e η in this procedure. Let F i (e η ) = T 0 (I a i (t) + I s i (t)e η )1(i susceptible at t)dt be the total amount of "pathogen exposure" for person i, which is a function of e η . We run the following steps until convergence or the maximum number of iterations is reached:

;

• update b S by solving 0 = i:i got infected

where in y i = 1(i ever got exposed), and solving for the objective function (S11) is equivalent to solving for the linear coefficients of a Poisson regression model with individual offset log β + log F i (e η ); • update e η by numerically solving * (S12) 0 = i:i got exposed

S3. Derivations of the rejection sampler for missing exposure times. Given that the risk λ i (t) is a step function, the normalizing constant C i (t i min , t i max ) in Equation (17) can be explicitly evaluated as (note that λ i (t) ≡ λ j on each interval A j = (t j−1 , t j ))

(t j − t j−1 ) 1(ϕ=λj) e (ϕ−λj)tj − e (ϕ−λj)tj−1 ϕ − λ j

. (S13)

If we don't have prior knowledge about the latency period, we should perhaps search for possible exposure time between 0 and the manifestation time. That is, we can adopt the natural choice of L i = [0, t (I) i ], and then the density (17) is simplified into p i (t | t (I) i , β, δ i , η, ϕ, network events)

be the density function of a truncated inhomogeneous Exponential distribution with rate function λ i (t), and then we would have

i ) * Any built-in solver offered by computational softwares (e.g, "optim" function in R) should work. This suggests that we can sample exposure time t (E) i from p i (t) via rejection sampling with proposal density q i (t), as described in Section 4.1.

S4. More details of the real data analysis.

S4.1. Complete data inference with external infection cases. In Figure S1 , we provide a diagram of the epidemic model with external onsets. Here external onset is the transition between S and I, with the E status subsumed. For convenience, when working with the real data, we assume that we have knowledge of which cases are internal infection cases and which are external ones. † Denote the former set of cases by I (int) and the latter by I (ext) . Now the set of parameters are extended toΘ = {β, ϕ, γ, η, b S , ξ, b E , α, ω}, and the complete data likelihood (7) e b T S xi [I a i (t) + I s i (t)e η ] 1(i is susceptible at t) + γ(I a (t) + I s (t)) + ϕE(t) dt (S16)

In addition to the notation explained in Section 2, here n (int) I , n (ext) I denote the total number of infection (manifestation) cases due to internal and external sources, respectively. Again, similar to previous definitions, for convenience, if an individual i never got infected, then we set t

Examining the above expression, it is obvious that estimation of all previously existing parameters are not really impacted, except that all formulas related to β, b S , η and ϕ should be restricted to internal infection cases instead of all infection cases. Therefore, equations (12) -(15) shall remain completely unchanged, and the rest of the partial derivatives should be modified slightly, on top of adding two more for ξ and b E :

Same as described in Section 3.1, solving for MLEs requires setting all partial derivatives to zero, which leads to an updated inference method that includes all steps in Section 3.1, with an additional iterative procedure that runs until convergence or the maximum number of iterations is reached:

where z i = 1(i ∈ I (ext) ), and (very similar to our treatment to b S in (S11)) this is equivalent to solving for the linear coefficients of a Poisson regression model with individual offset log ξ + log t (I) i . Therefore, with this updated inference procedure for maximum likelihood estimation from complete data, we can run the inference algorithm stated in Section 4 on the real data, where epidemic observations are incomplete. Note that the addition of the external infection sources doesn't affect conditional sampling of the missing exposure times or recovery times, so Step 1 and Step 2 of the inference scheme remains unchanged, while only Step 3 is modified to the procedure discussed above.

S4.2. Survey questions used to derive individual characteristic "change_behavior". The following prompt is copied verbatim from the original eX-FLU survey:

The next section asks questions about a possible outbreak of pandemic flu in the US-a new type of flu that spreads rapidly among humans and causes severe illness. Imagine that a lot of people were getting very sick from the flu and the flu was spreading rapidly from person to person when you answer the following questions. During an outbreak, would you voluntarily make the following changes to your life? (Select Yes (1) or No (0). )

Mask use, hand hygiene, and seasonal influenzalike illness among young adults: a randomized intervention trial

Design and methods of a social network isolation study for reducing respiratory infection transmission: The eX-FLU cluster randomized trial

Impact of a comprehensive workplace hand hygiene program on employer health care insurance claims and costs, absenteeism, and employee perceptions and practices

Transmission of pneumococcal carriage in families: a latent Markov process model for binary longitudinal data

Stochastic epidemic models: a survey

Supplement to "likelihood-based inference for partially observed stochastic epidemics with individual heterogeneity

Likelihood-based inference for partially observed epidemics on dynamic networks

Likelihood-based estimation of continuous-time epidemic models from time-series data: application to measles transmission in london

S. pneumoniae transmission according to inclusion in conjugate vaccines: Bayesian analysis of a longitudinal follow-up in schools

The sem algorithm: a probabilistic teacher algorithm derived from the em algorithm for the mixture problem

Using real-world contact networks to quantify the effectiveness of digital contact tracing and isolation strategies for covid-19 pandemic

A stochastic em algorithm for approximating the maximum likelihood estimate

Graph-coupled HMMs for modeling the spread of infection

Fitting birth-death processes to panel data with applications to bacterial dna fingerprinting

Contact tracing and disease control

Hierarchical graph-coupled HMMs for heterogeneous personalized health data

A unifying variational inference framework for hierarchical graph-coupled HMM with an application to influenza infection

Impact of non-pharmaceutical interventions (npis) to reduce covid19 mortality and healthcare demand

Efficient data augmentation for fitting stochastic epidemic models to prevalence data

Exact stochastic simulation of coupled chemical reactions

Stochastic modeling of scientific data

Direct likelihood-based inference for discretely observed stochastic compartmental models of infectious disease

Birth/birthdeath processes and their computable transition probabilities with biological applications

Simulation from endpoint-conditioned, continuous-time Markov chains on a finite state space, with applications to molecular evolution

Outbreaks of streptococcus pneumoniae carriage in day care cohorts in finland-implications for elimination of transmission

Intensified hand-hygiene campaign including soap-and-water wash may prevent acute infections in office workers, as shown by a recognized-exposure-adjusted analysis of a randomized trial

Impact of health campaign on hand hygiene with alcohol-based hand rubs in a non-clinical setting

Sequential monte carlo algorithms for agent-based models of disease transmission

A contribution to the mathematical theory of epidemics

Infectious disease control using contact tracing in random and scale-free networks

The engines of sars-cov-2 spread

Finding the observed information matrix when using the em algorithm

To quarantine, or not to quarantine: A theoretical framework for disease control via contact tracing

Assessment of simulated surveillance testing and quarantine in a sars-cov-2-vaccinated population of students on a university campus

Covid-19 superspreading suggests mitigation by social network modulation

The stochastic em algorithm: estimation and asymptotic results

Fast mcmc sampling for markov jump processes and extensions

Stochastic population processes: analysis, approximations, simulations

Hand washing with soap and water together with behavioural recommendations prevents infections in common work environment: an open cluster-randomized trial

Global transmission network of sars-cov-2: From outbreak to pandemic

Modelling strong control measures for epidemic propagation with networks-a covid-19 case study

Household sars-cov-2 transmission and children: a network prospective study

Hand hygiene performance and beliefs among public university employees

Outcomes of a pilot hand hygiene randomized cluster trial to reduce communicable infections among us office-based employees

Computational tools for assessing gene therapy under branching process models of mutation

The healthy workplace project: results of a hygiene-based approach to employee wellness

Scalable bayesian inference for coupled hidden markov and semi-markov models

Likelihood-based inference for discretely observed birth-death-shift processes, with applications to evolution of mobile genetic elements

Efficient transition probability computation for continuoustime branching processes via compressed sensing

Cancel plans with friends, other students, or family members

Avoid busy public places-e.g., shopping areas, movie theaters, restaurants

Cancel inter-state or international travel plans

Avoid public transportation, including University buses

Stay in your residence hall room if you were feeling sick

Stay in your residence hall room to avoid contact with sick people

Reduce contact with people outside of your residence

Wear a face mask when out in public

Be absent from class

Be absent from work

Assist sick neighbors or friends by bringing them food or supplies

Survey questions used to derive individual characteristic "prevention" The following prompt is copied verbatim from the original eX-FLU survey: Do you believe that the following practices reduce your risk of catching flu?

Reducing the number of people you meet over a day

Avoiding public transportation, including university buses

Cleaning or disinfecting things you might touch

Washing your hands regularly with soap and water

Wearing a face mask when out in public

Avoiding hospitals or doctors' offices

Cancel plans with friends, other students, or family members

Avoid busy public places-e.g., shopping areas, movie theaters, restaurants; 3. Cancel inter-state or international travel plans

Stock up on food and/or other necessities

Avoid public transportation, including University buses

Stay in your residence hall room if you were feeling sick

Stay in your residence hall room to avoid contact with sick people

Reduce contact with people outside of your residence

Wear a face mask when out in public; 10. Be absent from class

Be absent from work

Assist sick neighbors or friends by bringing them food or supplies

Survey questions used to derive individual characteristic "prevention". The following prompt is copied verbatim from the original eX-FLU survey: Do you believe that the following practices reduce your risk of catching flu?

Avoiding public transportation, including university buses

Cleaning or disinfecting things you might touch

Washing your hands regularly with soap and water

Wearing a face mask when out in public

Avoiding hospitals or doctors' offices