key: cord-0972744-zl01i278 authors: Nourinejad, Mehdi; Berman, Oded; Larson, Richard C. title: Placing sensors in sewer networks: A system to pinpoint new cases of coronavirus date: 2021-04-08 journal: PLoS One DOI: 10.1371/journal.pone.0248893 sha: 11abd328163ba297211d1cbf2a3e674f87fc4021 doc_id: 972744 cord_uid: zl01i278 We consider a proposed system that would place sensors in a number of wastewater manholes in a community in order to detect genetic remnants of SARS-Cov-2 found in the excreted stool of infected persons. These sensors would continually monitor the manhole’s wastewater, and whenever virus remnants are detected, transmit an alert signal. In a recent paper, we described two new algorithms, each sequentially opening and testing successive manholes for genetic remnants, each algorithm homing in on a neighborhood where the infected person or persons are located. This paper extends that work in six important ways: (1) we introduce the concept of in-manhole sensors, as these sensors will reduce the number of manholes requiring on-site testing; (2) we present a realistic tree network depicting the topology of the sewer pipeline network; (3) for simulations, we present a method to create random tree networks exhibiting key attributes of a given community; (4) using the simulations, we empirically demonstrate that the mean and median number of manholes to be opened in a search follows a well-known logarithmic function; (5) we develop procedures for determining the number of sensors to deploy; (6) we formulate the sensor location problem as an integer nonlinear optimization and develop heuristics to solve it. Our sensor-manhole system, to be implemented, would require at least three additional steps in R&D: (a) an accurate, inexpensive and fast SARS-Cov-2 genetic-remnants test that can be done at the manhole; (b) design, test and manufacture of the sensors; (c) in-the-field testing and fine tuning of an implemented system. The Coronavirus disease (COVID-19, caused by the virus SARS-CoV-2) is highly infectious and dangerous. The best disease control is infection avoidance: having all members of society wear masks, do social distancing, wash hands frequently and undertake other beneficial behavioral and hygienic steps. Despite controls, some people will become infected. A newly infected person may show no symptoms and not even be aware of his/her infection for up to one or even two weeks; during this initial period, the newly infected individual may become highly contagious. Such asymptomatic infectiousness is one of the confounding properties of the Any municipal sewage system is a network of sewage-transporting pipes leading from all the possible originating sources in the community to the final destination, the WTP. This network has no cycles. It is a tree network whose topology is known and that we exploit in our work. We describe this network in detail in Section 2.2. Our dynamic algorithmic work focuses on manual sampling and testing the sewage from manholes, each a "node" in the sewer system tree network. A key observation is this: If the sewage from a given manhole tests positive for SARS-CoV-2, then there are one or more infected individuals upstream from that manhole. If that sewage tests negative, then no individual upstream from that manhole is infected. This is assuming perfect testing, an assumption we use throughout. We start with a community that is known to be devoid of Coronavirus. While there may have been infections in the past, "today"-via recent testing-there are no known infections. Then one day at the WTP, a light flashes red: Remnants of Coronavirus are found today in the community's sewage. Chances are that the person infected is not even aware of it, due to the typical asymptomatic one-week delay from the infection event until the onset of symptoms. Yet the individual, even before symptoms, soon may become highly infectious, endangering all around him/her. Just like the student dormitory testing, the challenge is to identify the infected individual as soon as possible and move him/her into supportive isolation, thereby avoiding infection of others. Follow up with contact tracing for all of those who were in contact with the infected individual. But suppose the community we are speaking of has 35,000 residents! We cannot test them all. We need to narrow down greatly the number of possible persons who could be infected into a small sub-population, so that it is feasible to test each in the sub-population. We do that by sequentially testing manholes for Coronavirus. Suppose we pick as the first manhole to test one where the average sewage flow in the main pipe is about one half of its average maximum flow, the maximum occurring at entrance to the WTP. If that selected manhole tests negative, then no one upstream from this 50%-flow point (usually about 50% of residents) is infected, but someone downstream is! If that manhole tests positive, then someone upstream is infected and assuming only one person in the entire community is infected, we continue our search upstream. This process continues, each time selecting an unopened manhole that is near the "upstream-downstream" 50%-flow of remaining suspect manholes, and each time finding the direction of the source of the infection. Due to the discarding of about half of the remaining flow each time, the search space is approximately cut in half at each iteration. This means that for a community with 128 manholes, each with equal flows, we would only need about seven manhole openings and tests, since 128 = 2 7 . Our work is more general than this simple description, as we allow for spatial heterogeneity in the sources for flows. The algorithm terminates when we have identified the "Source manhole" and know with certainty that the virus is coming from nearby sources feeding directly into this manhole. The Source manhole is the first in the system to receive sewage from the residence of Newbie. The suspect area is now the immediate sewage "catchment zone" of the Source manhole, usually part of a city block with four to eight homes feeding the sewer system into this most-upstream positive-tested manhole. If we are successful, we have reduced our population to test from 35,000 to the number of people in those homes. The U.S. Census Bureau reports that the average U.S. suburban home has 2.7 occupants [10] . For example, if we have eight homes to test, then we have only about 22 people to test. After testing and identification of the infected person(s) (who will immediately be isolated and treated), it is likely that all other individuals in the residence of the infected person(s) would be quarantined, and contact tracing would commence. We reduce the size of the problem in a community of tens of thousands to the size of university dormitory problem, a small university dormitory! We have just described the basis for the "Tributary Search Algorithm" of our previous PLOS ONE paper, referred to there as "Algorithm 1." The earlier paper also had an "Algorithm 2," used to home in on an infection 'hot spot' in a community that has a significant number of people infected. Some sewer systems are too large to complete all manual manhole tests in one day. According to public works experts with whom we have consulted, assuming that fast on-scene tests are available, each test would require about one hour, start to finish. That means that eight manholes tests in one day is about the maximum. Since we seek speed in identifying the infected individual(s), we want to assure that the testing can be done in one day. In this new paper, we do that by strategically placing semi-permanent sensors in selected manholes, sensors that would in real time relay any "Red Light" reporting coronavirus infection in sewage passing through that manhole. From the point of view of our sewage testing for COVID-19, each such implanted sensor acts as a local WTP. This second paper addresses that problem: How many sensors do we need so that all manhole tests could be done in one working day? And where do we place such sensors to have maximal beneficial effect? The other reason for a second paper is the need for system realism. If our methods are to be used in practice, our model of the sewer network needs to be much more realistic than the initial prototype described in our earlier paper. The idea of in-place sensors to detect contaminants has a rich history, both in sewage systems and in municipal water distribution systems. Beyond water and sewage systems, the field of Operations Research has a deep research history focused on determining the optimal number and locations of "flow-intercept" facilities such as sensors on networks [11, 12] . Starting with water systems, we find that in-system sensor placement has a literature going back decades. Since water is an essential fluid for sustaining life, the focus of the sensor system is on discovering quickly any unwanted contaminant intrusions into the system. Since water flows at finite speeds, placing sensors strategically within the system reduces time until detection. The mathematical analysis is often made difficult by the need to model the nonlinear system hydraulics within the water system graph. This complication results in need to expand beyond analytical models and algorithms, and to include complex simulations. An extensive and comprehensive recent overview of methods is given in Oluwaseye et al. [13] . The authors describe the approaches used and problems encountered by hundreds of authors, over a span of 50 years. The analytical methods include virtually all the same methods used by operations researchers when they study network-based systems. These include optimization algorithms, both exact and heuristic, deterministic and stochastic simulations, graph theory and more. Because of their liquid environment, the great majority of authors also must contend with modeling system hydraulics, and as mentioned above, this hugely complicates analysis and makes difficult the solution to even moderate-sized problems. An important paper not cited by Oluwaseye et al. [13] is one by Giudicianni et al. [14] . In this paper, the authors simplify the problem by not requiring any hydraulic modeling of the water distribution system, focusing only on the topology of the distribution network, mathematically a graph. They look for neighborhoods within the graph for "centrality," surrogates for so-called optimal solutions. Their graph-theoretic intuitive approach has attributes in common with ours and with many operations research papers dealing with locations on graphs. Despite the myriad papers written on the subject of sensors in water distribution networks, significant analysis problems remain for most researchers: excessive computation time for large water-distribution networks, as the problem is described mathematically as NP hard [15] ; no agreement on the best objective function; convergence only to an approximately optimal solution; and lack of significant reported impact in the field. But the field of sensors in water distribution systems is projected to grow significantly in the years ahead, perhaps to over $2 billion by 2030, when added with use of sensors in wastewater networks [16] . Because of increased need, sensor placement in water systems remains a very active research topic, and we expect to see significant advances both in theory and application in the years ahead. Switching to sewage system applications, the historical problem is not life-threatening pollution as found with water systems, but rather continual monitoring of system flows and content to assure uninterrupted safe operation of wastewater treatment plants. In combined systems that receive storm drainage as well as usual sewage, the volume of the extra input could exceed the capacity of the wastewater treatment plant, resulting in raw effluents being spilled directly into public waters. In-system sensors can detect differences in usual flow composition and alert authorities to possible downstream complications. Sewage systems can also experience accidental spills as well as deliberate unauthorized discharges (e.g., industrial effluent), and these may also affect negatively the operation of the wastewater treatment plant. Sensors are used to detect such intrusions. Finally, as will be discussed later in this paper, the most recent intense efforts directed towards the monitoring of sewage flows deals with detecting remnants of Coronavirus in the population served by the sewage system. Regarding use of mathematical modeling to assist planners in sensor placement, the recent paper by Banik et al. [17] is representative of the class of papers using a mathematical modeling and optimization approach. The focus is to place sensors in the system to detect unwanted and possibly dangerous liquid intrusions of various types. The authors seek to locate a number of sensors within the sewage system network to achieve a combination of objectives, the main one being "time until detection." The decision variables are N, the number of sensors, and their locations. After developing their heuristic algorithmic approaches, they illustratively apply their model and algorithms to a small town in Italy. No actual implementations are reported. And the resolution of the location of the unwanted intrusion is limited to the number and placements of the sensors. There is no additional locational step after identifying the location of the first sensor to flag a new intrusion. In a second paper, B.K. Banik et al. [18] treat various modifications to the above problem, again applying simulation-algorithmic methods to that small town in Italy. The authors admit that computation time can become enormous for all but the smallest of problems. We should also mention a third paper by Banik et al. [19] . There are many additional sensor-focused papers cited in the three publications above, with heuristic problemsolving approaches similar to those described here. Finally, we need to conclude our brief overview with the fact that placing semi-permanent sensors in sewage systems is fraught with practical difficulties [20] . As these considerations may affect the success of implementing our proposed sensor placement methodology, we discuss these issues towards the end of the paper. We now describe our new modeling of the tree network that depicts the connected set of wastewater underground pipes in any community. Fig 1 depicts a portion of the map of the sewer system of the Town of Belmont, Massachusetts [21] . The entire system, for a town of 26,000 residents covering 4.7 square miles, has 77 miles of sewer pipes and about 2,030 manholes. The green line segments in the figure depict underground sewer pipes of the network. The green circles located at various points on the green sewer lines depict manholes, the eventual source of data for our algorithms. The adjacent town Lexington, Massachusetts (population 34,000, 16.5 square miles) has 4,924 manholes distributed of a 171-mile sewer pipeline network [22] . A typical American suburban town may have 50 to several hundred miles of sewer pipes, all leading to the local WTP, and between 200 and 5,000 manholes. The number of sewer manholes nationwide is about 12 million, averaging approximately 300 feet apart [23] . The sewage tree network has at least three types of pipes, each with an ascending diameter for increasing flow conditions. The three pipe types are branch sewer, main sewer and trunk sewer. The branch sewer, serving a limited number of buildings in a small geographic area collects sewage from buildings' sewers and conveys it to a main or trunk sewer. A main sewer collects sewage from two or more branch sewers acting as tributaries. The trunk sewer conveys sewage from many main and branch tributary sewers over large areas to the WTP. The map shown in Fig 1 is typical of the sewer system maps we would see for most communities in the U.S. We use the figure to guide us on modeling the tree network of the sewer system. Our modeling involves two highly similar yet different networks: (1) the network of a community's streets and (2) the sewage pipeline tree network, all pipes assumed to be under streets. For the street network, nodes are the street intersections and links are the road segments connecting adjacent street intersections. For the sewage pipeline network, nodes are manholes and links are the sewage pipes connecting adjacent manholes. Since the under-streets sewage pipeline network is a tree network and the road network is not, it should be clear that the total length of links in the sewage pipeline network is less than the total length of roads in the street network. Our modeled sewage system tree network has the following additional four attributes: 1. Every street intersection contains a manhole, a sewer network node. 2. There are two types of street intersections: The three-way or T-intersection and the fourway or Crossroads intersection. With the exception of sewage pipe "dead ends," we assume that the underground sewer pipes follow the street geometries above ground. Underground, the T-intersection has two sewer pipes flowing into a third pipe; the Crossroads intersection has three sewer pipes flowing into a fourth. 3. A dead-end sewer line, possibly but not necessarily under a dead-end street segment, will have a manhole at its termination point. 4. The majority of manholes are not at street intersections or street dead ends, but on the interiors of street segments. For simplicity, we will space them evenly along the interiors of streets, with mean distance between adjacent manholes equal to the community's empirical mean (typically 150 to 300 feet). A study of the map in Fig 1 verifies that our descriptions above summarize the key network properties of the sewer pipeline system. As always in modeling, there will be some exceptions, such as an intersection with more than one manhole. We seek not 100% accuracy but sufficient detail to capture the key attributes of the system. The catchment zone of a manhole "X" is the small collection homes, apartments, businesses and other sewage-producing facilities whose sewage encounters manhole X first. Each manhole has it 'own' catchment zone. Over the community being modeled, catchment zones are mutually exclusive and collectively exhaustive. Since the majority of manholes are in the interior of city or town blocks, most catchment zones comprise a small number of contiguous structures in the interior of the block. But catchment zones can be more complex, as shown in Fig 2, with the (red) Source manhole located at a Crossroads intersection. The infected "Newbie" individual could be located in any of the three upstream directions from the source manhole: North, West or South. In this instance, the catchment zone of the Source manhole is the union of the three upstream small neighborhoods adjacent to the Crossroads intersection. Similar considerations apply to T-intersections, and we treat them in analogous manner. We wish to model parametrically the random tree structure of municipal sewer networks. There are two motivations for such models. The first relates to a community that may be considering our approach to homing in on COVID-19 infections. Prior to a detailed analysis of the community's sewer system, local system planners may want to obtain provisional results indicating approximate benefits (and costs) of deploying one or more sensors within the sewer system. They would like to do such an analysis without having to devote considerable time and effort obtaining and loading into a computer database the detailed map of the community's sewer system. Of course, if they already had such a detailed digital map, the modeling step would not be necessary. The second motivation for modeling is our need to test heuristic algorithms for determining the required number of sensors and their optimal or near-optimal locations. There are countless ways to create random tree graphs. We design a new process employing key attributes of the true system map. We describe these attributes probabilistically, as an empirically estimated cumulative distribution function (CDF) for street segment lengths and as an empirically estimated probability mass function (PMF) for end-of-street topologies in the simulation. Our approach is to simulate a community multiple times, then analyze results for their decision implications. We assume that each run of the simulation has a targeted size of the resulting tree graph, where size is given as J, the desired number of manholes in the tree. Recall that manholes will be placed at each street (graph) intersection, at the end point of a sewer pipe dead end, and in the interior of each street segment, spaced evenly apart at a pre-specified distance, D. In practice, it may not be possible to terminate the tree growth precisely at the desired point J, but more likely at a few manholes beyond this point. In building out the tree graph, always moving upstream, away from the WTP, we create a FIFO (First-In, First-Out) ordered list O of new pipeline segments to be created. When the simulation commences, O contains one pipeline segment to be created, the one emanating from the WTP. The list O builds and declines over the course of the simulation. If it becomes empty during the simulation and the desired number of manholes J has not been obtained, that simulation run is discarded and a new one is run to replace it. More formally, we seek: • The CDF F L (x) of the lengths of above-ground street segments, where F L (x) � fraction of street segments less than or equal to x in length. The distance units are in feet. • The PMF P 0 (n) of the end-of-sewer-pipe segment tree growth outcomes, where P 0 (n) � Fraction of end-of-sewer-pipe tree growth outcomes that are outcome n. There are three possible outcomes: (1) this sewer-pipe segment is a dead end, so this branch of the pipeline tree terminates any further growth of the tree from this point; (2) this sewer-pipe segment terminates under a T street intersection, so two new sewer branches will emanate from this point; (3) this sewer-pipe segment terminates under a Crossroads street intersection, so three new sewer pipeline branches will emanate from this point. Note: Manholes that are to be evenly spaced in the interior of a street segment are placed there as soon as the corresponding sewer-pipe segment outcome becomes known. That is, first build the pipe under the street from street intersection to adjacent street intersection, then populate the street with evenly spaced interior manholes. The simulation logic is shown in Box 1. Returning to Belmont, Massachusetts, an eyeball inspection of the sewer pipelines indicates 12 dead ends, 18 T-intersections and only one Crossroads-intersection. This sums to 31 endof-segment outcomes. Thus, we have P 0 1 ð Þ ¼ 12 We describe our generalized "Tributary Search Algorithm" that seeks to find the source manhole. In contrast to our earlier paper [5] , the generalization is required due to the more complex and realistic sewer network tree structure at street intersections, as described earlier. Here 2. End-of-street growth outcome: Determine the tree growth outcome at the end of the just-created street segment by Monte Carlo sampling from P 0 (n). If n = 1, we terminate growth of the tree from this street segment end point and we place a manhole at the dead end; if n = 2 (a T-intersection), we commit to adding two new branches to the end of the current one and we place a manhole at the intersection; if n = 3 (a Crossroads intersection), we commit to adding three new branches to the end of the current one, and we place a manhole at the intersection. Update the sum S of manholes in the tree. The new branches to add are placed at the end of the set O. 3. If S�J, STOP, simulation complete; else, Continue. 4. Simulation to continue? If we found a dead end above and there are no more new segments to create from O, Terminate this simulation and start over. Else Continue, back to 1. 5. Add next street segment. Select the first entry from O and move to Step 1. Subtract this entry from O. we are applying Algorithm 1, seeking a single infected person (Newbie) in an otherwise uninfected community. Algorithm 2, seeking hot spots, has a similar logic. Consider a directed tree network G(M, A) with node set M and link set A in which the nodes are manholes, and the links are the sewage pipelines connecting the |M| manholes. All sewage flows terminate at a single WTP node where sewage is tested and treated. Each manhole i2M has an associated Bayesian probability ρ i of being the source manhole, i.e., the closest downstream manhole to Newbie, i = 1, 2, . . ., |M|. In practice, the Bayesian probabilities need to be sensibly estimated, using such inputs as the number of bedrooms in the various residences and/or neighborhood demographic information. Whenever the sewage flow of any manhole is sampled, we will know if Newbie is located upstream or downstream of the sampled manhole. If the test is positive, the infection originated upstream, and if the test is negative, the infection originated downstream. The algorithm first computes Bayesian probability flows at each manhole i, reflecting the summed Bayesian probability that the infection is upstream of manhole i. Let δ ij be a binary indicator equal to one if node j is upstream of node i, and zero otherwise. The Bayesian probability flow entering manhole i is ∑ j δ ij ρ j . We seek first to sample that manhole with a Bayesian flow probability closest to 1/2. If the test is negative, we discard the sub-graph upstream of the tested manhole, and if the test is positive, we discard the downstream sub-graph. The discarded sub-graph, upstream or downstream, represents about 50% of the original Bayesian probabilities. We then renormalize the Bayesian probabilities and their flows in the surviving sub-graph. In that sub-graph, we again look for a node to test that has a (revised) Bayesian flow probability closest to 1/2. The algorithm continues in this exponentially convergent way, each time discarding a sub-graph having about 50% of the (updated) Bayesian probabilities. As described earlier, the source manhole is the most upstream manhole exhibiting remnants of coronavirus. The algorithm terminates when there is a single manhole left with an updated Bayesian probability of 1. This manhole is the Source manhole. The algorithm generalization involves manholes at intersections, which will have two or possibly three immediately upstream manholes, one on each of the upstream intersecting streets. For an intersection manhole to be the source manhole, it must test positive and each of the two (or three) immediately upstream manholes must test negative, being clear of coronavirus remnants. We now assess the performance of the Tributary Search Algorithm. We utilize Monte Carlo simulation with random trees of various sizes, with respective numbers of manholes equal to powers of two: 16, 32, 64, 128, 256, and 512. We use the street segment CDF for Belmont Massachusetts in Fig 3 and consider an average manhole spacing of D = 200 feet. We consider a branching probability of P 0 (1) = 0.61 for dead-ends, P 0 (2) = 0.28 for T-intersections and P 0 (3) = 0.11 for crossroads. To avoid generating trees that do not reach the required number of |M| manholes, we outlaw dead ends until there are at least |M|/4 manholes in the tree generated thus far. To generate the Bayesian probabilities, we assign a random number, uniformly and independently generated, to each catchment zone and then normalize their sum to one. In this process, we treat crossroad intersections as having three catchment zones and T-intersections as having two catchment zones. We choose the source manhole from a Monte Carlo sampling of the Bayesian probabilities. We then apply the (Generalized) Tributary Search Algorithm and count the number of manhole samples required to find the source manhole. figure also shows the minimum and maximum number of samples. As an example, for 64 manholes, the minimum and maximum number of samples are 5 and 8, respectively. The red boxes in Fig 4 represent the medians of the number of samples. We see that all medians are exactly equal to log 2 |M|. The 50% cut at each iteration of the Tributary Search Algorithm seems to suggest an exponentially convergent process. According to Fig 4, the average number of samples of our Tributary Search Algorithm is similar to the number of iterations required of the binary search algorithm, which is log 2 |M| for the case of equal Bayesian probabilities in a straight-line network with |M| consecutive manholes [25] . This is an important finding because it allows us to use a closed-form expression of the number of samples for a given network with |M| manholes. This expression can be used to find the required number of sensors in the network if one wants to keep the number of samples below a threshold and ultimately enables formulating the sensor location problem as a mathematical model for which we can develop solution algorithms. It also allows analyzing the structural properties of the sensor location problem including robustness to perturbations in sensor locations. Moving forward, for notational convenience, we omit the number (base) 2 in log 2 (). As mentioned earlier, in Larson et al. [5] we provide two algorithms: one for finding the single source of contamination and one for finding the hot spot (the neighborhood with the highest level of contamination). It was shown in Larson et al. [5] that the second algorithm (finding the hot spot) is a simple extension of the first and the approximation of The two objectives of our overall problem are to find the appropriate number of sensors and their respective locations in the wastewater sewer network. Avoiding further spreading of the virus requires finding the source manhole quickly, in as few manhole samples as possible, preferably in the course of one day. The samples cannot be taken in parallel because the result of one sample guides to the location of the next sample to be taken; the inherent dependencies between consecutive samples elongate the process. We will require that a maximum expected value of T samples is allowed for avoiding unacceptable delays. Usually, if T�8, convergence to the source manhole can occur in one working day. While it is intuitive that more sensors will reduce the number of manhole samples, the explicit relationship between the number of sensors and the number of manhole openings requires some analysis. We first introduce the concept of an entry set which is fundamental to the description of the problem. A manhole is included in the entry set of a sensor if that sensor is the first to detect the presence of virus originating from that manhole. Analogous to catchment zones, the entry sets are mutually exclusive and collectively exhaustive. The WTP, always equipped with virus detection technology, has its own entry set. Thus, if our algorithms install S sensors, there are a total of S+1 entry sets in the system. We now develop a "back-of-the-envelope" approximation for the performance of the system as a function of the number of sensors deployed. Suppose S sensors are located in a wastewater network with |M| manholes, each with identical Bayesian probability, 1/|M|. Let Q(S) be the expected number of manhole samples required to find the source manhole. Suppose an equal number of manholes |M|/(S+1) are allocated to each entry set. Therefore, the expected number of samples can be written as Recalling that a maximum of T samples is allowed, we require log jMj ðSþ1Þ � T. System planners would prefer to have the smallest possible number of sensors, S � , so that now we require dlog jMj ðS � þ1Þ e ¼ T. From this, we have a fundamental relationship, We see that S � increases linearly with size of the network as measured by |M|, and decreases exponentially with the allowable number of samples T. Fig 5 shows this relationship. As an example, for 2,000 manholes and T = 8 manual samples, we need S � ¼ d 2000 2 8 À 1e = 7 sensors. There are sharp diminishing returns to scale in the impact of sensors. By adding S sensors to a system having no sensors, from (1) we reduce the expected number of manhole samples by D ¼ Q 0 ð Þ À Q S ð Þ ¼ logjMj À log jMj ðSþ1Þ ¼ logðS þ 1Þ. Accordingly, if we deploy one sensor, i.e., S = 1, then we reduce the expected number of manhole samples by one, as the sensor acts on behalf of an otherwise required first manhole sample. If we plan to reduce the expected number of manhole samples again by one (total by two), we would need two additional sensors, one located within the entry set of the first sensor and the other located in the entry set of the WTP. The number of required sensors to reduce the expected number of manholes samples by Δ is S = 2 Δ −1, which is an exponential function, demonstrating that the first few installed sensors have a much greater impact on the expected number of manhole samplings than the latter ones. We note that the number of sensors given in (2) is based on two strong assumptions. First, it assumes all manholes have the same Bayesian probability, a useful assumption for obtaining an approximate number of sensors but may require fine-tuning when the Bayesian probabilities are not equal. Second, it assumes that the expected number of manholes can be used, ignoring the probability distribution. We seek to find the locations of the S sensors in the wastewater network that lead to the best (smallest) expected number of manhole samples. Let x i be a binary variable equal to one if a sensor is located at manhole i, and zero otherwise. Given a set of S locations for the sensors, let m i be an integer decision variable representing the number of manholes contained in the entry set of the sensor located at node i. The following non-linear integer program finds the best set of sensor locations. Given S sensors to locate, using the log(|M|) approximation, the minimization of the objective function in (3) seeks the smallest expected number of manhole samples required to find the source manhole. Constraint (4) gives the number of manholes in the entry set of sensor i, 8i. Constraint (5) gives p i , the sum of Bayesian probabilities of the manholes in the entry set of sensor i, 8i. Constraint (6) ensures that S sensors are located, and Constraint (7) identifies the decision variables. The mathematical model is a non-linear integer program. The non-linearity arises from the log function in the objection function and the product of terms in both the objective function and the Constraints (4) and (5) . The problem cannot be solved efficiently using existing solvers because the formulation is a non-linear integer program with a huge number of decision variables: 2|M| integer and binary decision variables and 2|M|+1 constraints (recall that typical |M| is between 250-5000). We later show that the number of samples is insensitive to the exact location of the sensors. This justifies our use of a heuristic algorithm explained in the next section. In the following heuristic algorithm, we set 0×log 0 = 0 in the objective function (3), implying that if a manhole i does not have a sensor the term log(m i ) p i x i does not appear in the objective function. This convention allows us to circumvent the computational complexities of dealing with log(0) which has a right-side limit of negative infinity. We develop a decomposition heuristic to solve the above non-linear integer program. Definition: A sensor s b is defined to be a sensor located "immediately upstream" of sensor s a if wastewater from s b flows next into one of the manholes in the entry set of sensor s a . Note that a sensor can have multiple immediately upstream sensors. Using the same convention, if s b is immediately upstream of s a then s a is immediately downstream of s b . Note that each sensor has only a single immediately downstream sensor due to the tree structure of the network. When there is only one sensor in the network, its immediately downstream sensor is the WTP. The heuristic requires several additional definitions. Let (s a , s b ) be a paired set of two sensors such that s b is immediately upstream of s a . For S sensors there are always S+1 such combinations because the WTP also has its own entry set. Note that the WTP can only be a downstream sensor. Let U i be the set of manholes in the entry set of sensor s i and let U ab = U a [U b be the union of the sets of manholes in the entry sets of s a and an immediately upstream sensor s b , as shown in Fig 6A and 6B . In the Algorithm below we will fix the location of s a and consider every potential s b in U ab as shown in Fig 6C. Note that s b always remains in the upstream of s a . The algorithm for the sensor location problem includes the following steps: 3. Select the largest Q ab . In the case of ties, choose arbitrarily. In Step 1 of the algorithm, a good initial solution is to locate the sensors so that the entry sets hold approximately equal number of manholes. For instance, if there is one sensor and nine manholes, one entry set would contain four manholes and the other five. We show in Lemma 1 (in S1 Appendix) and Result 1 below that this solution is optimal if the Bayesian probabilities are all equal. In Steps 2 and 3 of the algorithm we select the largest Q ab not yet considered because this term has the highest potential influence on the objective function value. In Step 4, the algorithm seeks to reduce this term in the objective function by moving the one upstream sensor to other possible locations in the union of the two respective entry sets. If a net reduction is found, sensor s b is moved to its new (better) location, and potentially all system entry sets are redefined. In that case, all current entries to the stack V are erased and V is now populated only with the new pair (s a , s b ). All other pairs, even if previously included in V, will need to be re-analyzed due to the system entry-set reconfiguration caused by relocating s b . If we cannot improve the solution by moving s b within the union set U ab , then the sensor pair (s a , s b ) is added to current set V. When all sensor pairs are considered and the solution cannot get any better, we terminate this steepest-descent heuristic algorithm as set V will include all pairwise sensor combinations. Consider the initial solution explained above, where we locate sensors so the entry sets are roughly equal in number of manholes. Results 1. In a network with S sensors, |M| manholes, and equal Bayesian probabilities of ρ = 1/|M|, to minimize the number of samples, the sensors are located to have approximately equal number of manholes in their respective entry sets. Result 1 is based on Lemma 1 in the S1 Appendix. In the proof of Lemma 1 given in the S1 Appendix we show that any perturbation to this solution would increase the expected number of manhole samples. In the S1 Appendix, we also present Lemma 2 and its proof that shows a special case of a wastewater tree with no junctures for which Step 4 of the algorithm can be accelerated to avoid checking every manhole in U ab . Whenever the union set U ab does not include any junction, we can use Lemma 2 and replace step 4 of the algorithm with the following alternative 4' for which the steepest descent approach is used to find the best location of sensor s b in fewer algorithmic checks. Step 4': First, locate s b in a manhole that has an equal number of manholes in its U ab upstream and downstream or, if the sum of upstream and downstream manholes is an odd number, locate s b so that the difference between the numbers of U ab upstream and downstream manholes is one. Check the two manholes adjacent to s b and relocate the sensor to the manhole with the lower Q ab . Continue until the two manholes adjacent to s b have a larger Q ab . Consider the simple example of Fig 7. We index manholes by numbers and sensors by letters. All manholes have a Bayesian probability of 0.1 except for manhole 4 which has a Bayesian probability of 0.3. Without loss of generality, we assume in this example that the WTP (s w ) also has an entry set with a Bayesian probability of 0.1 (i.e., the WTP directly collects wastewater Placing sensors in sewer networks: A system to pinpoint new cases of coronavirus from nearby residences). There are two new sensors in the network represented by s a located at node 1 and s b , located at node 2 as depicted in Fig 7A. The entry set of the WTP located at s w contains only manhole 0, the entry set of sensor s a contains manhole 1, and the entry set of sensor s b contains manholes 2-7. There are two sensor combinations: (s w , s a ) and (s a , s b ). We calculate Q wa and Q ab associated with the two pairs in the objective function: Q wa = log(1)×0.1+log(1)×0.1 = 0 and Q ab = log (1)×0.1+log(6)×0.8 = 2.06. Because Q wa