key: cord-0990917-1wdlf0ms authors: de Belleville, L.-M. H. title: Group Testing with Homophily to Curb Epidemics with Asymptomatic Carriers date: 2020-10-13 journal: nan DOI: 10.1101/2020.10.09.20210260 sha: c5a7bfa68f881438360c7930be545ae09bbd4c06 doc_id: 990917 cord_uid: 1wdlf0ms The global fight against COVID-19 is plagued by asymptomatic transmission and false negatives. Group testing is increasingly recognized as necessary to fight this epidemic. I examine the gains from considering heterogeneous interpersonal interactions (homophily), which induce potential contamination, when designing testing pools. Homophily can be identified ex ante at a scale commensurate with pool size, so that the risk of contamination is higher within a well-designed pool than with an outsider. This makes it possible to overcome the usual information-theoretic limits of group testing which rely on an implicit homogeneity assumption. More importantly, group testing with homophily detects asymptomatic carriers that would be missed even by exhaustive individual testing because of false negatives. Such a strategy should be implemented at least at a weekly frequency to fit the time profile of test positivity. It can be used either to avoid unnecessary lockdowns or to make lockdowns more efficient. The present study analyzes the potential gains from taking homophily into account when implementing pool testing to fight epidemics with asymptomatic carriers and false negatives. It shows that designing test pools that encompass potential contamination clusters improves the efficiency of tests significantly, and makes it possible, in combination with more advanced complementary exams, to identify carriers that would be missed even by exhaustive (and unfeasible) individual testing. Various strategies have been implemented to curb the COVID-19 epidemic. Trace and track may be efficient (Normile, 2020) but can be thwarted by asocial behaviors or religious beliefs (see patient 31 in South Korea) and tracking teams are overwhelmed when incidence is too high. Lockdowns and quarantine work (World Health Organization, 2020, Kupferschmidt and Cohen, 2020) but are costly (Gourinchas, 2020) . The need for testing, which can substantially reduce the need for indiscriminate quarantines, was identified early in the epidemic (Piguillem and Shi, 2020) . Massive and timely identification of asymptomatic disease carriers is crucial if human-to-human asymptomatic transmission happens. Clinical diagnosis based on symptoms is inefficient in that case, while Yelin et al. (2020) lament that focusing tests on acutely ill patients leave potentially infectious carriers undiagnosed at the community 2 . Testing of asymptomatic people is also useful if the disease has long-lasting consequences even without symptoms or if subsequent phases induce a higher fatality rate. Chan, Yuan, Kok et al. (2020) and find ground-glass opacities for the vast majority of COVID19 asymptomatic patients; although further research may be needed on that point, this may signal potential sequelae even for asymptomatic patients. More or less stringent definitions of asymptomatic carriers exist. document the existence of both presymptomatic and truly asymptomatic carriers. In order to tackle the contamination induced by the former, one could consider implementing trace and track, at least if the presymptomatic contagious period is short. However, as noted already, tracking teams may soon be overwhelmed. Thus, massive and timely identification of presymptomatic carriers may be necessary. More generally, I follow Harpedanne (2020) and use "asymptomatic" transmission to cover the transmission of a disease by asymptomatic but also presymptomatic, subclinical, or only mildly sick patients. Post-symptomatic patients may also present viral load, but these patients cause less problems for disease transmission since they can be isolated easily. First, many studies document the existence of asymptomatic carriers. In a meta-analysis of 66 articles and pre-prints, Koh, Naing, Rozledzana et al. (2020) find average asymptomatic proportion at diagnosis of 25.9%, including two thirds of presymptomatic and one third of truly asymptomatic. Among other studies, document a low proportion of asymptomatic carriers (4%), probably due to sample selection issues (virologically confirmed COVID-19 patients in Shanghai Public Health Centre). Kimball et al. (2020) find 13% asymptomatic carriers and 43% presymptomatic; Mizumoto et al. (2020) find 18% asymptomatic carriers; Qiu et al. (2020) find 28%, Nishiura, Kobayashi, Suzuki et al. (2020) find 31%, and Day (2020) cites China National Health Commission pointing to 78% asymptomatic carriers in new cases observed over 24 hours to April 1, 2020. Second, Human-to-human transmission of COVID-19/SARS-CoV-2 was early documented by Xu et al. (2020) , Li, Guan et al. (2020) , Chan, Yuan, Kok et al. (2020) and Phan et al. (2020) . More specifically, biological and epidemiologic evidence for asymptomatic transmission is provided by Bai et al. (2020) , Rothe et al. (2020) , Zou et al. (2020) , Santarpia et al. (2020 ) 3 , while Wong, Aziz, Chaw, Mahamud, Griffith Ying-Ru et al. (2020 strengthen the evidence for both asymptomatic and presymptomatic transmission. Koh, Naing, Rozledzana et al. (2020) find that the risk of transmission is 2.55 higher when the index case is symptomatic. Li, Pei et al. (2020) find that although the transmission rate of undocumented carriers is only 55% that of documented carriers, the former are responsible for 80% of contaminations, due to their high absolute numbers. Thus, massive testing of asymptomatic people has gained popularity during the epidemic (Allen, Block, Cohen et al., 2020 . However, individual testing of asymptomatic people is hopeless. For instance, France (the sixth largest world economy, with a population of 67 millions) has reached 1.19 million COVID-19 tests per week as of September 2020. Even in the unlikely case in which all these tests would be dedicated to detect asymptomatic carriers, each person would be tested less than once a year. Group testing (batching the samples of different people and implementing a single test on the pooled sample), first proposed by Dorfman (1943) , makes asymptomatic testing much more efficient. According to Mutesa (2020), when the prevalence of COVID-19 is .1 percent, the Dorfman's design decreases 17-fold the number of tests required to identify asymptomatic COVID-19 carriers (0.06 test per person) while a new strategy suggested by Mutesa et al. decrease 55 times this number of tests (0.018 test per person). Still, the gains of group testing over individual testing are lower for higher prevalence, and there exist information-theoretic limits to the potential improvements allowed by group testing (see below Section 3). I examine the benefits of taking a priori information on interpersonal relations into account when designing the pools used for group testing. Considering this information (which I label "homophily", see Section 2) makes it possible to push the information-theoretic limits mentioned above and improve test's efficiency further. Also, group testing with homophily proves very efficient to tackle false negatives, a major deficiency of usual COVID19 RT-PCR tests based on nasopharyngeal swabs. Group testing is used to fight COVID19 in China, India, Germany, the United States (Mallapaty, 2020) and Rwanda (Mutesa et al., 2020) . In the United States, it is authorized for pools of up to four people . The specific literature on group testing and COVID19 includes Gollier and Gossner (2020) , Conger et al. (2020) , , Eberhardt, Breuckmann and Eberhardt (2020), Mutesa et al. (2020) , Mallapaty (2020) , Lohse et al. (2020) and Yelin et al. (2020) . Group testing methods are either adaptive or non-adaptive. A group testing framework is adaptive if the results of a given round of test influence the design of subsequent rounds. For instance, Dorfman (1943) proposes a two-step procedure in which groups are tested in a first round, and if a group is positive, individual tests are implemented in this group. Mutesa et al. (2020, Section IV and Appendices B and C) discuss the pros and cons of various adaptive and non-adaptive tests to fight the COVID-19. Section 2 presents homophily and clustering as well as their links to contamination channels and contamination clusters; it shows that homophily can be identified ex ante, so that the strategy analyzed here, and especially in Section 5, is feasible. Then, Section 3 provides counterexamples to the limits computed by Chan et al. (2011) and Baldassini et al. (2013) and concludes that the homophily structure provides relevant information. Then, Section 4 shows that homophily makes it possible to reduce the effects of the dilution induced by group testing. Eventually, Section 5 considers false negatives induced by idiosyncratic noise: I find evidence that an adaptive strategy combining a first step group testing with homophily and a second step based on more advanced individual complementary exams can help identify asymptomatic carriers that could not be detected even by exhaustive individual tests. The appendix analyze the efficiency of this strategy to identify carriers at reduced cost and minimize the risk of false negatives. Although many examples and references relate to the COVID19 epidemic, most results are mathematical or logical in nature and may be useful more generally during epidemics in which massive and repeated asymptomatic testing is necessary. In this Section, I present homophily and clusitering, which are concepts used in social sciences, and clusters, which are used in epidemiology. I show that these concepts, that I summarize by "homophily" can be used ex ante to identify potential contamination structures in order to design testing pools accordingly, which is a necessary condition to implement the testing strategy analyzed in the rest of the present paper. Homophily was defined in 1954 by Lazarsfeld and Merton. It "refers to the fact that people are more prone to maintain relationships with people who are similar to themselves" (Jackson, 2008, p. 68) . McPherson et al. (2003) document the prevalence of homophily in many social networks. Jackson and Lopez-Pintado (2013) analyze the effects of homophily on contagion. In particular, they show that starting from a small initial seed (a small number of infected people), homophily facilitates diffusion under rather limited conditions. Since homophily is prevalent in network analysis and relates to contagion, it makes sense to consider this phenomenon when designing testing strategies. Surprisingly enough, the literature on group testing has disregarded these aspects until now. A related econometric concept is clustering. Clustering refers to the nondeterministic correlation of outcomes between individuals that are somewhat related. Moulton (1986 Moulton ( , 1990 introduced this idea and showed that failing to take it into account induce significant errors when estimating standard errors. Clustering has been popularized by Bertrand et al. (2004) who have shown that the standard errors of difference-in-difference estimates were not properly estimated when neglecting clustering. Clustering may be "multi-way" (Cameron, Gelbach and Miller, 2011) , meaning for instance that an individual may be correlated with people working in the same firm on the one hand, with people living in the same village on the other hand, but also with people going to the same gym club, those having their children in the same school, etc, without these different clusters being nested. Nowadays, correcting for potential clustering is a condition sine qua non for scientific work in applied economics. Many results show that overlooking clustering would underestimate standard errors, which means that there exists a positive correlation between outcomes for individual belonging to groups identified ex ante on rather simple criteria. This pattern is verified for a wide range of outcomes in many settings. In other words, various branches of social sciences converge on both the necessity and the possibility of taking heterogeneous interpersonal interactions into account when analyzing many mechanisms, and contagion especially. The medical literature confirms that contamination occurs through clusters. Han and Yang (2020) cite a Chinese-written article asserting that "In some cities, cases involving cluster transmission accounted for 50% to 80% of all confirmed cases of COVID-19." The strategy analyzed in the present study requires to group together COVID cases in the same testing pool, or a few number of pools. Pools may include a few dozen individuals, at most one hundred (See Section 4 on dilution). Thus, it is needed to identify ex ante potential clusters of limited size. Madewell et al. (2020) report that "To better understand clustering within households, it would also be useful for researchers to report the number of infections by household in addition to the total number of infected individuals." Unfortunately, this is rarely done, so that the feasibility of the strategy here analyzed must be evaluated indirectly. This can be done for instance by looking at studies on small clusters with high attack rate (ratio of contaminated people in a given group) or secondary attack rate (SAR is the number of people contaminated by an index case, divided by the people in contact with this index case). Koh, Naing, Rozledzana et al. (2020) provide a meta-analysis of 20 studies on secondary attack rate. Household is quite often the place where the SAR is highest (15.4% on average), and Qiu et al. (2020) . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 13, 2020. . https://doi.org/10.1101/2020.10.09.20210260 doi: medRxiv preprint find that out of 36 children infected in a Chinese city, 32 (89%) had transmission by close contact with family members. But high SAR have also been observed in a chalet (73.3%), at a choir (53.3%) and at a religious event; high SARs are observed sometimes for travels and eating with an index case. Koh, Naing, Rozledzana et al. (2020) report other cases of clusters with very high attack rate: "a nursing home in Kings County, Washington (64%) […] a church in Arkansas (38%), a homeless shelter in Boston (36%), a fitness dance class (26.3%) and the Diamond Princess cruise ship in Japan (18.8%). Park et al. (2020) analyze an outbreak in a Korean building: 94 of 97 cases worked on the same floor (11th). 79 cases worked in the same open-space (attack rate of 52%). Many clusters have also been observed in slaughterhouses. Thus, most of these results show high concentrations of cases in groups of limited size that can be tested in one or a few groups. Is it possible to derive from these results ex ante general contamination patterns in small clusters? Many studies (Li, Zhang, Lu et al., 2020 , Madewell et al., 2020 , Liu, Lian, Zhong et al.,2020 and the meta-analysis by Koh, Naing, Rozledzana et al. (2020) underline that longer and more intense exposure to infection sources increases the risk of infection. Crowded indoor environments with sustained close contact and conversations are a particularly high-risk setting (Nishiura, Oshitani, Kobayashi, et al., 2020) . Interestingly, Park et al. (2020) find a high concentration of cases in open spaces but only one case in small offices. This is consistent with the theoretical analysis in Harpedanne de Belleville (2020) who shows an almost convex effect of the number of room users onto contamination. Using these general patterns and theoretical results, it is possible to identify ex ante potential clusters, and to design pools that encompass these clusters. Section 3 to 5 analyze the gains from this strategy. More generally, Harpedanne 2020) relates contamination channels and contamination probabilities between individuals who have interpersonal interactions. For instance, airborne and droplets contagions increase the probability of contagion between people sharing the same office, open space, corridor, etc. Contagion through fomites increase the probability of contagion between people using successively the same toilet, the same seat in a train coach, etc. Thus, like Harpedanne (2020), the present paper deals with curbing epidemics with asymptomatic contamination and heterogeneous social interactions inducing specific expected contamination patterns. For Harpedanne (2020), asymptomatic carriers are unidentified but organizational measures can affect interpersonal interactions; conversely, the present paper takes interpersonal interactions for granted and proposes to take them into account to better identify asymptomatic carriers. In Section 5, I analyze a two-step strategy in which the first step (pool test on nasopharyngeal swabs) draws on homophily to design test pools. From a policy perspective, many patterns of homophily and potential clusters are identified; they can be used to make testing more efficient and therefore reduce the need for unnecessary lockdowns (Piguillem and Shi, 2020) . Still, households are documented as frequent clusters (high homophily inside households), and lockdowns may aggravate this fact. Thus, the two-step strategy can also be implemented to identify household contamination and make lockdowns more efficient and shorter. In this section, I show that if the homophily structure is known before implementing a pooled test, it contains information that can make the test more efficient. For that purpose, I evidence that if homophily is "strong" enough and can be properly identified when designing the pooled tests -more specifically, when designing the pools -, it makes it possible to overcome the most recent and tight information-theoretic lower bounds on the efficiency of group testing. These limits have been identified by Chan et al. (2011) who, for the first time in the literature, define limits in terms of actual numbers and not only rate or capacity, and Baldassini et al. (2013) , who follow the same path and provide a new and tighter lower bound. Unlike Sections 4 and 5, the present Section focuses on noiseless tests. Thus, a few definitions may be useful here. A group test is noiseless if a negative test outcome is guaranteed when all items in the . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 13, 2020. . https://doi.org/10.1101/2020.10.09.20210260 doi: medRxiv preprint testing pool are nondefective, and a positive outcome when a least one item in the pool is defective (Aldridge, Johnson and Scarlett, 2019) . Otherwise, the test is noisy. Noisy tests are often examined under the assumptions of constant (Chan et al., 2011) or worst-case (Macula, 1997) noise. However, the results by Yelin et al. (2020) point instead to an increasing risk of false negatives when dilution increases, and the results by Yang, Yang, Shen et al. (2020) , Wang, Xu, Gao et al. (2020) and Wang, Tan, Wang et al. (2020) point to patient-specific idiosyncratic noise. These two issues are analyzed in Sections 4 and 5, respectively, and I show that taking homophily into account brings specific gains. Conversely, the general form noise models (such as the symmetric error model, see e.g. Chan et al. , 2011 , or the additive model, see e.g. Atia and Saligrama, 2012) are not relevant to analyze these forms of noises, so that there is no need to provide a general noisy-models analysis. I consider noiseless tests for different other reasons. First, this makes it clear that the benefits of taking homophily into account in group testing are not limited to noise-related issues. Second, it makes the comparison with the information-theoretic limit of Baldassini et al. (2013) easier. Baldassini et al. (2013, Section III) analyze a noiseless test in a population of size N, with K defectives (K is known for simplicity). They show that if the number of tests is limited to T, the probability of correct identification of the set of defectives is: Let N be 64 and K be 8. Using 6 tests only (T=6), one can cut the population in 8 groups of 8 people each and determine which group contains carriers if only one group contains carriers (think of the 64 population as a 4x4x4 cube and cut the cube in half in each dimension, that is implement 6 tests over 32 people each). According to (1): (2) ( ) ≤ 2 6 64 8 ≈ 1.45 10 −8 Let now introduce homophily. Homophily means that there exists high potential for contamination within each group, while the potential for intergroup contamination is low. Let assume that only one individual has imported the disease in the 64 population: this a decent assumption if the prevalence is low in the general population; this assumption may be verified with probability (1-ε1), and let ε2 be the probability that intergroup contamination has happened. Then with a probability higher than (1 -ε1)(1 -ε2), all 8 carriers are in the same group. For instance, if ε1=0.2 and ε2=.5, we get: Which of course contradicts (2). Strong homophily provides information that makes it possible to overcome information-theoretic limits based on the implicit assumption of absence of homophily. The counterexample just provided is extreme and not very useful in practice; the aim of this example is merely to illustrate that the usual information-theoretic limits rely on an implicit homogeneity assumption. By taking homophily into account, we relax this assumption. More realistic adaptative frameworks may provide rather high probability of success with a limited number of test. Think of testing the 8 groups independently in a first step and testing individually all people in the first two groups that turn positive in the first step. With decent homophily, this strategy would likely provide a very good probability of success while (1) would give a bound equal to 0.3 %. Actually, (1) would not apply since the strategies examined by Baldassini et al. (2013) are only non-adaptative, but since the authors point to the limited gain in efficiency brought by adaptative designs, it is likely that gains from homophily could be formally proven for adaptative designs. This is beyond the scope of the present paper, and (3) suffices to prove that homophily, if identified ex ante when designing a group test, may provide relevant information. . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 13, 2020. . https://doi.org/10.1101/2020.10.09.20210260 doi: medRxiv preprint Dilution is a crucial issue for group testing applied to disease detection. If a positive swab is pooled with many negative swabs, it may difficult to identify traces of the virus. Dilution is dealt with already in the seminal paper on group testing by Dorfman (1943) who tackles syphilis detection among young drafted Americans. Dorfman finds that "diagnostic tests for syphilis are extremely sensitive and will show positive results for even great dilution of antigen". Still, dilution may be more of an issue for other diseases. Dilution has been examined in a group testing framework by Hwang (1976) . Warasi et al. (2017) propose a parametric model to tackle dilution. A series of studies have examined dilution for RT-PCR test targeting genes of SARS-CoV-2. Lohse et al. (2020) target the E gene (envelop gene) and S gene (spike gene) of SARS-CoV-2 and show that it is possible to identify positive samples correctly, even when diluted 30-fold (that is, one positive swab pooled with 29 negatives swabs). Yelin et al. (2020) find that group testing can be implemented with up to 32 individuals per group with a rate of false negative of 10 % in the case of COVID19, which they claim is low when compared to other sources of noise. They also suggest that implementing additional amplification cycles would make it possible to implement group testing with up to 64 individuals per group. Accordingly, Mutesa et al. (2020) show that using a Ct-value of 40 makes it possible to detect positive swabs diluted up one hundred-fold. More precisely, they show that for tests targeting the N gene of SARS-CoV-2, the upper 95% bound is below 40 and for tests targeting the Orf1ab gene, the upper 90 % bound is below 40. Subsequently, they confirm these results with tests targeting the E and RdRp genes. If homophily is taken into account when designing the pools, it is rather likely that carriers will be concentrated in a few pools, and therefore that no single carrier will be isolated in a pool. If many carriers are concentrated in a pool, this counteracts the effects of dilution. For instance, if a pool with 32 swabs contains 2 positive swabs, the dilution is 1/16, which is rather limited. In the present paper, I do not analyze further the interactions between homophily and dilution. Indeed, the available literature points to limited effect of dilution for COVID-19. Still, this issue remains open to further research. Idiosyncratic noise in tests can occur for many different reasons: contamination of the samples, error or insufficient training of the person in charge of collecting the swabs, etc. I focus here on a type of noise that has been extensively documented by the literature on COVID19: the swabs used for tests may fail to contain viral loading for many disease carriers. To solve this issue, a strategy based on group testing with homophily can identify more asymptomatic carriers than group testing alone, but also more than exhaustive individual testing. Many methods can be used to identify SARS-CoV-2 carriers: clinical diagnosis, chest radiograph and CT-scan, fibrobronchoscope brush biopsy, RT-PCR on bronchoalveolar lavage fluid, sputum, nasal swabs, pharyngeal swabs, feces, etc. By definition, clinical diagnosis does not work for asymptomatic carriers; chest radiograph and CT-scan are not available to implement massive identification of asymptomatic carriers; as underlined by Yang, Yang, Shen et al. (2020) , collecting lower respiratory samples (bronchoalveolar lavage fluid, fibrobronchoscope brush biopsy) requires specific equipment and skilled operators, and can be painful; among upper respiratory samples, sputum is produced in only 28 % of COVID cases examined by Huang et al. (2020) . Thus, only nasal swabs and pharyngeal swabs may be used for large scale asymptomatic testing. Yang, Yang, Shen et al. (2020) analyze four types of specimens (bronchoalveolar lavage fluid, nasal swabs, pharyngeal swabs and sputum) from 213 confirmed COVID patients and find strong evidence of false negatives for individual tests based on nasal or pharyngeal swabs: the rate of positive is only 50% to 73.3 % for nasal swabs; it is higher (72.1 % to 73.3 %) for the swabs collected over the first week . CC-BY-NC-ND 4.0 International license It is made available under a perpetuity. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 13, 2020. . https://doi.org/10.1101/2020.10.09.20210260 doi: medRxiv preprint after the onset of the disease. The rate is even lower for the pharyngeal swabs: 11 % to 61.3 %; once again higher for the swabs collected over the first week (60 % to 61.3 %). Wang, Xu, Gao et al. (2020) analyze eight types of samples (bronchoalveolar lavage fluid, fibrobronchoscope brush biopsy, sputum, nasal swabs, pharyngeal swabs, feces, bood and urine). They also find a high rate for false negative for individual nasal swabs and even more for pharyngeal swabs. They also point the very poor performance of rRt-PCR tests based on blood and urine. Wang, Tan, Wang et al. (2020) compare the detection performance of individual nasopharyngeal and oropharyngeal swabs for 535 patients. They also find better results for the former, and find that using both tests increase detection slightly over nasopharyngeal alone, which confirms the existence for false negatives for both. Overall, nasal swabs, which are widely used for rRT-PCR identification of SARS-CoV-2 (Wang, Hu, Hu et al., 2020) , display a significant rate of false negatives. Yang, Yang, Shen et al. (2020) underline cases in which all upper respiratory tests (or all upper respiratory tests over a given period) are negative for confirmed COVID-19 patients. This points to the fact that false negative results are related to specific individuals rather than mere technical errors. Group testing with homophily can be very beneficial here. Indeed, even if a carrier is "false negative" (meaning that there is no viral load in the sample collected for this individual), with homophily it is likely that other carriers belong to the same group/pool, so that the pooled test has more chances to turn positive 4 . If α is the proportion of false negatives in carriers and false negative individuals are i.i.d. in the population of carriers, the risk of missing the identification of a carrier is α for individual tests and for test with only one carrier in a pool, but it is α² if there are 2 carriers in the pool… and α n if there are n carriers. Since the literature shows that 0< α <1, we get α n < α n-1 ….< α ²< α: more carriers in a group increase the probability of a correct (positive) result at the group level. The multistage testing strategies usually considered in the group testing literature are clearly not optimal in the presence of individual false negatives: the last stage usually implemented is individual testing in positive groups, which would pick only part of the true positives in each group. But if groups are properly defined by taking homophily into account, carriers are likely to be concentrated in the positive groups. Thus, rather than implementing the same tests as in the first step, it makes sense to isolate and take care of all people in the positive groups and to implement advanced search onto them; for COVID-19: clinical diagnosis, chest radiograph or CTscan, fibrobronchoscope brush biopsy, and rRT-PCR on bronchoalveolar lavage fluid, sputum if available and feces (this choice of samples for rRT-PCR is based on Wang, Xu, Gao et al., 2020) . In this multistage strategy, two "costs" depend on homophility. First, even if this strategy is more efficient at tackling false negatives and identifying carriers than existing testing strategies, missed carriers happen. Second, if carriers are identified but are not concentrated, many pools must undergo the costly second-step process. Graph 1 analyze the quantitative gains from homophily in the two-step strategy described above, for an absolute number of defectives ranging from 2 to 5. This covers a large range of different situations. For instance, two defectives in a population of two thousand correspond to a rate of 0.1%, while five in a population of 50 correspond to a rate of 10%. The size (and number) of pools do not affect the graphs, so that the graphs also covers a large range of pool size. From left to right, each graph provides statistics (described below) for increasing concentration of the defectives in a few testing pools. As in Section 3, K is the number of defective. α is the probability that a test result is a false negative. According to Wang, Xu, Gao et al. (2020) , RT-PCR on nasal swabs identifies 63% of carriers (72% to 74% for swabs collected over the first two weeks after onset for Yang, Yang, Shen et al., 2020) . Thus, realistic figures for α range from 0.25 to 0.5, with an average value close to 0.33. I provide results for these three values. For instance, in the upper graphs (2 defectives), the left case (1 1) corresponds to one defective in a pool and one in another, while the right case (2) corresponds to the two defectives in 4 For the sake of simplicity, I neglect dilution. This is valid under mild conditions, including a limited pool size. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 13, 2020. . https://doi.org/10.1101/2020.10.09.20210260 doi: medRxiv preprint the same pool. Increasing concentration can be denoted by the operator