key: cord- -r ovkbd authors: liu, shuchang; ma, zheng feei; zhang, yutong; zhang, yingfei title: attitudes towards wildlife consumption inside and outside hubei province, china, in relation to the sars and covid- outbreaks date: - - journal: hum ecol interdiscip j doi: . /s - - - sha: doc_id: cord_uid: r ovkbd we designed a self-administered -item questionnaire to determine changes in attitudes towards wildlife consumption in chinese adults during the sars epidemic in – and on-going covid- pandemic that was first identified in december . a total of adults ( males and females) with a mean age of . ± . years participated, the majority ( . %) from hubei. the percentages of participants who had eaten wildlife significantly decreased from . % during sars to . % during covid- (p = . ). the most common reason participants provided for consuming wildlife was to try something novel ( . % during sars and . % during covid- ). more than half of participants (≥ . %) reported that they had stopped eating wildlife meat because most species of wildlife are legally protected. our study results indicate over the period between the sars epidemic to the outbreak of the covid- pandemic, attitudes towards the consumption of wildlife in china have changed significantly. in november , an epidemic of severe acute respiratory syndrome (sars) centred in foshan municipality, guangdong province, was identified, which peaked in february (evans et al. ) . early cases reported that patients positive for sars lived near animal markets, and nearly half of them were food practitioners who had contact with animal products. after years, in december , a novel coronavirus pneumonia outbreak was reported in wuhan, hubei province (fig. ) . the coronavirus disease outbreak was traced to the huanan seafood market, and most of the early diagnosed patients had been to the local fish and wildlife market before the outbreak ). the fish and wildlife market also sold live animals such as poultry, bats, marmots, hedgehogs, badgers, birds, and snakes wu et al. ) . since both outbreaks have been linked to wildlife markets (li and davey ; lu et al. ; wu et al. ) , it is important to explore the changes of attitude towards eating wildlife before and after the two outbreaks in the general population. both the sars and the on-going covid- outbreaks have had extremely negative impacts worldwide. the world health organization (who) recorded > cases of sars and deaths worldwide (zhong et al. ) . the covid- outbreak has also led to serious consequences including unprecedented levels of infection and deaths, decreased quality of life, and increased stress due to strict lockdowns and limits on social interactions ma et al. ; zhang and ma a,b) . in mainland china, the number of diagnosed patients from st january until nd february increased from cases to , cases within two weeks (dong et al. ) . hubei province, and especially its capital city wuhan, have been significantly affected since wuhan was the epicentre of the covid- outbreak. the total number of covid- cases in hubei province had reached , as of march th. meanwhile, the total number of covid- cases in other chinese provinces had reached , (maier and brockmann ) . the two epidemics began in areas with populations with a preferences for consumption of wildlife (sun et al. ) , which has been identified as the source of both the outbreaks, and covid- has been reported as having a probable origin in bats (zhou et al. ) . viruses usually need intermediate hosts to spread from bats to humans (sun et al. ) , and some wildlife species such as pangolins are reported to act intermediate hosts of severe acute respiratory syndrome-coronavirus (sars-cov- ). the virus may pass onto humans when they consume wildlife meat, and subsequently may lead to the risk of human-to-human transmission (zhang et al. ) . however, published research related to the attitudes regarding the wildlife consumption during both the ongoing covid- pandemic and the sars outbreak of years ago is very limited, illustrating the general lack of sufficient scientific attention to the safety of and attitudes towards consuming wildlife worldwide (wei ) . therefore, our aim in this study was to determine changes in attitudes towards wildlife consumption in chinese adults in relation to the sars and covid- outbreaks with a particular focus on hubei province. this is because hubei province, especially its capital city wuhan, has been significantly hit by the covid- pandemic. our findings from this study have important implications for public health, especially relating to the current dietary habit of consuming wildlife meat in china and elsewhere, and provide a basis for future studies to develop more effective prevention and treatment strategies. we conducted a cross-sectional study between april and april by using convenience sampling. inclusion criteria included: non-pregnant individuals of chinese nationality aged ≥ years and currently living either in or outside hubei province, china, who were living in same province during both the sars and covid- outbreaks. no financial rewards were given to participants for completing the questionnaire. all participants provided informed consent prior to the study enrolment. the study had obtained the approval from the ethics committee of the jinzhou medical university (ref. no. jydll ). in addition, our study protocol was conducted according to the provisions of the declaration of helsinki (as revised in edinburgh ). the questionnaire comprised a total of related questions including eliciting basic socioeconomic information such as sex, age, education, job type, marital status, religion, and city of residence. we also asked participants if their employment was related to healthcare professions. additionally, participants were also asked to indicate whether they or their friends/relatives were currently diagnosed with covid- . there were five questions each for sars and covid- . furthermore, participants were asked what they would do if they see someone hunting illegally. the questionnaire was distributed via wechat, qq, and baidu post bar. in the sars and covid- sections of the questionnaire, we asked participants whether they had ever eaten wildlife such as palm civets, snakes, wild boar, frogs, monkeys, bats, or pangolins during the outbreaks. if they answered yes, they were asked to select their reason for eating wildlife, including "i eat wildlife for nutrients," "i eat wildlife to test something novel," "i eat wildlife because they taste good," or "i eat wildlife because they are expensive, and they signify my social status." if they answered no, they were asked to select their reasons for not eating wildlife, including "i do not eat wildlife because i dislike eating wildlife," "i do not eat wildlife because they are protected by law," "i do not eat wildlife because they are too expensive," and "i do not eat wildlife because it is hard to buy wildlife in the local markets." we then provided four choices for participants reflecting whether or not their opinion had changed about eating wildlife since the sars outbreak: "i eat wildlife whenever i get the chance," "i have stopped eating wildlife meats because wildlife are legally protected," "i will only eat wildlife meats after they are inspected by food inspectors," and "i had another reason," which they were asked to state specifically. we also included questions as to whether participants considered palm civets to be carriers of sars, and bats to be carriers of sars-cov- . statistical analyses were performed using spss ver. (spss, chicago, il). differences were considered statistically significant when a p value was < . . difference between sex and age were determined using an independent t-test. a significant relationship between two categorical variables were analysed with a chi-square test. all results of quantitative variables were presented either as frequency (percentage) (%) or mean ± standard deviation where appropriate. the online questionnaire was completed by chinese adults and of these, . % ( / ) were from hubei province and . % ( / ) were from wuhan city ( table ). the mean age of participants was . ± . , with no difference in mean age between men and women (p = . ), and . % of participants were under years old. the majority of participants ( . %) had a higher education qualification level. about one third of participants ( . %) were married. none of the participants in the study was currently diagnosed with covid- ; only two participants indicated that they had friends who had been diagnosed with covid- ; . % of participants declared they had no religious belief, and . % indicated they were buddhist. in addition, . % were of han ethnicity, while man and hui accounted for . % and . %, respectively. the percentages of participants who had ever eaten wildlife were much lower than those who had not eaten, both during the sars ( . % vs. . %) and covid- ( . % vs. . %) outbreaks (tables and ). however, the percentages of participants who consumed wildlife differed significantly during two outbreaks (p = . ), as . % of participants reported that they consumed wildlife before sars and only . % had eaten wildlife before covid- . for those who had eaten wildlife, the most common reason was to test something novel, . % during the sars and . % during the covid- outbreaks, respectively. interestingly, no one consumed wildlife because of the expense signified their social status. for those who had never eaten wildlife, the two most common reasons were dislike of eating wildlife ( . % during sars and . % during covid- ) and because most species of wildlife are protected by law ( . % during sars and . % during covid- ). education level was significantly associated with wildlife consumption, both during the sars and covid- outbreaks (p = . and p < . , respectively). additionally, only during the sars outbreak, there were significant differences in the percentages of wildlife consumption between males and females ( . % and . %, respectively) (p < . ). however, there was no difference in the percentage of participants living inside or outside hubei who consumed wildlife during the two outbreaks (p = . and p = . , respectively). overall, the majority of participants reported that during the covid- outbreak they stopped eating wildlife and/or did not eat it because they were legally protected species ( . %), followed by "only eat inspected wildlife meat" ( . %) and "eat when got opportunity" ( . %) (tables and ) . similarly, majority of participants reported that during the sars outbreak, they stopped eating wildlife and/or did not eat it because they were legally protected species ( . %), followed by "only eat inspected wildlife meat" ( . %) and "eat when got opportunity" ( . %) (tables and ) . those who chose "other reasons" indicated that their attitudes towards not eating wildlife had never changed ( . % during covid- and . % during sars). there were significant differences in the perceptions of eating wildlife during sars and covid- between participants living in hubei and those living outside hubei (p = . and < . , respectively) (tables and ). participants living in hubei indicated that they changed their opinion during sars mainly because wildlife were legally protected ( . %), followed by "only eat inspected wildlife meat" ( . %), and "eat when got opportunity" ( . %). however, when it came to the covid- outbreak, the percentages changed to . %, . %, and . %, respectively. participants outside hubei changed their opinion to "only eat inspected wildlife meat" ( . %), followed by stop eating wildlife because they were legally protected ( . %), and "eat when got opportunity" ( . %) during sars. these percentages changed to . % ("only eat inspected wildlife meat"), . % ("stop eating wildlife were legally protected"), and . % ("eat when got opportunity"), respectively during covid- . there were significant differences in the perceptions of eating wildlife between participants who had higher educational qualifications and participants with secondary education level during sars and covid- . the percentages of them choosing "stop eating wildlife were legally protected", "only eat inspected wildlife meat" and "eat when got opportunity" were . %, . %, and . % respectively for participants who had higher educational qualifications compared to . %, . %, and . % for participants with secondary education level, respectively, during sars. on the other hand, the percentages of those choosing "stop eating because wildlife were legally protected," "only eat inspected wildlife meat," and "eat when got opportunity" were . %, . % and . % for participants who had higher education compared to . %, . % and . % for participants with secondary education, respectively during covid- (all p < . ). there were no differences in the percentages of those changing of their opinion about eating wildlife between males and females and different age groups during covid- and sars (all p > . ). more than half of the participants ( . %) thought that palm civets were carriers of sars, while only . % indicated they did not think that palm civets were carriers of sars, and about one-third ( . %) indicated they did not know. in addition, nearly half the participants ( . %) agreed that bats were carriers of sars-cov- . furthermore, significantly more female participants agreed that bats were carriers of sars-cov- than male participants ( . % vs. . %) (p < . ) ( our study results clearly indicate that chinese attitudes towards eating wildlife have changed significantly between the - sars outbreak and the december ongoing covid- outbreak. the percentages of participants who had eaten wildlife decreased from . % during sars to . % during covid- (p = . ). this showed that the chinese population's attitudes towards eating wildlife have significantly altered over the past years, which may be due to the fact that sars outbreak encouraged greater vigilance and reflection on the dangers inherent in wildlife meat consumption. in addition, there are currently many non-governmental organizations organizing activities to further protect wildlife (yuan et al. ). there were significant differences in opinions about eating wildlife during sars and covid- between participants from hubei and participants outside hubei. approximately three-fifths of participants from hubei chose not to eat wildlife because most wildlife species are legally protected. approximately half of participants outside hubei chose only to eat inspected wildlife meat. from the sars outbreak to the covid- outbreak, the changes in the opinion of participants from hubei and outside hubei were reflected in the fact that the participants who only consumed wildlife that had been inspected during sars indicated that they stopped eating wildlife during covid- . only . % of participants from hubei agreed that palm civets were carriers of the sars virus, which was lower than those participants outside hubei ( . %) (p = . ). this may be because since the main outbreak area of sars was not concentrated in hubei so that of outbreak may not have had such a profound impact among participants from hubei (evans et al. ) . in addition, our results indicate that education level significantly affected attitudes towards wildlife consumption. interestingly, during sars, . % of participants with higher education qualifications indicated they consumed wildlife, which was more than twice that of participants without higher education ( . %) (p = . ). the percentages of participants with higher education who thought that palm civets were sars carriers were more than twice as high as those without higher education (p < . ). however, during covid- , the percentages of participants with secondary school education who consumed wildlife were three times that of participants with higher education. at the same time, these two groups also reflected significant changes in perceptions of eating wildlife (p < . ). from sars to covid- , participants with higher education who indicated they chose to "stop eating because wildlife are legally protected" increased from . % to . %. usually, wildlife meat is sold for higher prices because of its scarcity. consumers with higher income and higher education level were reported to have higher consumption rates of wild animals (zhang and yin ) . additionally, consumers with higher education levels usually have a higher income. therefore, this may explain why there were higher percentages of participants with higher education levels who consumed wild meat than those with secondary education level during sars ( . % vs. . %). however, during covid- , there were fewer participants with higher education who consumed wildlife than those with secondary level education. it is possible that participants with higher education levels might have become more aware of the risks associated with wildlife consumption, especially after sars and covid- . china's per capita consumption of meat quadrupled from to (liu and diamond ) . however, meat production cannot keep up with china's growing appetite for animal products cannot (machovina et al. ) . eating wildlife may be a way to increase sources of protein (asibey ) . the consumption of wildlife is not uncommon in many parts of the world, including america, africa, and asia, and in many cases is a very important part of cultural identify (lindsey et al. ; volpato et al. ) . however, the chinese population currently have abundant choices for sources of protein. in our study, more than half of the participants indicated that they ate wildlife meat because they wanted to try something novel, and secondly that they like its taste. only a small number of participants ( . % during sars and . % during covid- ) thought that wildlife meat has special nutritional value. this seems to indicate that wildlife meat rather than being a necessary source of protein for the chinese population is nowadays simply a matter of personal choice. it is worth noting that during the two outbreaks, none of the participants reported that they consumed wildlife because the expensive price signified their social status. if laws related to the protection of wildlife are tightened and strictly enforced, and cutting off the supply of wildlife in markets, then the cost of eating illegally hunted wildlife will increase. the chinese population will then find it increasingly difficult to find opportunities to consume wildlife. thus, the number of individuals who eat wildlife to satisfy their curiosity would also be greatly reduced. at the same time, the dangers inherent in the consumption of wildlife meat, especially if the source is unknown, should be widely publicized. as has become clear during the ongoing covid- outbreak, some species of wildlife carry viruses that can cross barriers between species and mutate to become dangerous and potentially fatal to humans (volpato et al. ) . also noteworthy is that more than half of our study participants ( . %) indicated that they thought that palm civets were carriers of sars. however, fewer than half of the participants ( . %) thought that bats were the carriers of sars-cov- . this may be because, at the time the questionnaire was circulated, the covid- outbreak was so recent. when compared to the sars outbreak of years ago, participants may not have had enough knowledge and familiarity with covid- . however, since the habit of consuming wildlife is acquired over a long period, a gradual approach to improving eating habits should be adopted, since it is neither feasible to force the chinese population to change their dietary habits just after the pandemic outbreak, nor would it likely produce the desired outcomes. in our study, some participants reported that they would continue to consume wildlife meat, which indicates that there is still demand for wildlife meat. after the covid- outbreak, chinese government shut down wet markets (markets for live or freshly slaughtered animals), but this clearly did not eliminate demand, and may in fact lead to the wildlife trade continuing underground (volpato et al. ) . it is more realistic to provide a greater variety of food choices in the markets. for example, most of our study participants ( . % during the sars and . % during the ongoing covid- outbreaks, respectively) consumed wildlife meat because they felt that the wildlife meat was novel and they had the opportunity to acquire it. if qualified enterprises can breed some of these wild species, with the same safety guarantees as currently domesticated farm animals, this could provide an alternative safe option for those who continue to favour wildlife consumption. it might be easier to achieve with better results rather than attempting to enforce a blanket ban on wildlife consumption. furthermore, while protecting the original environment of endangered wildlife species is important, intensive breeding for reintroduction or even meat production is also a useful strategy (leader-williams et al. ) . the chinese government has in fact implemented a series of measures, including amending the wildlife protection law and captive breeding of wildlife, to further enhance wildlife protection (wang et al. ) . it was encouraging that the majority of our study participants ( . %) indicated that they would stop or try to stop illegal hunting, with more than a quarter saying they would firmly stop illegal hunters, and only . % indicated they would not take any action. the covid- outbreak has led to lockdown for months, greatly affecting the lives of the whole nation and the whole world (yuan et al. ) . it is hoped that the serious consequences of this covid- pandemic will alert the chinese population to the importance of environmental protections. a significant strength of our study is that it is one of the first to investigate the impact of covid- on wildlife consumption and compare the results with the earlier sars outbreak. furthermore, since we especially targeted participants from hubei province, and more than a half of the hubei participants were living in wuhan, the epicentre of the covid- outbreak, we had the opportunity to determine whether there were differences in the attitudes of wildlife consumption between residents from both inside and outside the epicentre of the covid- outbreak. one limitation of our study is potential recall bias, because participants might have had difficulty recalling details from the sars period years ago. another limitation is the use of the convenience sampling method. in addition, the translation of some english words and western understandings such as "wildlife" could be problematic in china because of different historical rationales for eating wildlife in chinese and western conceptions and cultures. therefore, our findings should be interpreted cautiously. in conclusion, in the years from the sars to covid- outbreaks, the proportion of chinese adults consuming wildlife has decreased significantly. at present, chinese populations seem to be in favour stopping wildlife consumption and fighting against illegal hunting. however, it is likely that some people in china will continue to consume wildlife meat for a number of reasons including believed health benefits. funding this research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. data availability the datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request. wildlife as a source of protein in africa south of the sahara evaluation of lifestyle, attitude and stressful impact amid the covid- pandemic among adults in an interactive web-based dashboard to track covid- in real time wildlife production systems: economic utilisation of wild ungulates culture, reform politics, and future directions: a review of china's animal protection challenge the bushmeat trade in african savannas: impacts, drivers, and possible solutions china's environment in a globalizing world outbreak of pneumonia of unknown etiology in wuhan, china: the mystery and the miracle increased stressful impact among general population in mainland china amid the covid- pandemic: a nationwide cross-sectional study after wuhan city's travel ban lifted biodiversity conservation: the key is reducing meat consumption effective containment explains sub-exponential growth in confirmed cases of recent covid- outbreak in mainland china potential factors influencing repeated sars outbreaks in china baby pangolins on my plate: possible lessons to learn from the covid- pandemic captive breeding of wildlife resources-china's revised supply-side approach to conservation food safety issues related to wildlife have not been taken seriously from sars to covid- a new coronavirus associated with human respiratory disease in china regulating wildlife conservation and food safety to prevent human exposure to novel virus wildlife consumption and conservation awareness in china: a long way to go impact of the covid- pandemic on mental health and quality of life among local residents in liaoning province, china: a cross-sectional study psychological responses and lifestyle changes among pregnant women with respect to the early stages of covid- pandemic willingness of the general population to accept and pay for covid- vaccination during the early stages of covid- pandemic: a nationally representative survey in mainland china epidemiology and cause of severe acute respiratory syndrome (sars) in guangdong, people's republic of china a pneumonia outbreak associated with a new coronavirus of probable bat origin publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations conflict of interest the authors declare that they have no conflict of interest. the present study was approved by jinzhou medical university research ethics committee involving human beings. (ref. no. jydll ). in addition, our study protocol was conducted according to the provisions of the declaration of helsinki (as revised in edinburgh ). all participants were briefed about the study protocol and informed consent was obtained from them. key: cord- -kjx cvzh authors: zhao, ze-yu; chen, qi; zhao, bin; hannah, mikah ngwanguong; wang, ning; wang, yu-xin; xuan, xian-fa; rui, jia; chu, mei-jie; yu, shan-shan; wang, yao; liu, xing-chun; an, ran; pan, li-li; chiang, yi-chen; su, yan-hua; zhao, ben-hua; chen, tian-mu title: relative transmissibility of shigellosis among male and female individuals: a modeling study in hubei province, china date: - - journal: infect dis poverty doi: . /s - - -x sha: doc_id: cord_uid: kjx cvzh background: developing countries exhibit a high disease burden from shigellosis. owing to the different incidences in males and females, this study aims to analyze the features involved in the transmission of shigellosis among male (subscript m) and female (subscript f) individuals using a newly developed sex-based model. methods: the data of reported shigellosis cases were collected from the china information system for disease control and prevention in hubei province from to . a sex-based susceptible–exposed–infectious/asymptomatic–recovered (seiar) model was applied to explore the dataset, and a sex-age-based seiar model was applied in to explore the sex- and age-specific transmissions. results: from to , shigellosis cases (including male and female cases) were reported in hubei province. the seiar model exhibited a significant fitting effect with the shigellosis data (p < . ). the median values of the shigellosis transmission were . × ( ) for sar(mm) (secondary attack rate from male to male), . × ( ) for sar(mf), . × (- ) for sar(fm), and . × (- ) for sar(ff). the top five mean values of the transmission relative rate in (where the subscript was defined as male and age ≤ years, was male and age to years, was male and age ≥ years, was female and age ≤ years, was female and age to years, and was male and age ≥ years) were . × (- ) for β( ), . × (- ) for β( ), . × (- ) for β( ), . × (- ) for β( ), and . × (- ) for β( ). conclusions: the transmissibility of shigellosis differed among male and female individuals. the transmissibility between the genders was higher than that within the genders, particularly female-to-male transmission. the most important route in children (age ≤ years) was transmission from the elderly (age ≥ years). therefore, the greatest interventions should be applied in females and the elderly. old [ ] . according to the chinese center for disease control and prevention (china cdc), approximately to cases were reported annually within the period to [ ] . although there have been an improvement in the quality of water and sanitation, shigellosis remains a major public health problem in several developing countries, including china [ , ] . bacillary dysentery is an infectious intestinal disease that can be transmitted via the consumption of contaminated food or water [ ] . humans are the only natural host for shigella spp.. in recent years, numerous reports have demonstrated that the incidence of shigellosis within males is higher than that within females [ ] [ ] [ ] . the incidence of shigellosis, a water/food born disease, is directly related to the hygiene behaviours such as regular hand washing [ ] . a study has indicated that the sanitary state in females is always higher than that in males [ ] . does this mean that the transmission features differ between male and female? a study has reported that shigellosis primarily occurs from person-toperson [ ] . thus, the water/food-to-person route has been interrupted. moreover, many studies have indicated different incidences in individuals of various ages [ , , ] . in this study, we aimed to explore the interpersonal transmission further. in model studies of shigellosis, the distribution of time and space has been a greater focus than population-based research [ ] [ ] [ ] [ ] [ ] . a study demonstrated that the susceptible-exposed-infectious/asymptomatic-recovered-water/food (seiarw) model exhibited a significant fitting effect with outbreak data in a school [ ] . however, it did not estimate the transmissibility of bacillary dysentery between males and females. considering that water makes less of a contribution in the transmission, a sex-based susceptible-exposed-infectious/asymptomatic-recovered (seiar) model was applied to explore the dataset from hubei province. the secondary attack rate (sar), which is defined as the probability of an infected person infecting a susceptible person during his or her entire infectious period, was adopted to assess the relative transmissibility of shigellosis between males and females. in this study, shigellosis cases reported in hubei province, china, were collected. the seiar model was applied to fit the data, calculate the related index, and determine the transmissibility of shigellosis between males and females. with the aim of exploring the transmission features in different gender and age groups, the seiar model was adopted to fit the data of shigellosis cases reported from to in hubei province, china. a mathematical study was implemented using a sexand age-based model to analyze the transmission characteristics of reported shigellosis cases in hubei province, china, from to . in this study, we divided the research process into three parts (fig. ) . first, we developed the model according to the natural history and transmission mechanism in different genders. second, we acquired the model parameters by reference and curve fitting. finally, we adopted indicators to estimate the transmissibility in different genders and to explore the transmission features in different age groups further. the dataset of the shigellosis cases was collected from the china information system for disease control and prevention in hubei province from to . the dataset included gender, age, occupation, address, date of onset, and date of diagnosis. in this study, people were divided into two groups according to gender. the information of the population, such as the birth rate, death rate and total population were obtained from the hubei statistical yearbook. the seiar model was developed according to the natural history of shigellosis among male and female individuals (fig. ) . we used the subscripts m to represent male and f to represent female. the pattern followed by the model was person to person, which consisted of susceptible (s m , s f ), exposed (e m , e f ), symptomatic (i m , i f ), asymptomatic (a m , a f ) and recovered (r m , r f ) individuals. definitions of the epidemiological classes are summarized in table . in the model, we assumed that: a) susceptible individuals of different genders become infected by contact with infected/asymptomatic people. b) the relative rate of transmission among male and female individuals is β mm and β ff , respectively. c) the relative rate of transmission from male to female is β mf and from female to male is β fm . moreover, we assumed that in both male and female: a) the disease does not spread vertically, and individuals born in various groups are all susceptible. the natural birth rate is br and the natural mortality rate is dr. b) according to a new review [ ] , the transmission of shigellosis mainly occurs from person-to-person. meanwhile, our pilot study indicated a minor contribution of water/food (additional file ). therefore, we assumed that the water/food to person transmission route had been cut off. c) the ( -p) e ( ≤ p ≤ ) number of exposed individuals will change to infected person i following an incubation period, while a further pe number of exposed individuals will become asymptomatic person a following a latent period (the period during which the exposed individuals become an asymptomatic person). d) the removal speed from i and a is positively proportional to the number of people in both groups, and the proportional coefficients are γ and γ', respectively, whereas /γ and /γ' are the infectious period of i and a. e) the infected person will die as a result of the disease and the case fatality rate is f. the model is expressed as follows: the left side of the equation indicates the instantaneous rate of change of s, e, i, a and r at time t. in the model, the sar was calculated as follows: considering that the transmissibility could relate to different ages (we considered three age groups based on the age distribution of the reported shigellosis incidences in the province), we divided individuals into six groups. the subscript was defined as male and age ≤ years, was male and age to years, was male and age ≥ years, was female and age ≤ years, was female and age to years, and was male and age ≥ years. thereafter, we constructed a sex-age-based seiar model. we calculated the ratios x, y, and z (from the results of sex-based seiar model) in four transmission routes of the different genders to increase the reliability of the estimated parameters. we set β ff as β and the framework is presented in fig. and its equation is provided in additional file . according to the reported incidence of shigellosis from to in hubei province, we selected the year to quantify the transmissibility in the different sex and age groups (fig. a) . meanwhile, we compared wuhan city with yichang city based on the different incidence in both cities of hubei province in (fig. b) . according to the epidemiological characteristics of shigellosis and our previous study [ ] , we set k and γ' as . and . , respectively. the proportions of asymptomatic individuals were reported to range from . to . [ ] [ ] [ ] . we set p = . in the seiar model. the incubation of shigellosis was reported to range from to days [ ] [ ] [ ] . therefore, we set ω as . to . . the symptoms generally last for week, but certain people may experience symptoms for several weeks [ , ] . we assumed the course of the disease was up to weeks. therefore, we set γ as . to . . the fatality rate of the disease reported in a study decreased from . to . from to [ ] . considering that the fatality rate of shigellosis is extremely low [ ] , we set f = . the values of β mm , β ff , β mf and β fm were generated by curve fitting using the seiar model and the reported shigellosis data. the definitions, ranges and sources of the parameters are displayed in table . we performed a "knock-out" simulation to explore the roles of the different β values. the theory of the "knockout" simulation was come from originates from the gene "knock-out" technique (an experimental technique used in genetics in which a normal gene is replaced by a defective gene either at the exact same chromosomal sitehence, the normal gene is 'knocked out' by the defective gene-as occurs with the yeast genome, or the deoxyribonucleic acid is inserted at random sites, as occurs in [ ] . in the model, we always estimated the contribution of one parameter by setting it to to calculate the decreasing number of cases or total attack rate. for example, the contribution of the parameter β fm simulated by the model was the decreasing number of cases when we set it to . therefore, "knock-out" simulation (interrupting the different shigellosis transmission routes among males and females) was performed in five scenarios in our study: a) β mm = ; b) β mf = ; c) β ff = ; d) β fm = ; and e) control (no intervention). was employed for the model simulation. the simulation methods were as previously described [ , [ ] [ ] [ ] [ ] . according to our previous published studies [ , ] , we assumed that heterogeneity of the transmissibility existed during an ascending trend and a descending trend. the annual data were therefore divided into numerous parts and the simulated time step was a day; for example, the data of were divided into parts ( moreover, spss . (ibm corp, armonk, ny, usa) was used to calculate the coefficient of determination (r ) by curve fitting, which was adopted to judge the model goodness of fit. because nine parameters, namely k, ω, γ, γ', p, br, dr, f and q, were obtained from references and the hubei statistical yearbook, uncertainty existed influence in the model. in our model, the nine parameters were split into values, as indicated in table . considering that the simulated model method was the same in each year, we performed sensitivity analysis in (a middle reported incidence and case in fig. a ). the results of the curve fitting indicated that the seiar model fitted the data effectively (fig. ) . the r values of the seiar model for the different genders each year are presented in table . in , the reported data of all individual groups exhibited a significant fitting effect with simulated data in hubei province (fig. ) , wuhan city, and yichang city (fig. ). according to fig. , the results of the "knock-out" simulation demonstrated that the number of cases in the different genders using the parameters β mm = , β ff = , β mf = and β fm = were lower than that in the control group. when β fm = , the number of cases decreased the most in the different genders. in , a total of cases were reported in hubei province ( cases in yichang city and cases in wuhan city). the "knock-out" simulation demonstrated similar results of the contribution in four transmission routes between wuhan and yichang city, but different results from hubei province (fig. ) . fig. presents the difference between the mean and % confidence interval (ci) from to when using β mm , β ff , β mf and β fm . the mean value was . × - ( % ci: . × - to . × - ) when using β mm , . × - ( % ci: . × - to . × ) when using β ff , . × - ( % ci: . × - to . × - ) when using β fm, and . × - ( % ci: . × - to . × - ) when using β mf . the results of the sar from to are presented in fig. . the median value of sar mm was . ci: . × - to . × - ) and β (mean: . × - , % ci: . × - to . × - ). based on the times that the model ran, the model was not sensitive to the parameters br, dr, f, q and γ'. the number of cases set were the same for the mean, meanstandard deviation (sd) and mean + sd values (fig. ). our model was slight sensitive with parameters ω, k and p (fig. a,b,c) . meanwhile, high sensitivity to parameter γ ( . ) was demonstrated, as illustrated in fig. d . several mathematical models (such as the time-series susceptible-infectious-recovered and seiarw) have been established to determine the dynamics of shigellosis [ , ] . however, our study is the first to clarify the fig. the results to simulate the contribution of β during the transmission in different genders. a: male; b: female; β mm = , interrupt transmission among male; β ff = , interrupt transmission among female; β fm = , interrupt transmission from female to male; β mf = , interrupt transmission from male to female; none: control transmission of shigellosis between both genders globally. in this study, we used the seiar model to study the transmission of the water/food-borne infectious disease and explored the transmission routes in the different sex-age groups further. the results provide guiding significance for controlling the prevalence of shigellosis. according to r of the linear regression, the seiar model exhibited a high goodness of fit with the reported data in the different genders. moreover, it was consistent with the results of previous research [ ] , suggesting that the model was suitable for this study. according to the results of the sensitivity analysis, the model was more sensitive to parameter γ. therefore, the results would be more reliable if γ was collected from real data, instead of from the literature. in recent years, although the incidence of shigellosis exhibited a decreasing trend in china [ , , ] , relatively high levels still occurred in hubei province from to . different incidences of shigellosis cases in males and females were observed by the descriptive epidemiology [ , ] . however, few clarifications of the causes of this difference and the transmission features have been provided. a study indicated that there were more cases in males than in females (the male-to-female ratio was . : ), which is consistent with our results in the descriptive epidemiology [ ] . the transmission pattern of shigellosis has shifted from water/food-to-person to person-to-person, with high risk groups being particularly men who have sex with other men (msm) in developed countries [ ] . meanwhile, numerous studies have reported that the incidence in males is higher than that in female [ ] [ ] [ ] . does this mean that the transmissibility of shigellosis among males is stronger than that among females? the seiar model was developed to verify this hypothesis. however, we obtained the number of cases in five hypotheses using "knock-out" simulation. when β fm = , the number of cases decreased the most in both genders, which means that female-to-male transmission contributed significantly during the transmission. therefore, it is important to isolate and treat female cases as well as to strengthen personal health. in this study, we modelled the reported data from two cities in hubei province. the results of the "knock-out" simulation demonstrated that the decreasing trend of wuhan city was similar to that of yichang city, but both exhibited a certain disparity fig. the parameter of β mm , β ff , β mf and β fm during the transmission from to in hubei. a: β mm , transmission relative rate among male; b: β ff , transmission relative rate among female; c: β mf , transmission relative rate from male to female; d: β fm , transmission relative rate from female to male fig. the sar mm , sar mf , sar fm and sar ff estimated by model from to in hubei. sar: secondary attack rate; subscript mm, among male; mf, from male to female; fm, from female to male; ff, among female compared to the results of hubei province. according to fig. , there were differences in the cases reported from wuhan city and yichang city for . both cities exhibited similar ascending and descending trends during each time for the same gender, but the results differed from those of hubei province. this could be related to the proportion of male and female cases reported daily. regional differences may not be the main influential factor for the incidences in terms of gender. compared to hiv which exhibits different transmissibility in different genders, shigellosis is not particularly highly contagious in the different genders [ ] . our results demonstrated that the mean values of the transmission parameters among males and females, from male to female, and from female to male are differed, with the following order: β fm > β mm > β mf > β ff . the median values of the sar exhibited the following order: sar fm > sar mf > sar mm > sar ff . because a model of the total population in hubei was constructed, the value of sar was small and within the neighborhood of zero. however, this did not affect the quantification of the transmissibility of shigellosis. a previous study indicated a high incidence in msm in developed countries owing to unprotected sex and oro-anal contact [ ] . however, the proportion of msm in china is not large. this finding may be related to the fact that the contact rate between males and females, such as kissing, embracing, and shaking hands, is higher than within genders. the results indicate that the most significant transmission route is from female to male. superior hygiene behaviours may be responsible for the lower female than male incidences. the greatest reason that males are more susceptible than females may be related to superior lifestyle habits, such as hand washing, in female individuals than in males. moreover, females generally carry out more tasks such as cooking in the home. this finding suggests the importance of emphasizing the importance of washing hands before cooking for females. the results of this study are consistent with those of most research [ , ] , which have indicated a heavy disease burden in children under years. there is no doubt that children have a relatively high susceptibility compared to other ages. furthermore, it is apparent that children often exhibit poor habits such as not washing their hands after using the toilet or before meals. our results demonstrate that the main transmission route is from the elderly to children. there is a custom in china whereby young parents leave their children in the grandparents' care. this suggests that the most important intervention may be the need to cut off transmission from the elderly. according to the epidemic characteristics of bacterial dysentery, control measures could be implemented in terms of following aspects: a) focus on females cooking in the home and grandparents caring for grandchildren, such as advocating hand washing. b) encourage effective hygiene habits to reduce the susceptibility of male individuals and children. c) reduce the frequency of social behaviour such as kissing, embracing and shaking hands. fig. the transmission relative rate in different age and gender groups in . β : transmission relative rate within female; β ij refers to transmission relative rate of gender and age group from i to j, i and j represent subscript to , subscript was defined as male and ≤ years old, was male and between to years old, was male and ≥ years old, was female and ≤ years old, was female and between to years old, and was female and ≥ years old; the data of were divided into stages based on the following simulated periods, limitations several influential factors contributed to the year being considered for estimating the transmission features in the different age groups. it is possible that the transmission would vary according to changes in human behaviour. thus, further research is required to explore the transmission characteristics of hubei province. numerous studies have indicated that shigella consists of four species, namely dysenteriae, boydii, flexneri, and sonnei, among which the final two are the most common in low-and middle-income countries [ , , ] . in our study, the dataset was obtained from routine infectious disease surveillance of the cdc in hubei province with no reported information regarding the shigella species. we believe that it is highly necessary to estimate the transmissibility in different shigella species. additional data for the different species will need to be collected for analysis. the results have been affected given that we supposed that β w = in the seiar model and ignored environmental factors (such as water and food). moreover, owing to the limited availability of data, sociological components (for example, occupations, and cultural and societal backgrounds) were not considered in the model. additional data relating to sociological factors need to be collected for analysis. finally, the parameters of the seiar model were obtained from relevant references and the hubei statistical yearbook, and not from a firsthand data, which had an impact on the accuracy of our model. in hubei province, the incidence of shigellosis in males is higher than that in females. the transmissibility between the genders is higher than that within the genders, particularly female-to-male transmission. the main transmission route in children (age ≤ years) is transmission from the elderly (age ≥ years). therefore, the greatest interventions should be applied in females and the elderly. supplementary information accompanies this paper at https://doi.org/ . /s - - -x. additional file the contribution of β w in seiarw model. additional file . sex-age based seiar model. cdc: center for disease control and prevention; seiarw: susceptible-exposed-infectious/asymptomatic-recovered-water/food; seiar: susceptible-exposed-infectious/asymptomatic-recovered; sar: secondary attack rate; ci: confidence interval; sd: standard deviation; msm: men who have sex with other men environmental drivers and predicted risk of bacillary dysentery in southwest china the global burden of diarrhoeal disease regional disparities in the burden of disease attributable to unsafe water and poor sanitation in china. b world health organ multistate shigellosis outbreak and commercially prepared food, united states spatiotemporal characteristics of bacillary dysentery from to in zhejiang province spatial-temporal pattern and risk factor analysis of bacillary dysentery in the beijing-tianjin-tangshan urban region of china the changing epidemiology of bacillary dysentery and characteristics of antimicrobial resistance of shigella isolated in china from gender and the hygiene hypothesis risk factors for shigellosis in thailand effects of ambient temperature on bacillary dysentery: a multi-city analysis in anhui province impact of meteorological factors on the incidence of bacillary dysentery in beijing, china: a time series analysis meteorological variables and bacillary dysentery cases in changsha city, china patterns of bacillary dysentery in china socio-economic factors of bacillary dysentery based on spatial correlation analysis in guangxi province spatiotemporal risk of bacillary dysentery and sensitivity to meteorological factors in hunan province investigation of key interventions for shigellosis outbreak control in china risk factors for secondary transmission of shigella infection within households: implications for current prevention policy detection of intra-familial transmission of shigella infection using conventional serotyping and pulsed-field gel electrophoresis asymptomatic salmonella, shigella and intestinal parasites among primary school children in the eastern province world health organization. foodborne disease outbreaks, guidelines for investigation and control. geneva: who an outbreak of foodborne infection caused by shigella sonnei in west bengal, india a school outbreak of shigella sonnei infection in china: clinical features, antibiotic susceptibility and molecular epidemiology a brief history of shigella prevention cfdca: shigella -shigellosis. trend and disease burden of bacillary dysentery in china global burden of shigella infections: implications for vaccine development and implementation of control strategies gene knockout technique. the effectiveness of age-specific isolation policies on epidemics of influenza a (h n ) in a large city in central south china evidence-based interventions of norovirus outbreaks in china risk of imported ebola virus disease in china simulation of key interventions for seasonal influenza outbreak control at school in changsha estimating the transmissibility of hand, foot, and mouth disease by a dynamic model transmissibility of acute haemorrhagic conjunctivitis in small-scale outbreaks in hunan province dynamics of shigellosis epidemics: estimating individual-level transmission and reporting rates from national epidemiologic data sets an -year study of shigellosis and shigella species in taiyuan, china: active surveillance, epidemic characteristics, and molecular serotyping the epidemiological influence of climatic factors on shigellosis incidence rates in korea spatial-temporal detection of risk factors for bacillary dysentery in beijing identifying high-risk areas of bacillary dysentery and associated meteorological factors in wuhan mathematical models for hiv transmission dynamics: tools for social and behavioral science research burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the global enteric multicenter study, gems): a prospective, case-control study use of quantitative molecular diagnostic methods to identify causes of diarrhoea in children: a reanalysis of the gems case-control study shift in serotype distribution of shigella species in china university of texas medical branch at galveston. ; chapter we thank the staff members in the hospitals, local health departments, and local cdcs for their valuable assistance in coordinating data collection. authors' contributions tc, bz, and zz designed the study. qc collected data. tc, zz, qc, bz, nw, yw, xx, jr, sy, mc, yw, xl, ra, lp, and ys and performed the analysis. tc, zz, nw, and mnh wrote the first draft of this paper. all authors contributed to the writing of the manuscript. the author(s) read and approved the final manuscript. availability of data and materials extra data is available by emailing to dr. qi chen ( @qq.com) on reasonable request. this effort of disease control was part of cdc's routine responsibility in hubei province, china. therefore, institutional review and informed consent were not required for this study. all data analysed were anonymized. not applicable. key: cord- - x g r c authors: yao, ye; pan, jinhua; liu, zhixi; meng, xia; wang, weidong; kan, haidong; wang, weibing title: ambient nitrogen dioxide pollution and spread ability of covid- in chinese cities date: - - journal: ecotoxicol environ saf doi: . /j.ecoenv. . sha: doc_id: cord_uid: x g r c this study aims to explore the relationship between ambient no( ) levels and the transmission ability (basic reproductive number, r( )) of covid- in chinese cities. after adjustment for temperature and relative humidity, r( ) was positively associated with no( ) concentration at city level. the temporal analysis within hubei province indicated that all the hubei cities (except xianning city) had significant positive correlations between no( ) concentration (with -day time lag) and r( ) (r> . , p< . ). since the association between ambient no( ) and r( ) indicated no( ) may increase underlying risk of infection in the transmission process of covid- . in addition, no( ) is also an indicator of traffic-related air pollution, the association between no( ) and covid- ’s spread ability suggest that reduced population movement may have reduced the spread of the sars-cov- . the covid- pandemic has highlighted the importance of international solidarity and unity in the face of a dire global health and economic crisis. the pandemic, which was first reported in december in wuhan, china, has caused , , confirmed cases worldwide as of jul , , with , cases reported in china (nhc, ) . although massive intervention measures (e.g., shutting down cities, extending holidays, and travel bans) have been implemented in china and many other countries, the spread of the disease is unlikely to be stopped worldwide in the near future. no effective vaccines or antiviral drugs have been clinically approved so far. our current understanding of the factors that impact sars-cov- transmission is still limited. environmental factors are associated with the seasonality of respiratory-borne disease j o u r n a l p r e -p r o o f epidemics (sooryanarain and elankumaran, ) . some research has investigated both indoor and outdoor environmental nitrogen dioxide (no ) pollution exposure to individuals (salonen et al., ) . previous cross-sectional and cohort research has provided evidence that ambient no exposure had longitudinal effects on growth in lung function (molter et al., ) , causing pulmonary insufficiency (e.g., lung volume, expiratory flow). in addition, previous studies have suggested that ambient no exposure may play a role in the phenotypes of respiratory diseases including but not limited to influenza (huang et al., ) , asthma (weinmayr et al., ) , and severe acute respiratory syndrome (kan et al., ) . for example, no might increase adults' susceptibility to viral infections (goings et al., ) . exposure to high levels of no before the start of a respiratory viral infection is associated with the severity of asthma exacerbation (chauhan et al., ) . recently, a european study found that % of covid- fatalities were located in five regions that showed the highest concentrations of no (ogen, ) . this finding indicates that long-term no exposure may be an important risk factor for covid- fatality. however, contini et al.(contini and costabile, ) discussed the relationships between atmospheric parameters and covid- prevalence or fatality are influenced by several confounding factors, which made difficult to interpret correlations that are not indicating necessarily a cause-effect relationship in the description study. although it's an inevitable limitation in our description study, our study aims to thoroughly explore the influence of no on covid- transmission and to try to acquire more solid results with potential confounders adjusted. in this study, we aim to assess the associations between ambient no levels and the spread ability of covid- across chinese cities, and we provide information to facilitate the further prevention and control of covid- . j o u r n a l p r e -p r o o f methods we collected covid- confirmed case information reported by the national health commission of the people's republic of china(who, ) and health commission of hubei province (http://wjw.hubei.gov.cn/bmdt/ztzl/fkxxgzbdgrfyyq/). guidelines on the diagnosis and treatment of patients were defined according to the fourth version of the guidelines (issued on january , ). the clinical criteria for diagnosis were to meet any two of the three remaining clinical criteria (i.e., fever, radiographic findings of pneumonia, and normal or reduced white blood cell count or reduced lymphocyte count in the early stage of illness). an epidemiological criterion was added (e.g., linkage with a confirmed covid- case) (nhc, ; zhang et al., ) . the population movement in cities outside hubei from the same period was obtained from baidu qianxi data (https://qianxi.baidu.com/ /), and we used migration index and travel intensity to describe the movement. we obtained hourly concentrations of various air pollutants, including sulfur dioxide (so ), no , carbon monoxide (co), ozone (o ), fine particulate matter (pm . ), and inhalable particulate matter (pm ). these data came from cities (cities in china with more than confirmed covid- cases as of february , ) and ranged from january , to february , . the data were acquired from the national urban air quality the reproductive number (r ), the average number of individuals infected by an initial infectious individual in a completely susceptible population, is fundamental to understanding disease transmission. we calculated r for chinese cities with more than cases as of february , (the covid- peak period in china), including and cities inside and outside hubei, respectively. we used the method introduced by aaron et al. to estimate r (aaron a. king et al., ) . first, we constructed a linear regression model to estimate the relevant coefficient. second, we obtained r by combining the coefficients obtained from the previous step with the average incubation and confirmation periods. we assigned the average values of the incubation period and the mean course from case infection to confirmation as and . days, respectively. these values were obtained in previous mathematical research (pan et al., ) . all calculations were completed in r software version . . (r foundation for statistical computing). mediation is a hypothesized causal chain in which one variable affects a second variable that, in turn, affects a third variable (lederer et al., ) . the relationship between no concentration and r of covid- may be mediated by population density or other air pollutants, such as city population and city area. those mediators may indirectly affect the r value of covid- by modulating the no concentration, thus affecting the spread of covid- . in this study, we used mediation analysis to explore whether these factors were j o u r n a l p r e -p r o o f mediators of the relationship between no and r of covid- , and we used bootstrapping to estimate standard error while testing the significance of these mediating effects. we conducted a cross-sectional analysis to examine the associations of no with r of covid- . we also conducted a longitudinal analysis to examine the temporal associations (with daily data points) of no with r in cities inside hubei province since the date when they had enough confirmed cases to acquire stable daily r values. the other covariates, including health policies, were quite similar throughout hubei province. when examining the correlation between no and r of covid- , we estimated the associations of no concentration with r both inside and outside hubei province (r & p) in the same period by using multiple linear regression models after controlling for temperature and relative humidity (as covariates in the regression model) separately. then, we used meta-analysis to pool the estimates of the specific associations of no concentration with r (meta χ & p). we also examined the corresponding temporal associations between no and r of covid- across the different cities inside and outside hubei province using multiple linear regression models after controlling for temperature and relative humidity separately. the change of r per μg/m increase in no pollution was calculated. given that associations between no and covid- prevalence are influenced by several confounding factors, we further examined the associations of no with the r of covid- with adjustment for density of population, gdp per capita and hospital beds per capita in the main model. among the investigated cities, the mean±standard deviation and range of no concentration and r were ( . ± . ug/m , . - . ug/m ) and ( . ± . , . - . ), respectively. the cities with the three highest r values were wuhan, huanggang, and yichang, which are all in hubei province. the similarity of the spatial distributions between r and no suggests a relationship between r and no concentration (figure ) . no matter hubei province or outside of hubei province, the daily concentration trend of no from january to march in - is almost the same, but it is obvious that the daily concentration of no in is lower than that in other years, especially after january , (figure ) , which may be due to the closure of wuhan city in hubei. j o u r n a l p r e -p r o o f the scatter diagram of r and no distributions (figure ) shows that r tends to increase with no concentration, suggesting a positive correlation between r and no concentration. the cross-sectional analysis indicates that, after adjustment for temperature and relative humidity, r was positively associated with no concentration at city level (meta χ = . , j o u r n a l p r e -p r o o f p= . ) (figure ) . additionally, we further examined the associations of no with the r of covid- adjusted for density of population, gdp per capita, hospital beds per capita separately in the main model, and we found that none of the three covariate would affect the significant positive association between no with r . in the following stratified analysis, a significant association was confirmed in cities outside hubei (r= . , p= . ), whereas the trend observed in cities inside hubei was not significant (r= . , p= . ) (figure ) . for every μg/m increase in no , r increased by . ( . - . ) and . (− . to . ), respectively. we did not find significant associations of temperature or relative humidity with r of covid- (meta χ = . , p= . and meta χ = . , p= . , respectively). the basic reproductive number r was positively associated with no (meta χ = . , p= . ) in cities outside hubei (blue points, cities, r= . , p= . , solid line) and cities inside hubei (green points, cities, r= . , p= . , dashed line). we controlled the effects from temperature and relative humidity in the multiple linear regression models. in addition, we found that r was positively associated with the average no value from - (meta χ = . , p= . ; figure a ) with adjustment for temperature and relative humidity. because the average no value from - was significantly j o u r n a l p r e -p r o o f associated with that in early (r= . , p< . ), it is difficult to determine which factor is dominant in covid- transmission. moreover, the other investigated air pollutants (so , co, o , pm . , and pm ) had no significant associations with r (meta χ < . , p> . ; figure b-f). furthermore, in order to avoid potential population movement effects in our study, which could decrease both no and r , we collected reduced population movement data from cities outside hubei in the same period. we re-calculated no -r associations including the population movement as a covariate, and we found that the no was still significantly correlated with r of covid- outside hubei (r= . , p= . ) (a) the basic reproductive number r was positively associated (meta χ = . , p= . ) with the average no value from - . (b)-(f) there were no significant associations between other air pollutants (so , co, o , pm . , and pm ) and r (meta χ < . , p> . ). we controlled the effects from temperature and relative humidity in the multiple linear regression models. we calculated the daily r values of cities in hubei (except wuhan) from january to february , (there were few covid- confirmed cases in these cities afterwards) and normalized them based on wuhan's daily r value to eliminate the effects of other covariates. we found that hubei cities (except xianning city) had significantly positive correlations between no concentration (with -day time lag) and r (r> . , p< . ), suggesting a positive association between daily no concentration and covid- spread ability on the temporal scale (figure ) . the same conclusion was reached for other time lag settings, but the most significant value was obtained with a delay of days. the results of residual analysis and principal component analysis were shown in figure s and figure s , respectively. temporal correlation between no concentration and r in cities in hubei. except for xianning, all of those cities had significant positive correlations (r> . , p< . ) between no (with -day time lag) and daily r (normalized based on wuhan's daily r ). to eliminate the effects of city population and city area on the relationship between no concentration and r value, we applied a mediation analysis to verify whether more densely populated cities had both greater r and no concentration values. after adjustment for temperature and relative humidity, the mediation analysis found insignificant direct and indirect effects of city population and city area on r (z=− . , p= . & z=− . , p= . and z=− . , p= . & z= . , p= . , respectively) . thus, there were no apparent mediation effects between city population, city area, no , and r . city population and city area did not influence the association between no concentration and r . this study explored the association between environmental factors and covid- transmission. to our knowledge, little research has been done on the relationship between ambient air pollution and covid- transmission. our results show a significant association between no exposure and r , suggesting that ambient no may contribute to the spread ability of covid- . to prevent city population and city area from affecting the relationship between no concentration and r level, we applied a mediation analysis to verify whether more densely populated cities have both greater r values and higher no concentrations. the results showed that city population and city area did not influence the association between no concentration and r level. although the closures of cities throughout hubei occurred at approximately the same time point: the other cities of hubei were locked down no longer than - days later than wuhan city, the effect of the lockdown measure in different cities (e.g. cities with busy traffic vs. small rural cities) was not expected to have the same influence on the association between j o u r n a l p r e -p r o o f no and covid- transmission. multiple impact factors (the population density of the city, the typical road traffic and commercial exchanges, etc.) may still have confounded the association in the current analysis, but we have controlled for as many factors as possible to reduce confounding and solid our results, including the density of population, gdp per capita and hospital beds per capita. previous studies also have suggested that the increased spread ability resulting from no exposure might be caused by the effects of no on host defenses that prevent viral spread (becker and soukup, ) . tm chen et al. (chen et al., ) found that exposure to no may harm to humans' health by interacting with the immune system; besides, ic mills et al. (mills et al., ) observed that short-term exposure to no had increased the hospital admission rates for a range of respiratory diseases in different age groups. therefore, we speculated no have potential ability to contribute in the infection process of covid- directly. in addition, no emissions primarily come from burning fossil fuels (diesel, gasoline, coal), resulting in automobile and smokestack exhaust, the latter of which can be produced by electricity generation. therefore, changes in no levels can be used to indicate changes in human activity and population movement due to the lockdown of cities. for example, we can see that since january , , the daily average concentration of no after the closure of wuhan is obviously lower than that of the same period in previous years ( figure ). besides, it is well known that the spread of respiratory virus is through contact (direct or indirect via fomites) or through contaminated droplets emitted by cough, sneeze, respiration and speaking of infected individuals, both of which are related with human contact, social distance and population movement. plus, no is as an indicator of traffic-related air pollution, the association between no and r of covid- may be explained by the relationship between viral spread and population movement. of course, further investigations are warranted to provide additional details and illustrate this mechanism. j o u r n a l p r e -p r o o f our study has some limitations: first, the averaging of no concentrations across cities likely resulted in an unknown degree of exposure misclassification, given the spatial variability and traffic-dependence of no and the potential for indoor exposure. second, r could be highly variable and is influenced by a variety of factors, including not only the previously mentioned mitigation efforts but also the comprehensiveness of case identification. third, for the lack of corresponding data of no, we did not explore the association between primary pollutant no and the transmission ability of covid- . given the ecological nature of this study, other city-level factors, such as the implementation ability of covid- control policy, urbanization rate, and availability of medical resources, may affect the transmissibility of covid- and confound our findings. future studies should develop individual-based models with high spatial and temporal resolution to assess the correlations between air pollution and the epidemiologic characteristics of covid- . the mechanisms between no and the transmission of covid- disease still require further research, besides, the spread of covid- could be affected by many factors. we also believe that there is likely to have interaction of environmental factors and npis, which deserves further analysis. introduction to model parameter estimation effect of nitrogen dioxide on respiratory viral infection in airway epithelial cells personal exposure to nitrogen dioxide (no ) and the severity of virus-induced asthma in children outdoor air pollution: nitrogen dioxide, sulfur dioxide, and carbon monoxide health effects effect of nitrogen dioxide exposure on susceptibility to influenza a virus infection in healthy adults acute effects of air pollution on influenza-like illness in nanjing, china: a population-based study relationship between ambient air pollution and daily mortality of sars in beijing control of confounding and reporting of results in causal inference studies. guidance for authors from editors of respiratory, sleep, and critical care journals quantitative systematic review of the associations between short-term exposure to nitrogen dioxide and mortality and hospital admissions long-term exposure to pm and no in association with lung volume and airway resistance in the maas birth cohort national health commission of the people's republic of china.diagnosis and treatment guideline on pneumonia infection with novel coronavirus assessing nitrogen dioxide (no ) levels as a contributing factor to coronavirus (covid- ) fatality effectiveness of control strategies for coronavirus disease : a seir dynamic modeling study human exposure to no in school and office indoor environments environmental role in influenza virus outbreaks short-term effects of pm and no on respiratory health among children with asthma or asthma-like symptoms: a systematic review and meta-analysis coronavirus disease (covid- ) situation report - evolving epidemiology and transmission dynamics of coronavirus disease outside hubei province, china: a descriptive and modelling study the funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing the report. the corresponding author had full access to all of the study's data and takes final responsibility for the decision to submit for publication. the authors declare no competing interests. ☒the authors declare the following financial interests/personal relationships which may be considered as potential competing interests: none key: cord- -xyowl authors: wang, lishi; li, jing; guo, sumin; xie, ning; yao, lan; cao, yanhong; day, sara w.; howard, scott c.; graff, j. carolyn; gu, tianshu; ji, jiafu; gu, weikuan; sun, dianjun title: real-time estimation and prediction of mortality caused by covid- with patient information based algorithm date: - - journal: sci total environ doi: . /j.scitotenv. . sha: doc_id: cord_uid: xyowl the global covid- outbreak is worrisome both for its high rate of spread, and the high case fatality rate reported by early studies and now in italy. we report a new methodology, the patient information based algorithm (piba), for estimating the death rate of a disease in real-time using publicly available data collected during an outbreak. piba estimated the death rate based on data of the patients in wuhan and then in other cities throughout china. the estimated days from hospital admission to death was (standard deviation (sd), days). the death rates based on piba were used to predict the daily numbers of deaths since the week of february , , in china overall, hubei province, wuhan city, and the rest of the country except hubei province. the death rate of covid- ranges from . % to % and may decrease in the future. the results showed that the real death numbers had fallen into the predicted ranges. in addition, using the preliminary data from china, the piba method was successfully used to estimate the death rate and predict the death numbers of the korean population. in conclusion, piba can be used to efficiently estimate the death rate of a new infectious disease in real-time and to predict future deaths. the spread of -ncov and its case fatality rate may vary in regions with different climates and temperatures from hubei and wuhan. piba model can be built based on known information of early patients in different countries. • the mortality rate determines whether a highly infectious disease becomes a public concern. • summarizing information after the fact does not contribute to real-time readiness to deal with the disease. • the patient information based algorithm (piba) estimates the death rate of a disease in real-time. • piba can be used to estimate the death rate of a new infectious disease in real time and to predict future deaths. a b s t r a c t a r t i c l e i n f o the mortality rate is the most important factor that determines whether a highly infectious disease becomes a public concern and carries risks causing a pandemic. different virus epidemics take place throughout the world every year, but only a few rise to the level of public concern (schlagenhauf and ashra, ; viboud and simonsen, ; who ebola response team, ) . severe acute respiratory syndrome (sars), swine influenza a h n virus (h n ), and zaire ebolavirus (ebola) brought on the public's attention because they caused many severe infections and thousands of deaths (dawood et al., ; nicholls et al., ; who ebola response team, ) . similarly, the disease covid- caused by a coronavirus ( -ncov) brought world-wide attention and caused public panic because many deaths had been reported without being put in the context of the many mild infections and its potentially low case fatality rate (chan et al., ; huang et al., ; wang et al., ; wu et al., ) . for example, despite being a common infection, influenza rarely causes public concern because even though it is common, it leads to death in only . % of cases. a variety of reports indicate that -ncov is highly infectious through multiple routes huang et al., ; wu et al., ) . while the high infection rate is certain, the mortality rate of covid- has not been definitively determined. it is reasonable to suspect that the deaths of six of the first patients ( %) in wuhan (huang et al., ) in the earliest reports by chinese scholars were inaccurate. when the initial mortality rates were reported, only patients who were critically ill were included. patients with mild symptoms, as well as those with asymptomatic infections, were not analyzed huang et al., ; wu et al., ) . case-fatality rates reported by huang et al. ( ) analyzed a skewed patient sample since it included only a small number of patients who had been transferred from other hospitals due to their critical condition. therefore huang et al.'s sample was skewed towards a concentration of severely ill patients, while the general patient population includes more patients with covid- who are asymptomatic or only have mild symptoms and who have not been hospitalized. chen et al. ( ) reported an % death rate, again based on patients with severe conditions. we have estimated the mortality rate using a patient information based algorithm (piba). the piba uses patient data in real-time to build a model that estimates and predicts death rates for the near future. piba uses data of patients identified early in the disease process to calculate the average number of days from hospitalization to death for those hospitalized. another feature is to take into account variations based on mathematical models. the piba calculation method does not divide the total number of patients on a day by the number of deaths on the same day. instead, the piba calculation method divides the number of deaths on that day by the number of possible patients of a day or days when the patients have just begun to develop the disease. thus, piba comprehensively and reasonably estimates the mortality rate based on the actual number of deaths and estimates the number of patients on a specific day. as time goes on, large amounts of data from northern and southern china have been accumulated through continuous reporting, all of which are used by piba, which then becomes more accurate as data accumulates. we conclude that it is time to utilize the accumulated data to estimate the case fatality rate of covid- infection. based on national data from the china national health center, the covid- death rate is much lower than that reported in huang et al. ( ) . holistic data covering all of wuhan, the epicenter city of covid- , also indicates a death rate lower than that reported by huang et al. these data sources cover a larger patient sample, and include patients displaying symptoms with varying levels of severity. therefore, the updated estimation of the death rate should reference these larger scale and more representative data. our study contributes to knowledge on covid- death rate by building on huang et al.'s ( ) estimation and available data from official websites and addressing the limitations with a larger and more representative sample. . . steps for estimating and predicting mortality using piba ) to collect data from the patient's initial admission to death. strive to collect data for a certain number of patients. ) to calculate the average number of days (μ) from hospital admission to the death and the number of days between one standard deviation (μ ± σ) interval and two standard deviations (μ ± σ). ) to use these parameters (μ, μ ± σ, μ ± σ) to calculate the daily mortality during the epidemic. ) to predict the mortality of infectious diseases in the future based on the calculated known mortality combined with the number of patients in a region. the predicted numbers are compared with real mortality to test and correct model data. ) to conduct following-up modification of the piba model according to different nationalities and regions. in particular, the initial patient data collected may vary significantly from country to country, one ethnic group to the other, and region to region. the calculation based on the number of deaths and the number of patients on the same day does not reflect the real death rate because most patients with covid- do not die on the same day that they entered the hospital (chan et al., ; huang et al., ; wang et al., ) . with the piba method, we recognize that the patient population size was inaccurate in the early days but trust the published information of patients who died right after covid- outbreaks. the estimation is built upon data from patients with a normal distribution model. based on information about patients in wuhan who died during the period between dec , , to jan , (huang et al., ) , two parameters were used to estimate days from onset of symptoms to death and days from admission to the intensive care unit (icu) to death. these two parameters are adopted in the estimation and prediction of covid- death rate. each parameter has five values including the mean, μ, one standard deviation from the mean, μ ± σ, and two standard deviations from the mean, μ ± σ. we collected data from covid- patients in china from three public websites. the data from the whole country are collected and made available on the official website of the health emergency office of the national health commission of the people's republic of china at http://www.nhc.gov.cn/yjb/new_index.shtml. the data from hubei province and wuhan are from the health commission of hubei province at http://wjw.hubei.gov.cn/fbjd/dtyw/. these data include the number of patients with covid- who were confirmed as having the disease, who died from the disease, whose condition was severe, and who were admitted to the hospital or icu. other collected data included daily new cases, new deaths, people who were in close contact with an infection source, and accumulated number of patients. we paid particular attention to data from wuhan, plus two additional cities in hubei province, xiaogan, and huanggang, in which the number of patients was higher than in other cities in hubei province. information from a northern province, heilongjiang province, was collected from the official website of outbreak information of the health commission of heilongjiang province at http://wsjkw.hlj.gov.cn/index. php/home/zwgk/all/typeid/ . data of heilongjiang province and harbin city were included because the province is located in the northern high-altitude zone. these data are used to assess whether the covid- is more, less, or equally likely to spread to an area with a cold climate. collected information included numbers of patients and numbers of deaths from each city and in the whole province. for any missing data in any day, a formula was used to estimate the data in that day: ni = {(n(i + j) + (n(i − j)) / (j + )} + (n(i − j), where ni = the estimated value of the missing data of the day i. j is the number of days of missing data, usually is ; in the rare case, data of two consecutive days may be missing. if the data of two days are missing, the first day will be considered as the day i, the second day n (i + ) will be calculated as n(i + ) = ni + {(n(i + j) + (n(i − j)) / (j + )}. based on the days between confirmation of covid- and the days of death in the hospital, calculated from wuhan, as mentioned in method and information from the whole country and hubei province, we tested the number of days from diagnosis to death, that most likely reflects the actual death rate. the estimated days are used to estimate the death rate using data from hubei province and wuhan city with the five values from above (μ, μ ± σ, μ ± σ). in consideration of the contribution of a variety of sources for the estimation, we fractured the data from (μ, μ ± σ and μ ± σ) into the piba and built the testing model as follows. ) m i = (d i − d i− ) / (p i−n − p i−n− ) (death rate at increments) ) m i = d i / p i−n (death rate at accumulative numbers) where m i = mortality rate, d i = the cumulative numbers of deaths on day i, p i = the cumulative numbers of patients on day i , i = the current day for calculating the death rate, n = the number of days from severe infection to death. when we considered these five partial values in normal distribution as a good indicator with a width of one standard deviation, each one of the five death rates calculated above on each day would have its own weight as the possible normal distribution (μ = . %, μ − σ = μ + σ = . %, and μ + σ = μ − σ = . %). from here, we could give the death rate for every single day just a single value that results from the weighted average of all five cohorts of patients, as defined by time from severe illness to death. the equation is as follows: where d = death rate, mμ = mortality rate with μ days, wμ = weight with μ days gap, μ = mean in normal distribution, σ = standard deviation. . . confirmation of the best estimation of the days to calculate the death rate in the other cities the same formula was then used to estimate the death rate from the other two cities in hubei province, namely xiaogan and huanggang. the piba model was developed using data from hubei province, including a. distribution of days between disease symptoms and death and between time of icu admission and death. vertical axis: days, horizontal axis: cases. b. estimated days from first symptoms to death and days from icu admission to death. c. lagging days (days from first symptoms to the day of death), μ, μ ± σ and μ ± σ and their weight (in percentages) used for the estimation of death rate in the broader patient population. note: among these values above, the lagging day μ − σ from symptom confirmation to death in panel b that equals to − has been set to . wuhan, xiaogan, and huanggang, and was further validated using data from heilongjiang province and harbin city. piba was then used to predict trends in new number of deaths. in order to further test the validity of our piba method in predicting actual mortality, we used a combination of the curve trend data and the overall mortality rate of the country, hubei, wuhan, and the rest of the country (china overall except hubei). based on our prediction of the days from actual hospitalization to death, we separately predicted the number of deaths in each day of the coming week. that is, from the comprehensive information of the number of new patients on the seventh day, the th day, and the th day before the targeted prediction day, we obtained three numbers of deaths for each of the predicted days. then from three of these numbers, the lower and upper values of the number of deaths on that day are used as the minimum and the maximum number of predicted deaths on that day, respectively. also, the same formula was used to predict the death number of a week in south korea. using information published by wuhan, we calculated the days between icu admission and death. we obtained the actual data from patients who died in the hospital in wuhan. the days from onset of symptoms to deaths ranged from to (see fig. a ). from icu intake to death, the shortest number is one day, and the longest is days. we derived two parameters, each from the death cases, i.e., the days from onset of symptoms to death and the days from inpatient admission to death. since there are six patients out of these death cases who have the same date of symptoms' appearance and inpatient, there were values in the dataset related to inpatient and values in another dataset related to symptoms' appearance (fig. a) . the results indicated that the average time from onset of the symptoms to death is days (m = , s.d. = ) (see fig. b ). accordingly, the lagging days from the day of death and their weight in the calculation of death rate were derived based on the new inpatient days (fig. c) . the prediction of death rate is based on data from wuhan city in which patients diagnosed with covid- had been confirmed since january , and where deaths had occurred, which were among the first confirmed cases of coronavirus. . . estimated death rate for the whole country and hubei province using piba formula according to our five estimation parameters, from illness (i.e., symptom appearance) to death, the maximum number of days is days. the earliest reported data in wuhan was published on january , . based on these data, we were able to calculate the mortality rate from february , , to the present. however, on february , the national health committee revised the data again (see appendix table ). because of this amendment, the number of confirmed cases appeared to have changed significantly in only one day. we chose the calculation results from february up to february (appendix table ), considering that the death rates on february and february are likely distorted by this sharp rise within a short term. fig. a through d provide information about the overall death rates in mainland china (hereafter referred to as country), hubei, wuhan, and rest of country (excluding hubei) (appendix table ). we noticed that the death rate at increments based on piba in the whole country (in blue) in fig. a is below %, with most values between . % and % in the last five days. the death rate in hubei province is similar to that of the whole country because % of the patients in the whole country were from hubei province (see appendix table ) (fig. b) . in wuhan, the accumulated death rate was still high, as much as % (fig. d ). when we used the data from the rest of the country to test our piba formula, as expected, the curve is different from the curves from hubei and wuhan. unlike in hubei and wuhan, the death rate of the rest of the country is much lower and stable, mostly lower than % (fig. c) . the predicted death rate will remain between % and % for the near future. xiaogan and huanggang are the two cities in hubei province. the number of patients with covid- in these two cities is higher than in other cities in hubei except wuhan. they also are the cities with the largest number of patients with covid- in china. we, therefore, tested the piba formula using data from these two cities. currently, the death rate based on the increment data is around %, lower than that in wuhan but higher than that in the rest of the country. however, according to piba, the rate of deaths may decrease in the near future. heilongjiang province, including its capital city, harbin, is the province outside of hubei with the largest number of diagnosed patients. harbin city is located in the northeast of china and is in the coldest area in china. no patients from harbin city or the heilongjiang province were reported during the sars epidemic period. we used the piba formula to estimate the death rate in both the heilongjiang province (fig. c) and harbin city (fig. d) . the death rate of harbin decreased sharply in the past several days, into %. the low rate of less than % will possibly remain for the future. based on the piba and the death rate of accumulated numbers, the expected final death rate of the whole country, hubei, wuhan, and rest of the country except hubei, is predicted as follows (see table ). the predicted values are from the intersection points between the incremental estimation and net values estimation. we used the predicted death rate to calculate the potential number of deaths per day in the coming week. because our initial estimation on the lagging days between inpatient and death was only based on fig. . death rate estimations of four places. the blue curve represents the mortality calculated by the actual increase in deaths per lagging day divided by the increase in actual patients on the previous corresponding day. the gray curve represents the total number of deaths per lagging day, divided by the total number of identified actual patients on the corresponding previous day. the orange curve shows the number of deaths per day divided by the total number of patients the same day. numbers on the vertical axis represent the death rate; on the horizontal axis is the date. a. the death rate of xiaogan city in hubei province b. death rate of huanggang city in hubei province. c. the death rate in heilongjiang province. d. the death rate in harbin city. patients, we, therefore, used the days of average days plus ( days) and minus one standard deviation ( days) as the range of number of deaths on a given day in the coming week (see appendix table . predicted number of deaths in the days of the coming week after february , ). as shown in fig. , the actual number of deaths in the past four days fell into the predicted range. in the country (fig. a) , hubei (fig. b) , and wuhan (fig. c) , the numbers of actual death were near the predicted minimum numbers. while, for the rest of the regions of the country except hubei, the actual death data fluctuates between the predicted maximum and minimum values (fig. c) . due to the number of newly infected patients dropping in the last few days, the total number of patients tends to be constant or even less in the coming days if unexpected events do not occur. the peaks in these figures reflect sudden changes in numbers of patients (see fig. ). we believe that the intersecting point between the trendlines could reasonably be considered one of the rates in its range of the death rate of patients infected in the future. as shown in the data above, the incidence in mainland china's provinces and cities was basically zero in late middle march. because of this, we were not able to prove the feasibility of this method in more regions in mainland china. however, because the environment, medical conditions, and population races in different countries are different, to test the usefulness of the piba model in other countries, we need to get the basic information of the initial population. this information includes the specific number of days from onset to death of a reasonable number of patients in different regions of different countries. at present, we could not access these data accurately. the only thing we can do is to test asian countries such as south korea and japan based on their ethnic similarities with populations in china. taking all aspects into consideration, we believe that south korea's data are more reliable. therefore, fig. . comparison between the predicted number of deaths based on piba and the actual number of deaths. the blue color represents the estimated minimum number of deaths line. the orange color represents the estimated maximum number of deaths line. the gray line represents the actual number of deaths. panels a, b, c, and d showed these death numbers in the country, hubei, wuhan and the rest of country except hubei. we further tested our model using the affected population in south korea. as shown in fig. , the trend of deaths in south korea in recent days is consistent with our prediction. first, piba is capable of accurately estimating the disease mortality and the number of future deaths. this real-time accurate prediction and estimation of disease mortality provide the public, government, and society with more accurate disease information. based on currently available data that includes patients with varying degrees of severity, the estimated prediction of the mortality rate of covid- is less than %, and less than the prior prediction based on limited available data. this finding may ease public concern and panic. updated scientific findings will be widely disseminated to broaden public awareness and contribute to helping fight covid- . the medical, clinical, and research community should strive to publish scientifically rigorous findings related to urgent public health issues. publishing findings based on the availability of limited data contributes to unnecessary public concern and government action. in this particular case, the first report on the estimation of coronavirus death rate is an applaudable effort. however, it also had the limitations of a skewed dataset that focused on patients who were transferred from local hospitals because of their critical condition while excluding patients with less severe symptoms who remained at local hospitals. as soon as more data are available, we should provide updated reports and introduce improved estimation and prediction algorithms. this study indicates that as the number of transmissions of ncov increases among the human population, its lethality will gradually decrease. indeed, the reasons are not necessarily all because of their reduced toxicity. there may also be improvements in treatments and implementation of early detection methods. therefore, a real-time estimate of death rate using patient information such as the piba method would demonstrate an appreciation of the importance of public and societal awareness. a critical issue to consider is that if the mortality rate of the covid- in a certain area is relatively high, the covid- in the area is still spreading and endemic. one of the most obvious questions is why the fig. . test piba model using covid- population from south korea. a. estimation of death rate in the korean population using the piba method. the blue curve represents the mortality calculated by the actual increase in deaths per lagging day divided by the increase in actual patients on the previous corresponding day. the gray curve represents the total number of deaths per lagging day, divided by the total number of identified actual patients on the corresponding previous day. the orange curve shows the number of deaths per day divided by the total number of patients the same day. the number on the vertical bar represents the death rate, number on the horizontal bar shows the date. b. comparison between the predicted number of deaths based on piba and the actual number of deaths. the blue color represents the estimated minimum number of deaths line. the orange color represents the estimated maximum number of deaths line. the gray line represents the actual deaths. mortality rate in wuhan is considerably higher than in other places. based on our assessment, wuhan's medical equipment and rescue measures are comparable with other areas in china, and the pathogenicity of the virus is similar. we conclude that there is a large proportion of patients in wuhan who have mild illness and not been hospitalized at all. due to the uncertainty of the movement of infected people in the early stages of the onset, these mildly ill people move around in wuhan unidentified. this problem reminds other parts of the world that if the fatality rate of the covid- is found to be high, a large number of infected people have not been able to be identified or diagnosed. therefore, the work of controlling and isolating this infected group has not been completed, and the disease is still spreading and circulating in the area. the data on heilongjiang province and harbin show that, unlike some experts' predictions (cf. https://news.ifeng.com/c/ uhmhxcfhmq), it will occur more intensely in the high-altitude regions with a cold climate, and the mortality rate will be higher. with the development of the generations of -ncov, its toxicity will gradually weaken, and we expect that the mortality rate in the cold northern regions will not increase, nor will it exceed that in wuhan or hubei province. our research has limitations, mainly due to available data. first, the estimation of number of patients from the date of hospital admission or icu intake to the date of death is based on data from official public websites. information from individuals was estimated. if the information had been available regarding more patients, the initial estimate would have been more accurate. the second aspect is the accuracy of the number of patients diagnosed and the number of hospitalizations per day. due to the back and forth revision and correction of the data as announced by the official sources, we are not confident that all the data are error-free; however, we feel that these data as a whole are reliable. the third limitation of the piba method is that it depends on accurate patient information at the beginning of the epidemic. depending on different situations from different countries or regions, this information may or may not be available, or the information may not be accurate. the piba model accurately predicted a case fatality of . % for symptomatic patients in china at a very early stage in the covid- pandemic. the model can be generalized to predict case fatality for any infection (including asymptomat), to predict the rate of severe disease, and to predict the death rate for patients who develop severe disease. these early, accurate predictions inform the public, society, and governments to estimate the extent of the disease's harm and to develop suitable strategies. supplementary data to this article can be found online at https://doi. org/ . /j.scitotenv. . . the authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study estimated global mortality associated with the first months of pandemic influenza a h n virus circulation: a modelling study clinical features of patients infected with novel coronavirus in wuhan lung pathology of fatal severe acute respiratory syndrome severe acute respiratory syndrome spreads worldwide global mortality of pandemic influenza a h n a novel coronavirus outbreak of global health concern ebola virus disease in west africa-the first months of the epidemic and forward projections nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study this work was partially supported by funding from merit grant i bx to wg from the department of veterans affairs and the veterans administration medical center in memphis, tn, usa and grant dduc to cg from u.s. department of health and human services, administration for community living. revise and approve and manuscript: all authors. all the data of patients in this study are from official public websites. key: cord- -mar zt t authors: su, liang; ma, xiang; yu, huafeng; zhang, zhaohua; bian, pengfei; han, yuling; sun, jing; liu, yanqin; yang, chun; geng, jin; zhang, zhongfa; gai, zhongtao title: the different clinical characteristics of corona virus disease cases between children and their families in china – the character of children with covid- date: - - journal: emerg microbes infect doi: . / . . sha: doc_id: cord_uid: mar zt t this study aims to analyze the different clinical characteristics between children and their families infected with severe acute respiratory syndrome coronavirus . clinical data from nine children and their families were collected, including general status, clinical, laboratory test, and imaging characteristics. all the children were detected positive result after their families onset. three children had fever ( . %) or cough ( . %) symptoms and six ( . %) children had no symptom. among the adult patients, the major symptoms included fever ( . %), cough ( . %), chest tightness/pain ( . %), fatigue ( . %) and sore throat ( . %). nearly % of the patients had normal ( . %) or decreased ( . %) white blood cell counts, and % ( / ) had lymphocytopenia. there were adults ( . %) showed abnormal imaging. the main manifestations were pulmonary consolidation ( %), nodular shadow ( %), and ground glass opacity ( %). five discharged children were admitted again because their stool showed positive result in sars-cov- pcr. covid- in children is mainly caused by family transmission, and their symptoms are mild and prognosis is better than adult. however, their pcr result in stool showed longer time than their families. because of the mild or asymptomatic clinical process, it is difficult to recognize early for pediatrician and public health staff. in late , an outbreak of pneumonia with unknown etiology was found in wuhan, hubei province, china. then the pathogen was isolated soon and named the novel coronavirus ( -ncov) on january [ ] and on february, the international committee on taxonomy of viruses announced that its official classification is severe acute respiratory syndrome coronavirus (sars-cov- ). the virus spread very fast in wuhan. even more unfortunate, as the chinese spring festival is approaching, aggregation of large numbers of people flow caused it to spread quickly across the country and even spread to more than countries [ ] . the current case reports are mainly concentrated in hubei province and adults, but cases of children outside hubei province are rare. meanwhile, the clinical characteristics of cases in hubei province and other provinces were significantly different. here, we report the clinical manifestations, laboratory test results, imaging characteristics, and treatment regimen of nine sars-cov- infected children and their families in jinan, shandong province to increase awareness of this disease, especially in children. a retrospective review was conducted of the clinical, lab tests, and radiologic findings for nine children and their families admitted to the jinan infectious diseases hospital identified to be nucleic acid-positive for sars-cov- from january to february . sample collection and pathogen identification after admission to the hospital, respiratory tract samples including sputum and nasopharyngeal swabs were collected from the patients, which were tested for influenza, avian influenza, respiratory syncytial virus, adenovirus, parainfluenza virus, mycoplasma pneumoniae and chlamydia, along with routine bacterial, fungal, and pathogenic microorganism tests. real-time pcr used the sars-cov- (orf ab/n) nucleic acid detection kit (bio-germ, shanghai, china) and performed refer to previous literature [ ] . all the patients were recorded with basic information and epidemiological histories [ ] including ( ) history of travel or residence in wuhan and surrounding areas or other reported cases within days of onset; ( ) history of contact with new coronavirus infection (nucleic acid-positive) days before onset; ( ) history of contact with patients with fever or respiratory symptoms from wuhan and surrounding areas, or from communities with case reports within days before onset; ( ) cluster onset, along with disease condition changes. laboratory test results were compiled, including standard blood counts, blood biochemistry, c-reactive protein (crp), procalcitonin (pct), erythrocyte sedimentation rate(esr), interleukin- (il- ) and myocardial enzyme spectrum. additional data collected included medical imaging, treatment regimens, and prognosis (any severe complications, including death), and recover or discharge date (table ) . this study was conducted in accordance with the declaration of helsinki. informed consent was waived because of the retrospective nature of the study and the analysis used anonymous clinical data. continuous data are expressed as medians and ranges, and categorical data are presented as counts and percentages. there were three boys, six girls and their families admitted to jinan infectious disease hospital of shandong university were investigated in this study. the youngest of the nine children was a pair of elevenmonth-old twins and the oldest is nine years and months old (mean age was . years, median age . years, table ). there were families were infected by sars-cov- , and adults were enrolled in this study (two patients hospitalized in another hospital). the patients consisted of males and females with a mean age of . years (median age, years [range, - years]). all nine pediatric patients came from eight families. as shown in table , six children had no information from the epidemiological data, / ( %) of the adults were infected through household contact, ( . %) was found to be infected after returning from wuhan or hubei in late january and ( . %) patients couldn't find the exact source of infection. as shown in table , / ( . %) children had normal or decreased white blood cell counts, consistent with the main characteristic of viral infection. six children ( . %) showed increased ck-mb. alt, ast and the other index of liver and kidney were all normal. all inflammation indicators, including crp, pct, esr and il- were all within the normal range. two children ( . %) showed bronchitis and one ( . %) showed bronchial pneumonia. one ( . %) boy (the older of the twins) showed pulmonary consolidation and ground glass opacity on the first day ( figure (a)) admitted in the hospital, and disappeared after five days (figure (b) ). five other ( . %) children showed no abnormal chest radiograph. all the adult patients had normal ( / , . %) or decreased ( / , . %) white blood cell counts and ( . %) have lymphopenia. there were ( . %) patients had increased crp, pct, serum amyloid a (saa), d-dimer and il- , meanwhile, their ct-scan showed larger lung consolidation. compared to children, there were only two ( . %) patients showed increased ck-mb. ferritin in the adult patients were higher than the children but most of them were normal ( / , . %). the imaging of adult chest was mix and the most common characters of imaging were pulmonary consolidation ( %), nodular shadow ( . %), and ground glass opacity (ggo, . %) (figure ) . four ( . %) adults showed normal chest imaging. at present, there are no drugs available that can target sars-cov- . therefore, treatment was focused on symptomatic and respiratory support. all the children inhaled interferon and one of the twins was prescribed ribavirin ( - mg/kg.d) in addition. ten ( . %) adults with pneumonia were treated lopinaviritonavir ( / mg, tablets, bid), interferon and chinese medicine. the patients with higher infection index (such as crp, pct, esr, saa, il- ) were prescript antibiotics for - days in addition. all the nine children and adult patients recovered in - weeks and were discharged after two negative nucleic acid tests. unfortunately, our follow up found that there were five discharged children were admitted again before we submit this article because their stool showed positive result in sars-cov- pcr. meanwhile, all their families were negative in all the specimen. coronaviruses are a large family of viruses that are known to cause illness ranging from the common cold to more severe diseases. aa an enveloped rna virus, cov is ubiquitous in humans, other mammals, and birds, which can cause respiratory, digestive, liver and nervous system disorders [ , ] . to date, six covs have been known to cause human infection [ ] . among them, two zoonotic viruses, sars-cov and mers-cov, were responsible for serious outbreaks: in china in - [ , ] of particular concern, our observations found that all the children were diagnosed after their families, which indicated that they were infected by the household contact. however, after an epidemiological investigation, we found that six adults ( . %) had a definite or suspicious contact history and six families ( . %) contacted them were infected, while the other two patients ( . %) denied any epidemiological history. among them, the father of case did not contact anyone who came back from wuhan or hubei, but also denied contact with any person with respiratory symptoms. at the same time, through official investigations, they did not find that someone was diagnosed with sars-cov- infection on the vehicle he was travelling on, prompting the virus to spread. in addition, from the official information, more and more patients can't find the clue of infection and more and more cluster outbreak showed that no contact, no close communication and even never go out the door. so, we think that these phenomena maybe suggest that: ( ) the virus spreads very strongly and the transmission of the virus may not be limited to contact, droplets and airborne transmission, and aerosol transmission may also exist, which was similar to sars [ ] . ( ) the virus may be carried asymptomatically after infecting the human body but can infect other people. in china, the sars outbreak of is still impressive, because the - sars outbreak infected individuals leading to deaths in eight affected areas [ ] . during the sars outbreak, there were less children patients and the symptoms are significantly milder in children than in adults [ ] [ ] [ ] [ ] . similarly, the official data to date suggest that children infected with the sars-cov- are relatively rare too [ ] , and their overall symptoms are significantly mild. the main reasons for this phenomenon may be: ( ) the range of activities for children is relatively small, they are mainly infected by their adult families. and, as an rna virus, the sars-cov- virus maybe also is prone to mistakes in replication, mutating, and surviving without recognition by the immune system, but can also cause a decline in virulence. so, children are infected with second or third generation or even fourth generation virus and they get milder symptoms; ( ) it may be because of differences in the immune responses of children compared to adults. one hypothesis is that the innate immune response, that is the early response that is aimed broadly at groups of pathogens, tends to be more active in children. the innate immune system is the first line of defense against pathogens. cells in that system respond immediately to foreign invaders. the adaptive immune system, by contrast, learns to recognize specific pathogens, but takes longer to join the battle. if the innate immune response is stronger in children exposed to sars-cov- , they may fight off infection more readily than adults, suffering only mild symptoms. other coronaviruses, including sars and mers, also show this pattern [ ] . ( ) the number or function of ace receptors in children is not as good as in adults. recently, one studies had investigated the role of the ace receptor and found that the sars-cov- uses the sars-coronavirus receptor ace and the cellular protease tmprss for entry into target cells [ ] . as we know, the distribution of ace receptors in different organs and populations is different. therefore, it may be that different receptor levels or functions in children and adults lead to different severity of illness. ( ) other reasons: such as children have fewer basic diseases, children smoke less, and children have strong self-healing capabilities and so on. ck-mb is an indicator of myocardial injury. in the present study, we found six children and two adults had high ck-mb, which means that sars-cov- can cause heart injury. it is reported that the main mechanisms of sars-cov- -induced myocardial injury may be the direct injury of virus, the inflammatory storm and the distribution of ace receptor [ ] . as human lifestyles change, more and more viruses are spreading across species. current research confirms that sars-cov- are transmitted from animals to humans. like other viruses, the relationship between sars-covs and humans has the following possibilities: ( ) the virus disappears for some unknown reasons, such as sars-cov. ( ) viruses coexist with humans and have seasonal onsets, such as flu influenza viruses. the first is the best outcome of the current situation, but the second possibility is very large. if, as we analyzed above, many people, especially children with mild or no clinical symptoms carry the virus but do not develop the disease, however, the virus spread very strongly, it may lead to the silent spread of the disease and leading to major losses. therefore, the chinese government will face greater risks after school starts and work resumes. and, clinicians, especially pediatricians, need to be vigilant to prevent widespread spread of the disease. children who have infected family members should be monitored or evaluated and family clustering should be reported to ensure a timely diagnosis. in addition, just before we submit, we found that five of six discharged children returned to the hospital because of positive pcr in their stool, however, their families were all negative. one girl (case ) didn't return to the hospital but isolated in home because she had mild mental symptoms after discharge. although positive results cannot confirm there were live virus in the stool or not. however, for insurance of public health, they were admitted to the hospital again to get clinical observation. interestingly, their onset was later than their families, but the period of positive pcr was longer than adults. we should pay more attention to this phenomenon and study the possible mechanism. several important limitations of this study should be noted. first, the size was small. second, the retrospective study included only of children who were hospitalized in one hospital. but as one of the rare reports in children out hubei province, it's helpful to improve the ability to recognize patients with mild illness. further studies with large multi-center samples are needed. in conclusion, by analyzing confirmed cases of covid- in jinan, shandong province, this study's findings indicate that new control measures should include rapid medical assessment and removal of the case from the home, as well as increased awareness of the importance of protective measures after symptom onset. public health measures such as home isolation should be aimed at minimizing such risk factors when addressing household transmission of serious infections spread through droplet transmission. geneva: world health organization who. coronavirus disease (covid- ) situation reports clinical characteristics of novel coronavirus cases in tertiary hospitals in hubei province national health commission of people's republic of china e e faf b ecd df .pdf?ich_args = - _ eae af a e d e ae b b _ _ c c fdec f d f a _c eab a ddd cb f f daf coronavirus pathogenesis fatal swine acute diarrhoea syndrome caused by an hku -related coronavirus of bat origin epidemiology, genetic recombination, and pathogenesis of coronaviruses infectious diseases. battling sars on the frontlines epidemiology and cause of severe acute respiratory syndrome (sars) in people's republic of china isolation of a novel coronavirus from a man with pneumonia in saudi arabia clinical features and short-term outcomes of patients with sars in the greater toronto area summary of probable sars cases with onset of illness from severe acute respiratory syndrome in children: experience in a regional hospital in hong kong clinical presentations and outcome of severe acute respiratory syndrome in children new and emerging infectious diseases the novel coronavirus pneumonia emergency response epidemiology team. the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) in china the novel coronavirus ( -ncov) uses the sarscoronavirus receptor ace and the cellular protease tmprss for entry into target cells heart injury signs are associated with higher and earlier mortality in coronavirus disease (covid- ) we thank all patients involved in the study. dr zhang and gai had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. no potential conflict of interest was reported by the author(s). this study was funded by the jinan science and technology bureau [grant number ]. the funders had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. xiang ma http://orcid.org/ - - - key: cord- -ztk l n authors: deng, j; peng, z y; wen, z x; dong, g q; xie, m x; xu, g g title: high covid- mortality in the uk: lessons to be learnt from hubei province – are under-detected “silent hypoxia” and subsequently low admission rate to blame? date: - - journal: qjm doi: . /qjmed/hcaa sha: doc_id: cord_uid: ztk l n nan and hospitals overwhelmed. then, fangcangs were built and used to manage about a quarter of all covid patients in the provincial capital city. these enabled admissions of most mildly or moderately infected patients. there, vital signs were regularly checked and, based on the chinese national guideline for covid, nasal o was supplied to those whose spo became ≤ % [but not severe enough for icu admission]. fangcangs' peak bed usage was over %. with centralised isolation and timely treatment to prevent transmission and deterioration of the infection, and with occasional transfers of patients with worsening symptoms to icu, this drastically decreased the mortality over the entire epidemic in hubei [ table ]. after a -day lockdown was lifted on april and between may and june , wuhan tested , , residents to screen "hidden" sars-cov- infections. that was virtually the entire population of the city tested when including those already tested since the outbreak in january and excluding those who had left the city during the spring festival from about january and not returned since the lockdown on january. as a result, no new cases were found, with only https://mc.manuscriptcentral.com/qjm the , confirmed cases in table is a highly reliable reflection of the epidemic in hubei after the initial chaotic statistics in january. in contrary, the uk had more time to prepare, with a medical system seemingly coped well with the already peaked pandemic. however, by july , it has suffered a population mortality rate that is considerably higher than hubei has. we should treat the ratio in table we share the above lessons and experiences learnt from hubei and would like to provoke discussion to address the paradox seen in the uk and other regions. for example, by the end of july, new york had most covid deaths in the us with , fatalities, yet a "nightingale" hospital costing $ million treated only virus patients. without effective and safe drugs and/or vaccines facing imminent covid resurgences, the hubei approaches may be worth considering. the (uk) office for national statistics. comparisons of all-cause mortality between european countries and regions per cent of oxygen-supported beds set aside for coronavirus are empty. mail online national health commission of the people's republic of china. new diagnosis and treatment scheme for novel coronavirus infected pneumonia second wave coronavirus narrative fails to hold clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study critical care crisis and some recommendations during the covid- epidemic in china on behalf of chinese thoracic society and chinese association of chest physician) ct imaging of the covid- this hospital cost $ million. it treated virus patients. the new york times key: cord- -na f yam authors: zeng, yiping; guo, xiaojing; deng, qing; zhang, hui title: forecasting of covid- spread with dynamic transmission rate date: - - journal: nan doi: . /j.jnlssr. . . sha: doc_id: cord_uid: na f yam abstract the covid- was firstly reported in wuhan, hubei province, and it was brought to all over china by people travelling for chinese new year. the pandemic coronavirus with its catastrophic effects is now a global concern. forecasting of covid- spread has attracted a great attention for public health emergency. however, few researchers look into the relationship between dynamic transmission rate and preventable measures by authorities. in this paper, the seir (susceptible exposed infectious recovered) model is employed to investigate the spread of covid- . the epidemic spread is divided into two stages: before and after intervention. before intervention, the transmission rate is assumed to be a constant since individual, community and government response has not taken into place. after intervention, the transmission rate is reduced dramatically due to the societal actions or measures to reduce and prevent the spread of disease. the transmission rate is assumed to follow an exponential function, and the removal rate is assumed to follow a power exponent function. the removal rate is increased with the evolution of the time. using the real data, the model and parameters are optimized. the transmission rate without measure is calculated to be . and . for hubei and outside hubei province, respectively. after the model is established, the spread of covid- in hubei province, france and usa is predicted. from results, usa performs the worst according to the dynamic ratio. the model has provided a mathematical method to evaluate the effectiveness of the government response and can be used to forecast the spread of covid- with better performance. on december , , a case of unexplained pneumonia was reported in wuhan. the virus was brought to hubei province and china by people travelling for chinese new year. after the outbreak of new pneumonia, the coronavirus disease was named as covid- by the world health organization on february , [ ] . in order to control and stop the spread of covid- , chinese national health commission took strict measures to lockdown the wuhan city and all transportation was suspended to prevent human-to-human contact on february , [ ] . epidemiological modeling play a vital role in the early warning and prevention of outbreaks, such as severe acute respiratory syndrome (sars-cov) [ ] , middle east respiratory syndrome (mers-cov) [ ] , ebola virus [ , ] , zika virus [ , ] and so on. until now, a lot of researches were performed on covid- [ ] [ ] [ ] . yang et al. [ ] calculated the basic reproduction number and the death rate by analyzing the data from infected people, and they found that the basic reproduction number was about . and the mean incubation period was estimated to be . days. li et al. [ ] analyzed the data for the first confirmed cases in wuhan for the purpose of investigating the epidemiologic characteristics of covid- . they found that the mean incubation period was . days ( % confidence interval, . to . ), with the th percentile of the distribution at . days. zhong et al. [ ] collected data from patients confirmed with covid- in china. through data analysis, they found that the median incubation period was days (interquartile range, to ). all studies help us better understand the covid- and find the suitable methods to prevent virus and cure individuals. researchers mentioned above focused on the clinic characteristics of covid- , while others paid attention to modelling and prediction, which could also provide reference for the management of anti-virus. a majority of researchers [ , ] modelled and reproduced the spread process of virus using the original or modified sir and seir models. natsuko et al. [ ] estimated the potential number of novel coronavirus cases in wuhan. from results, there were a total of cases of covid- with onset of symptoms by th january . based on the susceptible-exposed-infected-removed (seir) compartment model, zhou et al. [ ] found that the basic reproduction number ranged from . to . with the help of dataset reported on the people's daily in china. the predicted value fell between . and . . xiong et al. [ ] analyzed the infected population and spread trend of covid- under different policy with the help of seir model, and they found that the epidemic spreading was dominated by the quarantine rate and starting date of intervention. ming et al. [ ] explored the effect of covid- on healthcare system using mathematical modelling, and found that if there was no effective intervention, the healthcare system burden would be increased with the increased confirmed cases. however, few researchers have taken the dynamic transmission rate into consideration because of varied preventable measures by authorities. in reality, after actions taken by authorities, cities are in lockdown and individuals need to stay at home, resulting in the decrease of the transmission rate. in this paper, the spread of covid- is divided into two stages: before and after intervention. before intervention, the transmission rate is assumed to be a constant since individual, community and government response has not taken into place. after intervention, the transmission rate is reduced dramatically due to the societal actions or measures to reduce and prevent the spread of disease. the transmission rate is assumed to follow the exponential function. in this paper, the original and modified seir models are briefly introduced in terms of the transmission rate and removal rate in section . in section , based on the least square method, the improved model is optimized by considering accumulated number of infected individuals and daily new cases. then we compare and discuss the performance between the original and modified models. using the modified model, the spread of covid- in hubei province, france and usa is predicted and compared. conclusions are made in the last section. the original seir (susceptible exposed infectious recovered) model is widely used to predict the spread of epidemic disease. s is the susceptible individuals, and the susceptible individuals s have a probability β to enter the exposed class e after they meet individuals with epidemiological virus by close contact. after some days without any obvious features (the incubation period), some exposed individuals e have a probability of α to show some characteristics of epidemic disease, that is infected individuals. infected individuals i either recover or die, which will be removed from the system. the rate to remove is called γ. the following formula describe the spread process of epidemic disease: where r is the average number of contacts per person per day. in the case of covid- , individuals s enter the exposed class e because of exposed individuals e and infected individuals i. however, the original model does not consider the fact that exposed individuals have an ability to infect susceptible individuals, which is one of the most important factors for epidemic disease spread. in the modified model, the individuals are affected by exposed individuals e with a probability of β and infected individuals i with β . where r is the average number of individuals with whom an infected individual is confronted by close contact. r is for the exposed individuals. in order to prevent the spread of epidemiological virus, authorities took various measures, such as to lock down city, to control traffic, to wear facemask, to educate the public on the knowledge of the disease, to require potential patients to stay in hospital or stay at home, and so on. the implemented measures can control and stop the spread of virus and save the life. on the other words, after measures, the transmission rates β and β begin to decrease. in our model, we assume that the transmission rate following the following formulation: where τ is the time when the measures are taken to intervene the virus and k is a constant value to control the transition rate. the removal rate in the original model is kept as a constant. however, in reality, as the time progresses, medicine and therapy for curing patients are found, the death rate is reduced gradually. the removal rate will increase with the passage of the time. at the same time, from the data collected, we can find the recovery rate is larger than the death rate. therefore, the following formulation is proposed to show the relationship between the removal rate () t  and time: where a and b are constant values in the model. the basic reproductive number r measures the average number of secondary people infected by a primary patient in a pool of mostly susceptible individuals in absence of controlling measures [ ] and it is a parameter to estimate the epidemic spread in a sealed group [ ] . for any initial level of epidemic disease, it is going to disappear from the population in the infected area when r is smaller than . r is larger than , which implies that disease is spreading in the population [ ] . there are many ways [ , ] to calculate the r in terms of formula derivation and model fitting. in our model, the basic reproduction number r is estimated by the formula of r = β/γ for the purpose of simplifying the model. furthermore, because of measures taken by authorities, the basic reproduction number r may vary with the passage of the time. by considering transmission rate β(t) and removal rate γ(t), the effective reproduction number r e (t) is estimated by the following formulation: in order to simplify the model, some hypotheses are made, as shown in fig. : ) the exposed individuals and infected individuals have same probability to infect susceptible individuals, that is β =β ; ) there is no pedestrian flow between hubei and outside hubei, and covid- spreads in the corresponding area; ) removed individual from the system has no ability to infect others; ) the transmission rate β is assumed to follow an exponential function considering the fact that fewer individuals are infected after measures are in placed; ) the removal rate γ is supposed to follow a power exponent function, and the removal rate increases as the time processes due to the better treatment. ) the basic reproduction number r changes with the time because of measures taken by authorities. the effective reproduction number is calculated by r e (t)= β(t)/γ(t). the national health commission of the people's republic of china published the accumulative number of the confirmed cases and daily new cases on the official website. by the use of r package [ ] , we acquired the datasets of hubei province in respect of accumulative confirmed cases and daily new cases from january to march , . for out of hubei province, the first case was reported on january . the data of outside hubei province in china was obtained from january to march , . the interest is shifted to the possible range of parameters in the model. first of all, the number of infected patients is needed to be confirmed. according to datasets from the national health commission of china and reports by the health commission of hubei province, the number of infected people was in hubei province and the first day was set up as january . for the data out of hubei province, january is set up as the first day. furthermore, the number of infected individuals on the first day is . for the initial exposed individuals, it is difficult to estimate the number of exposed individuals due to medical techniques. some researchers thought that the rate (infected: exposed) is about %. according to the experience, the rate is set to be %. initial exposed number of hubei province is therefore set to be and it is equal to for outside hubei province. no individual recovers from the disease on the first day. all initial removal individuals are therefore set to be zero. there is no way or method to define the average number of individuals who an infected individual meets, so r is set to be a value ranging from to according to the value used in sars. because the exposed individuals have a big opportunity to meet others by close contact because of the incubation period. r is set to be within [ , ] . at last, the transmission rate from the exposed individuals to infected individuals α is set from to . . in addition, with the help of the least square method, other unknown parameters are optimized to fit the real data, which is shown in table . because there is a big difference between hubei and outside hubei provinces in respect of control measures, we estimate the accumulated infected individuals and effective reproduction number r e in terms of hubei and outside hubei provinces. based on the dataset, the model is fitted by simulation. fig. and fig. show the fitted and predicted data of hubei and outside hubei provinces, respectively. from fig. , the spread of epidemiological virus is controlled at the start of march because of fewer new cases in hubei. it is also found that on january , there are fewer infected individuals from fig. . furthermore, the fitted parameters are obtained and shown in table . firstly, the transmission rate without measures β is equal to . and . for hubei and outside hubei provinces. thus the virus in hubei has a larger probability to infect other individuals. the same rule is suitable for the transmission rate from the exposed individuals to infected individuals α. fig. shows the effective reproduction number r e with passage of the time in hubei and outside hubei provinces, respectively. from fig. , r e is decreased slightly before january due to the increase of the treatment. then r e declines suddenly because authorities implement several control measures and individuals are alerted to prevent epidemiological virus. measures and treatments result in decreasing effective reproduction number with the passage of the time. fig. shows that virus is about to fade way in may in hubei province. also from fig. , virus outside hubei province is going to die out earlier due to secondary or third generation of virus. the results from the original and modified models are compared in terms of r e . from fig. , r e from the original model is larger than that from the modified model. r e is decreased slightly as the time progresses, which cannot perform well because of preventable measures taken by authorities. the daily new cases are compared in hubei and outside hubei provinces, respectively. from fig. and fig. , it is found that daily new cases by the original model are smaller than those by the modified model at the early period of virus spread. underestimation in virus contributes to less attention to the virus, which can result in large damage and casualties for any countries and regions. fig. . it means that the virus is under control. in fig. , the label (actual infected population) means the number of cases confirmed with covid- in reality. it is found that, in real life, usa performs the worst. from fig. , the spread of covid- in hubei province is under control firstly. in this paper, a modified model is developed to better predict the spread of covid- considering the dynamic change of the control measures and treatment. in our model, the transmission rate β is assumed to follow exponential function by considering the fact that fewer individuals are infected after measures to prevent the virus spread. then the removal rate γ is supposed to follow a power exponent function. the removal rate is increased with time because of better cure for disease. based on real data, we optimize the model parameters using the least square method. transmission rate without measures β is equal to . and . for hubei and outside hubei. the results from the original and modified models are compared in terms of effective reproduction number r e and daily new cases. it is found that daily new cases obtained by the original model are smaller than those by the modified model at the starting spread of epidemiological virus. fewer infected individuals contribute to less attention to the virus, which may result in large damage and casualty. furthermore, the model is used to evaluate the coronavirus spread of hubei province, france and usa. usa performs the worst according to the ratios. the model has provided a mathematical method to evaluate the effectiveness of the government response and can be used to forecast the spread of covid- with better performance. nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study a discrete epidemic model for sars transmission and control in china synthesizing data and models for the spread of mers-cov, : key role of index cases and hospital transmission ebola virus disease in west africa-the first months of the epidemic and forward projections ebola control: effect of asymptomatic infection and acquired immunity estimate of the reproduction number of the zika virus outbreak in barranquilla, colombia, and estimation of the relative role of sexual transmission zika virus epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study epidemiological and clinical features of the novel coronavirus outbreak in china, medrxiv epidemiological and clinical features of primary herpes simplex virus ocular infection early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia clinical characteristics of coronavirus disease in china preliminary prediction of the basic reproduction number of the wuhan novel coronavirus -ncov estimating the potential total number of novel coronavirus cases in wuhan city simulating the infected population and spread trend of -ncov under different policy by eir model breaking down of healthcare system: mathematical modelling for controlling the novel coronavirus ( -ncov) outbreak in mathematical models in population biology and epidemiology the basic reproductive number of ebola and the effects of public health measures: the cases of congo and uganda global properties of sir and seir epidemic models with multiple parallel infectious stages estimating the basic reproductive ratio for the ebola outbreak in liberia and sierra leone an r package and a website with real-time data on the covid- coronavirus outbreak, medrxiv the authors declared that they have no conflicts of interest to this work. key: cord- -enivzp f authors: lio, chon fu; cheong, hou hon; lei, chin ion; lo, iek long; yao, lan; lam, chong; leong, iek hou title: the common personal behavior and preventive measures among uninfected travelers from the hubei province, china during covid- outbreak: a cross-sectional survey in macao sar, china date: - - journal: peerj doi: . /peerj. sha: doc_id: cord_uid: enivzp f background: the novel coronavirus diseases (covid- ) caused over . million confirmed cases and cumulative mortality up to over , deaths worldwide as of april . a total of macao citizens were obligated to stay in hubei province, china, where the highest covid- prevalence was noted in the country and a “lockdown” policy was implemented for outbreak control for more than one month. they were escorted from wuhan city to macao via a chartered airplane organized by macao sar government and received quarantine for days with none of the individual being diagnosed with covid- by serial rna tests from the nasopharyngeal specimens and sera antibodies. it was crucial to identify common characteristics among these uninfected individuals. methods: a questionnaire survey was conducted to extract information such as behavior, change of habits and preventive measures. results: a total of effective questionnaires were analyzed after exclusion of infants and children with age under fifteen as ineligible for the survey and missing of one questionnaire, with a response rate of . % ( out of ). the proportion of female composed more than % of this group of returners. the main reason for visiting hubei in . % of respondents was to visit relatives. over % of respondents did not participate in high-risk activities due to mobility restriction. all ( %) denied contact with suspected or confirmed covid- cases. comparison of personal hygiene habits before and during disease outbreak showed a significant increase in practice including wearing a mask when outdoor ( . % and . %, p < . ) and often wash hands with soap or liquid soap ( . % and %, p = . ). the novel coronavirus diseases caused by the severe acute respiratory syndrome coronavirus (sars-cov- ) caused over . million confirmed cases and cumulative mortality up to over , deaths worldwide as of april (world health organization (who), b), provided with its early transmission dynamic of human-human transmission among close contacts . it is estimated in a model that covid- would have resulted in . billion infections and million deaths globally in in the absence of any intervention (walker et al., ) . wuhan city, the capital of hubei province in china, became the first outbreak center of covid- since december (phelan, katz & gostin, ) . during the chinese new year holidays, chinese people have the traditional habit of traveling to their hometowns for a family reunion and gathering to celebrate the beginning of the lunar new year. hence, many people, including groups of macao citizens, were obligated to stay in hubei province after the announcement of "lockdown"/sanitary cordons by the local government on january , that is, days before chinese new year. it was not until march that the macao sar government escorted a special team to wuhan, china to pick up macao citizens from families, who stayed in different cities in hubei province (macao sar government portal, b) . covid- was ruled out in all of them afterwards. a cross-sectional survey was conducted to have in-depth questionnaire interview of these people who were all uninfected by sars-cov- in a high-risk area, hubei province, china. this study aims to identify the common grounds and personal behavior leading to a zero-infection rate among participants that might provide crucial hints on global covid- pandemic control. a citizen who presented with body temperature equal to or greater than . degree celsius in hubei province was not allowed for boarding. after arrival to macao, all citizens were sent to public health clinical center for a -day quarantine. a total of three serial nasopharyngeal swabs were obtained on day , day and day for viral rna detection by real-time rt-pcr techniques, which were all negative ( %) (macao sar government portal, a) . sera antibodies of sar-cov- were tested with all negative results ( %) on day before citizens released from quarantine. all citizens did not complain any symptoms during quarantine period. a questionnaire was designed to obtain demographic information, activity in hubei province, contact history, personal health behaviors such as habit of handwashing, mask usage and home cleaning. participants aged or over were eligible for this study. the questionnaire survey was delivered to the isolation ward and was implemented by self-administration. the written consents were collected as digital format. infants and children with age under fifteen were considered ineligible for this survey. this study was approved by the hospital medical ethical committee of centro hospitalar conde de são januário, macao sar, china. descriptive statistic was used to summarize demographic information, high-risk activities and common preventive measures via standard parameter such as percentage, mean and median. then we compared behavior changes before and during covid- outbreak using wilcoxon signed rank test in continuous variables or mcnemar test in dichotomized variables. the statistical significance level was determined at a = . . the statistical analysis was conducted using r (version . . , r core team, ). a total of effective questionnaires were analyzed in final after exclusion of infants and children with age less than years old and missing of one questionnaire (response rate: . %). the demographic information was summarized in table . the majority of the participants aged between and years old ( . %) and had received secondary education or above ( . %). the proportion of female composed more than % of this group of returners. the most common comorbid diseases were hypertension ( . %), followed by diabetes mellitus ( . %) and hepatitis ( . %). more than half of the respondents were non-smokers ( . %). the main reason for visit hubei is to visit relatives ( . %). more than percent of participants thought the most important reason of not getting covid- was to keep distance ("stay away") from the crowd and decrease cluster or gathering incidence, followed by good personal protective measures ( . %). mobility and participation of high-risk activities were restricted for these participants in hubei province according to the emergency response policy and these were specified by these respondents (table ) : . % of them did not visit crowded places; . % of them did not use any public transportation; . % did not go to any supermarket. about three-quarters of respondents received daily supply at home via unified delivery. none of them visited or traveled to other provinces or cities ( %). all the participants ( %) denied any contact with suspected or confirmed covid- patients while . % of the participants stated there was confirmed covid- cases in their local community. a further survey of comparison of personal preventive measures before and during disease outbreak showed increased alert and practice of personal protection and hygiene during the spread (table ) , such as wearing a mask when outdoor ( . % and . %, p < . ), wearing a mask every time when contact or talk with people ( % and %, p < . ), often wash hands with soap/liquid soap ( . % and %, p = . ), use of alcohol-based hand sanitizers or disinfected wipes as substitute if handwashing facility not available ( . % and . %, p = . ), cleaning clothes and personal belongings immediately once get back home ( . % and . %, p < . ), cleaning mobile phone regularly ( . % and . %, p = . ). only . % of respondents attend meal gatherings regularly during the spread compared to . % before (p < . ). the increase in personal measures is significant and may possibly reflect the effectiveness of public health interventions. the aims of this research was to investigate the reasons that contributed to the negativeness of covid- in this high-risk population in hubei province. on the one hand, good physical health could be one factor, as the majority of participants were below the age of ( . %), non-smokers ( %) and . % had no underlying chronic diseases. however, further studies are needed to determine the exact effect of physical health on the risk of covid- infection. on the other hand, it was also important to stop the transmission chain via political measures or personal health behaviors. on january ( days before the chinese new year), the china government imposed a "lockdown" in wuhan and other cities in hubei to quarantine this center, which is commonly referred to as the "wuhan lockdown" (health-commission, ). all public transport, including buses, railways, flights and ferry services were suspended with all stations and airports closed. the residents of wuhan were not allowed to leave the city without permission which was unprecedented in public health history. besides, measures on social aspects including the ban on massive gatherings such as concerts or competitions, close of entertainment venues and public facilities, schools closure and mandatory orders of wearing masks in public areas, were applied to mitigate the outbreak by controlling the source of infection and block transmission routes (pan et al., ) . as a result, the respondents of our study reported the highly restricted mobility in wuhan, china. a total of . % of them denied visiting crowded places which required high self-discipline and other public measures to cooperate. to achieve this level of mobility restriction, local authority organized a team of volunteers to facilitate the delivery of foods and other supply to each home quarantine family (chinanews.com, ) , . % of participants received essential materials via this method that decreased the chance of outdoor activity and interaction with other. nonetheless, % of respondents said that table high-risk activities and daily supply conditions among respondents during covid- outbreak in early in hubei, china. "staying away from crowds" was the major reason to be not infected. moreover, there were emerging evidence suggesting these "lockdown" measures had certain roles on decreasing covid- incidence (colbourn, ; gostin & wiley, ; klompas et al., ; phelan, katz & gostin, ; the lancet respiratory medicine, ). it was estimated that the wuhan travel ban delayed the epidemic progression by - days in mainland china, (chinazzi et al., ; tian et al., ) while reducing case importations to other countries by nearly % through mid-february (chinazzi et al., ) . furthermore, the rates of confirmed cases and the effective reproduction number (rt), that is, the mean number of secondary cases generated by a typical primary case at time t in a population, declined since january , and fell below . since february , in a recent investigation (pan et al., ) . although intensive physical distancing and "lockdown" could help "flattening the curve" on covid- and preventing the sharp upward demand of health system capacities, the consideration of social and economic effects of "lockdown" and knock-on effects on health such as mental health and interpersonal violence is necessary (parmet & sinha, ). yet, our data showed that over half of the participants ( . %) felt "calm" during stay in hubei province, which was somehow counterintuitive. we hypothesized that the provision of sufficient logistic support to the isolated families by local authorities and clear information delivery to the public during a "lockdown" will help to ease the stress and minimize subsequent psychological impact (brooks et al., ) . therefore, local governments should be advised to create a comprehensive strategy and to prudentially evaluate the following concerns including racisms, adequate explanations to the public about the rationale and upside, logistic power and resources, and cultural factors which may hinder the compliance before implementing large-scale mobility restrictions (parmet & sinha, ) . the administration of "lockdown" could even lead to precarious situations that could heighten transmission in some countries if corresponding supports are not tailor-made and comprehensive based on their own economic and social conditions, such as workers may be packed in state-run shelter during india "lockdown" (pulla, ) . likewise, the announcement of closing the gambling industry during the first half of february in macau was accompanied with foreseen policies of financial and resources supply could be one of the references of administration of any kind of measures (macao sar government portal, c) . additionally, the significant behavior changes among participants before and during outbreak consisted of more wearing a mask outdoor, wash hands more frequently, clean and disinfect home more frequently, and less meal gatherings. although the transmission of sars-cov- was commonly believed via droplet and contact, no evidence of wearing a surgical mask alone by healthy persons can prevent them from infection with respiratory viruses including covid- currently while inappropriate use/disposal may even increase risk (world health organization (who), a). however, none of the participants in our study agreed that it was less important to wash hands after wearing masks, and all of them ( %) believed that the incidence of accidental touching the face or nose after wearing a mask would be reduced. the effectiveness of personal protective measures in preventing pandemic influenza transmission by meta-analysis showed a significant protective effect of hand hygiene but mixed results for mask use and thus wearing mask was suggested to be applied alongside with hands hygiene (saunders-hastings et al., ) . wearing mask might also act as a "symbolism" on increasing individual awareness of good hygiene practice (klompas et al., ) . however, the universal mask-wearing scheme in public should be emphasized on the concurrent hand hygiene practice and social distancing as a bundle, while the allocation and availability of resources should be taken into account first to ensure adequate protection for healthcare workers (emanuel et al., ) . there were some limitations in this study. the e.l.i.z.a kits used for antibody detection were qualitative and not able to provide titers information. although the sample size of this questionnaire was limited and recall bias was inevitable, its implication may indirectly reflect the effectiveness of public health interventions in wuhan, china, including sanitary cordon, traffic restriction, social distancing, home confinement, centralized quarantine and universal symptom survey. such interventions were aimed at preventing individuals from face-to-face interaction and preventing asymptomatic covid- patients from spreading the coronavirus within the community. the lack of infected citizens limits for further comparison of difference of measures or behavior and further studies are warranted to determine the effectiveness of each preventive measure on covid- at the individual level. moreover, some of the participants had stayed in their relative home where the cleaning duty was not their responsibilities. hence the question of home cleaning might partially reflect the attitudes from their relatives/friends. our findings were in line with common preventive measures advised by the world health organization. good personal hygiene and adequate preventive measures such as less gathering, frequent handwashing, in addition to wearing a mask outdoor, were common grounds among uninfected participants during the stay in hubei province under covid- outbreak. furthermore, the success of the "lockdown" and self-quarantine policy in hubei province could contribute to the local authority's strong logistical provision and transparency of information about the policy's rationale in order to maintain better mental health and thus increase compliance and efficacy of preventive measures. iek hou leong conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. the following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers): the medical ethical committee of centro hospitalar conde de são januário, macau sar, china, granted ethical approval to carry out the study within its facilities. the following information was supplied regarding data availability: the raw measurements are available in the supplemental file. supplemental information for this article can be found online at http://dx.doi.org/ . / peerj. #supplemental-information. the psychological impact of quarantine and how to reduce it: rapid review of the evidence unified purchase of daily supplies by regular assessment in wuhan community the effect of travel restrictions on the spread of the novel coronavirus (covid- ) outbreak covid- : extending or relaxing distancing control measures fair allocation of scarce medical resources in the time of covid- governmental public health powers during the covid- pandemic: stay-at-home orders, business closures, and travel restrictions control strategies for novel coronavirus infections in wuhan city universal masking in hospitals in the covid- era early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia macao residents return from hubei released from quarantine macao residents return home safely from hubei the chief executive of macau sar announced: gambling industry will be suspended for half month and people should stay at home to avoid from covid- association of public health interventions with the epidemiology of the covid- outbreak in wuhan covid- -the law and limits of quarantine the novel coronavirus originating in wuhan, china: challenges for global health governance covid- : india imposes lockdown for days and cases rise r: a language and environment for statistical computing. vienna: r foundation for statistical computing effectiveness of personal protective measures in reducing pandemic influenza transmission: a systematic review and meta-analysis covid- : delay, mitigate, and communicate an investigation of transmission control measures during the first days of the covid- epidemic in china the global impact of covid- and strategies for mitigation and suppression the-community-during-home-care-and-in-health-care-settings-in-the-context-of-the-novel covid- ) situation report- we thank dr. tan fong cheong and ms. hong lei lou for their assistance in data collection and coordination. the authors received no funding for this work. the authors declare that they have no competing interests. chon fu lio conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. hou hon cheong conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. chin ion lei conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft.iek long lo conceived and designed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. lan yao conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, and approved the final draft. chong lam conceived and designed the experiments, analyzed the data, authored or reviewed drafts of the paper, and approved the final draft. key: cord- - md j e authors: li, xin; lu, peixin; hu, lianting; huang, tianhui; lu, long title: factors associated with mental health results among workers with income losses exposed to covid- in china date: - - journal: int j environ res public health doi: . /ijerph sha: doc_id: cord_uid: md j e the outbreak and worldwide spread of covid- has resulted in a high prevalence of mental health problems in china and other countries. this was a cross-sectional study conducted using an online survey and face-to-face interviews to assess mental health problems and the associated factors among chinese citizens with income losses exposed to covid- . the degrees of the depression, anxiety, insomnia, and distress symptoms of our participants were assessed using the chinese versions of the patient health questionnaire- (phq- ), the generalized anxiety disorder- (gad- ), the insomnia severity index- (isi- ), and the revised -item impact of event scale (ies- ) scales, respectively, which found that the prevalence rates of depression, anxiety, insomnia, and distress caused by covid- were . %, . %, . %, and . %, respectively. multivariable logistic regression analysis was performed to identify factors associated with mental health outcomes among workers with income losses during covid- . participants working in hubei province with heavy income losses, especially pregnant women, were found to have a high risk of developing unfavorable mental health symptoms and may need psychological support or interventions. at the end of december , the chinese city of wuhan reported a novel pneumonia caused by coronavirus disease (covid- ) , an infectious disease caused by an acute severe respiratory syndrome coronavirus, which is rapidly spreading both domestically and internationally [ , ] . on january , the world health organization (who) held an emergency meeting and declared the worldwide covid- outbreak a public health emergency of international concern [ ] . the emergence and rapid increase in the number of covid- cases has posed and continues to pose complex challenges for global research, public health, and medical communities [ , ] . as of june , there were more than . million confirmed cases of covid- across more than countries and regions, including more than , deaths. with the rapid spread of covid- , the local government in wuhan immediately adopted a city closure policy, encouraging citizens to work at home and teach online, and shut down non-essential services to mitigate the impact and risks of the disease. then, the governments of other provinces with low numbers of infected people in china and many other countries around the world entered states of emergency for the health response and issued a series of policies, including ordering citizens (regardless of having symptoms of infection or not) to self-isolate at home, and maintaining social distance from other people. however, concerns have arisen about the potential psychological impact of these measures [ ] [ ] [ ] . studies proved that covid- has caused a high prevalence of mental health problems in china [ ] [ ] [ ] [ ] [ ] and other countries around the world [ ] [ ] [ ] [ ] . some researchers have attempted to understand the outbreak of this novel coronavirus from a global health perspective [ ] [ ] [ ] . however, most studies focused on the psychological effects of people who were infected with covid- , medical workers, or people in specific regions [ ] [ ] [ ] [ ] [ ] [ ] ] . studies showed that the economic impact caused by severe acute respiratory syndrome (sars) will produce psychological morbidities in individuals who are directly or indirectly exposed to life-threatening situations [ ] . the occurrence of such psychological morbidities among workers can impact their daily functions and lead to immediate economic and physiological consequences, such as lost job productivity, depression, and anxiety [ , ] . to the best of our knowledge, no previous study focused on mental health problems among people with income losses caused by covid- . to address this gap, the aim of our study was to evaluate the mental health of chinese workers with income losses exposed to covid- by quantifying the degrees of depression, anxiety, insomnia, and distress, and analyze the potential risk factors related to these symptoms. in this study, besides age, sex and other demographic characteristics, participants from hubei province and outside hubei province were taken as the research objects for comparison of regional differences. the ultimate goal of this study was to assess the mental health burden of people with income losses during covid- and to provide guidance for the promotion of mental well-being among this population. this was a cross-sectional study conducted using an online survey and face-to-face interviews to assess mental health problems and their associations with income losses among chinese citizens who were exposed to coronavirus disease (covid- ) from april to may . eligibility criteria included (i) currently living in china, (ii) aged years or older, and (iii) with income losses caused by covid- . participants were encouraged to participate in online surveys or complete offline questionnaires. a total of of contacted individuals completed the survey for a participation rate of . %, and people with no loss of income were excluded from the study. the final sample included respondents, with a response rate of . %. this study was approved by the ethics committee and institutional review board of wuhan university, wuhan, china (ref: ), and conducted in accordance with the ethical guidelines of the declaration of helsinki of the world medical association. all data were deidentified before being provided to the investigators. consent from each participant was obtained at the beginning of the survey. the questionnaire consisted of factors to record demographic indicators and symptoms of depression, anxiety, insomnia, and distress caused by covid- of the participants (see appendix a). the following demographic data were included in this study: sex (male or female), age ( - , - , - and > years old categories), educational level ( % to %, - %, and > % less than pre-epidemic income, respectively), and place of residence (urban or rural). mental disorders, including depression, anxiety, insomnia, and distress, caused by covid- were assessed in our study by chinese versions of validated measurement tools [ ] [ ] [ ] [ ] : the patient health questionnaire- (phq- ; the total score ranged from to ) [ ] , the generalized anxiety disorder- (gad- ; the total score ranged from to ) [ ] , the insomnia severity index- (isi- ; the total score ranged from to ) [ ] , and the revised -item impact of event scale (ies- ; the total score ranged from to ) [ ] . the response options are: = nearly every day, = more than half the days, = several days, and = not at all for phq- and gad- ; = always, = often, = sometimes, = rare, and = never for isi- and ies- . the total scores of these survey scales are interpreted as follows: phq- , extremely severe ( - ), severe ( ) ( ) ( ) ( ) ( ) ( ) ( ) , moderate ( ) ( ) ( ) ( ) ( ) , mild ( ) ( ) ( ) ( ) ( ) , and normal ( - ) depression; gad- , severe ( ) ( ) ( ) ( ) ( ) ( ) ( ) , moderate ( ) ( ) ( ) ( ) ( ) , mild ( ) ( ) ( ) ( ) ( ) , and normal ( - ) anxiety; isi- , severe ( - ), moderate ( ) ( ) ( ) ( ) ( ) ( ) ( ) , subthreshold ( ) ( ) ( ) ( ) ( ) ( ) ( ) , normal ( - ) insomnia; and ies- severe ( - ), moderate ( ) ( ) ( ) ( ) ( ) ( ) ( ) , subthreshold ( ) ( ) ( ) ( ) ( ) ( ) ( ) , and normal ( - ) distress. the cutoff score for detecting possible major symptoms of depression, anxiety, insomnia, and distress caused by covid- are , , , and , respectively. a higher score indicates participants with greater self-reported severe symptoms [ ] [ ] [ ] [ ] . the psychometric properties and internal reliabilities of the scales have been previously confirmed in chinese populations [ ] [ ] [ ] [ ] . in [ ] , statistical tests were performed to determine the reliability and validity of phq- . results showed that the internal consistency value of phq- was . and the test-retest reliability value of phq- was . , proving the phq- is a valid and reliable tool to evaluate depression in chinese people. he [ ] tested the reliability and validity of chinese version of gad- . the results show that the cronbach 'α coefficient of gad- is . , and the test-retest reliability coefficient is . , proving the chinese version of gad- has good reliability and validity in the application of evaluating anxiety. doris s.f. yu [ ] tested the reliability and validity of chinese version of isi- , finding that cronbach's alpha of the chinese version of the isi- was . , with item-to-total correlations in the range of . - . . in [ ] , chan reported that the cronbach 'α coefficient of ies-r is . , which proved the ies-r is a valid and reliable tool to evaluate distress among chinese people. in our study, the cronbach's alpha coefficient of our questionnaire is . . the cronbach's alpha coefficients of the chinese versions of phq- , gad- , isi- and ies- were . , . , . and . , respectively. first, we used descriptive statistics to describe the socio-demographic characteristics of these participants. second, the prevalence rates of depression (phq- score ≥ ), anxiety (gad- score ≥ ), insomnia (isi- score ≥ ), and distress (ies- score ≥ ) were estimated. finally, multivariable logistic regression models were used to explore factors associated with depression, anxiety, insomnia, and distress among workers with income losses exposed to covid- in china, and the associations between risk factors and outcomes are presented as adjusted odds ratios (aors) with a % confidence interval (ci), after adjustment for confounders, including sex, age, marital status, educational level, working position, place of residence, degrees of income losses. data analysis was performed by spss statistical software (version . , ibm corp., armonk, ny, usa,), with p-values < . indicating statistical significance. the significance level was set at α = . , and all tests were two-tailed. as shown in table , the proportion of men to women was close, at . % and . %, respectively, and the proportion of marital status (recoded into married and other including unmarried, widowed, and divorced) was similar to that of sex, at . % and . %, respectively. we classified their income losses caused by covid- as one of the demographic variables. response options were slightly affected (> % to %), moderately affected ( - %), and heavily affected (> %). table shows that the proportions of light, middle, and heavy income loss (> % to %, - %, and > % lower income than pre-epidemic income, respectively) caused by covid- were . %, . %, and . %, respectively. as hubei was most severely affected province by covid- in china, all participants were grouped by their geographic location. the proportions in hubei province, and places outside hubei province were . %, and . %, respectively. most of these participants were aged from to years, lived in urban areas, and had a college degree or above. generally consistent with the existing covid- research results [ ] [ ] [ ] , the prevalence rates of our participants who had symptoms of depression, anxiety, insomnia, and distress cause by covid- were . %, . %, . %, and . %, respectively. as shown in table , multivariable logistic regression analyses showed that, after controlling for covariates, the adjusted odds of depression, anxiety, insomnia and distress were lower among participants who under years (e.g., depression among participants aged - years: or = . , % ci: . - . , p < . ; depression among participants aged - years: or = . , % ci: . - . , p < . ) compared with who aged over years, and greater among those working in hubei province (e.g., depression: or = . , % ci: . - . , p < . ) than outside hubei province. for the population whose income was heavily affected by covid- , they were prone to experiencing mental symptoms of depression, anxiety, and insomnia (e.g., depression among participants with light income losses: or = . , % ci: . - . , p < . ). those from urban area had lower adjusted odds of depression anxiety, insomnia and distress than those from rural area (e.g., depression: or = . , % ci: . - . , p = . ). at the same time, being married (or, . ; % ci, . - . ; p < . ) was associated with a greater risk of feeling depressed than being unmarried. in sex statistics, we set an additional question (if you are a woman, please indicate whether you are pregnant). in this study, as shown in table , multivariable logistic regression analyses showed that, after controlling for covariates, we found that pregnant women with income losses during covid- were associated with a greater risk of feeling depressed and anxiety (depression: or = . , % ci: . - . , p = . ; anxiety: or = . , % ci: . - . , p = . ) than unpregnant women (table ) . table lists the detailed results of phq- from multivariable logistic regression analysis; the results for the other scales are presented in supplementary materials (tables s -s ). abbreviations: na = not available; aor: adjusted odds ratio; ci: confidence interval. phq- : the patient health questionnaire- . according to lai, j et al. [ ] , the cutoff scores for detecting possible major symptoms of depression, anxiety, insomnia, and distress caused by covid- are , , , and , respectively. thus, the prevalence rates of our participants who had severe mental symptoms of depression, anxiety, insomnia, and distress were . %, . %, . %, and . %, respectively. similar to findings regarding prevalence of mental symptoms, as shown in table , multivariable logistic regression analyses showed that, after controlling for covariates, the adjusted odds of severe symptoms of depression, anxiety, and distress were lower among participants who aged - years (e.g., severe depression: or = . , % ci: . - . , p = . ) compared with who aged over years, greater among those with heavy income losses than light and middle income losses (e.g., severe depression among participants with light income losses: or = . , % ci: . - . , p < . ), and lower among those from urban area than those from rural area (e.g., severe depression: or = . , % ci: . - . , p < . ). for those working in hubei province, they were more prone to experiencing severe mental symptoms of anxiety and distress than those working outside hubei province. we enrolled respondents and found a high prevalence of mental health symptoms among workers with income losses caused by covid- in china. this latest national sample indicated the prevalence rates of any disorder (excluding dementia), anxiety disorders, and depressive disorders were . %, . %, and . % in china, respectively. compared with national data, we found much higher prevalence rates of participants with symptoms of depression, anxiety, insomnia, and distress caused by covid- , at . %, . %, . %, and . %, respectively. our findings are consistent with those of previous covid- studies, including a study in mainland china that found that the prevalence of depression as measured during the covid- pandemic was . % [ ] and a study in hong kong that found that the prevalence of depression caused by covid- was . % [ ] . mental disorders, including depression, anxiety, insomnia, and distress, caused by covid- were assessed in our study by chinese versions of validated measurement tools [ ] [ ] [ ] [ ] : phq- , gad- , and isi- . in our study, the cronbach's alpha coefficient of our questionnaire is . . the cronbach's alpha coefficients of the chinese versions of phq- , gad- , isi- and ies- were . , . , . and . , respectively, proving these scales have good reliabilities and validities in the application of evaluating mental disorders among chinese worker with income losses. by reviewing the literature, we found that these chinese scales are widely used in the study of psychological problems. especially recently, these four scales have been used to study covid- . for example, researchers used them to assess the magnitude of mental health outcomes among healthcare workers treating patients exposed to covid- in china [ ] , phq- and gad- were used to evaluate depression and anxiety in hong kong during the covid- pandemic [ ] , and gad- was used to assess the prevalence of mental health problems and examine their association with social media exposure [ ] . in this study, besides age, sex and other demographic characteristics, participants from hubei province and outside hubei province were taken as the research objects for comparison of regional differences. the proportions of respondents from hubei province and places outside hubei province were . % and . %, respectively. the proportions of light, middle, and heavy losses of income (> to %, - %, and > % less income than pre-epidemic levels, respectively) caused by covid- were . %, . %, and . %, respectively. most of these participants were aged from to years, lived in urban areas, and had a college degree or above. we found that workers with heavy income losses caused by covid- reported more symptoms of depression, anxiety, and insomnia. compared with participants outside hubei province, those in hubei province reported higher scores on all four scales. the prevalence rates of our participants who had severe mental symptoms of depression, anxiety, insomnia, and distress were . %, . %, . %, and . %, respectively. our findings further indicated that pregnant women scored higher than non-pregnant women on phq- and gad- measuring symptoms of depression and anxiety. these findings are consistent with the previous studies' findings that exposure to a public health emergency can cause mental health problems. this study has several limitations. first, it was limited in scope. almost half of the participants ( . %) were from hubei province, limiting the generalization of our findings to less affected regions. this survey was mainly conducted online, so some respondent bias, such as few elder citizens' participation, may have affected the results. second, the survey was conducted over two weeks and lacked longitudinal follow-up. it was hard to determine whether the mental health symptoms of workers with income losses could become more severe, so the long-term psychological implications of this population are worth further investigation. last, although the response rate of this study was . %, response bias may still exist if the non-respondents were either too stressed to respond or not at all stressed and therefore not interested in this survey. in conclusion, our findings showed that relatively high prevalence rates of symptoms of depression, anxiety, insomnia, and distress were caused by covid- . the prevalence of mental health problems among workers caused by covid- in china is high, especially those working in hubei province with heavy income losses. in addition, pregnant women with income losses were associated with a greater risk of feeling depressed and anxiety than other women, and may need psychological support or interventions. these results further indicate that the long-term psychological implications of this population are worth further investigation. supplementary materials: the following are available online at http://www.mdpi.com/ - / / / /s , table s : prevalence of anxiety and associated factors, table s : prevalence of insomnia and associated factors, table s : prevalence of distress and associated factors, table s : prevalence of severe anxiety and associated factors, table s : prevalence of severe insomnia and associated factors, table s : prevalence of severe distress and associated factors. the authors declare no conflicts of interest. the questionnaire consisted of questions to record demographic indicators and symptoms of depression, anxiety, insomnia, and distress of all participants. demographic data the following demographic data were included in this study: sex (male or female), age ( - , - , - , or > years categories), educational level ( to %, - %, and > % less income than the pre-epidemic level, respectively), and place of residence (urban or rural). the english versions of the phq- , gad- , isi- , and ies-r- scales were used in this study to measure the degree of symptoms of depression, anxiety, insomnia, and distress of our participants. early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a novel coronavirus from patients with pneumonia in china emergency committee regarding the outbreak of novel coronavirus ( -ncov) outbreak of pneumonia of unknown etiology in wuhan, china: the mystery and the miracle characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention risk perception and impact of severe acute respiratory syndrome (sars) on work and personal lives of healthcare workers in singapore: what can we learn? curating evidence on mental health during covid- : a living systematic review mental health problems and social media exposure during covid- outbreak depression and anxiety in hong kong during covid- factors associated with mental health outcomes among health care workers exposed to coronavirus disease mental health problems during the covid- pandemics and the mitigation effects of exercise: a longitudinal study of college students in china impact of the covid- pandemic on mental health and quality of life among local residents in liaoning province, china: a cross-sectional study related health factors of psychological distress during the covid- pandemic in spain covid- and the fears of italian senior citizens epidemiological aspects and psychological reactions to covid- of dental practitioners in the northern italy districts of modena and reggio emilia the psychological impact of confinement linked to the coronavirus epidemic covid- in algeria the predictive capacity of air travel patterns during the global spread of the covid- pandemic: risk, uncertainty and randomness spatio-temporal patterns of the -ncov epidemic at the county level in hubei province a comparison of infection venues of covid- case clusters in northeast china quarantine and isolation: how are quarantine and isolation different? available online coronavirus is more dangerous for the global economy than sars the economic impact of pandemic influenza in the united states: priorities for intervention the impact of covid- on tourist satisfaction with b&b in zhejiang, china: an importance-performance analysis validity and reliability of patient health questionnaire- and patient health questionnaire- to screen for depression among college students in china reliability and validity of a generalized anxiety scale in general hospital outpatients insomnia severity index: psychometric properties with chinese community-dwelling older people the development of the chinese version of impact of event scale-revised (cies-r) this article is an open access article distributed under the terms and conditions of the creative commons attribution (cc by) license key: cord- -rhybnlw authors: pei, lijun title: prediction of numbers of the accumulative confirmed patients (nacp) and the plateau phase of -ncov in china date: - - journal: cogn neurodyn doi: . /s - - - sha: doc_id: cord_uid: rhybnlw in the present study, i propose a novel fitting method to describe the outbreak of -ncov in china. the fitted data were selected carefully from the non-hubei part and hubei province of china respectively. for the non-hubei part, the time period of data collection corresponds from the beginning of the policy of isolation to present day. but for hubei province, the subjects of wuhan city and hubei province were included from the time of admission to the huoshenshan hospital to present day in order to ensure that all or the majority of the confirmed and suspected patients were collected for diagnosis and treatment. the employed basic functions for fitting are the hyperbolic tangent functions [formula: see text] since in these cases the -ncov is just an epidemic. subsequently, the -ncov will initially expand rapidly and tend to disappear. therefore, the numbers of the accumulative confirmed patients in different cities, provinces and geographical regions will initially increase rapidly and subsequently stabilize to a plateau phase. the selection of the basic functions for fitting is crucial. in the present study, i found that the hyperbolic tangent functions [formula: see text] could satisfy the aforementioned properties. by this novel method, i can obtain two significant results. they base on the conditions that the rigorous isolation policy is executed continually. initially, i can predict the numbers very accurately of the cumulative confirmed patients in different cities, provinces and parts in china, notably, in wuhan city with the smallest relative error estimated to [formula: see text] , in hubei province with the smallest relative error estimated to [formula: see text] and in the non-hubei part of china with the smallest relative error of [formula: see text] . % in the short-term period of infection. in addition, perhaps i can predict the times when the plateau phases will occur respectively in different regions in the long-term period of infection. generally for the non-hubei part of china, the plateau phase of the outbreak of the -ncov will be expected this march or at the end of this february. in the non-hubei region of china it is expected that the epidemic will cease on the th of march and following this date no new confirmed patient will be expected. the predictions of the time of inflection points and maximum nacp for some important regions may be also obtained. a specific plan for the prevention measures of the -ncov outbreak must be implemented. this will involve the present returning to work and resuming production in china. based on the presented results, i suggest that the rigorous isolation policy by the government should be executed regularly during daily life and work duties. moreover, as many as possible the confirmed and suspected cases should be collected to diagnose or treat. it has been suggested that coronaviruses are threats to human life. this type of viruses, which was discovered and characterized in , is broadly distributed in mammals and birds. in humans, the majority of the coronaviruses cause only mild respiratory infections and a limited number, such as the ''severe acute respiratory syndrome'' (sars) in china (liu et al. ) and the ''middle east respiratory syndrome'' (mers) in saudi arabia and south korea, have caused more than , cumulative infected cases in the past decades. although several coronaviruses have been identified and characterized, additional unknown coronaviruses that are potential threats are to be discovered. in december , pneumonia cases of unknown reasons emerged in wuhan, the capital of hubei province and one of the largest cities in the central part of china. although the majority of them were cured, they led to respiratory failures and a few patient fatalities. this outbreak of pneumonia attracted significant attention in the world. the causative agent identified by the chinese authorities was designated novel coronavirus ( -ncov) by the world heath organization (who) on january . on january , the chinese government classified the novel -ncov as a class a agent. a series of non-pharmaceutical interventions were implemented, namely, isolation of symptomatic persons, strict restriction of travel in hubei province and shutdown of the public transport in various cities. although the number of the accumulative confirmed patients (nacp) in the non-hubei chinese regions have decreased continually for days, the effectiveness and efficiency of these interventions is questionable. in addition, when the viral infection reaches its plateau phases several factors are affected including financial costs and work abstention. therefore, it is very crucial to reduce the outbreak of the -ncov. so far, there are nearly , confirmed cases in wuhan and more than , confirmed cases in china, whereas several exported cases have been confirmed in other countries including japan, south korea, singapore, usa, canada, germany, france, uk and spain. mathematical models were employed to investigate the viral outbreak and interesting results were obtained. the mathematical modeling of the n-cov outbreak has been previously investigated tang et al. ; rabajante ; imai et al. ; liu et al. ; fanelli and piazza ; peng et al. ; toda ; sameni ) . the reason of the outbreak has bee reported in previous studies (chu et al. ; khan et al. a, b) . its clinical characteristics and laboratory test results were also studied (qian et al. ; world health organization ) . its treatment and prognosis were presented in two recent studies (chen and du ; chai et al. ). its containment strategy was discussed in (bittihn and golestanian ; hu et al. ) . the prediction of the tendency of -ncov and notably of the nacp and of the plateau phase is of great importance at present. these goals were achieved by fitting the data of the nacp in these regions. the -ncov outbreak could not be modeled accurately due to the weak knowledge of the reasons, transmission mechanisms, effect of control policies, treatments strategies and damages. the mechanism of -ncov infection is very unclear and was studied by several scientists. however, the data contains its much information and can disclose its many natures. therefore, these data were fitted in order to conduct the outbreak prediction. two significant features were stated in my novel fitting method as follows: • the first novel idea is the choice of the data of the nacp in different regions. for the china regions outside hubei province, i.e., the non-hubei part, the medical conditions are sufficient and the isolation policy is well executed. all the confirmed patients can be collected and receive treatment, and the suspected cases can be collected for diagnosis and further treatment. therefore, the outbreak of the -ncov is just a general epidemic. for hubei province, which includes the wuhan city, several cabin hospitals and the leishenshan hospital were employed following initial completeness of the huoshenshan hospital. the majority of the confirmed patients can be collected for treatment and most suspected patients can also be collected for diagnosis and subsequent treatment. therefore, the outbreak of the -ncov in wuhan city and hubei province are considered as a general epidemic and can be treated as such. the data were collected from the non-hubei part of china from approximately january , i.e., from the beginning of the policy of isolation to the present day. the data were collected from hubei province and from wuhan city from february , i.e., from the date of the initial establishment of the huoshenshan hospital, to the present date. the data can be fitted to predict the nacp in the short-term duration and predict the initiation of the plateau phases. • the selection of the basic functions for the fitting model is crucial for the success of the prediction. since in both the above cases, the -ncov is just an epidemic, this suggests that it will initially spread rapidly and subsequently exhibit a tendency to disappear. therefore, the numbers of the cumulative confirmed patients in different cities, provinces and geographical locations will initially increase rapidly and subsequently remain constant when reaching the plateau phase of the viral infection. in the present study, the hyperbolic tangent functions tanhð:Þ were used that can satisfy the aforementioned conditions. therefore, the hyperbolic tangent function tanhð:Þ was set as the basic function for fitting. it laid the foundation of the success of the fitting model and further enhanced the prediction success of the -ncov infection. by this novel method, two significant results were obtained based on the conditions that the rigorous isolation policy is cognitive neurodynamics executed continually. initially, the numbers of the accumulative confirmed patients in different cities, provinces and geographical locations in china were predicted very accurately in the short term period of infection. moreover, the times of the plateau phases were determined in different places in the long-term period of infection. generally, in the non-hubei china part, the nacp of -ncov will tend to constant from approximately february and its maximum infectivity will be theoretically achieved by march . following this date, no additional infected patient will be expected to be diagnosed. based on the present results, it is suggested that the rigorous isolation policy by the government should be executed continually. the remaining part of this article is organized as follows: in sect. , the novel fitting method of the outbreak of -ncov in china is proposed, and the selection of the data and basic functions is presented. the validation of this novel method was achieved in the data derived from the sars infection in in china mainland and hongkong, which are presented in sect. . the results of the prediction of nacp and of the plateau phase, as well as of the ips of -ncov in china are presented in sect. . finally, i present some concluding remarks in sect. . initially, i will present the novel fitting method for the prediction of nacp and the plateau phase of -ncov in china. the success of the fitting or prediction depends on the selection of the data, the basic functions for this fitting and the fitting method. i will describe all three components in this section. the data of the nacp must correspond to the epidemic characteristics. they must fit into the epidemic pattern. all the confirmed patients must be collected for treatment and the suspected patients can be collected for diagnosis. the effective treatment of the infected patients and the efficient diagnosis of the suspected cases should be ensured. this is the basis of the principle to which the government adheres regarding as many as possible the confirmed and suspected should be collected to diagnose and treat. with regard to the chinese regions outside of hubei province, i.e., the non-hubei part of china, the medical conditions are sufficient to treat the infected cases and ultimately contain the spread of the virus. therefore the confirmed patients can be collected for the appropriate treatment and the suspected patients can be collected for diagnosis and further treatment, so that the outbreak of the -ncov in this part will be considered as a general epidemic. therefore the data from the st of january , which was the beginning of the strict isolation policy in the non-hubei region were used for fitting. the data of the nacp in cities and provinces with major viral outbreak are presented in tables and . all data are collected from the official websites of the health commissions in these regions. the hubei province, which includes wuhan city, was thoroughly assessed. following the establishment of the huoshenshan hospital, the cases reported in several cabin hospitals and subsequently in the leishenshan hospital were examined. the majority of the confirmed patients were collected for successful treatment, and most of the suspected patients were collected for diagnosis and further treatment. therefore, the outbreak of the -ncov in wuhan city and even hubei province is considered a general epidemic, suggesting that it should be treated as such. the data from wuhan city and hubei province were collected from the establishment of the huoshenshan hospital to the present date to ensure that all or the majority of the confirmed and suspected cases could be collected for diagnosis and treatment. the data of the nacp in wuhan and hubei province are presented in table . all the data were collected from the official websites of the health commissions in hubei province and wuhan city. a notable change was noted on february . the numbers of the cumulative clinical confirmed patients were also added to those of the cumulative confirmed patients in hubei province and in wuhan city. more than , cases of this type were added into the data collected by february the th. therefore, the present data are differentiated from the restrictive confirmed standards to the relaxed confirmed standards. however, the data were collected based on the old restrictive confirmed standards. in future studies, the data should be fitted to the new relaxed confirmed standards of hubei province and wuhan city in order to predict the number of the accumulative confirmed patients and the plateau phase of the -ncov infection. these data are subsequently fitted to the following novel basic functions. i will fit these data by the following novel basic functions. usually, the power functions ; x; x ; x ; x ; . . . are used as the basic functions for fitting the data. however, the data of the infectious diseases require a different set of basic functions. despite fitting of the data, a usual fluctuation may be noted and consequently the tendency-like plateau phase can not be predicted. therefore, the selection of the basic functions for this fitting is crucial. the infectious diseases are characterized by the numbers of the cumulative confirmed patients. they initially increase rapidly and finally stabilize. at that time period no confirmed patient presents. following isolation and treatment, the majority of the confirmed patients will recover and a limited number will not survive the infection. at last, the infectious diseases are controlled and eradicated. it is well known that the hyperbolic tangent functions tanhð:Þ exhibit two properties: an initial rapid increase and a final phase with constant. based on these two properties, the functions are rearranged to: ; tanhð : xÞ; tanhð : xÞ; tanhð : xÞ; . . . or ; tanhð : xÞ; tanhð : xÞ; tanhð : xÞ; . . . as the basic functions for fitting the data of the nacp in these regions. the accuracy of the fitting by this novel method is excellent since not only it can predict the nacp in these regions in the next day with very small relative errors, but it can also plot the evolution curves of the nacp in the long-run period of infection and perhaps it can be used to estimate the days when the plateau phase comes. initially, the novel fitting method was used for estimation of the nacp. the number of infected cases per day can be predicted in some regions by the novel fitting method and the fitting function. a plot can be constructed and the days required for the plateau phase can be estimated. for example, in nanyang city, a serious outbreak city that is close to wuhan city in the henan province, the restrictive isolation was executed from january to february (tables , ). the data were fitted using the following basic functions: ; tanhð : xÞ; tanhð : xÞ; tanhð : xÞ; . . .. the equation was rearranged as follows, f ðxÞ ¼ : þ : tanhð : xÞ À : tanhð : xÞ þ : tanhð : xÞ À : tanhð : xÞ: the number of cases on february was f ð Þ ¼ : % . the actual number was and the relative error was À : %. the fitting results are shown in fig. a . the fitting was optimal. the prediction, which is described by the evolution curve is displayed in fig. b . obviously it will be constant from february . shanghai city is a very important international city in china and the nacp from january to february (tables , ) following execution of the restrictive isolation could be fitted in the basic functions: the number estimated in february , was f ð Þ ¼ : % . the actual number was and the relative error was À : %. the fitting result is shown in fig. a . the fitting was optimal. the prediction that contained the evolution curve is displayed in fig. b . it is expected that it will tend to be constant from approximately february , . another example can be obtained for the non-hubei region of china part. the nacp in that region was reported from january to february (tables , ) , when the strict isolation was executed. the data could be fitted with the basic functions as follows: ; tanhð : xÞ; tanhð : xÞ; tanhð : xÞ; . . .. the following formula was obtained, f ðxÞ ¼ : þ : tanhð : xÞ À : tanhð : xÞ þ : tanhð : xÞ À : tanhð : xÞ þ : tanhð : xÞ À : tanhð : xÞ: the number estimated on february was f ð Þ ¼ : % ; . the actual number was , , and the relative error was : %. the fitting result is shown in fig. a . the fitting was excellent. the prediction, i.e., the evolution curve is displayed in fig. b . it is expected for this curve to stabilize from the th of february . all fittings and the evolution curves for the different regions are presented in figs. , , , , , , , and . with regard to the prediction of the next day, i.e. on the th of february , the method is similar. the actual number on the th of february was added into the old data and fitted with the new method and the above basic functions. the fitting functions could be obtained and the number of infected cases on february could also be obtained. the fitting results and the evolution curves were obtained. the prediction of nacp in the continuous days is presented in tables and . initially, this novel fitting method was employed to the fitting and prediction of the data of sars in china mainland and hongkong in to assess the effectiveness of this method. the results are presented in fig. . apparently, the fitting was excellent and could predict the evolution of sars in china mainland and in hongkong in qualitatively and quantitatively. the data of sars in china mainland were obtained from the official website of who (world health organization) from april to july (https://www.who.int/csr/sars/country/en/). the data from the hongkong were obtained from the website from the march to the st of july (https://www.who.int/csr/sars/country/en/). the data implied that the novel fitting method is excellent for sars in and perhaps valid and effective for the -ncov. in the next section, this fitting method was employed to the prediction of nacp and the plateau phase of -ncov in china. in this subsection, the prediction of nacp will be presented in the continuous days at different regions in tables and . all fittings and the evolution curves for different regions are represented in figs. , , , , , , , and . in this subsection, the inflection points (ip) in different chinese regions are presented. ip stands for the maximum of nacp that was achieved. after this point, the infectious disease will be controlled and the majority of the confirmed patients will recover, whereas a low number of cases will succumb to the disease. the prediction of this point is of great importance to the inhibition of the -ncov outbreak. in the present study, the prediction of ips in the -ncov outbreak in different regions of china is presented (figs. , , and tables , , , ) . these regions are usually important cities or the main affected regions around hubei province. it is of great importance to investigate the date of the -ncov termination. it is very helpful for the prevention of -ncov outbreak to construct a plan for their daily life, returning to work and resuming production by the government. it can be deduced that the cities in the mainland china outside of hubei province should be able to control the outbreak of -ncov before march and after this date no confirmed patient should be reported. the nearby regions, especially the severe outbreak regions, such as henan, hunan, jiangxi and anhui provinces, the important industrial and financial regions, such as guangdong, zhejiang, jiangsu provinces and the important international figs. , , and tables , , , . since the medical conditions and curative efficiency are being improved, the nacp of hubei province and wuhan city can not represent the real number of the infected subjects. thus, the nacp or ip can not be predicted effectively. but in this paper, i have tried to predict the nacps and ips for wuhan city and hubei province. the results of nacps are excellent also and presented in tables , and . the results for ips in these two regions are presented in fig. ðy À zÞ and table . based on the fact that the rigorous isolation policy is executed continually, the results of the present study are very significant. firstly, the numbers of the infected subjects can be predicted very accurately leading to the cumulative number of confirmed patients in different regions of china. notably, hubei province exhibited the smallest relative errors ( : %), followed by wuhan city ( : %). in addition, the non-hubei chinese region exhibited larger relative errors (À : %) in the short-term period of infection. moreover, it is possible to predict the time points when the plateau phases are developed in different regions in the long-run period of infection. it was generally shown for the non-hubei chinese region that the nacp of the -ncov infection would tend to reach a constant state of growth from february . this evidence is considered very important for the fighting against the -ncov outbreak in china. recently, there are increasing imported confirmed patients in the international cities such as beijing, shanghai, guangzhou since more and more chinese peoples returned back from the other countries suffering from the worse outbreak of -ncov. since the numbers of the coming back peoples from these countries in everyday and the rates of infection in these countries are different, so the number of the imported confirmed patients are stochastic. it is not similar to the situation of china mainland. it has no any relation to the outbreak of -ncov in china mainland. i can not predict it accurately by the present method in this paper. i will study it in the separate paper in the future. it is a challenge of the fighting against the -ncov in china. based on these results, it is suggested that not only the rigorous isolation policy by the government should be executed continually, but also the concomitant diagnosis and treatment of-as many as possible-confirmed and suspected cases should be facilitated. this will speed up the conquer of the viral outbreak. in the present study, the novel fitting method was employed to predict the nacp and the plateau phase of the -ncov infection in different regions of china. the data were collected during different time periods of infection occurring in different regions of china to ensure that as many as possible confirmed and suspected cases could be collected for diagnosis or treatment. the hyperbolic tangent functions were used as the basic functions for the fitting method. two significant results were obtained as follows: firstly, the nacp could be predicted very accurately in different regions of china, notably in wuhan city and hubei province with very small relative errors. in the non-hubei region of china larger relative errors were noted in the short-term period of infection. secondly, the time point at which the plateau phases occur can be predicted in different regions in the long-run period of infection. generally for the non-hubei chinese regions, the plateau phase of -ncov was noted at approximately march and after this time period no new confirmed patients were identified. the predictions of the time of inflection points (ips) and maximum nacp for certain important regions were also presented. these measures are very important for the prevention of the -ncov outbreak and for returning to work and resuming production in their daily life and work duties in china. based on these results, it is suggested that the rigorous isolation policy should be executed by the containment strategy for an epidemic based on fluctuations in the sir model specific ace expression in cholangiocytes may cause liver damage after -ncov infection a time delay dynamical model for outbreak of -ncov and the parameter identification potential natural compounds for preventing -ncov infection molecular diagnosis of a novel coronavirus ( -ncov) causing an outbreak of pneumonia analysis and forecast of covid- spreading in china forecasting and evaluating intervention of covid- in the world estimating the potential total number of novel coronavirus cases in wuhan city novel coronavirus is putting the whole world on alert studies on mathematical models for sars outbreak prediction and warning predicting the cumulative number of cases for the covid- epidemic in china from early data epidemic analysis of covid- in china by dynamical modeling clinical characteristics of novel infected coronavirus pneumonia: a systemic review and meta-analysis rabajante fj ( ) insights from early mathematical models of -ncov acute respiratory disease (covid- ) dynamics mathematical modeling of epidemic diseases estimation of the transmission risk of the -ncov and its implication for public health interventions susceptible-infected-recovered (sir) dynamics of covid- and economic impact world health organization ( ) laboratory testing of novel coronavirus ( -ncov?) in suspected human cases: interim guidance acknowledgements the author would like to acknowledge the financial support for this research via the nnsf of china under the grant nos. , and . the author also thank prof. rubin wang, editor-in-chief of cognitive neuro-dynamics, for his sufficient discussions, valuable suggestions and kind help. all best wishes for all citizens in hubei province, esp. wuhan city, and the doctors, nurses, scientists and plas there from all over china helping them. key: cord- - s wniq authors: lv, boyan; li, zhongyan; chen, yajuan; long, cheng; fu, xinmiao title: global covid- fatality analysis reveals hubei-like countries potentially with severe outbreaks date: - - journal: j infect doi: . /j.jinf. . . sha: doc_id: cord_uid: s wniq • cfr in iran in the early stage of the outbreak is highest among all the countries. • cfrs in the usa and italy are similar to hubei province in the early stage. • cfrs in south korea are similar to outside hubei, indicating less severity. • our findings highlight the severity of outbreaks globally, particular in the usa. the outbreak of novel coronavirus diseases (covid- ) is ongoing in china, but appears to reach late stage and also just starts to devastate other countries. as of march , there have been confirmed covid- cases and deaths in china, much higher than those outside china with confirmed cases and deaths. however, the daily increase in covid- cases outside china has greatly surpassed that inside china (over verse on march), and therefore people raise deep concerns about the outbreaks outside china. here we attempted to uncover their characteristics by comparative analysis on crude fatality ratios (cfrs). we collected data of the officially released cumulative numbers of confirmed cases and deaths (from january to march ) with respect to mainland china, epicenter of the outbreak (i.e., hubei province and wuhan city), outside hubei (in china) and outside wuhan (in hubei), as well as to typical countries reported with a substantial number of deaths including south korea, japan, iran, italy, usa, france and spain ( fig. ) . cfrs in hubei and wuhan are significantly higher than those outside hubei and outside wuhan, and they are relatively higher in the early stage of outbreaks than in the late stage ( fig. a) , in line with earlier comprehensive reports by china cdc and who. , the outbreaks outside china are overall lagging approximately one month behind china ( fig. b vs fig. a) . cfr in iran in the early stage (from february to late march) is extremely high while cfr in korea is low and stable over time. notably, cfr in iran has significantly decreased since march while cfr in italy increased a lot in the past days. cfrs in a period of -day, i.e., from march to february for china and other specific periods for countries outside china (for detail, refer to s .xls file), were plotted as mean ±sd at % confidence intervals (in the black box), with median being shown as short lines. statistics were performed using spss with anova algorithm, and significance levels ( p value) for all the pairs are shown in table s . p values larger than . between wuhan/hubei and other countries are colored in red, indicating no significant difference (i.e., somehow being similar to each other) and the relative severity of the epidemic therein; p value between outside hubei and south korea is . (colored in blue), indicating relatively mild or controllable epidemic in south korea. next, we performed comparative statistical analysis on cfrs in a period of days in the early stage of outbreaks between outside china and china. in particular, two periods were set for iran and italy in order to fully cover their changing trends (for detail, refer to s .xls file). results displayed in fig. c revealed i) cfrs in iran, italy and usa in the past ten days are not significantly different from hubei ( p being . , . and . , respectively); ii) cfr in usa is not significantly different from wuhan to marginal degree ( p being . ); iii) cfr in iran from february to march is significantly different from any regions of china (p < . ; table s ). in view of the detailed p values among all pairs (table s ), we suppose the ranking for the severity of covid- outbreaks in different countries/regions in terms of cfrs as follows: iran > wuhan > hubei ≈usa ≈italy > outside wuhan ≈spain ≈japan ≈france > south korea ≈outside hubei. as cfr is defined as the number of deaths (numerator) among the number of confirmed cases (denominator), both increase of numerator and decrease of denominator lead to higher cfr. in hubei/wuhan there were neither sufficient covid- test kits for infection identification nor enough beds in hospitals for effective treatments of patients in the early stage of the outbreak. these shortages led to numerous transmissions in households, reduced the apparent number of cumulative confirmed cases and caused mild patients without treatments to become severe/critical ones and even die, as implicated by earlier reports. , as such, cfrs in hubei/wuhan was relatively high in the early stage. , similar cfrs between hubei and usa/italy, suggest that these countries may face similar situations at present as hubei had experienced before. in support of this, recent news reports show that italy is extremely short of medica resources (beds and acute care equipment) while usa has some problems in covid- testing capacity. in iran, these problems might be even more severe such that its cfr is extremely high. to fight against the covdi- outbreaks in these hubei/wuhan-like countries, governments may need to implement control measures and timely supply medical resources as hubei/wuhan had done in the past month. , emergence of a novel coronavirus causing respiratory illness from wuhan who: coronavirus disease (covid- ) outbreak . available from who: coronavirus disease (covid- ) situation reports report of the who-china joint mission on coronavirus disease the novel coronavirus pneumonia emergency response epidemiology team. vital surveillances: the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- )-china corona virus disease , a growing threat to children potential association between covid- mortality and healthcare resource availability cnn: the us is starting to look like italy on coronavirus lockdown this work is support by the national natural science foundation of china (no. and to xf). authors declare no conflict of interests. supplementary material associated with this article can be found, in the online version, at doi: . /j.jinf. . . . key: cord- -ucwuptgg authors: fang, h.; wang, l.; yang, y. title: human mobility restrictions and the spread of the novel coronavirus ( -ncov) in china date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: ucwuptgg we quantify the causal impact of human mobility restrictions, particularly the lockdown of the city of wuhan on january , , on the containment and delay of the spread of the novel coronavirus ( -ncov). we employ a set of difference-in-differences (did) estimations to disentangle the lockdown effect on human mobility reductions from other confounding effects including panic effect, virus effect, and the spring festival effect. we find that the lockdown of wuhan reduced inflow into wuhan by . %, outflows from wuhan by . %, and within-wuhan movements by . %. we also estimate the dynamic effects of up to lagged population inflows from wuhan and other hubei cities, the epicenter of the -ncov outbreak, on the destination cities' new infection cases. we find, using simulations with these estimates, that the lockdown of the city of wuhan on january , contributed significantly to reducing the total infection cases outside of wuhan, even with the social distancing measures later imposed by other cities. we find that the covid- cases would be . % higher in the chinese cities outside hubei province, and . % higher in the non-wuhan cities inside hubei, in the counterfactual world in which the city of wuhan were not locked down from january , . we also find that there were substantial undocumented infection cases in the early days of the -ncov outbreak in wuhan and other cities of hubei province, but over time, the gap between the officially reported cases and our estimated "actual" cases narrows significantly. we also find evidence that enhanced social distancing policies in the chinese cities outside hubei province are effective in reducing the impact of population inflows from the epicenter cities in hubei province on the spread of -ncov virus in the destination cities elsewhere. human mobility contributes to the transmission of infectious diseases that pose serious threats to global health. indeed, in response to pandemic threats many countries consider and impose measures that restrict human mobility flows as one of their response plans (bajardi et al., ; wang and taylor, ; charu et al., ) . however, restrictions on human mobility are controversial not only because of their negative economic impacts, but also because of the uncertainty about their effectiveness in controlling the epidemic. even if restricting human movement could lead to improvements in disease control and reductions in health risks, it is empirically challenging to quantify the impact of human mobility on the spread of infectious diseases, and to understand the detailed spatial patterns of how the infectious disease spreads. both granular disease occurrence data and human mobility data (charu et al., ) are hard to obtain; moreover, it is difficult to disentangle the impact of human mobility from other potential contributing factors in the spread of epidemics hollingsworth et al., ) . in this paper, we exploit the exogenous variations in human mobility created by lockdowns of chinese cities during the outbreak of the novel coronavirus ( -ncov), and utilize a variety of high-quality data sets, to study the effectiveness of an unprecedented cordon sanitaire of the epicenter of covid- , and provide a comprehensive analysis on the role of human mobility restrictions in the delaying and the halting of the spread of the covid- pandemic. the fast-moving -ncov that infected , people and claimed , lives as of march , is deteriorating into one of the worst global pandemics. the virus emerged in the city of wuhan in the hubei province of china in early december of , spread mainly through person-to-person contact (chan et al., ) , and rapidly reached more than countries as of march , . currently, there are no licensed vaccines or specific therapeutics to combat covid- . the lockdown of million people in wuhan from january , represents by then the largest quarantine in public health history, and offers us an opportunity to rigorously examine the effects of the city lockdown and understand the relationship between human mobility and virus transmission. specifically, this paper studies five research questions. first, how does the lockdown of the city of wuhan amid the novel coronavirus outbreak affect population movement? second, how do population flows among chinese cities, particularly outflows from wuhan and other cities in hubei province, affect virus infection in the destination cities? third, is there evidence of, and if so, what is the magnitude of, undocumented cases of covid- cases in wuhan and other cities in hubei province during the early stages of the epidemic? and how does the extent of undocumented infection cases evolve over time? fourth, how many covid- cases elsewhere in china were prevented by the unprecedented wuhan lockdown? fifth, are social distancing policies in destination cities effective in reducing the spread of the infections? we utilize reliable datasets on population migration among pairs of chinese cities and the within-city population movements of each city at the daily level from baidu migration, and the city-level daily numbers of confirmed covid- cases, recovered patients, and death tolls from the chinese center for disease control and prevention (ccdc) during a sample period of january -february , , covering days before and days after the city lockdown on january , , as well as the matched data from the same lunar calendar period in . we first employ various difference-in-differences (did) estimation strategies to disentangle the effect of wuhan lockdown on human mobility reductions from other confounding effects including panic effect, virus effect, and the spring festival effect-the spring festival of the chinese new year is on january , (table ) . we find that the lockdown of wuhan reduced inflow into wuhan by . %, outflows from wuhan by . %, and within-wuhan movements by . %. we also estimate the dynamic effects of up to lagged population inflows from wuhan and other hubei cities, the epicenter of the -ncov outbreak, on the destination cities' new infection cases ( figure ). we discover that the estimated effects of the different lags of inflows from wuhan and hubei both show a clear inverted u -shape with respect to the lags, with the largest impact on the newly confirm cases today comes from the inflow population from wuhan or other cities in hubei about to days earlier. we find, using simulations with these estimates, that the lockdown of the city of wuhan on january , contributed significantly to reducing the total infection cases outside of wuhan, even with the social distancing measures later imposed by other cities. we find that the covid- cases would be . % higher in the chinese cities outside hubei province, and . % higher in non-wuhan cities inside hubei, in the counterfactual world in which the city of wuhan were not locked down from january , . we also find that there were substantial undocumented infection cases in the early days of the -ncov outbreak in wuhan and other cities of hubei province, but over time, the gap between the officially reported cases and our estimated "actual" cases narrows significantly. we also find evidence that imposing enhanced social distancing policies in the chinese cities outside hubei province is effective in reducing the impact of population inflows from the epicenter cities in hubei province on the spread of -ncov virus in the destination cities elsewhere. by providing a rigorous estimation of the impact of within and cross-city migration on the spread of the -ncov virus in china, our study contributes to fast-growing literature on -ncov infection, mostly in the medical and public health fields. huang et al. ( ) describe high rates of respiratory distress, intensive care admission, and abnormal findings on chest computed tomography (ct) in the first patients hospitalized from december to january in wuhan, as well as a % death rate. chan et al. ( ) investigate a family cluster and confirm the human-to-human transmission of this novel coronavirus within hospitals and families. the basic reproduction number (r ) for covid- in studies has a mean of . and a median of . (liu et al., ) (compared to for sars (wto, ) ). this study is also related to disaster-induced migration, which has often occurred during flooding (gray and mueller, ) , drought (munshi, ) , earthquake (lu et al., ) , and other destructive climatic phenomena. given that the -ncov has rapidly spread worldwide due to human travel and caused severe illness and significant mortality, it is therefore essential to understand the impact of various control measures on human mobility and the virus transmission. qiu et al. ( ) apply machine learning tools and use exogenous temperature, wind speed, and precipitation in the preceding third and fourth weeks as the instruments to show that the massive lockdown and other control measures significantly reduced the virus transmission. their results highlight that the population outflow from the outbreak source city poses higher risks to the destination cities than other social and economic factors, such as geographic proximity and similarity in economic conditions. our results are also in line with the results of the latest modeling exercises, which mostly rely on model calibrations of various parameters, such as generation time, incubation period, detection rates, and changes in travel flow. using the global epidemic and mobility model, chinazzi et al. ( ) project the impact of travel limitation on the national and international spread of the -ncov. the study finds that only . % of infected cases were reported as of february , and the wuhan lockdown reduced the cases by % in cities outside wuhan by january . the study most closely related to ours is li et al. ( ) , which shows that contagious but undocumented covid- cases facilitated the geographic spread of the epidemic in china. using a networked dynamic meta-population model and bayesian inference, li et al. ( ) find that % of all infections were undocumented before the wuhan lockdown and reported infections would have been reduce by . % in china, without transmission from undocumented cases between january and january . lai et al. ( ) build a travel network-based susceptibleexposed-infectious-removed (seir) model to simulate the outbreak across cities in mainland china. they use epidemiological parameters estimated from the early stage of outbreak in wuhan to parameterize the transmission before the non-pharmaceutical interventions (npi) were implemented. the npis they consider include travel bans and restrictions, contact reductions and social distancing, early case identification and isolation. through their sim-ulations, they find that the npis deployed in china appear to be effectively containing the covid- outbreak, but the efficacy of the different interventions varied, with the early case detection and contact reduction being the most effective. moreover, deploying the npis early is also important to prevent further spread. relative to our study, it is important to point out that in their simulations, they assumed that the pattern of population movements was the same in years when there were no outbreaks and interventions. to the best of our knowledge, this paper is the first to provide a causal interpretation of the impact of city lockdown on human mobility and the spread of -ncov, and to clearly disentangle the lockdown effects from other potential contributing factors such as panic and virus effect, as well as the seasonal spring festival effect (see section ). the remainder of the paper is structured as follows. in section , we describe the data sets used in our analysis. in section , we present different difference-in-difference estimation strategies to separate the lockdown effect, panic effect, virus (deterrence) effect, and the spring festival effect on population movements in china. in section , we estimate the distributed lag effects of inflows from the epicenter cities of the -ncov outbreak on destination cities' daily infection cases. in section , we study how the enhanced social distancing policies, or "lockdown" policies, in the destination cities' impact the effects of the population inflows from the epicenter cities. in section , we conclude. population migration data. we obtain inter-city population migration data from baidu migration, a travel map offered by the largest chinese search engine, baidu. . the baidu migration data set covers , pairs of cities per day for chinese cities between january and march in , and between january and february in . note that, by the lunar calendar, the data covers the same period of days before and days after the spring festivals, respectively for year and year . the daily inter-city migration data consist of , , city-pair observations each year. in addition, baidu provides the daily within-city mobility data for each city in the sample period, which is a panel consisting of , city-day level observations each year. the baidu migration data is based on real-time location records for every smart phone using the company's mapping app, and thus can precisely reflect the population movements between cities. for each of the cities, baidu migration provides the following informa- tion: ( ). the top origination cities (oc) for the population moving into the city and the corresponding percentages of inflow population that originated from each of the top oc; ( ). the top destination cities (dc) for the population moving out of the city, and the corresponding percentages of the outflow population that go into each of the top dc. in the data, the cumulative percentages of the inflow population from the top origination city, and the cumulative percentages of the outflow population into the top destination cities, reach % per city on average. this ensures that the baidu migration data capture near complete inflows and outflows for each of the cities in the data. in addition, the baidu migration data provides three migration intensity indicators: the daily in-migration index (imi) of a city, the daily out-migration index (omi) of a city, and the daily within-city migration index (wcmi). the intensity indicators are consistent across cities and across time. to convert the index to the number of people, we use the actual number of inflow population into shanghai by airplanes, trains and buses/cars, and the number of within-shanghai trips using subways, buses and expressways, collected by the national earth system science data center for the period of february to february , . using this data, we estimate that one index unit in the imi and omi corresponds to , person movements, and one index unit in wcmi corresponds to , , person movements. we are thus able to calculate the number of daily inflow and outflow migrants in each city-pair, as well as the number of within-city population movements. table presents the summary statistics of the population flows at the city-pair-day level and city-day level. it shows drastic declines in the average inflows, average outflows, and average within-city migration in , compared to a sample period in (matched by the lunar calendar date). the plummeting of the migration statistics due to the wuhan lockdown is also depicted in figure . the three figures on the top show the inflows into wuhan, outflows from wuhan and within-wuhan flows, for year (solid line) and year (dashed line), matched by the lunar calendar; and the bottom shows the corresponding figures for the national city averages. the first vertical line indicates the date of january , when experts confirmed that -ncov could transmit from human to human; and the second vertical line indicates the date of january , when wuhan was locked down. it is clear that while the flows in tracked that of well until january , the level dropped to a fraction of their corresponding levels, particularly after the lockdown of wuhan. [ table about here] [ figure about here] it is important to emphasize here that the mobility data is about the movement of people from one city to another based on geo-location services of the smartphones; as such a person flowing out of city a to city b is not necessarily a resident of city a, but he/she must have been to city a before moving to city b. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . outbreak data. covid- daily case counts are collected from china cdc, which provides daily updates on confirmed, dead, and recovered covid- cases in each city. from january to february , , the data covers , laboratory-confirmed covid- cases, , death cases, and , recovered cases in cities in china. panel b of table presents the summary statistics of covid- data, and figure plots the trends of daily statistics of covid- separately for the epicenter city of wuhan, for other cities in hubei province, and for cities outside of hubei. there are many possible reasons for us to treat the officially reported numbers of confirmed cases in wuhan and other cities of hubei with more caution, and to treat them differently from the data of cities outside of hubei. as the epicenter of covid- , the health care systems in wuhan and other cities in hubei were overwhelmed by the sheer number of patients who needed laboratory testing, especially in the early phases of the virus outbreak. as such, the over-extended medical system in wuhan and other cities in hubei might have caused delays in the testing of the patients who contracted covid- ; and because of the delay, some patients who contracted covid- might have self-recovered, or might have died, before being officially tested; and some who were infected with the virus might be asymptomatic. there is also a possibility that government officials in the epicenter cities had incentives to downplay the severity of the outbreak, at least initially. these considerations impact how we use the outbreak data in section . the incubation period of -ncov is long in comparison with sars; moreover, the virus can transmit while the person is still asymptomatic, which increases the probability a person with the novel coronavirus will travel and unknowingly spread the virus to others. to suppress the spread of -ncov, the central government of china imposed an unprecedented lockdown in wuhan starting from am of january , , and in other hubei cities several days later. as of february , , cities in provinces issued different source: http:// ncov.chinacdc.cn/ -ncov/ the spike of confirmed cases observed on february in hubei province is. for the most part, the result of a change in diagnosis classification for which , clinically (rather than laboratory) confirmed cases were all reported as new cases on february , , even though they might have been clinically diagnosed in the preceding weeks. also on february , , a new communist party secretary of hubei province was appointed who started his position on the next day. the range of the incubation period for -ncov is estimated to be - days, or even as long as days. the median incubation period is about days, see lauer et al. ( ) . in contrast, the incubation period for sars is - days. in addition, sars transmits only after showing symptoms. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . levels of lockdown policies. table a in the appendix provides the detailed information about the various forms of population mobility control in different cities. we also plot the geographic distributions of sample cities and cases in figure . [ figure about here] there are several confounding factors in our attempt to causally quantify the impact of lockdown on human mobility, and on the spread of infectious viruses. first, the virus outbreak happens right before the spring festival of the chinese lunar new year, which causes the largest annual human migration every year. second, the virus itself, even in the absence of a mandatory lockdown, may lead to curtailed human movement as people attempt to avoid exposure to the virus in the journeys and public spaces. we refer to this deterrence effect as the virus effect. third, for the city of wuhan and other cities in hubei that are close to wuhan, there is also the possibility of a panic effect, in reaction to the virus. the panic effect can lead to an increase in the population outflow from the epicenter of the virus outbreak, and a decrease of the population inflow to the epicenter, particularly the city of wuhan. the panic effect is likely to peak when the government officially confirmed on january , that the novel coronavirus can transmit from person-to-person. in our analysis below, we create a specific pre-lockdown period before ,t , which includes the threeday period between january and january , , to capture the panic effects. note that while the virus effect applies to movement into and out of all cities, the panic effect is more specific to the cities in the epicenter, especially wuhan, and can have a positive effect on outflows, and a negative effect on inflows. within hours of the wuhan lockdown, travel restrictions were imposed on the nearby cities of huanggang and ezhou, and were eventually imposed on all other cities in hubei, affecting a total of about million people. on february , the city of wenzhou implemented a partial lockdown in which only one person per household was allowed to exit once every two days, and most of the highway exits were closed. following wenzhou, another six cities, hangzhou, zhumadian, ningbo, haerbin, fuzhou, and zhengzhou also launched similar partial lockdowns on february . in another cities, surveillance and tighter controls applied to each neighborhood. in the provinces of liaoning and jiangxi, as well as major cities such as shenzhen, guangzhou, nanjing, ningbo, chengdu and suzhou, checkpoints have been set up to control the inflow population. the migration across china, which officially begins from about two weeks before, and ends about three weeks after, the lunar new year is often referred to as chunyun (meaning spring movement). in , approximately billions trips were made during chunyun, see https://www.cnn.com/travel/article/ lunar-new-year-travel-rush- /index.html. the large outflow from wuhan is possible because many people in wuhan are migrant workers and college students with hometowns elsewhere. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . we first examine the impact of city lockdown on inter-city population mobility, including inflow and outflow, between a city pair (i, j). to disentangle the contributions of these confounding factors on human mobility, we exploit many unique sources of variations in the data, and employ several did estimation strategies by comparing different treatment and control groups. the did specification can be described as follows: where i, j, and t respectively index the destination city, origination city, and date; the dependent variable, ln(flow i,j,t ), is the logarithmic population flows received by city i from city j at date t. the definition of treat varies by specific did designs, and we will be explicit about its definition below. the city-pair fixed effect µ i,j is included to absorb the city-specific and the city-pair specific heterogeneities that may contaminate the estimation of our interested coefficient β . we also control for the date-fixed effect θ t to eliminate the time-specific impact, including the spring festival travel effect. the standard errors are clustered at the daily level. in equation ( ), we include two pre-lockdown period indicators: before ,t is a dummy that takes value for the period from january to january , ( to days before the wuhan lockdown), which can be used to examine the parallel trend assumption in the did analysis; before ,t is a dummy that takes value for the period from january to january , , three days before the unprecedented wuhan lockdown, but after the official announcement that the novel coronavirus can spread from person to person. before ,t allows us to capture the panic effect. finally, after t is a dummy that takes value for the sample period after the wuhan lockdown, between january and february , . the omitted benchmark period is from january to january , . on january , , an expert at china cdc confirmed that the novel coronavirus can spread from human to human. the confirmation highlighted the increasing risk of an epidemic. prior to the january announcement, the experts' assessment of the virus was that it was "preventable and controllable," and "at this time, there is no evidence of person to person transmission." . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . we also estimate the effect of lockdown on the within-city population movement utilizing the city-level data and a variety of did specifications: where i and t index the city and date. ln(withincityflow i,t ) is the logarithmic within-city population mobility measure for city i at date t. similar to equation ( ), treat will be defined according to the did design. before ,t , before ,t and after t are defined in the same way as in equation ( ). we include the city fixed effects µ i and date fixed effects θ t . the standard errors are clustered at the daily level. [ table about here] in table , we report the results from three sets of regressions specified according to equation ( ) for inter-city inflows (panel a) and outflows (panel b), and according to equation ( ) for within-city movement (panel c). we implement three models that differ in the estimation sample, and the definition of the variable treat. model : wuhan vs. unlocked cities in . the estimation sample used in the regressions reported in column ( ) is the data for wuhan and cities that were never locked down during the coronavirus outbreak. in panel a when we examine the inflow population, treat takes value if the destination city i is wuhan; in panel b when we examine the outflow population, treat takes value if the origination city j is wuhan. the control group consists of cities that are not subject to any mobility restrictions. in panel c when we examine the within-city mobility, treat takes value if city i is wuhan. we interpret the coefficient estimate of treat * before ,t as measuring the panic effect of wuhan relative to the unlocked cities; and the coefficient estimate of treat * after t as measuring the lockdown effect. the coefficient estimate of treat * before ,t allows us to examine whether the parallel trend assumption for did is recall that baidu migration data covers chinese cities, cities implemented some level of mobility restrictions (see table a ). if the virus effect is stronger in wuhan than in the unlocked control cities, then the coefficient estimate of treat * after t also includes the excess virus effect on the population in wuhan over the average virus effect experienced by the population in the control cities. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . satisfied. it is important to note that the possibly time-varying spring festival effects and the virus effects are both absorbed in the day fixed effects. based on the coefficient estimate of the term treat * after t , we find that wuhan's lockdown reduces the inflow population to wuhan by . % (= − exp(− . )), the outflow population from wuhan by . % (= − exp(− . )), and within-city population movements in wuhan by . %(= − exp(− . )), relative to all other unlocked cities in the post-lockdown period in . in panel c, we also find that the coefficient of treat * before ,t is significantly negative at - . , suggesting that the official confirmation of the person-toperson transmission reduces the within-city movement in wuhan by . % from jan to jan in . this points to a panic effect for the within-city population flow in wuhan, but we do not observe a significant panic effect for inter-city flows in and out of wuhan in model . model : wuhan vs. wuhan . model could be criticized on the ground that wuhan may not be comparable to the unlocked cities -after all, these cities that never imposed any mobility restrictions in the covid- outbreak could be very different from wuhan. in model , we compare the population movements of wuhan in to itself in the same matched lunar calendar period in , during which wuhan is free of lockdown and coronavirus outbreak. thus, the estimation sample in model is the daily inflows into and outflows out of wuhan, as well as the daily within-city movements in wuhan for years and . under model , in panel a when we examine the inflow population, treat takes value if the destination city i is wuhan and year is ; in panel b when we examine the outflow population, treat takes value if the origination city j is wuhan and the year is . the control group is wuhan . in panel c when we examine the within-city mobility, treat takes value if the year is . under model , the coefficient estimate of treat * before ,t measures the panic effect related to the virus outbreak in wuhan; and the estimate of treat * after t measures the sum of the lockdown effect and the virus effect. the coefficient estimate of treat * before ,t still allows us to examine whether the parallel trend assumption for did is satisfied. the day fixed effects absorb the possibly time-varying spring festival effects. notice that the interpretation of the coefficient estimate of the term treat * after t differs from that in model because of the differences in the treatment and control groups. under model , the estimated coefficients on treat * after t , which, as we explained above, capture both the lockdown and the virus effects, remain negative, and economically and statistically significant in all panels. the estimates suggest that the lockdown of wuhan, together with the deterrence effect of the virus (the virus effect), on average reduces the . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . inflow population into, outflow population from, and within-city movements in wuhan by . % (= − exp(− . )), . % (= − exp(− . )), and . % (= − exp(− . )), respectively, relative to the same lunar calendar days in . we also find that the coefficient on treat * before ,t is significantly positive in panel b and significantly negative in panel c, suggesting that the official confirmation of person-to-person spread of covid- creates a panic effect, causing an increase of outflow from wuhan of . % (= exp( . ) − ), and a decrease of within-city movements in wuhan of . % (= − exp(− . )), during the three days after the announcement but before the city lockdown. however, we do not observe a statistically significant panic effect for the population inflow into wuhan, suggesting that people in other cities were not yet sufficiently concerned about the virus outbreak in wuhan and did not avoid traveling to wuhan, even after the official confirmation of the personto-person transmission. finally, we should also point out that the coefficient estimates for treat * before ,t are all statistically insignificant, which suggests that the parallel trend assumption for the dids are plausible. model : wuhan vs. seven other lockdown cities . in model , the coefficient estimates of treat * after t provide us with an estimate of the sum of the lockdown and the virus effects. in order to isolate the lockdown effect from the virus effect, we consider model , where the estimation sample consists of data of the city of wuhan and seven other cities that went into partial lockdown on february and february , , to days after the lockdown of wuhan, in an effort to curtail the spread of the virus. as we show in table a in the appendix, these seven cities are more comparable to wuhan than other unlocked cities in terms of the epidemic situation and other economic indicators, and thus provide a reasonable control group to partial out the virus effect. in particular, it is much more plausible than in model (where the control cities are cities that were never locked down) to assume that the deterrence effect of the virus on human mobility in the seven cities is similar to that in wuhan. the estimation sample for model consists of data from wuhan and the seven cities for the period between january and february , . note that during this period, none of the seven control cities were locked down yet, even though they were soon eventually locked down. the definition for treat variables are as follows. in panel a, treat takes value if the destination city i is wuhan; in panel b, treat takes value if the origination city j is wuhan; in panel c, treat takes value if city i is wuhan. the control group consists of the these seven cities are: wenzhou, which was partially lockdown from february , ; and ningbo, zhumadian, hangzhou, zhengzhou, haerbin, and fuzhou, which were partially locked down on february , . as summarized in table a in the appendix, partial lockdown includes "closed-off management" on highways, railways and public transport systems; and sets up checkpoints to control the inflow population, and implements surveillance and tighter controls in each neighborhood. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . seven cities. under model , the coefficient estimate of treat * before ,t measures the panic effect related to the virus outbreak in wuhan relative to the seven control cities; and the estimate of treat * after t measures the lockdown effect only. the coefficient estimate of treat * before ,t still allows us to examine whether the parallel trend assumption for did is satisfied. the possibly time-varying spring festival effects and the virus effects are both absorbed in the day fixed effects. we find that the wuhan lockdown significantly reduces the inflow into, outflow from, and within-city movements in wuhan by . % (= − exp(− . )), . % (= − exp(− . )), and . % (= − exp(− . )), respectively. we interpret these as the pure lockdown effect on population mobility related to wuhan. summary. based on our preferred estimation models, which are models and , table summarizes our estimates of the panic effect, the virus effect, and the lockdown effect on inflows into, outflows from wuhan, and within-city population movements in wuhan. [ table about here] in table , the lockdown effects are directly calculated from the corresponding coefficient estimates of treat * after from model discussed above; the panic effects are from the coefficient estimates of treat * before in model . for the virus effect, we recognize that the coefficient estimates of treat * after in model incorporate both the lockdown and the virus effects. thus we calculate the virus effect on inflows into wuhan to be exp(− . − (− . )) − = − . %, on the outflows from wuhan to be exp(− . − (− . )) − = − . %, and on the within-city flow in wuhan to be exp(− . −(− . ))− = − . %. because our models assume that the different effects enter exponentially in explaining the flows -recall the natural log specifications in equations ( ) and ( -when we would like to calculate the impact of two or more effects on the population flows, we should not simply add the individual effects. for example, the joint impact of the panic and virus effects on outflows out of wuhan is ( + . %) * ( − . %) − = . %, instead of the simple sum of . % − . % = . %. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . we now examine the impact of human mobility on the transmission of -ncov. considering that almost all the new covid- cases outside the city of wuhan were confirmed after the wuhan lockdown while almost all inter-city population flows occurred prior to the wuhan lockdown (see figures and ) , we investigate the imported infections by specifically looking at the impact of population inflows from cities in the epicenter of the novel coronavirus outbreak, namely, wuhan and other cities in hubei province, on the new cases in the destination cities. recognizing that -ncov has a long incubation period, we estimate a dynamic distributed lag regression model taking into account that inflows from wuhan with different lags may have differential impacts on the current new cases in the destination cities. most of the medical literature states that the -ncov virus has a median incubation period of five days, and some can have an incubation period of days or more (see lauer et al. ( ) , e.g.). luckily, our data allows us to incorporate the possibility that contact with an infected person from wuhan or other cities in hubei can result in confirmed infections in the destination city for up to days. the analysis focuses on the daily new confirmed covid- cases in the post-wuhan lockdown period from january to february , , for cities i that are outside of hubei province. specifically, we run the following regression: where i indexes the cities outside of hubei, and t ∈ { , ..., } indicate the date. κ ∈ { , ..., } indicates the time lapsed from the inflows from wuhan or other hubei cities till the current date t. ln( +newcase i,t ) is the logarithm of the number of new confirmed cases in city i at date t. inflow i,wh,t−κ and j =i,j =wh,j∈hb inflow i,j,t−κ are the inflows from wuhan, and the inflows from the other cities in hubei to city i, respectively, κ days prior our log-log specification is based on the classical susceptible-infectious-removed (sir) model in epidemiology. date t = indicates the date of january , , and t = the date of february , . . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . to the focal date t. we control for destination city fixed effects µ i and date fixed effects θ t . note that, in this regression we include only cities outside of hubei province for two reasons. first, wuhan and other cities in hubei province are the epicenter of the novel coronavirus outbreak, and we are interested in how population outflows from these cities to other cities outside hubei affect the destination cities' covid- cases. second, the confirmed covid- cases in wuhan and other cities in hubei province are likely to be inaccurate for the following reasons. first, as widely reported, the health care systems in wuhan and other cities in hubei were totally overwhelmed by the sheer number of covid- patients. this made it impossible to conduct laboratory tests on all patients, which can lead to delayed confirmation of infected patients. second, during the delay some of the infected may have healed on their own, or have died before being confirmed. third, local government officials may face strong incentives to under-report the number of infected cases. in section . , we will evaluate the possible downward biases of the officially reported cases in wuhan and other cities in hubei based on our estimates. in contrast, the confirmed cases in other cities are likely to be accurate, as their numbers are not large enough to overwhelm their local health care system; and the incentives to under-report are much weaker in cities outside of hubei. the estimated coefficients β κ and β κ in equation ( ) respectively represent the impact of the inflows from wuhan and other cities in hubei κ ∈ { , ..., } days ago on the destination cities' new cases today. they are respectively plotted in the top and bottom panels of figure . we also fit a spline smoothed curve of the estimated effects of the different lags of inflows from wuhan and hubei, which both show a clear inverted u -shape with respect to the lags. interestingly, both graphs show that the largest impact on the newly confirm cases today in chinese cities outside hubei comes from the inflow population from wuhan or other cities in hubei about to days ago. the pattern exhibited in figure is consistent with the hypothesis that the incubation period of the -ncov is up to to days, but also consistent with a shorter incubation period coupled with secondary infections. [ figure about here] anecdotal evidence suggests the official statistics of covid- cases in wuhan may have been under-reported due to the shortages of testing equipment and other medical resources. with the estimated dynamic effects as shown in figure , which is estimated under the plausible assumption that the reported cases outside hubei province are reliable, we can estimate the "actual"' number of infection cases in wuhan and other cities in hubei. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . to estimate the "actual" number of infection cases in wuhan using the estimated equation ( ), we technically need to impute a value for inflow wh,wh,t−κ , that is "inflows from wuhan to wuhan." we proxy these inflows by the daily within-wuhan population movement from january to february , i.e., by withincityflow wh,t−κ . similarly, to estimate the "actual" number of infection cases in other cities in hubei, we need to replace the inflow from city j to itself by the corresponding daily within-city-j population movements. we need to make an additional assumption about the city fixed effects. recall that cities in hubei province were not included in the estimation sample for equation ( ), as such there are no city fixed effects estimated for cities in hubei. luckily, it is plausible to assume that the city fixed effects in hubei is the average of the city fixed effects of all chinese cities outside hubei. according to world bank's (respectively, imf's) method of estimating per capita gdp, hubei's per capita gdp in was usd , (respectively, usd , ), and the average per capita gdp of mainland china was usd , (respectively, usd , ). hubei is not only geographically and demographically in the center of mainland china, it is also economically the average of china. thus we believe it is defensible to proxy the fixed effects of hubei cities by the average of the city fixed effects outside of hubei. [ figure about here] in figure , we plot the estimated daily new cases according to the above-described method using the estimated equation ( ), as well as the corresponding cumulative cases for wuhan (panel a) and other cities of hubei (panel b) for the period of january to february , . we also plotted the corresponding daily and cumulative officially reported (i.e., documented) cases. we find a persistent gap between the estimated and reported laboratory-confirmed cases in wuhan before february , , just before the announcement of a new party secretary for hubei province on february , . the estimated "actual" number of infection cases is . times the reported cases during the first days after the wuhan lockdown, on average. in particularly, we estimate that on january , , the day of the wuhan lockdown, . % of our estimated infections in wuhan were undocumented in the sense that the number of officially reported cases on that day was only . % of our estimated infection cases. this gap widened over time, possibly as a result of the overwhelmed health care system, and peaked at . % on january . the proportion of undocumented infections started to decline gradually afterward, when more medical support and resources were mobilized across china to support wuhan. as of february , we estimate that there were , total covid- infections in wuhan, which is . % higher than the official reported statistics for wuhan -a total of , cases. the . % discrepancy can be plausibly be explained by the unaccounted for self-healing and death that might have occurred during the early periods of the outbreak between january and early february. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . thus, we are led to conclude that the almost all infection cases in wuhan were able to be treated over time as the stress on the health system was relieved, and moreover, the official statistics were mostly accurate, as can be seen from the left figure on the daily new cases in panel a in figure . it is useful to note that the general pattern of the undocumented covid- infections in wuhan before and after the lockdown based on our estimates is consistent with that reported in li et al. ( ) . they use a networked dynamic meta-population model and bayesian inference to estimate the proportion of undocumented infections in the epicenter of the outbreak, as well as the respective inflection rates for documented and undocumented cases. they estimate that % of all infection cases as of january , were undocumented. our interpretation of these undocumented cases is broader than the asymptomatic cases (i.e. covid- cases that do not show symptoms); we believe that some of the undocumented cases were due to the lack of ability to test the rapidly increasing infection cases in the early period of the outbreak. most of the undocumented cases become documented over time as the care capacity was strengthened in wuhan. in the bottom panel of figure , we plot our estimated daily new confirmed cases and total infection case for cities (other than wuhan) in hubei province, together with the officially reported series. we find that in the cities, infections were more seriously underreported in the first week after the wuhan lockdown when our estimated infected cases are . times of the reported cases. our estimate reveals a very high rate of undocumented infections on the first day of wuhan lockdown: . %. the gap narrowed gradually with more medical resources provided and more stringent control measures implemented in those cities. by the end of our study period on february , , the estimated "actual" number of infections is , cases in other cities in hubei, which is . % higher than the officially reported cumulative cases ( , ) . again, the discrepancy between the estimated and officially reported cumulative cases can at least be partially attributed to the unaccounted for self-healing and death that might have occurred during the early periods of the outbreak. locking down wuhan, a city of million residents, was an unprecedented measure to contain the spread of the novel coronavirus. an important policy-relevant question is, then, how many covid- cases were actually prevented by the wuhan lockdown in china? to answer this question, we must estimate the counterfactual number of covid- cases that would have occurred in other cities in the absence of wuhan lockdown, which would, in turn, . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . require a counterfactual estimate of the outflows from wuhan to other chinese cities, had there been no lockdown of wuhan. in section , we provided the estimates of how panic effect and virus effect will separately affect the outflows from wuhan to other cities in china, separate from the lockdown effect. these effects are summarized in table . it suggests that, in the absence of the wuhan lockdown, the virus effect and panic effects would have led to a . % decrease and a . % increase in the outflow population from wuhan, respectively. based on these effects, therefore, in the absence of wuhan lockdown, we would expect that the outflows from wuhan in days after january -the date of wuhan lockdown -to be ( − . ) * ( + . ) = . ( ) times higher than the normal outflows from wuhan to other cities. we use the daily level of outflows from wuhan to a city in on the same lunar calendar day as a measure of the normal outflow, and multiple the number by . to obtain the daily counterfactual inflows from wuhan to the city, had there been no lockdown of wuhan from january , . using this estimation method, we find that on average, the estimated counterfactual outflows from wuhan to other cities in hubei between january and february , would be , , , a level that is . times the actually observed inflow population observed in the data in (with the lockdown), which is , , ; similarly, the average inflows from wuhan to the other cities outside hubei in china would be , , , . times the actual inflow population to those cities during the same period in , which is , . we denote the counterfactual inflows from j = wuhan into city i at date s ∈ { , ..., } from the above calculation as inflow i,wh,s . we assume that the wuhan lockdown did not impact the population movements within other cities, and population flows among other cities. we also assume that all the control measures implemented by other cities after the wuhan lockdown remain in place. thus the parameter estimates of the dynamic lag effects of inflows from wuhan and other cities in hubei, estimated in equation ( ), remain valid as an epidemiological diffusion equation that is not affected by human mobility restrictions that result from the wuhan lockdown; the lockdown only affected the human flows. with these considerations in mind, we simulate the counterfactual number of covid- cases, had there been no wuhan lockdown, on date t ∈ { , ..., } (i.e., from january to recall that outflows from wuhan are not just residents of wuhan; any travelers who entered wuhan for whatever reason and then leave wuhan would be included in the wuhan outflows measured by baidu migration data. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . february , ) in cities i outside hubei province by the following equation: inflow i,j,t−κ +μ i ( ) whereβ κ andβ κ are coefficient estimates obtained from regressions specified in equation ( ) and reported in figure , and andμ i are the estimated city fixed effects from the same regression. note that in predicting the counterfactual covid- cases without wuhan lockdown, we use the counterfactual inflows from wuhan to city i for days after january inflow i,wh,s discussed previously. we are also interested in predicting the counterfactual cases in other non-wuhan cities in hubei province using the same method as that described above, with two differences. first, since hubei cities are not included in the estimation of equation ( ), we do not have the city fixed effects for the hubei cities. as we argued in section . , it is plausible to assume that the fixed effects of hubei cities are the average of city fixed effects of the cities outside hubei. we maintain this assumption as well in this counterfactual exercise. second, following the same strategy of section . , for a non-wuhan city j in hubei province, we use the within-city-j population movement at date t to proxy for the inflow to city j from city j when we implement equation ( ) for non-wuhan hubei cities. in figure , we present the counterfactual estimates of covid- cases had there been no wuhan lockdown (in solid curve), and the officially reported cases (in dashed curve) for cities outside hubei province (panel a) and non-wuhan cities of hubei (panel b). the left figure in each panel depicts the model's counterfactual prediction and the actual of daily infection cases, and the right figure depicts the evolution of cumulative cases from january to february , . the gap between the estimated counterfactual number of infection cases (the solid curve) and the officially reported cases (the dashed curve) on the right figure represent the number of covid- cases prevented by the wuhan lockdown. as of february , , the officially reported number of covid- in the cities outside hubei province was , , but our counterfactual simulation suggests that there would have been , cases, had there been no wuhan lockdown. similarly, the officially reported number of covid- cases in the non-wuhan cities in hubei was , by february , , but our counterfactual simulation suggests that the number of infection cases would have been , , had there been no wuhan lockdown. that is, the covid- cases would be . % higher in cities outside hubei, and . % higher in other cities in hubei as of february , in the counterfactual world in which the city of wuhan were not locked down from january , . our findings thus suggest that the lockdown of the city of wuhan from january . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . , played a crucial role in reducing the imported infections in other chinese cities and halts the spread of -ncov virus. our model also allows us to project when the new infection cases would have peaked in other cities in the absence of wuhan lockdown. as shown by the solid curves in the left figures of each panel, our model projects that in the absence of wuhan lockdown, the new daily infection cases in the non-wuhan cities in hubei would have peaked on february , the th day after the lockdown of wuhan, at a level of , daily new cases; and elsewhere in china, it would have peaked on february , , the th day after the lockdown of wuhan, at a level of , daily new cases. we also find that the estimated daily new cases in the counterfactual world gradually converge to the reported daily cases from february , suggesting that the social distancing measures implemented elsewhere in china would have worked eventually to contain the spread of -ncov virus, even if the city of wuhan was not locked down on january , but the initial onslaught on the medical system in all cities in china would have been much more severe, and the total number of infection cases elsewhere would have been significantly higher. [ figure about here] as the -ncov virus spread throughout the world, many countries are also implementing lockdown measures, and mandate social distancing as a policy response to contain the spread of the virus. up to now, our analysis has focused on the events of the lockdown of wuhan -the epicenter of the novel coronavirus outbreak, it would also be interesting to study the impact of lockdowns and/or the social distancing measures in the destination cities in reducing and containing the spread of virus. indeed, chinazzi et al. ( ) point out that travel restrictions to and from mainland china impact the global pandemic of the covid- only if transmission within the community is simultaneously reduced by % or more, which suggests that social distancing at the destination cities is crucial in preventing the possibly asymptomatic transmission from the source city. moreover, quantifying the effect of social distancing on virus transmission is especially relevant to the stage of pre-epidemic community spread when a person who is not known to have traveled to affected countries, or to have had contact with an infected person becomes infected. as shown in table a in the appendix, within the few weeks after the wuhan and hubei lockdowns, various human mobility restrictions were imposed on other chinese cities outside hubei. as described in table a , the "lockdowns" in destination cities varied in their . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . degree of strictness, from building entrances checkpoints to establishing quarantine zones, and from public transit shutdowns to strict limits on the inflows into the city, outflows out of the city, as well as within-city population movements. we interpret the human mobility restrictions in the destination cities as an enhanced social distancing policy, because the "lockdown" rules in the destination cities are not as strict as those implemented in wuhan. in this section, we use the variations in the destination cities' "lockdown" policies to study how the effects of inflows from wuhan and other cities in hubei province on the spread of covid- cases in the destination cities are impacted by the changes in the destination cities' lockdown policies, which would drastically impact the within-city population movements in the destination cities. specifically, we estimate the following specification that is a modified version of the regression specification described by equation ( ): where the new variable lockdown i,t is a dummy that takes value if time t is a date after destination city i's "lockdown" date, if at all; and otherwise, where the lockdown dates of the cities outside hubei are listed in table a . if city i never implemented any formal lockdown policy, the dummy is always . therefore, the coefficients, β κ and β κ , respectively, measure the impact of the lagged inflows from wuhan and hubei κ days earlier on the destination cities' current new cases before the city's imposition of its "lockdown," while γ κ and γ κ represent the effect on destination cities' of wuhan and hubei inflows after the imposition of the city's lockdown. if enhanced social distancing that comes from the "lockdown policies" imposed at the chinese cities outside hubei is effective in reducing the spread of the virus from population flows from the epicenter of the virus, then we expect that γ κ and γ κ to be smaller than β κ and β κ , respectively. see panel c of table for the evidence from wuhan lockdown on within-wuhan population movement. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . in figures (respectively, figure ), we plot the estimated coefficients, β κ and γ κ (respectively, β κ and γ κ ) in panel (a), and their differences in panel (b) for the lagged effects of inflows from wuhan (respectively, non-wuhan cities in hubei) on the daily new cases in destination cities outside hubei province. we find that the estimated lagged effects of inflows from wuhan and other cities in hubei before the destination city's lockdown, if any, show little change compared to the coefficients in figure however, the coefficient estimates of lagged inflows after the destination city's lockdown policies appear to be insignificant and indifferent from zero on almost all lags. in panel (b) of figures - , we plot the differences between the estimated effects pre and post the destination cities' lockdown policies. we find that the differences between the estimated coefficients pre and post destination city lockdowns are positive and statistically significantly at at-least % level for (respectively, ) of the first ten lagged population inflows from wuhan (respectively, other cities of hubei), and the other differences in lagged estimates are statistically insignificant. these results suggest that the enhanced social distancing policies in the destination cities are effective in reducing the impact of population inflows from the source cities of wuhan and other cities in hubei on the spread of -ncov virus in the destination cities. this in turn implies that population inflows from the epicenter contribute to the spread of infection in the destination cities only before the social distancing measures are applied; it appears that after implementing their various control measures, cities adopting an extended lockdown can flatten the upward trajectory of the virus. in this paper, we quantify the causal impact of human mobility restrictions, particularly the lockdown of the city of wuhan on january , , on the containment and delay of the spread of the novel coronavirus, and estimate the dynamic effects of up to lagged population inflows from wuhan and other hubei cities, the epicenter of the -ncov outbreak, on the destination cities' new infection cases. we find that the lockdown of wuhan reduced inflow into wuhan by . %, outflows from wuhan by . %, and within-wuhan movements by . %. using counterfactual simulations with these estimates, we find that the lockdown of the city of wuhan on january , contributed significantly to reducing the total infection cases outside of wuhan, even with the social distancing measures later imposed by other cities. we find that the covid- cases would be . % higher in the chinese cities outside hubei province, and . % higher in non-wuhan cities inside hubei, in the counterfactual world in which the city of wuhan were not locked down from january , . we also find that there were substantial undocumented infection cases in the early days of the -ncov outbreak in wuhan and other cities of hubei province, . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . but over time, the gap between the officially reported cases and our estimated "actual" cases narrows significantly. we also find evidence that imposing enhanced social distancing policies in the chinese cities outside hubei province is effective in reducing the impact of population inflows from the epicenter cities in hubei province on the spread of -ncov virus in the destination cities elsewhere. the results from our analysis provide valuable causal evidence on the role of human mobility restrictions on the containment and delay of the spread of contagious viruses, including the -ncov virus that is now ravaging the world. enhanced social distancing in the destination cities, and, if an epicenter can be identified as was the case for the city of wuhan in china, a lockdown, can play crucial roles in "flattening" the daily infection cases curve, giving the stressed medical system a chance to regroup and deal with the onslaught of new infection cases. although our study focuses exclusively on the effect of human mobility restrictions on the spread of -ncov virus in china, our estimated results can have general implications to other countries in their fight against the novel coronavirus. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . ( ) and ( ). the control and treatment groups for models - are described in the text. fixed effects of city-pair and daily are included in all columns in panels a and b, and fixed effects of city and daily are included in all columns in panel c. standard errors are clustered at the daily level. *** significant at the percent level. ** significant at the percent level. * significant at the percent level. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . ( ) and ( ) of table . *** significant at the percent level. ** significant at the percent level. * significant at the percent level. . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the the copyright holder for this preprint . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / notes: this figure plots the dynamic effects of lagged inflows from wuhan (left) and other cities in hubei (right) from estimating equation ( ). we add spline smoothing fit curves (in red) using the rcspline function and plot the % confidence intervals (the vertical green whiskers). . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the the copyright holder for this preprint . it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the the copyright holder for this preprint ( ). we add spline smoothing fit curves using the rcspline function and plot the % confidence intervals (the vertical green whiskers). . it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the the copyright holder for this preprint ( ). we add spline smoothing fit curves using the rcspline function and plot the % confidence intervals (the vertical green whiskers). . it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the the copyright holder for this preprint notes: this table provides summary statistics on the count of total confirmed cases as of february , , and on daily average population inflow, outflow, and within-city flow between january and march in , and on permanent population gdp as of december , for cities with different level of controls. . it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the the copyright holder for this preprint human mobility networks, travel restrictions, and the global spread of h n pandemic a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster human mobility and the spatial transmission of influenza in the united states the effect of travel restrictions on the spread of the novel coronavirus strategies for mitigating an influenza pandemic natural disasters and population mobility in bangladesh will travel restrictions control the international spread of pandemic influenza clinical features of patients infected with novel coronavirus in wuhan, china effect of non-pharmaceutical interventions for containing the covid- outbreak in china the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) the reproductive number of covid- is higher compared to sars coronavirus predictability of population displacement after the haiti earthquake networks in the modern economy: mexican migrants in the us labor market impacts of social and economic factors on the transmission of coronavirus disease (covid- ) in china patterns and limitations of urban human mobility resilience under the influence of multiple types of natural disaster consensus document on the epidemiology of severe acute respiratory syndrome (sars) which means all public transport and private vehicles are banned in the city, all residential buildings are locked down, and residents are not allowed to leave the city. cities in panel b are under partial lockdown, majority of the public transportation has been temporarily shut down, checkpoints has been set up to control the inflow population, and surveillance and tighter controls in each neighborhood. cities in panel c set up checkpoints and quarantine zones, and public transport maintains normal operation. key: cord- -m tapjp authors: peng, liangrong; yang, wuyue; zhang, dongyan; zhuge, changjing; hong, liu title: epidemic analysis of covid- in china by dynamical modeling date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: m tapjp the outbreak of the novel coronavirus ( -ncov) epidemic has attracted world- wide attention. herein, we propose a mathematical model to analyzes this epidemic, based on a dynamic mechanism that incorporating the intrinsic impact of hidden la- tent and infectious cases on the entire process of transmission. meanwhile, this model is validated by data correlation analysis, predicting the recent public data, and back- tracking, as well as sensitivity analysis. the dynamical model reveals the impact of various measures on the key parameters of the epidemic. according to the public data of nhcs from / to / , we predict the epidemic peak and possible end time for different regions. the epidemic in beijing and shanghai, mainland/hubei and hubei/wuhan, are expected to end before the end of february, and before mid- march respectively. the model indicates that, the outbreak in wuhan is predicted to be ended in the early april. as a result, more effective policies and more efforts on clinical research are demanded. moreover, through the backtracking simulation, we infer that the outbreak of the epidemic in mainland/hubei, hubei/wuhan, and wuhan can be dated back to the end of december or the beginning of january . a novel coronavirus, formerly called -ncov, or sars-cov- by ictv (severe acute respiratory syndrome coronavirus , by the international committee on taxonomy of viruses) caused an outbreak of atypical pneumonia, now officially called covid- by who (coronavirus disease , by world health organization) first in wuhan, hubei province in dec., and then rapidly spread out in the whole china . as of : feb. th, (beijing time), there are over , reported cases (including more than , death report) in china, among which, over % are from hubei province and over % from wuhan city, the capital of hubei province , . the central government of china as well as all local governments, including hubei, has tightened preventive measures to curb the spreading of covid- since jan. . many cities in hubei province have been locked down and many measures, such as tracing close contacts, quarantining infected cases, promoting social consensus on self-protection like wearing face mask in public area, etc. however, until the finishing of this manuscript, the epidemic is still ongoing and the daily confirmed cases maintain at a high level. during this anti-epidemic battle, besides medical and biological research, theoretical studies based on either statistics or mathematical modeling may also play a non-negligible role in understanding the epidemic characteristics of the outbreak, in forecasting the inflection point and ending time, and in deciding the measures to curb the spreading. for this purpose, in the early stage many efforts have been devoted to estimate key epidemic parameters, such as the basic reproduction number, doubling time and serial interval, in which the statistics models are mainly used [ ] [ ] [ ] [ ] [ ] [ ] . due to the limitation of detection methods and restricted diagnostic criteria, asymptomatic or mild patients are possibly excluded from the confirmed cases. to this end, some methods have been proposed to estimate untraced contacts , undetected international cases , or the actual infected cases in wuhan and hubei province based on statistics models , or the epidemic outside hubei province and overseas , [ ] [ ] [ ] . with the improvement of clinic treatment of patients as well as more strict methods stepped up for containing the spread, many researchers investigate the effect of such changes by statistical reasoning , and stochastic simulation , . compared with statistics methods , , mathematical modeling based on dynamical equations , - receive relatively less attention, though they can provide more detailed all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint mechanism for the epidemic dynamics. among them, the classical susceptible exposed infectious recovered model (seir) is the most widely adopted one for characterizing the epidemic of covid- outbreak in both china and other countries . based on seir model, one can also assess the effectiveness of various measures since the outbreak , , [ ] [ ] [ ] , which seems to be a difficult task for general statistics methods. seir model was also utilized to compare the effects of lock-down of hubei province on the transmission dynamics in wuhan and beijing . as the dynamical model can reach interpretable conclusions on the outbreak, a cascade of seir models are developed to simulate the processes of transmission from infection source, hosts, reservoir to human . there are also notable generalizations of seir model for evaluation of the transmission risk and prediction of patient number, in which model, each group is divided into two subpopulations, the quarantined and unquarantined , . the extension of classical seir model with delays , is another routine to simulate the incubation period and the period before recovery. however, due to the lack of official data and the change of diagnostic caliber in the early stage of the outbreak, most early published models were either too complicated to avoid the overfitting problem, or the parameters were estimated based on limited and less accurate data, resulting in questionable predictions. in this work, we carefully collect the epidemic data from the authoritative sources: the such a design aims to minimize the influence of hubei province and wuhan city on the data set due to their extremely large infected populations compared to other regions. without further specific mention, these conventions will be adopted thorough the whole paper. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint in progress. a. generalized seir model {s(t), p (t), e(t), i(t), q(t), r(t), d(t)} denoting at time t the respective number of the susceptible cases, insusceptible cases, exposed cases (infected but not yet be infectious, in a latent period), infectious cases (with infectious capacity and not yet be quarantined), quarantined cases (confirmed and infected), recovered cases and closed cases (or death). the adding of a new quarantined sate is driven by data, which together with the recovery state takes replace of the original r state in the classical seir model. their relations are given in fig. and characterized by a group of ordinary differential equations (or difference equations if we consider discrete time, see si). constant n = s + p + e + i + q + r + d is the total population in a certain region. the coefficients {α, β, γ − , δ − , λ(t), κ(t)} represent the protection rate, infection rate, average latent time, average quarantine time, cure rate, and mortality rate, separately. especially, to take the improvement of public health into account, such as promoting wearing face masks, more effective contact tracing and more strict locking-down all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint of communities, we assume that the susceptible population is stably decreasing and thus introduce a positive protection rate α into the model. in this case, the basic reproduction it is noted that here we assume the cure rate λ and the mortality rate κ are both time dependent. as confirmed in fig. a -d, the cure rate λ(t) is gradually increasing with the time, while the mortality rate κ(t) quickly decreases to less than % and becomes stabilized after jan. th. this phenomenon is likely raised by the assistance of other emergency medical teams, the application of new drugs, etc. furthermore, the average contact number of an infectious person is calculated in fig. e-f and could provide some clue on the infection rate. it is clearly seen that the average contact number is basically stable over time, but shows a remarkable difference among various regions, which could be attributed to different quarantine policies and implements inside and outside hubei (or wuhan), since a less severe region is more likely to inquiry the close contacts of a confirmed case. a similar regional difference is observed for the severe condition rate too. in fig. g -h, hubei and wuhan overall show a much higher severe condition rate than shanghai. although it is generally expected that the patients need a period of time to become infectious, to be quarantined, or to be recovered from illness, but we do not find a strong evidence for the necessity of including time delay (see si for more details). as a result, the time-delayed equations are not considered in the current work for simplicity. according to the daily official reports of nhc of china, the cumulative numbers of quarantined cases, recovered cases and closed cases are available in public. however, since the latter two are directly related to the first one through the time dependent recovery rate and mortality rate, the numbers of quarantined cases q(t) plays a key role in our modeling. a similar argument applies to the number of insusceptible cases too. furthermore, as the accurate numbers of exposed cases and infectious cases are very hard to determine, they will be treated as hidden variables during the study. leaving alone the time dependent parameters λ(t) and κ(t), there are four unknown coefficients {α, β, γ − , δ − } and two initial conditions {e , i } about the hidden variables (other initial conditions are known from the data) have to be extracted from the time series all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . data {q(t)}. such an optimization problem could be solved automatically by using the simulating annealing algorithm (see si for details). a major difficulty is how to overcome the overfitting problem. to this end, we firstly prefix the latent time γ − , which is generally estimated within several days , , . and then for each fixed γ − , we explore its influence on other parameters (β = nearly unchanged), initial values, as well as the population dynamics of quarantined cases and infected cases during best fitting. from fig. a -b, to produce the same outcome, the protection rate α and the reciprocal of the quarantine time δ − are both decreasing with the latent time γ − , which is consistent with the fact that longer latent time requires longer quarantine time. meanwhile, the initial values of exposed cases and infectious cases are increasing with the latent time. since e and i include asymptomatic patients, they both should be larger than the number of quarantined cases. furthermore, as the time period between the starting date of our simulation (jan. th) and the initial outbreak of covid- (generally believed to be earlier than jan. st) is much longer than the latent time ( - days), e and i have to be close to each other, which makes only their sum e +i all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint matters during the fitting. an additional important finding is that in all cases β is always very close to , which agrees with the observation that covid- has an extremely strong infectious ability. nearly every unprotected person will be infected after a direct contact with the covid- patients , , . as a summary, we conclude that once the latent time γ − is fixed, the fitting accuracy on the time series data {q(t)} basically depends on the values of α, δ − and e + i . and based on a reasonable estimation on the total number of infected cases (see fig. c-d) , the latent time is finally determined as days. in order to further evaluate the influence of other fitting parameters on the long-term forecast, we perform sensitivity analysis on the data of wuhan (results for other regions are similar and not shown) by systematically varying the values of unknown coefficients , . as shown in fig. e-f , the predicted total infected cases at the end of epidemic, as well as the the inflection point, at which the basic reproduction number is less than , both show a positive correlation with the infection rate β and the quarantined time δ − and a negative correlation with the protection rate α. these facts agree with the common sense and highlight the necessity of self-protection (increase α and decrease β), timely disinfection (increase α and decrease β), early quarantine (decrease δ − ), etc. an exception is found for the initial total infected cases. although a larger value of e + i could substantially increase the final total infected cases, it shows no impact on the inflection point, which could be learnt from the formula of basic reproduction number. we apply our pre-described generalized seir model to interpret the public data on the cumulative numbers of quarantined cases, recovered cases and closed cases from jan. th to feb. th, which are published daily by nhc of china since jan. th. our preliminary study includes five different regions, i.e. the mainland * , hubei * , wuhan, beijing and shanghai. through extensive simulations, the optimal values for unknown model parameters and all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . initial conditions, which best explain the observed cumulative numbers of quarantined cases, recovered cases and closed cases (see fig. ), are determined and summarized in table . there are several remarkable facts could be immediately learnt from table . firstly, the protection rate of wuhan is significantly lower than other regions, showing many infected cases may not yet be well quarantined until feb. th (the smaller α for wuhan does not necessarily mean people in wuhan pay less attention to self-protection, but more likely due to the higher mixing ratio of susceptible cases with infectious cases). similarly, although the average protection rate for hubei * is higher than that of wuhan, it is still significantly lower than other regions. secondly, the quarantine time for beijing and shanghai are the all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint shortest, that for mainland * is in between. again, the quarantine time for wuhan and hubei * are the longest. finally, the estimated number of total infected cases on jan. th in five regions are all significantly larger than one, suggesting the covid- has already spread out nationwide at that moment. we will come back to this point in the next part. the initial values for exposed cases and infectious cases separately. the time-dependent cure rate λ(t) and mortality rate κ(t) can be read out from fig. and are given in si. most importantly, with the model and parameters in hand, we can carry out simulations for a longer time and forecast the potential tendency of the covid- epidemic. in fig. and fig. a -b, the predicted cumulative number of quarantined cases and the current number of exposed cases plus infectious cases are plotted for next days as well as for a shorter period of next days. official published data by nhc of china from feb. th to th are marked in red spots and taken as a direct validation. overall, except wuhan, the validation data show a well agreement with our forecast and all fall into the % confidence interval (shaded area). and we are delighted to see most of them are lower than our predictions, showing the nationwide anti-epidemic measures in china come into play. while for wuhan city (and also hubei province), due to the inclusion of suspected cases with clinical diagnosis into confirmed cases ( cases for wuhan and cases for hubei * on feb. th) announced by nhc of china since feb. th during the preparation of our manuscript, there is a sudden jump in the quarantined cases. although it to some all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint extent offsets our original overestimates, it also reveals the current severe situation in wuhan city, which requires much closer attention in the future. towards the epidemic of covid- , our basic predictions are summarized as follows: . based on optimistic estimation, the epidemic of covid- in beijing and shanghai would soon be ended within two weeks (since feb. th). while for most parts of mainland, the success of anti-epidemic will be no later than the middle of march. the situation in wuhan is still very severe, at least based on public data until feb. th. we expect it will end up at the beginning of april. are not included into parameter estimation). by coincidence, on the same day, we witnessed a sudden jump in the number of confirmed cases due to a relaxed diagnosis caliber, meaning more suspected cases will receive better medical care and have much lower chances to spread virus. besides, wuhan local government announced the completion of community survey on all confirmed cases, suspected cases and close contacts in the whole city. besides the forecast, the early trajectory of the covid- outbreak is also critical for our understanding on its epidemic as well as future prevention. to this end, by adopting the shooting method, we carry out inverse inference to explore the early epidemic dynamics all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint of covid- since its onset in mainland * , hubei * , and wuhan (beijing and shanghai are not considered due to their too small numbers of infected cases on jan. th). with respect to the parameters and initial conditions listed in table , we make an astonishing finding that, for all three cases, the outbreaks of covid- all point to - days before jan. th (the starting date for public data and our modeling). it means the epidemic of covid- in these regions is no later than jan. st (see fig. d ), in agreement with reports by li et al. , , . and in this stage (from jan. st to jan. th), the number of total infected cases follows a nice exponential curve with the doubling time around days. this in some way explains why statistics studies with either exponential functions or logistic models could work very well on early limited data points. furthermore, we notice the number of infected cases based on inverse inference is much larger than the reported confirmed cases in wuhan city before jan. th. in this study, we propose a generalized seir model to analyze the epidemic of covid- , which was firstly reported in wuhan last december and then quickly spread out nationwide in china. our model properly incorporates the intrinsic impact of hidden exposed and infectious cases on the entire procedure of epidemic, which is difficult for traditional statistics analysis. a new quarantined state, together with the recovery state, takes replace of the original r state in the classical seir model and correctly accounts for the daily reported confirmed infected cases and recovered cases. based on detailed analysis of the public data of nhc of china from jan. th to feb. th, we estimate several key parameters for covid- , like the latent time, the quarantine time and the basic reproduction number in a relatively reliable way, and predict the inflection point, possible ending time and final total infected cases for hubei, wuhan, beijing, shanghai, etc. overall, the epidemic situations for beijing and shanghai are optimistic, which are expected to end up within two weeks (from feb. th, ). meanwhile, for most parts of mainland including the majority of cities in hubei province, it will be no later than the middle of march. we should also point out that the situation in wuhan city is still very severe. more effective policies and more efforts on medical care and clinical research are eagerly needed. we expect the final success of anti-epidemic will be reached at the beginning all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint of this april. furthermore, by inverse inference, we find that the outbreak of this epidemic in mainland, hubei, and wuhan can all be dated back to - days ago with respect to jan. th, in other words the end of dec. , which is consistent with public reports. although we lack the knowledge on the first infected case, our inverse inference may still be helpful for understanding the epidemic of covid- and preventing similar virus in the future. the authors declare no conflict of interest. epidemic doubling time of the novel coronavirus outbreak by province in mainland china. medrxiv epidemiological and clinical features of the novel coronavirus outbreak in china preliminary estimation of the basic reproduction number of novel coronavirus ( -ncov) in china the novel coronavirus, -ncov, is highly contagious and more infectious than initially estimated. medrxiv serial interval of novel coronavirus ( -ncov) infections. medrxiv assessing spread risk of wuhan novel coronavirus within and beyond china all rights reserved. no reuse allowed without permission author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the using predicted imports of -ncov cases to determine locations that may not be identifying all imported cases. medrxiv epidemic size of novel coronavirusinfected pneumonia in the epicenter wuhan: using data of five-countries' evacuation action. medrxiv estimating the daily trend in the size of covid- infected population in wuhan. medrxiv estimation of the asymptomatic ratio of novel coronavirus ( -ncov) infections among passengers on evacuation flights early dynamics of transmission and control of -ncov: a mathematical modelling study. medrxiv the effect of travel restrictions on the spread of the novel coronavirus ( -ncov) outbreak. medrxiv the impact of traffic isolation in wuhan on the spread of -ncov. medrxiv feasibility of controlling -ncov outbreaks by isolation of cases and contacts effectiveness of airport screening at detecting travellers infected with -ncov. medrxiv predictions of -ncov transmission ending via comprehensive methods all rights reserved. no reuse allowed without permission author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the a data driven time-dependent transmission rate for tracking an epidemic: a case study of -ncov novel coronavirus -ncov: early estimation of epidemiological parameters and epidemic predictions. medrxiv estimation of the transmission risk of the -ncov and its implication for public health interventions an updated estimation of the risk of transmission of the novel coronavirus ( -ncov). infectious disease modelling transmission dynamics of -ncov in malaysia. medrxiv lockdown may partially halt the spread of novel coronavirus in hubei province interventions targeting air travellers early in the pandemic may delay local outbreaks of sars-cov- . medrxiv simulating the infected population and spread trend of -ncov under different policy by eir model. medrxiv the lockdown of hubei province causing different transmission dynamics of the novel coronavirus ( -ncov) in wuhan and beijing. medrxiv jing-an cui, and ling yin. a mathematical model for simulating the transmission of wuhan novel coronavirus. biorxiv a time delay dynamical model for outbreak of -ncov and the parameter identification modeling and prediction for the trend of outbreak of ncp based on a time-delay dynamic system all rights reserved. no reuse allowed without permission author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint (which was not peer-reviewed) is the partial equilibrium approximations in apoptosis. ii. the death-inducing signaling complex subsystem chiu fan lee, and ya jing huang. statistical mechanics and kinetics of amyloid fibrillation we acknowledged the financial supports from the national natural science foundation all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity.the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . author/funder, who has granted medrxiv a license to display the preprint in perpetuity.the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . all rights reserved. no reuse allowed without permission.author/funder, who has granted medrxiv a license to display the preprint in perpetuity.the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint all rights reserved. no reuse allowed without permission. author/funder, who has granted medrxiv a license to display the preprint in perpetuity.the copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/ . / . . . doi: medrxiv preprint key: cord- -fsjze t authors: aslan, ibrahim halil; demir, mahir; wise, michael morgan; lenhart, suzanne title: modeling covid- : forecasting and analyzing the dynamics of the outbreak in hubei and turkey date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: fsjze t as the pandemic of coronavirus disease (covid- ) rages throughout the world, accurate modeling of the dynamics thereof is essential. however, since the availability and quality of data varies dramatically from region to region, accurate modeling directly from a global perspective is difficult, if not altogether impossible. nevertheless, via local data collected by certain regions, it is possible to develop accurate local prediction tools, which may be coupled to develop global models. in this study, we analyze the dynamics of local outbreaks of covid- via a coupled system of ordinary differential equations (odes). utilizing the large amount of data available from the ebbing outbreak in hubei, china as a testbed, we estimate the basic reproductive number, r of covid- and predict the total cases, total deaths, and other features of the hubei outbreak with a high level of accuracy. through numerical experiments, we observe the effects of quarantine, social distancing, and covid- testing on the dynamics of the outbreak. using knowledge gleaned from the hubei outbreak, we apply our model to analyze the dynamics of outbreak in turkey. we provide forecasts for the peak of the outbreak and the total number of cases/deaths in turkey, for varying levels of social distancing, quarantine, and covid- testing. in late , the city of wuhan in the province of hubei, china experienced an outbreak of coronavirus disease , the disease caused by the novel coronavirus sars coronavirus (sars-cov- ). this outbreak quickly spread to all states of china and across the globe, being declared a pandemic by the world health organization (who) on march . the authorities imposed a strict lock-down on the city of wuhan and other cities of the hubei province on january (world health organization, b) . in the face of over sixty-seven thousand cases and over three thousand deaths, the authorities continued strict enforcement of these measures (chinese physicians, ; coronavirus covid- global cases by johns hopkins csse, ). finally, on march , hubei reached a significant milestone as the province's health commission reported no new cases for seven consecutive days (world health organization, b; the new york times, ) . shortly thereafter, after over two months of severe restrictions on the movements of the hubei population, the " hubei lockdowns" were relaxed as the hubei outbreak began to wane, inspiring hope that the global pandemic might be able to be controlled. previous studies of covid- provided the evidence of human-to-human transmission and revealed its similarity and differences from sars (chan et al., ; huang et al., ; xu et al., ) . however, data-driven simulation-based studies are needed to understand the dynamics of the ongoing outbreak. indeed, it is of the utmost importance to use these tools to investigate the effectiveness of public health strategies, such as the number of covid- tests carried out to detect the infected, the level of quarantine/social distancing, and its efficiency in the transmission of studies investigate dynamics of this pandemic from a global perspective (see, e.g., (imai, dorigatti, cori, riley, & ferguson, ; read, bridgen, cummings, ho, & jewell, ; riou & althaus, ; shen, peng, xiao, & zhang, ; zhao et al., ; cao et al., ) ). nevertheless, the large variations in both quality and availability of data from region to region make direct global modeling of the dynamics of this pandemic exceedingly difficult. as a result, in this study, we develop a model for dynamics of the pandemic from a local perspective. the many "hotspots" of covid- combined with the many travel restrictions in place throughout the world further suggest that local models might provide more practical insights into the dynamics than their global counterparts. indeed, it stands to reason that accurate models for local regions can be coupled to develop reasonable models for larger regions. as of march , around one-quarter of the global covid- cases and consequent deaths occurred in hubei. the large proportion of data available from hubei combined with the region's recent achievements toward managing their local outbreak suggest that the data from this region presents an excellent picture of the lifetime of an outbreak of covid- . indeed, as countries worldwide close their borders, cities and regions, and impose their own "shelterin-place," quarantine, or lockdown orders in the face of the pandemic, the large amount of data available from hubei provides an excellent testbed for modeling the dynamics of a local outbreak of covid- . in this study, we start by developing a seiqr type deterministic model which uses a system of ordinary differential equations to analyze the dynamics of the outbreak, in particular highlighting the effect of testing and the effects of quarantine and social distancing in hubei. we present estimates of the basic reproductive number r of covid- in hubei and perform a sensitivity analysis to deduce which parameters play significant roles in the transmission and control of the outbreak in hubei. in addition, we also provide -day forecasts of the fatality rate of the outbreak, the number of cases, and the number of deaths depending on the data (chinese physicians, ; coronavirus covid- global cases by johns hopkins csse, ; world health organization, b) and outputs of our seiqr model. finally, building on knowledge obtained from the hubei outbreak, we apply our model to the outbreak in turkey. we forecast the peak of the outbreak and the total number of cases/deaths in turkey, utilizing the extant covid- data from turkey ((ministry of health (turkey), )). a deterministic compartmental model has been developed by using ordinary differential equations (odes) to understand the dynamics of covid- in hubei, china (chubb & jacobsen, ; keeling & rohani, ; kot, ). in the model, the total population n (t) at time t is divided into the following six compartments: susceptible s(t), susceptible in quarantine (isolated class) s q (t), exposed e(t), infected (asymptomatic or having mild symptoms) i(t), reported (infected) cases (hospitalized if get severe symptoms or quarantined if get mild symptoms) i q (t), and recovered r(t). note that all individuals who, upon testing, test positive are immediately isolated. the transition flows among compartments are given in figure . the rate of reported cases i q denotes the number of individuals who transition from the infected class i to the reported class i q per day; it is also directly related to the daily number of covid- tests carried out during the outbreak. figure : flow diagram illustrating the disease transitions among the compartments susceptible individuals make the transition to the s q (t) compartment with a rate of note that the main indicator of quarantine is the number of reported cases i q . when the number of reported cases increases in a state or country, then the quarantine is imposed or naturally taken as an option. thus, when the number of reported cases increases, then percentage or amount of people quarantined will increase. if the number of reported cases falls to zero, the transition rate from s to s q is zero and from s q to s is q s . the individuals in s and s q compartments transition to compartment e (exposed) with a force of infection given by and disease transmission rate β. note that, since the individuals in s q transition to e compartment less frequently, a reduction factor r is taken into count in the model. after an incubation period of /α, the individuals in e compartment transition to i compartment (infected) with rate α. the individuals in i compartment will either transition to r compartment (recovered) with a rate of γ i or i q compartment with a rate of i q , or die due to the disease with a rate of µ i . the individuals in i q (reported (infected) individuals, who are hospitalized or quarantined) compartment either transition to r compartment with a rate of γ q or die due to disease with a rate of µ q . the following (odes) system represents dynamical behavior of the system. . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / the left hand side of the system ( ) represents the rate of change per day. in the system ( ), we have hence, the feasible region of the system ( ) is given by this implies all the compartments stay non-negative. the parameter values used in the model are given in table with their description and units. . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint diseases free equilibrium and stability analysis one of major concepts in an outbreak is disease free equilibrium (dfe), where the entire population is susceptible (keeling & rohani, ; diekmann, heesterbeek, & roberts, ). for the system ( ), the dfe can be denoted to able to get the dfe for the system ( ), we set the right hand side of the system ( ) to zero and substitute the dfe into the system. hence, the dfe is found as we then analyze whether the dfe is stable or not. next-generation matrix (ngm) is used (van den driessche & watmough, ; diekmann et al., ; van den driessche & watmough, ) to determine the stability of dfe. we rewrite our system ( ) as: where x = (s * , s * q , e * , i * , i * q , r * ) and i = , ..., and hence, f and v are calculated for the system of ( ): notice that individuals which transition to e compartment are the only newly infected cases. therefore, the jacobian at the dfe for the infected classes (first three components, x) are cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / and then, we compute the next generation matrix (ngm) as the spectral radius of the ngm is the basic reproduction number, r defined as the average number of secondary cases arising from an average primary infected case in an entirely susceptible population. the dfe is locally stable if r < (van den driessche & watmough, ; diekmann et al., ) . the spectral radius of the ngm given in ( ) since we do not consider the individuals in s q compartment as a part of the dfe, we do not see any effect of quarantine on r . however, s q indirectly changes the other parameter values such as β, i q , therefore, r value changes with s q indirectly. note that α, β are positively correlated with r and i q , γ i , µ i , d are negatively correlated with r for the system ( ). note that we might control the disease with increasing quarantine rate of infected individual i q . thus, if then the dfe is locally stable and the disease dies out when sufficiently close to dfe. biologically, if the infected individuals can be detected in a sufficiently short time another word, the number of test to detect the number of cases increase, then the disease can be controlled. in this part, we estimate the parameters in the system ( ), so we fit our model with the daily reported cumulative number of cases and deaths, which are provided by (world health organization, b) and (chinese physicians, ). we use the ordinary least squares (ols) method and minimize the sum of the squares of differences between the daily reported data and those predicted by our model. the goodness of the fit is measured by computing the associated relative error of the fit using the formula where c i andĈ l are exact and estimated cumulative(infected) cases, and d i andd l are exact and estimated cumulative deaths. to estimate the number of covid- deaths, we sum the number of deaths coming from the infected class i and the reported (infected) class i q . note that the natural deaths in the infected class i and the reported (infected) class i q are also included in the total number of deaths. we used an ode solver with fmincon from the optimization toolbox of matlab. by using the initial conditions: s( ) = , , , s q ( ) = , i q ( ) = , and r( ) = , we estimate all the parameters of the model together with estimating the initial number of exposed e and infected i, except the natural death rate d, recruitment rate Π, and incubation period α. we used . days for the average incubation period . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / that is provided by (nishiura et al., ; guan et al., ; sanche et al., ) . the natural death and recruitment rate are provided by (world health organization, a). the simulation results obtained for the cumulative number of (infected) cases c and cumulative deaths d by fitting the model with the data from january , to march , are depicted in figure . these figures show a reasonably good fit with the total relative error . ( %). most of the error comes from the fit of cumulative cases, especially around february , . in february, china began to report clinically diagnosed cases in addition to laboratory-confirmed cases, and on february , , , clinically (rather than laboratory) cases reported even though they were diagnosed in the preceding days and weeks. due to the very small number of cases reported after march , , we chose to fit the model using only data from before this date. table ). in this section, we discuss -day forecasting of the outbreak, the effect of quarantine, and the effect of testing in hubei. we also conduct a sensitivity analysis to see which parameters play important role in the dynamics of the outbreak. when we look at the change in the quarantine class, nearly the entire province of hubei was quarantined by february , (see figure , february corresponds the day in the figure). the percentage of the population transitioning from the susceptible class to the quarantine class attains its maximum level between january , and february , , and by february , almost all of the population were in quarantine. this result makes sense since the state government imposed a quarantine in the state on january , , initially recommending quarantine and finally forcing the people into quarantine to guarantee social distancing. this action seems to have worked to great effect, reducing the contact rate by about . % (see table for the parameter, r, the reduction rate due to the quarantine). . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint when we reduce the quarantine rate, s q from . to . ( % reduction) and do not change the remaining parameters, the number of cases and deaths would be about and , respectively. similarly, when we increase the quarantine rate, s q from . to . ( % increase), the number of cases and deaths would be about and , respectively. thus, any change in the quarantine rate makes very significant change in total number of cases and deaths. furthermore, see the sensitivity analysis section below, the quarantine rate is a significant parameter in the dynamics of the outbreak as well as its efficiency, which is explained by the parameter, r is also significant in the dynamic of the outbreak. we used parameters in table for -day forecasting. figure shows the estimated number of infected cases for days. the plot on the left depicts the estimated number of exposed and the right plot depicts the estimated number of reported (infected) cases i q . as it can be seen, the number of individuals in each of these classes tends to zero, which implies that the outbreak is almost over, and so new cases may not be recorded in hubei. as it can be seen from the change in infected class, the outbreak reaches its peak about february , . the infected class i also shows how many people were out with no symptoms or mild symptoms during the outbreak. figure : the plot on the left depicts the number of exposed cases and the plot on the right depicts the number of infected cases with initial conditions s( ) = , , , s q ( ) = , e( ) = , i( ) = , i q ( ) = , r( ) = for days in figure , the plot on the left shows the estimated number of cumulative reported (infected) cases and the right . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint plot shows the estimated number of cumulative deaths. as of march , there were no reported cases in hubei in the past week and the total number of cases and total number of deaths were and , respectively. our model ( ), predicts the number of cases and deaths with high accuracy with percent relative error. we estimated the fatality rate of the outbreak in hubei as approximately . % with the estimated number of cases, about and deaths, about . several parameters play important roles in the model ( ). these parameters were estimated with existing data as of (coronavirus covid- global cases by johns hopkins csse, ). in order to determine the set of parameters that are statistically significant regarding the number of cumulative infected cases, we conduct a sensitivity analysis of the model. we utilized a latin hypercube sampling (lhs) and the partial rank correlation coefficients (prcc) method (marino, hogue, ray, & kirschner, ) . we use a range given in table to sample parameters from a uniform distribution, then use these samples as input variables when we run the system ( ) with initial conditions s( ) = , s q ( ) = , e( ) = , i( ) = , i q ( ) = , r( ) = for days. the number of cumulative infected cases is the output variables in sensitivity analysis. table shows prcc values, p-values and the range for each corresponding parameters. the sensitivity analysis indicates that β, s q , i q , and r are statistically more significant parameters depending on the . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https: //doi.org/ . //doi.org/ . / high prcc values in the dynamics of the outbreak. therefore, it is of interest to study how the number of cumulative infected cases changes when s q , i q , r, and β are varied and other parameters are held the same as in table and the initial condition same as before. figure shows the results of these experiments, how the number of cumulative cases changes for different values of s q , i q , r, and β. it is also important to analyze how r value varies with β and i q . thus, we vary β in the range [ , ] and i q in the range [ . , ] while keeping all other parameters the same in table . figure shows the boxplot of β and i q . we observe i q affects r in a wider range compare to β. thus, the range of r will change roughly between and . in addition, the value of r drops below when i q is above . . . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint the rate of reported (infected) cases, i q is related to the number of tests given during the outbreak to identify the infected people. thus, increasing the number of tests will increase the rate of case reporting i q . this will reduce the number of cases (see figure ) and, consequently, the number of deaths due to the outbreak. when we increase the rate of reported (infected) cases i q by about %, the number of cases and number of deaths are estimated to be and , respectively. decreasing the rate about %, the number of cases and number of deaths are estimated to be and , respectively. in the part, we fit the model ( ) with available covid- data from turkey ((ministry of health (turkey), )). we fit the model ( ) with turkish data from march , to april , , and get about . % relative error in the fit by using the equation ( ). we estimate the four parameters i q , s q , β, and r, which are not only the most significant parameters in the dynamics of outbreak, but also are specific to each country since they are related to the number of covid- tests administered i q , the number of individuals in quarantine s q , the contact rate of individuals β, and the efficiency of quarantine r in each country. therefore, by using the initial conditions: s( ) = , , , s q ( ) = , i q ( ) = , and r( ) = , we estimate these four parameters together with the initial number of exposed and infected individuals. we do not estimate the rest of the parameters, employing the parameters in table . therefore, our results in this section will depend on observed dynamics of the outbreak in hubei as well as the available turkish data (((ministry of health (turkey), )). note that the quarantine rate and the rate of reported cases (which, we stress, is related to number of covid- tests) can be increased, and the increase still may have significant effect toward the reduction of the number of cases (see figure , sensitivity analysis), but increasing the reduction rate r does not make very significant changes by way of the number of cases in turkey since it is very close to its maximum level (see figure , sensitivity analysis part). thus, we will vary only the quarantine rate s q and the rate of reported cases i q in forecasting the peak of the outbreak and the number of cases/deaths in turkey. . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https: //doi.org/ . //doi.org/ . / the rate of reported cases is about . ; this rate is larger than what we observed in hubei. this implies that in terms of numbers of covid- tests conducted per day, turkey is now doing a better job than hubei, china at a comparable time in hubei's outbreak. the efficiency of quarantine also seems to be very good in turkey, given the approximately % reduction in the contact rate of covid- obtained by our parameter estimation. on the other hand, the quarantine rate is about . , which is small when compared with the quarantine rate in hubei (the rate was . in hubei). in hubei, the population transitioned to quarantine class very quickly (almost in two weeks), but in turkey the movement to quarantine has been very slow in comparison (see figure ), suggesting why the contact rate is higher in turkey when we compare to the contact rate in hubei (see table and for these rates). it is still possible to increase the quarantine rate (the rate, per day, of transition to quarantine class) and the number of covid- tests given each day in turkey to make a reduction in the number of cases and deaths (see figures and ). in figures and , the red curves are obtained using base parameters from table and , and the other curves obtained by varying the quarantine rate s q and the rate of reported cases i q . when we use the base parameter values which are obtained from our fitting, turkey then will have about , cases and , deaths. if turkey can increase the number of individuals in quarantine and the number of daily covid- tests, then, depending on the magnitude of the increases, the number of cases and deaths can decrease significantly (see figures and ). when we look at trajectories of cumulative cases and deaths in figures and , in the worst-case scenario (the black curves) of the study, turkey will have about , cases and , deaths. these projections decrease to , cases and , deaths if turkey can increase the number of individuals in quarantine and the number of covid- tests. . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint figure : cumulative number of (infected) cases depending on different quarantine rate s q and rate of reported cases i q . left graph shows the cumulative number of cases between day to day , and right plot shows the cumulative number of cases between day to day in the outbreak in turkey. figure : cumulative number of deaths depending on different quarantine rate s q and rate of reported cases i q . left graph shows the cumulative number of deaths between day to day , and right plot shows the cumulative number of deaths between day to day in the outbreak in turkey. the peak of the outbreak in turkey is also very sensitive to the quarantine rate s q and the rate of reported cases i q . depending on the change in quarantine rate and the rate of reported cases i q , the peak of outbreak in turkey can be seen between the day (april , ) and day (april , ), and the outbreak will almost die out by the day (at the end of july , see figure ). . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / . . . doi: medrxiv preprint figure : projected (simulated) peak of outbreak in turkey depending on different quarantine rate s q and rate of reported cases, i q our analysis suggests that quarantine greatly reduced the number of cases and deaths seen in hubei's covid- outbreak. in addition, while quarantine does not appear in the representation of r , it still indirectly reduces r . we also saw that the dynamics of the outbreak is very sensitive to the quarantine rate s q and contact rate β, as indicated by our sensitivity analysis. the basic reproductive number is estimated as . and the study shows that any change in β or i q directly affects the basic reproductive number. the quarantine decidedly reduces the number of cases and deaths. increasing (or decreasing) the speed of movement from the susceptible class to the quarantine class by about % would double (or half) the number of cases and deaths due to the outbreak (this speed of movement is controlled by the rate s q ). of course, the efficiency of the quarantine is also very important. in our model, the efficiency of the quarantine measured by the reduction rate, r. the reduction rate shows how much reduction is effected in the contact of covid- thanks to the quarantine. based on our sensitivity analysis, this parameter is very important (see figure ). our model shows that the quarantine in hubei was almost perfect since it caused about . percent reduction in the contact rate of covid- . another important parameter that plays a crucial role in the dynamics of the outbreak is the rate of reported (infected) cases i q which is directly related to the number of tests given to detect infected individuals. similar to the quarantine rate s q , the rate of reported (infected) cases i q could double (half) when we have % reduction (or increase) in the rate. as of march , there were no reported cases in hubei in the past week and the total number of cases and total number of deaths were and , respectively. based on our -days forecasting, the number of cases in hubei was projected to be about and the number of deaths was projected to be about . thus, we estimate the fatality rate of the outbreak to be about . % in hubei. our model gives about % relative error and we are confident that using the model will be helpful for forecasting local outbreaks of the pandemic in other regions. from existing covid- data from turkey and the dynamics of our model understood from the hubei analysis, the outbreak in turkey is expected to reach its peak between april and april depending on the number of individuals (amount of people) in quarantine and the number of covid- tests carried out each day in turkey. the daily number of tests given in turkey is large when we compare to the rates of reported cases in hubei. as we showed in the sensitivity analysis, increasing the number of covid- tests and the number of individuals in quarantine will significantly reduce the number of cases (and deaths). based on our forecasting, the number of cases will be about , with the range , and , , and the number of deaths will be about , with the range , and , depending on quarantine rate, s q and the rate of reported cases, i q in turkey. thus, in any cases that are given in figure and , the fatality rate of covid- will be about . % in turkey. . cc-by-nd . international license it is made available under a author/funder, who has granted medrxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) the copyright holder for this preprint . https://doi.org/ . / small changes in quarantine rate make significant changes in the total number of cases and deaths in turkey. the efficiency of the quarantine in turkey is about percent, meaning that it causes percent reduction in the contact rate of covid- . thus, the quarantine rate, s q and its efficiency is very important to be able to contain the outbreak (see table for reduction rate, r and figure for the effect of reduction rate in the total number of cases). as of april , , the number of covid- tests given each day in turkey had increased to , ((ministry of health (turkey), )). if the number is increased further, then it also will decrease the total number of cases and deaths in turkey (see figure and ). indeed estimating the effective reproduction number of the -ncov in china a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster. the lancet. chinese physicians. ( ). ncov.dxy.cn mathematical modeling and the epidemiological research process coronavirus covid- global cases by johns hopkins csse. ( ). as the construction of next-generation matrices for compartmental epidemic models others ( ). clinical characteristics of novel coronavirus infection in china clinical features of patients infected with novel coronavirus in wuhan, china estimating the potential total number of novel coronavirus cases in wuhan city modeling infectious diseases in humans and animals elements of mathematical ecology. cambridge a methodology for performing global uncertainty and sensitivity analysis in systems biology coronavirus disease (covid- ) daily data serial interval of novel coronavirus ( -ncov) infections. medrxiv novel coronavirus -ncov: early estimation of epidemiological parameters and epidemic predictions pattern of early human-to-human transmission of wuhan -ncov the novel coronavirus, -ncov, is highly contagious and more infectious than initially estimated modelling the epidemic trend of the novel coronavirus outbreak in china reproduction numbers and sub-threshold endemic equilibria for compartmental models of disease transmission further notes on the basic reproduction number crude birth and death rate data by country novel coronavirus (covid- ) situation reports evolution of the novel coronavirus from the ongoing wuhan outbreak and modeling of its spike protein for risk of human transmission preliminary estimation of the basic reproduction number of novel coronavirus ( -ncov) in china, from to : a data-driven analysis in the early phase of the outbreak the authors would like to acknowledge the generous support of the turkish ministry of national education in the study. the authors declared no competing interests.authors' contributions all authors contributed equally to this work. key: cord- -q ah w authors: li, yang; duan, guangfeng; xiong, linping title: characteristics of covid- near china's epidemic center date: - - journal: am j infect control doi: . /j.ajic. . . sha: doc_id: cord_uid: q ah w background: this study described and analyzed the age, gender, infection sources, and timing characteristics of the confirmed cases in two cities near the center of china's covid- outbreak. methods: this study used publicly available data to examine gender, age, source of infection, date returned from hubei, date of disease onset, date of first medical visit, date of final diagnosis, and date of recovery of covid- cases. results: public-use data revealed similar risks of infection by age and that the numbers of new and final diagnoses of confirmed cases first increased, peaked at about two weeks, and then gradually decreased. the main sources of infection were firsthand or secondhand exposure in hubei province and contact with confirmed cases, which mostly involved contact with infected household members. the mean periods from disease onset to first medical visit, first visit to final diagnosis, and final diagnosis to recovery were . , . , and . days, respectively. conclusions: the results suggest that the measures taken to control the rate of infection were effective. prevention and control efforts should respond as quickly as possible, isolate and control activities of individuals leaving infected areas, and restrict household contact transmission. the first novel coronavirus pneumonia (covid- ) case was identified in wuhan, hubei province, china, on december , , after which the disease gradually spread. the emergence of the covid- epidemic coincided with the traditional chinese spring festival when most migrant workers return to their hometowns to celebrate. covid- 's novel infection presented few obvious upper respiratory symptoms (such as nasal discharge, sneezing, or sore throat), indicating the virus mainly was infecting the lower respiratory tract, , and most patients' first symptom was fever. the mode of transmission might have been by droplets, close contact, aerosol, mother-infant, or fecal-mouth transfer. during the incubation period, patients could transmit the virus to other humans. [ ] [ ] [ ] [ ] [ ] [ ] [ ] as of february , , countries had reported confirmed cases of covid- , of which china reported , confirmed cases, , recovery cases, and , deaths. , according to the research reports, covid- is highly infectious, and the large-scale population migration associated with the spring festival exacerbated the spread of the disease to outlying areas. xinyang city is in southern henan province, china, on the northern border of hubei province, and fuyang city is in northwest anhui province, adjacent to xinyang city. xinyang and fuyang are typical labor exporting cities near the epidemic's center. , this study investigated aspects of the covid- transmission regarding xinyang and fuyang, described its characteristics, and evaluated the prevention and control measures. china's data on covid- are gathered based on its classification as a class b infectious disease. class b infectious diseases are considered mandatory notifiable diseases; all new cases must immediately be reported using the national infectious diseases monitoring information system database. to prevent rapid spread of the disease, the municipal health departments publicized information about the confirmed cases on the governments' websites, including personal information, personnel exposure, and the disease trajectory. we downloaded the case information from the target cities' health commission websites and transformed it into numerical data. the variables used in the analysis were: gender, age, source of infection, date returned from hubei, date of disease onset, date of first medical visit, date of final diagnosis, and date of recovery. as of february , , cases of effective data were collected in the two cities , : cases in xinyang and cases in fuyang. the sources of infection were: ( ) firsthand or secondhand contact with hubei ("hubei exposure"), ( ) "confirmed case contact," ( ) "non-hubei returnee exposure," and ( ) "others." "hubei exposure" comprised confirmed cases of individuals who had recently left hubei province or had not recently left hubei but had been in contact with asymptomatic individuals who had been in hubei province. "confirmed case contact" refers to infected individuals who had not left their residential areas and they had been in close contact with individuals who were confirmed cases. "non-hubei returnee exposure" refers to individuals who had recently returned to xinyang or fuyang from non-hubei provinces. in this study, a "returnee" was an individual who had returned to xinyang or fuyang from some other location, and "non-returnee" referred to an individual who had not left xinyang or fuyang. the age, gender, trajectory, and rates of infection distributions of the confirmed cases in xinyang and fuyang were analyzed. the distribution of confirmed cases in households was analyzed to describe the extent of covid- clustered within household units. using the data on timing of disease onset and final diagnosis, the covid- development over time was investigated. regarding the disease trajectory (confirmed cases), four temporal stages were identified: ( ) arrival → disease onset, ( ) disease onset → first medical visit, ( ) first medical visit → final diagnosis, and ( ) final diagnosis → recovery. the mean periods of each stage were described and analyzed. ibm spss . was used for data analysis. as of february , , cases had been confirmed in xinyang and fuyang. in xinyang, of the cases were in recovery ( . %), and two had died ( . % fatality rate); in fuyang, of the cases were in recovery ( . %), and no deaths were reported. thus, on that date, there were and ongoing cases in xinyang and fuyang, respectively. however, cases were not included due to incomplete information. among the confirmed cases with complete effective data (n= ), were male ( . %) and were female ( . %). the proportional age distribution was zero- years old ( . %), - years old ( . %), and years or older ( . %). the proportion of confirmed cases aged - years in the returnee group was higher than among the non-returnees, and the proportions of confirmed cases aged zero- and aged or older among the non-returnees were higher than among the returnees. table presents the distributions regarding age, gender, source of infection, and within-household transmission and figure illustrates the disease trajectory between onset and final diagnosis from january , , through february , , and the disease trajectory from january , , through february , , by source of infection. figure a illustrates that the first day of disease onset in the two cities was january , , after which the number of confirmed cases gradually increased. the disease onset peak was january through january and then the number of newly confirmed cases gradually decreased. the first final diagnosis was on january , , the numbers of final diagnoses gradually increased, they peaked january through february , and they gradually decreased from that date. the peak of the final diagnoses was about six days after the peak of disease onset. hubei exposure was the source of ( . %) confirmed cases, ( . %) cases were confirmed case contacts, non-hubei returnee exposure accounted for ( . %) cases, and there were ( . %) cases due to other sources. figure b shows that the main source of infection before february was hubei exposure, and, after february , the main source of infection was confirmed case contact. regarding within-household infection, households (with confirmed cases) experienced within-household transmission based on multiple infected household members (table ) . of them, households had two, households had three, two households had four, one household had five, and two households had six infected household members ( . %, . %, . %, . %, and . % of the households with more than one infected household member, respectively). the mean number of people infected in the households with more than one infected member was . . returnees. the period between date of first visit and date of final diagnosis was slightly longer for non-returnees than returnees, and the period between date of final diagnosis and date of recovery was slightly longer for returnees than non-returnees. numbers less than zero mean that some confirmed cases did not present symptoms at the time of first medical visit. among the returnees, the proportion of confirmed cases aged - years was . %, whereas the proportion of those aged years or older and - years was just . %. in comparison to the returnees, the proportion of non-returnees aged - years was . % higher and the proportion of those aged or older was . % lower. the returnees' male to female sex ratio was . : , and the male to female sex ratio among non-returnees was . : . these results might reflect the fact that migrant workers are most likely to be males aged years, which means that there were relatively higher proportional representations of females and older people among the non-returnees. moreover, this finding indicates that people of all ages are susceptible to covid- . the numbers of final diagnoses of confirmed cases peaked within days of onset and then gradually decreased until february when just one case was diagnosed. this finding demonstrates that the spread of the virus had effectively been controlled through various measures, such as isolating exposed people, reducing public gatherings, increasing screenings for fever, and widespread public dissemination of prevention and control information. this study's analysis revealed that the main source of confirmed cases was hubei exposure or confirmed case contact. during the first half of the outbreak period, hubei exposure was the likeliest source and, during the second half of the outbreak period, confirmed case contact was the likeliest source of infection. previous studies have found that close contact with infected individuals tended to carry a high risk of infection. , the present study found that, of the confirmed cases whose source of infection was "confirmed case contact," . % were via household contact with one or more confirmed cases. studies have shown that the number of infected cases was significantly reduced by controlling the city's traffic, closing entertainment venues, and banning public gathering. implementing these measures can limit the progression of the epidemic. by further controlling within-household contact with infected people and the size of public gatherings, incidence might be further decreased. the key to controlling infectious diseases is early detection, reporting, isolation, and treatment. we found that the mean period from date of return to the study area and date of disease onset was . days (range - - ). twenty confirmed cases among the returnees ( . %) had symptoms before they arrived in xinyang or fuyang, suggesting that one of the first steps to take should be to assertively control of workers' abilities to return home which, in the early stage, might slow the rate of infection. the mean period from date of disease onset to first medical visit was about . days (range: - - ). two cases did not have symptoms at the time of first medical treatment (screening). li et al. found that the mean interval between date of disease onset and date of first visit was . days (cases with onset before january , ) or . days (onset from january through january ). we found a slightly shorter period, implying that public awareness of covid- and medical treatment had gradually improved and people were increasingly likely to seek treatment. moreover, the mean period between date of disease onset and date of first visit numbers less than zero indicate that some confirmed cases had symptoms reflecting disease onset before they returned home. the negative number indicates that some cases did not present symptoms at the time of first medical treatment. among non-returnees was slightly longer than among returnees, indicating that quarantine and isolation measures were slightly stronger for returnees than non-returnees. we found that the mean period between date of first visit and date of final diagnosis was . days, suggesting that the efficiency of early detection measures needed improvement. in addition, the mean period between date of final diagnosis and date of recovery was about . days (range: - ). the mean hospital stay was days in a previous study, but it was slightly longer in our study. effective responses to covid- for prevention and control required implementation of governmental measures, which apparently controlled the rate of infection in xinyang and fuyang, which are cities with significant flows of migrant workers to and from hubei province. the key to controlling the rate of infection via returnees is to act as quickly as possible, focus on isolating and controlling returnees' mobility, and decreasing close within-household contact between infected and non-infected household members. if these measures were implemented as a preemptive first step, the rate of infection would further be reduced. the funder had no role in the study's design, data collection, analysis, the decision to publish epidemiology working group for ncip epidemic response. an update on the epidemiological characteristics of novel coronavirus pneumonia (covid- ) clinical features of patients infected with novel coronavirus in wuhan a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster first case of novel coronavirus in the united states national health commission of the people's republic of china. what is fecal-oral transmission? national health commission of the people's republic of china the state council information office of the people's republic of china. press conference of the joint prevention and control of the state council a -hour old infant in wuhan diagnosed and mother-to-child infection suspected clinical analysis of neonates born to mothers with -ncov pneumonia world health organization epidemiology working group for ncip epidemic response. the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) in china the investigation and research on fuyang's off farm workers national health commission, ministry of human resources and social security, ministry of finance. measures to improve working conditions of and care for physical and mental health of healthcare workers clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china henan provincial people's government a new coronavirus associated with human respiratory disease in china the novel coronavirus originating in wuhan, china: challenges for global health governance reduce large-scale gathering activities in wuhan an investigation of transmission control measures during the first days of the covid- epidemic in china early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia. the new england journal of medicine the state council information office of the people's republic of china. press conference of the joint prevention and control of the state council we would like to thank editage (www.editage.cn) for english language editing. the authors declare that they have no competing interests. key: cord- -iqr fp authors: fan, changyu; liu, linping; guo, wei; yang, anuo; ye, chenchen; jilili, maitixirepu; ren, meina; xu, peng; long, hexing; wang, yufan title: prediction of epidemic spread of the novel coronavirus driven by spring festival transportation in china: a population-based study date: - - journal: int j environ res public health doi: . /ijerph sha: doc_id: cord_uid: iqr fp after the novel coronavirus ( -ncov) outbreak, we estimated the distribution and scale of more than million migrants residing in wuhan after they returned to their hometown communities in hubei province or other provinces at the end of by using the data from the – china migrants dynamic survey (cmds). we found that the distribution of wuhan’s migrants is centred in hubei province (approximately %) at a provincial level, gradually decreasing in the surrounding provinces in layers, with obvious spatial characteristics of circle layers and echelons. the scale of wuhan’s migrants, whose origins in hubei province give rise to a gradient reduction from east to west within the province, and account for % of wuhan’s total migrants, are from the surrounding prefectural-level cities of wuhan. the distribution comprises districts and counties in hubei province, and the cumulative percentage of the top districts and counties exceeds %. wuhan’s migrants have a large proportion of middle-aged and high-risk individuals. their social characteristics include nuclear family migration ( %), migration with families of – members ( %), a rural household registration ( %), and working or doing business ( %) as the main reason for migration. using a quasi-experimental analysis framework, we found that the size of wuhan’s migrants was highly correlated with the daily number of confirmed cases. furthermore, we compared the epidemic situation in different regions and found that the number of confirmed cases in some provinces and cities in hubei province may be underestimated, while the epidemic situation in some regions has increased rapidly. the results are conducive to monitoring the epidemic prevention and control in various regions. the outbreak of a new coronavirus ( -ncov) has spread internationally since the initial report of cases by wuhan municipal health commission, china on december [ ] [ ] [ ] [ ] . on january , who announced that there is a high risk of a -ncov epidemic in china and at a global level [ ] . the analysis object of this study was the floating population who have lived in wuhan for more than one month. short-term migrants and students were not included. distinct from other models of the dynamics of this epidemic, we used the information of the respondents from the cmds and their family members to further explore the origins of wuhan's migrant population, such as their returning destination, population characteristics, family structures and other metrics. this approach can provide practical solutions to prepare prevention strategies, and approaches to assess resources for treatment and containment of the epidemic. the national health commission of china released a report on january that stated that -ncov could be transmitted not only via respiratory droplets, but also via direct contact. -ncov has now spread nationally and worldwide, and due to the lack of data on the size and origins of the floating population of wuhan, it has been difficult for the chinese government to arrange real-time medical resources and implement effective public health interventions. we used data from the wuhan floating population monitoring survey to estimate the size and origins of the migrant population in wuhan. we also described the socio-demographic characteristics of this population, and compared confirmed cases from different regions to estimate the epidemic with modelling techniques. we found that three-quarters of wuhan's floating population are from hubei province, and that nearly % migrated with nuclear families. the number of members per family is to , and most individuals are migrant workers from rural areas with low education levels. by comparing the predicted and actual values obtained from the model, we analysed the profile of the epidemic in various regions since january , and found that the spread of the -ncov has varied greatly between regions, and that the epidemic in some regions may be underestimated. there may also be unknowns, such as structural factors in some regions, that deserve further attention. the majority of the floating population left wuhan before the city was "closed off" by authorities, so our analysis will be useful for estimating the key geographic areas for prevention and control. the results indicate that the floating population of wuhan is centred in hubei province and the surrounding provinces, so local government must quickly and effectively take steps to prevent further spread of -ncov. higher-level governments must also strengthen the assistance they are providing, such as sending medical workers and medical supplies to these areas to avoid -ncov becoming a new pandemic. at the same time, it is important to increase surveillance in areas where the epidemic may be underestimated, and promptly identify prevention and control loopholes to reduce the burden of a new round of transmission. china has been deeply involved in the globalisation process, and even china's central and western regions have become important links in the global production and trade chain. therefore, while our research is aimed at china in the current era of migration, this research has practical implications for global public health and disease control, as floating populations are increasing in size all over the world and relationships between countries are becoming increasingly close. thus, other countries should pay attention to the epidemic situation in specific geographic areas of china to prevent secondary and international transmission of the -ncov. the data used in this study are based on the - china migrants dynamic survey (cmds), and the tabulated data of million migrants in wuhan recently released by the wuhan municipal government. this survey was carried out via a multi-stage stratified sampling method, and collected data with structured questionnaires. as survey data is limited to the mainland provinces, municipalities, autonomous regions, and hainan province, the population analysed in this study excluded the populations of hong kong and macao. a total of , samples of the resident floating population in wuhan from to were extracted from the survey dataset. according to the survey design, wuhan's floating population was defined as the population from other cities and districts, aged and over, residing for more than one month in wuhan, and not registered in wuhan. in table , the sample distribution of the resident population in wuhan over time is presented. the sample size was in the year , and for other years. the outbreak is considered to have originated in the huanan seafood market, near hankou railway station in the jianghan district, wuhan, china. a severely infected area was concentrated in the urban areas of hankou. from the sample distribution in table , the floating population of wuhan is seen to be concentrated in the urban area of hankou. the results indicate that the sample proportion of jianghan district and nearby jiang'an district, tongkou district, and dongxihu district is approximately . %. therefore, the samples used in this study are suitably representative and thus acceptable for assessing the spread of the -ncov outbreak among the floating population. total , hankou zone jiang'an - qiaokou - jianghan - dongxihu - huangpi - xinzhou - wuchang zone hongshan - wuchang - qingshan - jiangxia - hanyang zone hanyang - hannan - caidian - all three zones ----- we obtained data from the "dingxiangyuan" national real-time epidemic website on confirmed cases from january to january [ ] . these data are compiled from open data released by authorities such as the national health commission, and the provincial and municipal health commissions. to ensure data comparability, we collected the data daily from : to : every day. using the information of the floating population and their family members in wuhan, we analysed their return destinations and their structural characteristics by descriptive statistical methods. in table , the distribution of the origins of wuhan's floating population is presented, at the provincial level, over the past several years. the sample size is quite stable for each province over time. the province of origin for % of the floating population was hubei province, which contains the city of wuhan, and approximately % of the population originated in other provinces. the location information published in other historical survey data is limited to the province where household registration is located, due to a lack of data for . the data provided by the hubei provincial health commission in includes more detailed information of prefectures, cities, districts and counties. the analysis was therefore divided into two parts; the first part comprised an analysis of the origin of wuhan's floating population at the provincial level using historical data. the second part comprised an analysis of the floating population within hubei province based on data. when analysing the floating population at the provincial level, we used all samples from the previous years-i.e., the mean of years of data collection-to ensure the robustness of the results, in view of the stability of sample distributions in each province over time. total , hubei henan anhui hunan jiangxi chongqing zhejiang sichuan fujian jiangsu shandong guangdong hebei tianjin shanghai inner mongolia xizang ningxia according to the current infectious features of -ncov, which are that middle-aged and elderly people have a high risk of infection, and transmission can occur between individuals, families and communities, we assessed several main variables. these comprised age group, educational level, pattern of migration, number of migrating family members per household, type of household registration, and reasons for migration. we defined these variables in the following ways: ( ) age group was classified as under , - , - , - , - , and over ; ( ) educational level was divided into junior high school and below, high school/secondary school, and college and above; ( ) pattern of migration was divided into independent migration, nuclear family migration, and extended family migration; ( ) number of migrating family members per household was classified as , , , , and or more; ( ) types of household registration were divided into rural and urban household registration; ( ) reasons for migration were working and doing business, family relocation, or other reasons. the analyses assume a theoretical model of -ncov transmission. we considered a floating population of million in wuhan, who returned to their hometowns from january , as potential infected persons. moreover, we added factors of demographic characteristics, the situation of medical diagnosis, government prevention and control, the number of confirmed cases, and undisclosed data to our statistical model to estimate the dynamics of the epidemic. after controlling for certain factors, we analysed the factors that were not controlled, such as government intervention and the number of statistical reports. specifically, we first analysed the correlation between the size of the floating population in wuhan and the number of confirmed cases per day. then, we examined the differences among regions and proposed a transmission rate as a reference to compare the differences in regions. in the comparative analysis, we focused on the probably underestimated number of cases and the virus transmission rate to determine the likelihood of epidemics existing in different regions. finally, we predicted the floating population of wuhan using statistical methods and compared it with the number of -ncov confirmed cases in each region, to identify regional differences of -ncov infection. furthermore, we predicted the forthcoming epidemic trend at the prefectureand province-level based on the proportion of wuhan's floating population represented by people from these areas. human-to-human transmission of the -ncov has been confirmed. four sets of factors that may influence regional differences appear to be involved: ( ) demographic factors, such as short-term business travellers between wuhan and other regions, college students in wuhan returning to their homes in other regions, spring festival tourists from wuhan to other regions, and trans-regional floating populations for spring festival family reunions from or across wuhan; ( ) intervention factors, such as medical treatments and governmental preventative measures; ( ) information disclosure and the information release system; and ( ) other unknown factors. we considered all these factors, and hypothesised the social environment of -ncov transmission. first, although the government had taken the unprecedented measure of sealing off wuhan city on january , we assumed that, at that time, the entire floating population of wuhan, all short-term business travellers to wuhan and all college students in wuhan had returned to their hometowns throughout china, because january was the spring festival's eve (normally, the spring festival holiday from january to ). moreover, this spring festival vacation period started at least a week before this date time, leaving plenty of time for these people to leave the city. however, the number of people in wuhan that travelled to reunite with their families in other cities during the spring festival vacation may be negligible, for the sealing-off of the city and other preventive measures taken across the country may have prevented their travelling. second, the influence of the college students in wuhan was an invariant factor, as college students are young and healthy, have fixed travelling routes, come from different regions evenly scattered across the country, and travelled to return home on or around january ; we would assume their influence on virus transmission to different regions to be the same. third, the medical treatment ability of regional medical centres of hubei province would also be the same, as the breakout emerged so fast that these regional medical centres would have had the same level of emergency-preparedness. finally, the above factors will not change dramatically until the mass return of wuhan's floating population after the conclusion of the spring festival vacation. to estimate the floating population in the cities of hubei province and across the country, we must determine the floating population residing in wuhan in . as the statistics compiled by the wuhan city government from have not been released, the data from previous years was used for this prediction. the prediction of floating population in wuhan based on the statistics from previous years is presented in table , demonstrating that there were approximately . million migrants living in wuhan for more than six months in . however, if the predictions of the statistical data were combined with survey data, which was used in this study to estimate the origin of wuhan's floating population that return to their hometowns, there would have been a problem with inconsistent statistical strength. this would have resulted from the fact that the floating population measured by the government statistics department reflects those who have lived in wuhan for more than months, but the respondents in the survey have lived in wuhan for over one month. a shorter defined residence time would have therefore produced a larger estimate of the population, and thus the total floating population in wuhan, as determined from the cmds data, was larger than the population as determined by the government statistics department. on january , xinhua news agency (an official government media source) interviewed the mayor of wuhan and reported that more than million members of the floating population had returned to their hometowns before the spring festival holiday. this number stated (over million) was more than twice the predicted value in this study ( . million), indicating that the statistical strength of the news report was based on a shorter period of residence, and this was consistent with the data we used to determine the floating population residing in wuhan for over one month. thus, in the absence of more rigorous and authoritative total data, we used million people as wuhan's floating population, from which to estimate the scale and distribution of those members of this population who returned to their hometown during the festival. based on sample survey data, in table , the proportional estimation of the origins of wuhan's floating population at a provincial level is presented, as well as the results of statistical analysis based on a floating population of million. estimation of population size is based on the total number of floating population in wuhan (about million); ci = confidence interval. the national distribution of the migrants presents obvious spatial characteristics of circle layers and echelons at provincial level (table and figure ). ( ) hubei province is the central area of origin of wuhan's floating population, accounting for % of the population, with a % confidence interval of ( . , . ). based on a total population of million people, wuhan's floating population with household registration in hubei province is approximately . million, with a % confidence interval of ( , , to , , ). ( ) henan, anhui, jiangxi and hunan provinces belong to the first circle layer. henan province, home to a floating population of , , had the highest proportion with respect to its total population, equating to approximately . % and a % confidence interval of ( , to , ). based on the analysis of city data in , xinyang, zhumadian, shangqiu, and nanyang cities in henan province accounted for . %, . %, . %, and . % respectively, of the floating population from henan living in wuhan, accounting for approximately % of the total. the floating population proportions of anhui, hunan, and jiangxi provinces were . %, . %, and . %, respectively, with corresponding floating populations in wuhan of , , , , and , respectively. ( ) chongqing, zhejiang, sichuan, fujian and jiangsu provinces are at the second circle layer, with . %, . %, . %, . % and . % floating populations, respectively, with corresponding populations of approximately , , , , , , , and , respectively. ( ) shandong, guangdong, hebei, gansu, guangxi, heilongjiang, shaanxi, shanxi and guizhou provinces are at the third circle layer, with a proportion of . % to . % and a corresponding population of , to , . ( ) some provinces and municipalities, including qinghai, liaoning, yunnan, jilin and beijing, are located in the fourth circle layer, accounting for . - . % of the floating population, equating to - people. ( ) the remaining provinces and municipalities, such as hainan, xinjiang, tianjin, shanghai, inner mongolia, tibet and ningxia, are at the fifth circle layer, with a floating population proportion of less than . %, corresponding to ≤ people. as presented in the table above, this population is mainly - years old, but the scale of the susceptible, high-risk and over years old population is also very large. the distribution is as follows: ( ) the susceptible and high-risk population is concentrated in hubei province. the size of the - age group is more than , , that of the age group of - is , , and the number of people over is , . ( ) henan and anhui provinces have larger susceptible and high-risk populations, of more than , and nearly , , respectively. ( ) six provinces and municipalities, namely hunan, jiangxi, chongqing, zhejiang, sichuan and jiangsu, have a high-risk population of to years old, comprising , - , people. ( ) in provinces, namely fujian, shandong, guangdong, hebei, gansu, guangxi, shanxi, guizhou, qinghai and xinjiang, the susceptible and high-risk populations are also concentrated in the - age group, with a population of approximately - . ( ) the three provinces of northeast china, namely heilongjiang, jilin and liaoning, have large susceptible and high-risk populations, equating to approximately in heilongjiang and approximately in jilin and liaoning. infection of family members is a main means of transmission, and the distribution of the characteristics of floating population family migration at the provincial level are detailed in table . the vast majority of the floating population migrates to wuhan in the form of nuclear families ( . %), and most families comprise - members ( . %). the distribution is as follows: ( ) the number of nuclear family households in the wuhan floating population that originates from hubei province is . million, accounting for . % of the total floating population of wuhan, and households with - family members number , , , accounting for . % of the total. the high risk of -ncov transmission within and by this population is self-evident. ( ) families from henan, anhui, hunan and jiangxi provinces comprise a large proportion of those in the floating population of wuhan. those from henan total nearly , households, and the number of these households with - family members is more than , . approximately , families from the remaining provinces are part of the floating population of wuhan, including nearly , - family-member households from anhui and more than , from hunan and jiangxi. ( ) the number of families in the floating population of wuhan from chongqing, zhejiang, sichuan, fujian and jiangsu municipalities and provinces is , - , , and the number of households with - family members is , - , . ( ) the number of families in the floating population of wuhan that originate from other provinces, namely shandong, guangdong, hebei, gansu, guangxi, heilongjiang and shaanxi, is , - , households, and the number of households with - family members is approximately , . the remaining provinces comprise fewer than , households jiangxi. certain factors can easily spread the virus from homes to communities in rural areas, such as a lack of medical resources and investment, weak health prevention and control, low awareness of health, and insufficient awareness of infectious diseases. in table , the floating population in wuhan is dominated by rural households ( . %), and working or doing business is the main reason for their having travelled to wuhan ( . %). therefore, epidemic prevention and control in rural areas is of critical importance. the distribution is as follows: ( ) the joint distribution of the origins of wuhan's floating population within hubei province is . %, equating to a population of , , , and . % of these are migrant workers, equating to , , people. ( ) henan, anhui, hunan, and jiangxi province both have more than , households with rural household registers and migrant workers in wuhan, and the population of those from henan in wuhan's floating population is approximately , . ( ) chongqing, zhejiang, sichuan, fujian, jiangsu, shandong, guangdong, hebei, gansu, and guangxi provinces together have a population of , - , households with rural household registers in wuhan and less than , in the remaining provinces of china. notably, guangdong, gansu, heilongjiang and liaoning have a larger proportion of the population with urban household registers, and this is greater than the number of rural household registers in guangdong province. , , guizhou qinghai liaoning yunnan beijing jilin hainan xinjiang tianjin shanghai inner mongolia xizang ningxia virus transmission is related to individual health awareness, which is affected by an individual's educational level, so we also examined the educational level of the floating population in wuhan. in table , % of the population was educated to junior middle school and below, % had senior high school or technical secondary school education, and % had college education and above, indicating that the overall education level of this population was low. specifically: ( ) in the provinces of qinghai, chongqing, jiangxi, anhui, henan, yunnan, guangxi and xinjiang, % or more of the population was educated to junior high school level or below. ( ) approximately - % of the population of the provinces of hubei, sichuan, hebei, fujian, jiangsu, hunan, guizhou, shandong, shanxi, tibet and gansu was educated to junior high school level or below. ( ) the population in three municipalities, including beijing, tianjin and shanghai, have a high level of education, with over % receiving tertiary education. the population of the remaining provinces had a medium-to-high educational level. above all, these data indicated that there is a large middle-aged and older high-risk floating population in wuhan. their social characteristics include having travelled to wuhan in a nuclear family of - members, being on a rural household register, and often having a lower educational level. these characteristics are consistent with conditions favouring the wide spread of -ncov. according to the foregoing analysis, % of wuhan's floating population have registered households in hubei province, equating to approximately . million people. that such a large proportion of the floating population of wuhan originate from elsewhere in hubei province has reduced the possibility of the epidemic spreading across the country, but all regions in hubei province are facing tremendous pressure from the spread of the epidemic. therefore, we used the cmds data to analyse the distribution of the floating population in regions within hubei province. table and figure present the distribution of the origins of wuhan's floating population within hubei province. the proportion of the floating population gradually decreases from east to west across hubei province, and there are great differences between cities. the distribution is as follows: ( ) xiaogan, wuhan, and huanggang are in the first echelon. the proportion of the floating population who originate from these cities is high, accounting for . %, . %, and % of the total, respectively. they are a cross-regional floating population of , and a % confidence interval of ( . , . ). the analysis of districts and counties indicates that the members the floating population who originate from the outskirts of huangpi district and xinzhou district flow into the main urban area of hankou, so the epidemic situation in the outskirts of huangpi district and xinzhou district needs special attention. secondly, the members of wuhan's floating population who originate from xiaogan comprise the largest proportion, equating to approximately , people and a % confidence interval of ( . , . ). members of wuhan's floating population who originate from huanggang comprise the third proportion, equating to approximately , people and a % confidence interval of ( . , . ). ( ) the three directly managed by province (dmp) cities (xiantao, qianjiang, and tianmen) and jingzhou belong to the second echelon, each comprising approximately , people, and each accounting for approximately % of the floating population of wuhan, with a % confidence interval of ( , ). ( ) jingmen, suizhou, xianning, and huangshi belong to the third echelon, accounting for - % of the floating population of wuhan, equating to , - , people. ( ) xiangyang, ezhou, yichang, enshi, and shiyan belong to the fourth echelon, accounting for less than % of the floating population of wuhan, equating to fewer than , people. overall, the suburbs of wuhan surrounding xiaogan, huanggang, and the three dmp cities are the origins of the largest proportion ( %) of the floating population of wuhan, equating to approximately . million people. we used district-and county-level variables to estimate the floating population within hubei province, and the results are presented in table . the survey covered districts and counties, including huangpi, xinzhou, jiangxia, caidian, and hannan in wuhan, as well as cross-region active migrants in some major urban areas. the top districts and counties of hubei province in terms of floating population are huangpi, hanchuan, xiantao, xinzhou, hong'an, yunmeng, honghu, macheng, xiaonan, and xiaochang. that is, ≥ , people from each of these districts and counties are part of the floating population of wuhan, with the top districts and counties, huangpi, hanchuan and xiantao, having ≥ , people in wuhan's floating population. these top district and counties of hubei province are followed by jingshan, yingcheng, dawu, guangshui, tianmen, lishui, jianli, anlu, jiangxia and caidian, which each have , - , people in wuhan's floating population. the third tier is huangmei, yangxin, daye, gongan, tongshan, jiayu, zhongxiang, qianjiang, songzi, huarong, zengdu, enshi, liangzihu, zaoyang, dongxihu, wuxue, huangzhou, hannan, xian'an, xiangzhou, zhijiang, echeng, luotian, badong, chibi, chongyang, hongshan, shayang, shishou, suixian, tuanfeng, gucheng and xiangcheng. these districts and counties each have , - , people in wuhan's floating population. the remaining districts and counties have fewer than , people in wuhan's floating population. in general, these members of wuhan's floating population originate from certain districts and counties of hubei province. the cumulative percentage of the top districts and counties exceeds % of these areas' total population, showing a clear exponential distribution trend. we then analysed the social characteristics of the migrants in hubei province by age, type of migration, number of migrants, type of household registration, and reasons for traveling to wuhan to become part of its floating population. from table (please see the last page), we observe that in terms of susceptible and high-risk groups over years old, there are approximately , people in xiaogan, approximately , people in wuhan (cross-region migration), and approximately , people in huanggang. there are also approximately , people in the dmp cities and jingzhou respectively, and , - , people in jingmen, suizhou, xianning, and huangshi. fewer than , people from each of xiangyang, ezhou, yichang, enshi and shiyan have travelled to wuhan. the migration characteristics of the floating population of wuhan from hubei province are detailed in table . migration with a nuclear family is the main pattern, accounting for nearly % of the total, or . million households. the proportion of households with - family members (i.e., nuclear families) is approximately %, or . million households. specifically, , nuclear families originate from xiaogan, , - , nuclear families originate from the inner suburbs of wuhan and huanggang, and approximately , nuclear families originate from the dmp cities and jingzhou. more than , nuclear families originate from jingmen, suizhou, xianning, and huangshi, while fewer than , nuclear families originate from xiangyang, ezhou, yichang, enshi and shiyan. the distribution of households with - members is similar to that of nuclear families. it also presents the distribution of the origins of wuhan's floating population who originate from within hubei province. according to the statistical results, rural household registers account for %, equating to a population of approximately . million. the proportion of the group who was working and doing business in urban areas is %, and the population is . million. the size of the population distribution in each city is similar to the aforementioned migration types and other variables, and is not reported here. in table , the overall educational level of those members of wuhan's floating population who originate from hubei province is higher than the national level, with approximately % having been educated to junior high school level and below, approximately % to high school/secondary school level and below, and approximately % to college and above. however, in those members of wuhan's floating population who originate from the surrounding cities of wuhan, which contribute a large number of people to the floating population of wuhan, namely xiaogan, huanggang, huangshi, suizhou, dmp cities, xianning, and ezhou, > % of people have an educational level of junior high school and below, with this being > % in xiaogan. this means that the awareness of health protection and timely treatment may be low in this section of the floating population of wuhan, which will heighten the risk of large-scale transmission of -ncov. the floating population in wuhan will serve as a sound predictor for the trend of the -ncov outbreak. the pearson's correlation coefficient between the proportion of the floating population in wuhan who originate from a certain region of hubei and the number of confirmed -ncov cases in each region increased from . on january to . on january (table ). this indicates that when a region contributes a higher number of people to the floating residential population of wuhan, more confirmed cases will emerge in this region. table ); ratio = confirmed cases (on / / )/floating population from wuhan (unit: , people); dmp (directly managed by the province) cities includes xiantao, qianjiang and tianmen; the pearson's correlation coefficient is calculated from the number of floating populations in wuhan and the number of confirmed cases per day. we assumed that the effect of the floating population on the transmission of the -ncov is consistent across hubei province, and selected three prefectures that contribute the greatest number of people to the floating population of wuhan (xiaogan, huanggang and jingmeng) as the reference prefectures to predict the epidemic trend of the -ncov at prefecture level. those prefectures can be divided into three groups since january ( the floating population of wuhan originated from outside hubei province may have promoted the spread of -ncov. table compares the number of individuals travelling from wuhan to other provinces and the daily number of confirmed cases for those other provinces. analysis revealed that the correlation coefficient at the provincial level was lower than at the prefecture level within hubei province, but the correlation coefficient increased from . on january to . on january . table also shows the ratio of confirmed cases in each province to the proportion of people in the floating population in wuhan who originate from each of these provinces, on january . we divide provinces into two categories based on their short-term travel populations in wuhan, and wuhan's travelling population to other provinces during the spring festival holiday. the first category comprises those provinces that have large-scale short-term business trips or tourist populations in wuhan during the spring festival holiday, namely beijing, shanghai, tianjin, and hainan. obviously, such a high level of inter-provincial population mobility may exacerbate the spread of -ncov. for example, the high ratio of confirmed cases in guangdong province may be due to the large short-term travel populations visiting shenzhen and guangzhou and wuhan, while the high ratio of confirmed cases in hainan province may result from the outbound tourist population from wuhan to hainan during the spring festival holiday. in table , the results are divided into two parts: the correlation coefficient of the first category of provinces, which reaches a maximum of . , and the correlation coefficient of the second category of provinces, which increased from . to . . this abovementioned second category comprise the other provinces that have small short-term business trip groups or tourist populations in wuhan during the spring festival holiday. we assumed that the effect of the floating population on the spread of -ncov was consistent across the country. the other provinces are divided into three groups since january ( figure ) : ( ) provinces with a rapid increase in the number of confirmed cases, namely zhejiang, shandong, guangxi, shaanxi, liaoning, and yunnan; ( ) provinces with a moderate increase in the number of confirmed cases, namely hunan, chongqing, sichuan, fujian, jiangsu, hebei, gansu, heilongjiang, shaanxi, guizhou, qinghai, jilin, xinjiang, inner mongolia, tibet, and ningxia; and ( ) provinces with a small increase in the number of confirmed cases, namely henan, anhui, and jiangxi. in table , if we exclude the data of henan province and zhejiang province from the second category, we find that the correlation coefficient on january is . . we selected four provinces (henan, hunan, sichuan, and zhejiang) as the reference provinces to predict the epidemic trend of -ncov in each province. we found that: ( ) the epidemic growth model of henan province does not fit the situation in most other provinces. that is, except in anhui and jiangxi, the actual number of outbreaks in other provinces was higher than that predicted by the henan model. as these provinces have large floating populations in wuhan, the rapid increase in the number of confirmed cases in henan, anhui and jiangxi may result from effective measures that have been taken to control the spread of -ncov, or the lack of sufficient diagnostic capabilities to detect suspected cases. ( ) the epidemic growth model for hunan and sichuan province predicts a rapid increase in the number of confirmed cases in henan, anhui and jiangxi provinces. thus, if the epidemic pattern in hunan and sichuan follows a typical evolutionary pattern, the current numbers of confirmed cases in the three provinces of henan, anhui, and jiangxi are greatly underestimated. for example, the number of confirmed cases in henan on january would be between and , but the number in official announcements was only . in contrast, the number of confirmed cases in zhejiang, shandong, guangxi, shaanxi, liaoning, and yunnan provinces were higher than the predicted number, which may be affected by uncontrollable local factors that need further investigation. ( ) the epidemic growth model for zhejiang province predicts a rapid increase in the number of confirmed cases in most provinces, especially jiangsu and fujian provinces that are adjacent to zhejiang. it is important to investigate why there were so many confirmed cases in zhejiang, and whether the outbreak in jiangsu and fujian province was not detected in a timely manner, or whether all possible cases have not yet occurred. overall, the predicted epidemic pattern for hunan and sichuan provinces fits best to the actual epidemic trend of the -ncov outbreak. however, the current number of confirmed cases in henan, anhui, and jiangxi provinces is likely to be underestimated, especially given that these contain extensive rural areas with large populations and limited medical resources. the higher actual number of confirmed cases in zhejiang, shandong, guangxi, shaanxi, liaoning, and yunnan provinces may be affected by other unknown factors or uncontrollable random factors that need further investigation. to prevent or mitigate the spread of an emerging infectious disease and its negative effects, public health interventions mainly aim at three types of population, namely the population in the source area, the floating population leaving the source area, and the population travelling from the infected area to other areas. the spring festival in is much earlier than in previous years. at this time, the possibility of human-to-human transmission of a new coronavirus had just been discovered. when the wuhan municipal government decided on january to "close the city" to control the outflow of population, more than million people had already left wuhan on the spring festival holiday, and it was too late to control the entire potentially infected population in the epidemic area. at present, china's high-speed railway and expressway transportation network has experienced great development. this fast and convenient transportation has led to a floating population that can leave the source area to quickly reach every part of the country, which makes it very difficult to quarantine the floating population leaving the source area through transportation stations. in addition, there is an incubation period after human infection, further increasing the difficulty of quarantine at traffic stations, which is also an important reason for the implementation of "city closure" control policies in many cities across the country. after -ncov was confirmed as being capable of transmitting from human to human, the chinese government implemented top to bottom national mobilisation. it fully investigated and isolated the population of wuhan, and also publicised the severity of the epidemic, and also increased awareness of the prevention of infectious diseases and raised people's vigilance through messages on television, mobile communications and the internet. in addition, according to the latest epidemic surveillance, the incubation period of the coronavirus is to days, with an upper limit of days. for this reason, the central government has issued an executive order to extend the spring festival holiday from january to february . many provinces are even requiring firms to not restart work until february, except those necessary for social operations related to the national economy and people's livelihood. extending the holiday is needed to avoid the returning people leaving home early and returning to work, so as to minimise the risk of the epidemic spreading again due to population fluctuations. there are limitations to this study. first, our analysis did not include other large-scale populations. for example, some are college students, because wuhan is the city with the largest number (> million) of college students in china and the world. the other parts include short-term business travellers, transit passengers and tourists. official media reported that the size of the populations during the spring festival holiday would reach more than million. this can be confirmed from the daily-confirmed cases of -ncov infection. although there is a small permanent population in wuhan whose household register belongs to provinces and cities such as beijing, shanghai, tianjin, hainan, and guangdong (in fact, shenzhen and guangzhou are two megacities), these provinces and cities still have large-scale temporary floating populations from and to wuhan because of the large population and well-developed economy. therefore, the number of confirmed cases of -ncov infection in these areas is far ahead of that in most other provinces that have a large floating population in wuhan. second, our sample has a certain deviation. the data on the origin of wuhan's floating population does not include hong kong, macao, or international migrants, which makes our research unable to estimate the population size of these regions. at present, some cases have been confirmed in surrounding asian countries, europe, north america and australia. third, limited to interdisciplinary research capabilities, our model does not include infectious disease analysis models such as sir to further analyse the potential and scale of -ncov spread, which may reduce the value of this research in the prevention and control of -ncov infections. finally, the results of the study are mainly applicable to the end of the spring festival holiday, and after the large-scale population comes back to work or study, the spread of the epidemic will be more complicated. we believe that the abovementioned limitations can be overcome. using big data such as location information of transportation and mobile internet, short-term floating populations can be included in the study to maximise the estimated population flotation and scale in wuhan. unfortunately, thus far we have not seen a rigorous study using big data to analyse the outflow of populations in the epicentre of an epidemic. this means that there is still a long way to go for the research and application of big data in the field of national and global public health. at the time of writing this paper ( january ), all provinces in china have reported confirmed or suspected cases of -ncov, every prefecture and city in hubei province has confirmed cases of -ncov, and transmission of -ncov has spread from imported to inter-regional. due to the fact that million migrants had left wuhan before the "closure of the city", our research reveals a high correlation between the number of wuhan's floating population and the number of confirmed cases. fortunately, the origin of wuhan's floating population is highly concentrated in hubei province and its surrounding provinces, of which the migrants with hubei household registers account for %, and more than % of the population is concentrated in the top districts and counties. this means that some areas will face a very high risk of epidemic outbreaks, but it is also conducive to centralised resources enabling prevention and control of the epidemic to avoid large-scale spread in other regions. more than million of wuhan's floating population have returned to their hometowns as potential carriers of the virus and may become carriers of the virus's re-transmission. due to china's urban and rural dualistic structure, most of these people are rural migrant workers with low levels of education. the results find that % of the migrants have rural household registers. these people, who frequently work outdoors or work overtime are more likely to be susceptible because of their poor diet and nutrition. at the same time, most of these people travel with - family members, and the susceptible and high-risk population over years old accounts for a large proportion of this floating population, which provides ideal conditions for the transmission of -ncov within families. to make matters worse, the rural areas where these people return to have very limited medical and public health services, and gatherings during the spring festival aggravate the risk of virus transmission in the community. so far, confirmed cases of -ncov continue to increase every day across china. the results of our model analysis indicate that, on the one hand, the correlation between the size of the floating population and the number of confirmed cases in wuhan has continued to increase over time, and by january, the correlation coefficient of these factors in hubei province had reached . , which means that the size of the floating population in wuhan is an important parameter for predicting the epidemic. on the other hand, we also found that the effect of the size of the floating population in wuhan is heterogeneous across regions. some areas have a large floating population in wuhan, including henan, anhui, and jiangxi provinces, and xiaogan city, jingzhou city, and the three county-level cities directly under the provincial government, and yet the number of confirmed cases of -ncov is apparently relatively small. however, we believe that the epidemic situation in these areas may be underestimated. considering the serious consequences of delays in diagnosis and loopholes in infection control in suspected or confirmed cases of sars in the sars epidemic in , it is necessary to strengthen surveillance in these areas to determine the causes of the fewer confirmed cases of -ncov in these areas. author contributions: l.l. conceived and proposed research ideas, c.f. and c.y. collected the data, c.f. undertook the main research work such as research methods, data analysis, and manuscript writing. c.f., l.l., w.g., a.y., c.y., m.j., m.r., p.x., h.l. and y.w. participated in draft review, contributed to data interpretation, and approved final manuscript. all authors have read and agreed to the published version of the manuscript. data sharing and outbreaks: best practice exemplified china coronavirus: what do we know so far? bmj nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study wuhan municipal health commission. the current epidemic of pneumonia in our city by wuhan municipal health commission geneva: world health organization national real-time epidemic website geneva: world health organization novel coronavirus -ncov: early estimation of epidemiological parameters and epidemic predictions report : transmissibility of -ncov epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study a novel coronavirus from patients with pneumonia in china a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster a novel coronavirus outbreak of global health concern emerging understandings of -ncov coronavirus infections-more than just the common cold from sars-cov to wuhan -ncov outbreak: similarity of early epidemic and prediction of future trends a novel coronavirus emerging in china-key questions for impact assessment another decade, another coronavirus report : estimating the potential total number of novel coronavirus cases in wuhan city report : estimating the potential total number of novel coronavirus cases in wuhan city modelling the epidemic trend of the novel coronavirus outbreak in china clinical features of patients infected with novel coronavirus in epidemic situation of the novel coronavirus in guangdong province published on xinhua news agency. chinese medical workers confirmed infected with coronavirus the chinese new year travel rush-the world's largest annual human migration wuhan municipal people's government. wuhan implements restriction to curb the spread of the epidemic real-time update on coronavirus outbreak internal migration and health in china the new york times. china grapples with mystery pneumonia-like illness health status and access to health care of migrant workers in china. public health rep acknowledgments: thanks to the national health committee for the migrant population data; the youth project of national social science foundation and the youth teacher project of central china normal university for support on previous construction of floating population database; binbin tang and junyue qian for their selfless help in the process of thesis writing. the authors declare no conflict of interest. key: cord- - sweo l authors: qiu, yun; chen, xi; shi, wei title: impacts of social and economic factors on the transmission of coronavirus disease (covid- ) in china date: - - journal: j popul econ doi: . /s - - - sha: doc_id: cord_uid: sweo l this study models local and cross-city transmissions of the novel coronavirus in china between january and february , . we examine the role of various socioeconomic mediating factors, including public health measures that encourage social distancing in local communities. weather characteristics weeks prior are used as instrumental variables for causal inference. stringent quarantines, city lockdowns, and local public health measures imposed in late january significantly decreased the virus transmission rate. the virus spread was contained by the middle of february. population outflow from the outbreak source region posed a higher risk to the destination regions than other factors, including geographic proximity and similarity in economic conditions. we quantify the effects of different public health measures in reducing the number of infections through counterfactual analyses. over . million infections and , deaths may have been avoided as a result of the national and provincial public health measures imposed in late january in china. the first pneumonia case of unknown cause was found close to a seafood market in wuhan, the capital city of hubei province, china, on december , . several clusters of patients with similar pneumonia were reported through late december . the pneumonia was later identified to be caused by a new coronavirus (severe acute respiratory syndrome coronavirus , or sars-cov- ) (zhu et al. ), later named coronavirus disease by the world health organization (who). while the seafood market was closed on january , , a massive outflow of travelers during the chinese spring festival travel rush (chunyun) in mid-january led to the rapid spread of covid- throughout china and to other countries. the first confirmed case outside wuhan in china was reported in shenzhen on january ). as of april , over . million confirmed cases were reported in at least countries or territories. two fundamental strategies have been taken globally, one focused on mitigating but not necessarily stopping the virus spread and the other relying on more stringent measures to suppress and reverse the growth trajectories. while most western countries initially implemented the former strategy, more and more of them (including most european countries and the usa) have since shifted towards the more stringent suppression strategy, and some other countries such as china, singapore, and south korea have adopted the latter strategy from the beginning. in particular, china has rolled out one of the most stringent public health strategies. that strategy involves city lockdowns and mandatory quarantines to ban or restrict traffic since january , social distance-encouraging strategies since january , and a centralized treatment and isolation strategy since february . this study estimates how the number of daily newly confirmed covid- cases in a city is influenced by the number of new covid- cases in the same city, nearby cities, and wuhan during the preceding weeks using the data on confirmed covid- case counts in china from january to february . by comparing the estimates before and after february , we examine whether the comprehensive set of policies at the national scale delays the spread of covid- . besides, we estimate the impacts of social distancing measures in reducing the transmission rate utilizing the closed management of communities and family outdoor restrictions policies that were gradually rolled out across different cities. as covid- evolves into a global pandemic and mitigating strategies are faced with growing pressure to flatten the curve of virus transmissions, more and more nations are considering implementing stringent suppression measures. therefore, examining the factors that influence the transmission of covid- and the effectiveness of the large-scale mandatory quarantine and social distancing measures in china not only adds to our understanding of the containment of covid- but also provides insights into future prevention work against similar infectious diseases. in a linear equation of the current number of new cases on the number of new cases in the past, the unobserved determinants of new infections may be serially correlated for two reasons. first, the number of people infected by a disease usually first increases, reaches a peak, and then drops. second, there are persistent, unobservable variables, such as clusters that generate large numbers of infections, people's living habits, and government policies. serial correlations in errors give rise to correlations between the lagged numbers of cases and the error term, rendering the ordinary least square (ols) estimator biased. combining insights in adda ( ) , the existing knowledge of the incubation period of covid- (world health organization b) , and data on weather conditions that affect the transmission rates of covid- (lowen and steel ; wang et al. b ), we construct instrumental variables for the number of new covid- cases during the preceding weeks. weather characteristics in the previous third and fourth weeks do not directly affect the number of new covid- cases after controlling for the number of new covid- cases and weather conditions in the preceding first and second weeks. therefore, our estimated impacts have causal interpretations and reflect population transmission rates. meanwhile, we estimate the mediating effects of socioeconomic factors on the transmission of covid- in china. these factors include population flow out of wuhan, the distance between cities, gdp per capita, the number of doctors, and contemporaneous weather conditions. we examine whether population flows from the origin of the covid- outbreak, which is a major city and an important transportation hub in central china, can explain the spread of the virus using data on real-time travel intensity between cities that have recently become available for research. realizing the urgency of forestalling widespread community transmissions in areas that had not seen many infections, in late january, many chinese cities implemented public health measures that encourage social distancing. we also examine the impacts of these measures on curtailing the spread of the virus. we find that transmission rates were lower in february than in january, and cities outside hubei province had lower transmission rates. preventing the transmission rates in non-hubei cities from increasing to the level observed in late january in hubei caused the largest reduction in the number of infections. apart from the policies implemented nationwide, the additional social distancing policies imposed in some cities in late january further helped reduce the number of infections. by mid february, the spread of the virus was contained in china. while many socioeconomic factors moderated the spread of the virus, the actual population flow from the source posed a higher risk to destinations than other factors such as geographic proximity and similarity in economic conditions. our analysis contributes to the existing literature in three aspects. first, our analysis is connected to the economics and epidemiological literature on the determinants of the spread of infectious diseases and prevention of such spread. existing studies find that reductions in population flow zhang et al. ; fang et al. ) and interpersonal contact from holiday school closings (adda ) , reactive school closures (litvinova et al. ) , public transportation strikes (godzinski and suarez castillo ), strategic targeting of travelers from high-incidence locations (milusheva ) , and paid sick leave to keep contagious workers at home (barmby and larguem ; pichler and ziebarth ) can mitigate the prevalence of disease transmissions. in addition, studies show viruses spread faster during economic booms (adda ) , increases in employment are associated with increased incidence of influenza (markowitz et al. ) , and growth in trade can significantly increase the spread of influenza (adda ) and hiv (oster ) . vaccination (maurer ; white ) and sunlight exposure (slusky and zeckhauser ) are also found effective in reducing the spread of influenza. second, our paper adds to the epidemiological studies on the basic reproduction number (r ) of covid- , i.e., the average number of cases directly generated by one case in a population where all individuals are susceptible to infection. given the short time period since the beginning of the covid- outbreak, research is urgently needed to assess the dynamics of transmissions and the implications for how the covid- outbreak will evolve (wu et al. b, c) . liu et al. ( ) identify studies that estimated the basic reproductive number in the wide range of . to . (with a mean of . and a median of . ) for wuhan, hubei, china, or overseas during january through january , . our r estimate relies on spatially disaggregated data during an extended period (until february , ) to mitigate potential biases, and the instrumental variable approach we use isolates the causal effect of virus transmissions and imposes fewer restrictions on the relationship between the unobserved determinants of new cases and the number of cases in the past. simultaneously considering a more comprehensive set of factors in the model that may influence virus spread, we find that one case generates . more cases within weeks ( . if cities in hubei province are excluded) in the sub-sample from january to february . in the sub-sample from february to february , the transmission rates fall to . ( . excluding hubei province). our estimate of r for the period in late january that overlaps with existing studies falls well within the range of the estimated r in the emerging covid- literature . third, our study contributes to the assessments of public health measures aiming at reducing virus transmissions and mortality. through a set of policy simulations, we report initial evidence on the number of avoided infections through the end of february for cities outside hubei province. specifically, the stringent health policies at the national and provincial levels reduced the transmission rate and resulted in , , ( % ci, , to , , ) fewer infections and poten-tially , fewer deaths. in contrast, the effects of the wuhan lockdown and local non-pharmaceutical interventions (npis) are considerably smaller. as a result of the wuhan lockdown, closed management of communities, and family outdoor restrictions, , ( % ci, to , ), ( % ci, to ), and ( % ci, to ) fewer cases were avoided, respectively. these three policies may respectively avoid , deaths, deaths, and deaths. making some additional assumptions, such as the value of statistical life and lost productive time, these estimates may provide the basis for more rigorous cost-benefit analysis regarding relevant public health measures. this paper is organized as follows. section introduces the empirical model. section discusses our data and the construction of key variables. section presents the results. section documents the public health measures implemented in china, whose impacts are quantified in a series of counterfactual exercises. section concludes. the appendix contains additional details on the instrumental variables, data quality, and the computation of counterfactuals. our analysis sample includes prefecture-level cities in china. we exclude wuhan, the capital city of hubei province, from our analysis for two reasons. first, the epidemic patterns in wuhan are significantly different from those in other cities. some confirmed cases in wuhan contracted the virus through direct exposure to huanan seafood wholesale market, which is the most probable origin of the virus . in other cities, infections arise from human-to-human transmissions. second, covid- cases were still pneumonia of previously unknown virus infections in people's perception until early january so that wuhan's health care system became overwhelmed as the number of new confirmed cases increased exponentially since mid-january. this may have caused severe delay and measurement errors in the number of cases reported in wuhan, and to a lesser extent, in other cities in hubei province. to alleviate this concern, we also conduct analyses excluding all cities in hubei province from our sample. we assume a case fatality rate of %, the same as china's current average level. of course, the eventual case fatality rate may be different from the current value, and it depends on many key factors, such as the preparedness of health care systems and the demographic structure of the population outside hubei province in comparison to china as a whole. also importantly, among patients who have died from covid- , the time from symptom onset to outcome ranges from to weeks (world health organization b), which is partially beyond the time window of this analysis. therefore, we defer more rigorous estimates about avoided fatality to future studies. document the exposure history of the first cases. it is suspected that the initial cases were linked to the huanan seafood wholesale market in wuhan. to model the spread of the virus, we consider within-city spread and between-city transmissions simultaneously (adda where c is a city other than wuhan, and y ct is the number of new confirmed cases of covid- in city c on date t. regarding between-city transmissions, d cr is the log of the distance between cities c and r, and r =c d − cr y rt is the inverse distance weighted sum of new infections in other cities. considering that covid- epidemic originated from one city (wuhan) and that most of the early cases outside wuhan can be traced to previous contacts with persons in wuhan, we also include the number of new confirmed cases in wuhan (z t ) to model how the virus spreads to other cities from its source. we may include lagged y ct , y rt , and z t up to days based on the estimates of the durations of the infectious period and the incubation period in the literature . x ct includes contemporaneous weather controls, city, and day fixed effects . ct is the error term. standard errors are clustered by province. to make it easier to interpret the coefficients, we assume that the transmission dynamics (α within,s , α between,s , ρ s ) are the same within s = , · · · , and s = , · · · , , respectively, but can be different across weeks. specifically, we take averages of lagged y ct , y rt , and z t by week, asȳ τ ct = s= y ct− (τ − )−s , y τ rt = s= y rt− (τ − )−s , andz τ t = s= z t− (τ − )−s , in which τ denotes the preceding first or second week. our main model is ( ) we also consider more parsimonious model specifications, such as the model that only considers within-city transmissions, the covid- epidemic is still ongoing at the time of writing, and the estimates are revised from time to time in the literature as new data become available. the current estimates include the following. the incubation period is estimated to be between and days (world health organization a), . days , or days (median, guan et al. ( ) ). the average infectious period is estimated to be . days (wu et al. a there are several reasons thatȳ τ ct ,ȳ τ rt , andz τ t may be correlated with the error term ct . the unobserved determinants of new infections such as local residents' and government's preparedness are likely correlated over time, which causes correlations between the error term and the lagged dependent variables. as noted by the world health organization ( b), most cases that were locally generated outside hubei occurred in households or clusters. the fact that big clusters give rise to a large number of cases within a short period of time may still be compatible with a general low rate of community transmissions, especially when measures such as social distancing are implemented. therefore, the coefficients are estimated by two-stage least squares in order to obtain consistent estimates on the transmission rates in the population. in eq. , the instrumental variables include averages of daily maximum temperature, total precipitation, average wind speed, and the interaction between precipitation and wind speed, for city c in the preceding third and fourth weeks. detailed discussion of the selection of weather characteristics as instruments is in section . . the timeline of key variables are displayed in fig. . the primary assumption on the instrumental variables is that weather conditions before weeks do not affect the likelihood that a person susceptible to the virus contracts the disease, conditional on weather conditions and the number of infectious people within the -week window. on the other hand, they affect the number of other persons who have become infectious within the -week window, because they may have contracted the virus earlier than weeks. these weather variables are exogenous to the error term and another objective of this paper is to quantify the effect of various socioeconomic factors in mediating the transmission rates of the virus, which may identify potential behavioral and socioeconomic risk factors for infections. for within-city transmissions, we consider the effects of local public health measures (see section for details) and the mediating effects of population density, level of economic development, number of doctors, and environmental factors such as temperature, wind, and precipitation. for between-city transmissions, apart from proximity measures based on geographic distance, we also consider similarity in population density and the level of economic development. to measure the spread of the virus from wuhan, we also include the number of people traveling from wuhan. the full empirical model is as follows: whereh kτ ct includes dummies for local public health measures and the mediating factors for local transmissions.m kτ crt andm kτ c,wuhan,t are the mediating factors for between-city transmissions and imported cases from wuhan. january , , is the first day that covid- cases were reported outside of wuhan, so we collect the daily number of new cases of covid- for cities from january to february . all these data are reported by provincial-level health commissions in china . figure shows the time patterns of daily confirmed new cases in wuhan, in hubei province outside wuhan, and in non-hubei provinces of mainland china. because hubei province started to include clinically diagnosed cases into new confirmed cases on february , we notice a spike in the number of new cases in wuhan and other cities in hubei province on this day (fig. ) . the common effects of such changes in case definitions on other cities can be absorbed by time fixed effects. as robustness checks, we re-estimate models a and b without the cities in hubei province. in addition, since the number of clinically diagnosed cases at the city level was reported for the days of february , , and , we recalculated the daily number of new cases for the days by removing the clinically diagnosed cases from our data and re-estimate models a and b. our main findings still hold (appendix b). regarding the explanatory variables, we calculate the number of new cases of covid- in the preceding first and second weeks for each city on each day. to estimate the impacts of new covid- cases in other cities, we first calculate the geographic distance between a city and all other cities using the latitudes and longitudes of the centroids of each city and then calculate the weighted sum of the number since the covid- outbreak started from wuhan, we also calculate the weighted number of covid- new cases in wuhan using the inverse of log distance as the weight. furthermore, to explore the mediating impact of population flow from wuhan, we collect the daily population flow index from baidu that proxies for the total intensity of migration from wuhan to other cities . figure plots the baidu index of population flow out of wuhan and compares its values this year with those in . we then interact the flow index with the share that a destination city takes (fig. ) to construct a measure on the population flow from wuhan to a destination city. other mediating variables include population density, gdp per capita, and the number of doctors at the city level, which we collect from the most recent china city statistical yearbook. table presents the summary statistics of these variables. on average, gdp per capita and population density are larger in cities outside hubei province than those in hubei. compared with cities in hubei province, cities outside hubei have more doctors. we rely on meteorological data to construct instrumental variables for the endogenous variables. the national oceanic and atmospheric administration (noaa) provides average, maximum, and minimum temperatures, air pressure, average and maximum wind speeds, precipitation, snowfall amount, and dew point for weather stations at the daily level in china. to merge the meteorological variables with the number of new cases of covid- , we first calculate daily weather variables for each city on each day from december to february from station-level weather records following the inverse distance weighting method. specifically, for each city, we draw a circle of km from the city's centroid and calculate the weighted average daily weather variables using stations within the -km circle . we use the inverse of the distance between the city's centroid and each station as the weight. second, we match the daily weather variables to the number of new cases of covid- based on city name and date. the transmission rate of covid- may be affected by many environmental factors. human-to-human transmission of covid- is mostly through droplets and contacts (national health commission of the prc ). weather conditions such as rainfall, wind speed, and temperature may shape infections via their influences on social activities and virus transmissions. for instance, increased precipitation baidu migration (https://qianxi.baidu.com). the -km circle is consistent with the existing literature. most studies on the socioeconomic impacts of climate change have found that estimation results are insensitive to the choice of the cutoff distance (zhang et al. ) . transmissions. in addition, increased rainfall and lower temperature may also reduce social activities. newly confirmed covid- cases typically arise from contracting the virus within weeks in the past (e.g., world health organization b). the extent of human-to-human transmission is determined by the number of people who have already contracted the virus and the environmental conditions within the next weeks. conditional on the number of people who are infectious and environmental conditions in the previous first and second weeks, it is plausible that weather conditions further in the past, i.e., in the previous third and fourth weeks, should not directly affect the number of current new cases. based on the existing literature, we select weather characteristics as the instrumental variables, which include daily maximum temperature, precipitation, wind speed, and the interaction between precipitation and wind speed. we then regress the endogenous variables on the instrumental variables, contemporaneous weather controls, city, date, and city by week fixed effects. table shows that f-tests on the coefficients of the instrumental variables all reject joint insignificance, which confirms that overall the selected instrumental variables are not weak. the coefficients of the first stage regressions are reported in table in the appendix. this table reports the f -tests on the joint significance of the coefficients on the instrumental variables (iv) that are excluded from the estimation equations. our iv include weekly averages of daily maximum temperature, precipitation, wind speed, and the interaction between precipitation and wind speed, during the preceding third and fourth weeks, and the averages of these variables in other cities weighted by the inverse of log distance. for each f statistic, the variable in the corresponding row is the dependent variable, and the time window in the corresponding column indicates the time span of the sample. each regression also includes -and -week lags of these weather variables, weekly averages of new infections in the preceding first and second weeks in wuhan which are interacted with the inverse log distance or the population flow, and city, date and city by week fixed effects. coefficients on the instrumental variables for the full sample are reported in table in the appendix we also need additional weather variables to instrument the adoption of public health measures at the city level. since there is no theoretical guidance from the existing literature, we implement the cluster-lasso method of belloni et al. ( ) and ahrens et al. ( ) to select weather characteristics that have good predictive power. details are displayed in appendix a. our sample starts from january , when the first covid- case was reported outside wuhan. the sample spans weeks in total and ends on february . we divide the whole sample into two sub-samples (january to february , and february to february ) and estimate the model using the whole sample and two sub-samples, respectively. in the first weeks, covid- infections quickly spread throughout china with every province reporting at least one confirmed case, and the number of cases also increased at an increasing speed (fig. ). it is also during these weeks that the chinese government took actions swiftly to curtail the virus transmission. on january , covid- was classified as a class b statutory infectious disease and treated as a class a statutory infectious disease. the city of wuhan was placed under lockdown on january ; roads were closed, and residents were not allowed to leave the city. many other cities also imposed public policies ranging from canceling public events and stopping public transportation to limiting how often residents could leave home. by comparing the dynamics of virus transmissions in these two sub-samples, we can infer the effectiveness of these public health measures. in this section, we will mostly rely on model a to interpret the results, which estimates the effects of the average number of new cases in the preceding first and second week, respectively, and therefore enables us to examine the transmission dynamics at different time lags. as a robustness check, we also consider a simpler lag structure to describe the transmission dynamics. in model b, we estimate the effects of the average number of new cases in the past days instead of using two separate lag variables. table reports the estimation results of the ols and iv regressions of eq. , in which only within-city transmission is considered. after controlling for time-invariant city fixed effects and time effects that are common to all cities, on average, one new infection leads to . more cases in the next week, but . fewer cases week later. the negative effect can be attributed to the fact that both local authorities and residents would have taken more protective measures in response to a higher perceived risk of contracting the virus given more time. information disclosure on newly confirmed cases at the daily level by official media and information dissemination on social media throughout china may have promoted more timely actions by the public, resulting in slower virus transmissions. we then compare the transmission rates in different time windows. in the first sub-sample, one new infection leads to . the dependent variable is the number of daily new cases. the endogenous explanatory variables include the average numbers of new confirmed cases in the own city in the preceding first and second weeks (model a) and the average number in the preceding days (model b). weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of each of these variables in other cities, during the preceding third and fourth weeks, are used as instrumental variables in the iv regressions. weather controls include contemporaneous weather variables in the preceding first and second weeks. standard errors in parentheses are clustered by provinces. *** p < . , more cases within a week, implying a fast growth in the number of cases. however, in the second sub-sample, the effect decreases to . , suggesting that public health measures imposed in late january were effective in limiting a further spread of the virus. similar patterns are also observed in model b. many cases were also reported in other cities in hubei province apart from wuhan, where six of them reported over cumulative cases by february . their overstretched health care system exacerbates the concern over delayed reporting of confirmed cases in these cities. to mitigate the effect of such potential measurement errors on our estimates, we re-estimate ( ) excluding all cities in hubei province. the bottom panel of table reports these estimates. comparing the iv estimates in columns ( ) and ( ) between the upper and lower panels, we find that the transmission rates are lower in cities outside hubei. in the january -february sub-sample, one new case leads to . more cases in the following week, and this is reduced to . in the february -february sub-sample. we also find a similar pattern when comparing the estimates from model b. people may contract the virus from interaction with the infected people who live in the same city or other cities. in eq. , we consider the effects of the number of new infections in other cities and in the epicenter of the epidemic (wuhan), respectively, using inverse log distance as weights. in addition, geographic proximity may not fully describe the level of social interactions between residents in wuhan and other cities since the lockdown in wuhan on january significantly reduced the population flow from wuhan to other cities. to alleviate this concern, we also use a measure of the size of population flow from wuhan to a destination city, which is constructed by multiplying the daily migration index on the population flow out of wuhan (fig ) with the share of the flow that a destination city receives provided by baidu (fig. ) . for days before january , we use the average destination shares between january and january . for days on or after january , we use the average destination shares between january and february . table reports the estimates from iv regressions of eq. , and table reports the results from the same regressions excluding hubei province. column ( ) of table indicates that in the first sub-sample, one new case leads to . more cases within week, and the effect is not statistically significant between and weeks. column ( ) suggests that in the second sub-sample, one new case leads to . more cases within week, and the effect is not statistically significant between and weeks. the comparison of the coefficients on own city between different sub-samples indicates that the responses of the government and the public have effectively decreased the risk of additional infections. comparing table with table , we find that although the number of new cases in the preceding second week turns insignificant ( ) table . each estimation sample contains days with the starting date indicated on the horizontal axis and smaller in magnitude, coefficients on the number of new cases in the preceding first week are not sensitive to the inclusion of terms on between-city transmissions. as a robustness test, table reports the estimation results excluding the cities in hubei province. column ( ) of table indicates that in the first sub-sample, one new case leads to . more cases within a week, while in the second sub-sample, one new case only leads to . more cases within a week. besides, in the second subsample, one new case results in . fewer new infections between and weeks, which is larger in magnitude and more significant than the estimate (− . ) when cities in hubei province are included for estimation (column ( ) of table ). the time varying patterns in local transmissions are evident using the rolling window analysis (fig. ) . the upper left panel displays the estimated coefficients on local transmissions for various -day sub-samples with the starting date labelled on the horizontal axis. after a slight increase in the local transmission rates, one case generally leads to fewer and fewer additional cases a few days after january . besides, the transmission rate displays a slight increase beginning around february , which corresponds to the return travels and work resumption after chinese spring festival, but eventually decreases at around february . such decrease may be partly attributed to the social distancing strategies at the city level, so we examine the impacts of relevant policies in section . moreover, the transmission rates in cities outside hubei province have been kept at low levels throughout the whole sample period (columns ( ) and ( ) of table ). these results suggest that the policies adopted at the national and provincial levels soon after january prevented cities outside hubei from becoming new hotspots of infections. overall, the spread of the virus has been effectively contained by mid february, particularly for cities outside hubei province. in the epidemiology literature, the estimates on the basic reproduction number of covid- are approximately within the wide range of . ∼ . ). its value depends on the estimation method used, underlying assumptions of modeling, time period covered, geographic regions (with varying preparedness of health care systems), and factors considered in the models that affect disease transmissions (such as the behavior of the susceptible and infected population). intuitively, it can be interpreted as measuring the expected number of new cases that are generated by one existing case. it is of interest to note that our estimates are within this range. based on the results from model b in tables and , one case leads to . more cases in the same city in the next days ( . if cities in hubei province are excluded). in the second sub-sample (february -february ), these numbers are reduced to . and . , respectively, suggesting that factors such as public health measures and people's behavior may play an important role in containing the transmission of covid- . while our basic reproduction number estimate (r ) is within the range of estimates in the literature and is close to its median, five features may distinguish our estimates from some of the existing epidemiological estimates. first, our instrumental variable approach helps isolate the causal effect of virus transmissions from other confounded factors; second, our estimate is based on an extended time period of the covid- pandemic (until the end of february ) that may mitigate potential biases in the literature that relies on a shorter sampling period within - january ; third, our modeling makes minimum assumptions of virus transmissions, such as imposing fewer restrictions on the relationship between the unobserved determinants of new cases and the number of cases in the past; fourth, our model simultaneously considers comprehensive factors that may affect virus transmissions, including multiple policy instruments (such as closed management of communities and shelter-at-home order), population flow, within-and between-city transmissions, economic and demographic conditions, weather patterns, and preparedness of health care system. fifth, our study uses spatially disaggregated data that cover china (except its hubei province), while some other studies examine wuhan city, hubei province, china as a whole, or overseas. regarding the between-city transmission from wuhan, we observe that the population flow better explains the contagion effect than geographic proximity (table ). in the first sub-sample, one new case in wuhan leads to more cases in other cities receiving more population flows from wuhan within week. interestingly, in the second sub-sample, population flow from wuhan significantly decreases the transmission rate within week, suggesting that people have been taking more cautious measures from high covid- risk areas; however, more arrivals from wuhan in the preceding second week can still be a risk. a back of the envelope calculation indicates that one new case in wuhan leads to . ( . ) more cases in the destination city per , travelers from wuhan within ( ) week between january and february (february and february ) . note that while the effect is statistically significant, it should be interpreted in context. it was estimated that , , people would travel out of wuhan during the lunar new year holiday . if all had gone to one city, this would have directly generated about cases within weeks. the risk of infection is likely very low for most travelers except for few who have previous contacts with sources of infection, and person-specific history of past contacts may be an essential predictor for infection risk, in addition to the total number of population flows . a city may also be affected by infections in nearby cities apart from spillovers from wuhan. we find that the coefficients that represent the infectious effects from nearby cities are generally small and not statistically significant (table ), implying that few cities outside wuhan are themselves exporting infections. this is consistent with the findings in the world health organization ( b) that other than cases that are imported from hubei, additional human-to-human transmissions are limited for cities outside hubei. restricting to cities outside hubei province, the results are similar (table ), except that the transmission from wuhan is not significant in the first half sample. we also investigate the mediating impacts of some socioeconomic and environmental characteristics on the transmission rates ( ). to ease the comparison between different moderators, we consider the mediating impacts on the influence of the average number of new cases in the past weeks. regarding own-city transmissions, we examine the mediating effects of population density, gdp per capita, number of doctors, and average temperature, wind speed, precipitation, and a dummy variable of adverse weather conditions. regarding between-city transmissions, we consider the mediating effects of distance, difference in population density, and difference in gdp per capita since cities that are similar in density or economic development level may be more closely linked. we also include a measure of population flows from wuhan. table reports the estimation results of the iv regressions. to ease the comparison across various moderators, for the mediating variables of within-city transmissions that are significant at %, we compute the changes in the variables so that the effect of new confirmed infections in the past days on current new confirmed cases is reduced by (columns ( ) and ( )). it is estimated that , , people traveled out of wuhan in during the lunar new year holiday (http://www.whtv.com.cn/p/ .html). the sum of baidu's migration index for population flow out of wuhan during the days around the lunar new year is . , which means one index unit represents . travelers. the destination share is in percentage. with one more case in wuhan, the effect on a city receiving , travelers from wuhan is . × . × × = . . http://www.whtv.com.cn/p/ .html from mid february, individual specific health codes such as alipay health code and wechat health code are being used in many cities to aid quarantine efforts. the dependent variable is the number of daily new confirmed cases. the sample excludes cities in hubei province. columns ( ) and ( ) report the changes in the mediating variables that are needed to reduce the impact of new confirmed cases in the preceding weeks by , using estimates with significance levels of at least . in columns ( ) and ( ), respectively. the endogenous variables include the average numbers of new cases in the own city and nearby cities in the preceding days and their interactions with the mediating variables. weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of these variables in neighboring cities, during the preceding third and fourth weeks, are used as instrumental variables in the iv regressions. additional instrumental variables are constructed by interacting them with the mediating variables. weather controls include these variables in the preceding first and second weeks. standard errors in parentheses are clustered by provinces *** p < . , ** p < . , * p < . in the early phase of the epidemic (january to february ), cities with more medical resources, which are measured by the number of doctors, have lower transmission rates. one standard deviation increase in the number of doctors reduces the transmission rate by . . cities with higher gdp per capita have higher transmission rates, which can be ascribed to the increased social interactions as economic activities increase . in the second sub-sample, these effects become insignificant probably because public health measures and inter-city resource sharing take effects. in fact, cities with higher population density have lower transmission rates in the second sub-sample. regarding the environmental factors, we notice different significant mediating variables across the first and second sub-samples. the transmission rates are lower with adverse weather conditions, lower temperature, or less rain. further research is needed to identify clear mechanisms. in addition, population flow from wuhan still poses a risk of new infections for other cities even after we account for the above mediating effects on own-city transmission. this effect is robust to the inclusion of the proximity measures based on economic similarity and geographic proximity between wuhan and other cities. nevertheless, we do not find much evidence on between-city transmissions among cities other than wuhan. as the - sars outbreak has shown, non-pharmaceutical interventions (npis) or public health measures may decrease or effectively stop the transmission of covid- even without vaccines. although the effectiveness of a single intervention strategy can be limited, multiple interventions together may generate substantial impacts on containing the spread of the virus. figure depicts the timeline for a series of policies enacted at the national, provincial, and city levels in china since january . after the official confirmation of human-to-human transmission by the chinese authorities on january , china has adopted a variety of npis to contain the covid- outbreak. at the national level, covid- was classified as a statutory class b infectious disease on january , and prevention and control measures for class a infectious diseases have been taken. government agencies across the country were mobilized. the joint prevention and control mechanism of the state council was established on january , and the central leadership group for epidemic response was established on january . on january , national healthcare security administration announced that expenses related to covid- treatments would be covered by the medical insurance and the government if necessary, in order that all covid- cases could be hospitalized . at the provincial level, provinces declared level i responses to major public health emergencies from january to , and all provinces had declared level i responses by january . level i responses in china are designed for the highest state of emergencies. measures taken include enhanced isolation and contact tracing of cases, suspension of public transport, cancelling public events, closing schools and entertainment venues, and establishment of health checkpoints (tian et al. ) . these policies together represent population-wide social distancing and case isolation (ferguson et al. ) . early detection of covid- importation and prevention of onward transmission are crucial to all areas at risk of importation from areas with active transmissions (gilbert et al. ) . to contain the virus at the epicenter, wuhan was placed under lockdown with traffic ban for all residents starting on january . the lockdown is not expected to be lifted until april . local buses, subways, and ferries ceased operation. ridehailing services were prohibited, and only a limited number of taxis were allowed on road by january . residents are not permitted to leave the city. departure flights and trains were canceled at the city airport and train stations. checkpoints were set up at highway entrances to prevent cars from leaving the city. since january , it became mandatory to wear masks at work or in public places. there was insufficient hospital capacity in hubei (and wuhan in particular) in late january. most patients in wuhan were hospitalized and isolated around mid february with the completion of new hospitals, makeshift health facilities, and increased testing capacity. see section . for details. we should note that the summary of china's policy responses here is not a comprehensive list. other entities have also made efforts to help curtail the spread of covid- . for example, on january , the state grid corporation of china declared that it would continue supplying electricity to resident users even if payment was not received on time. school and universities were closed already because of lunar new year holidays. in addition, all cities in hubei province implemented the lockdown policy, and most hubei cities had also adopted measures commensurate with class a infectious diseases by january . residents in those areas were strongly encouraged to stay at home and not to attend any activity involving public gathering. health facilities in wuhan had been extremely overstretched with shortage in medical supplies and high rates of nosocomial infections until february when ( ) two new hospitals, i.e., huoshenshan and leishenshan, were built to treat patients of covid- with severe symptoms, and ( ) makeshift health facilities were converted to isolate patients with mild symptoms and to quarantine people suspected of contracting covid- , patients with fever symptoms, and close contacts of confirmed patients. this centralized treatment and isolation strategy since february has substantially reduced transmission and incident cases. however, stringent public health measures within hubei province enforced after the massive lockdown may have little to do with virus transmissions out of hubei province due to the complete travel ban since january . quarantine measures have been implemented in other provinces that aim at restricting population mobility across cities and reducing the risk of importing infections . seven cities in zhejiang, henan, heilongjiang, and fujian provinces had adopted the partial shutdown strategy by february (fang et al. ) . in wenzhou, most public transportation was shut down, and traffic leaving the city was banned temporarily. on january , the ministry of transport of china launched level responses to emergencies in order to cooperate with the national health commission in preventing the virus spread. on january , the ministry of transport of china, civil aviation administration of china, and china state railway group company, ltd. (csrgc) declared to waive the change fees for flight, train, bus, and ferry tickets that were bought before january . later, the csrgc extended the fee waiver policy to train tickets that were bought before february . by february , all railway stations in china had started to monitor body temperature of travelers when they enter and exit the station. across the whole country, transportation departments set up , health checkpoints at bus and ferry terminals, at service centers and toll gates on highways, monitoring the body temperature of passengers and controlling the inflow of population (world health organization b). recent visitors to high covid- risk areas are required to self-quarantine for days at home or in designated facilities. on february , china's exit and entry administration temporarily suspended the approval and issuance of the travel permits to hong kong and macau. on january , wuhan municipal administration of culture and tourism ordered all tour groups to cancel travels to wuhan. on january , the ministry of education of china postponed start of the spring semester in , and on february , it further announced that students were not allowed to return to school campus without approvals from school. recent studies suggest that there is a large proportion of asymptomatic or mildsymptomatic cases, who can also spread the virus (dong et al. ; mizumoto et al. ; nishiura et al. ; wang et al. a ). thus, maintaining social distance is of crucial importance in order to curtail the local transmission of the virus. the period from january to , , is the traditional chinese spring festival holiday, when families are supposed to get together so that inter-city travel is usually much less. people were frequently reminded by official media (via tv news and phone messages) and social media to stay at home and avoid gathering activities. on january , china state council extended this holiday to february to delay people's return travel and curtail the virus spread. nevertheless, economic activities are still supposed to resume after the spring festival, bringing people back to workplaces, which may increase the risk of virus spread. to help local residents keep social distance and decrease the risk of virus transmissions, many cities started to implement the "closed management of communities" and "family outdoor restrictions" policies since late january (table ) , encouraging residents to restrict nonessential travels. from january to february , more than prefecture-level cities in china implemented "closed management of communities," which typically includes ( ) keeping only one entrance for each community, ( ) allowing only community residents to enter and exit the community, ( ) checking body temperature for each entrant, ( ) testing and quarantining cases that exhibit fever immediately, and ( ) tracing and quarantining close contacts of suspicious cases. meanwhile, residents who had symptoms of fever or dry cough were required to report to the community and were quarantined and treated in special medical facilities. furthermore, local governments of cities also imposed more stringent "family outdoor restrictions"-residents are confined or strongly encouraged to stay at home with limited exceptions, e.g., only one person in each family may go out for shopping for necessities once every days . exit permits were usually distributed to each family in advance and recollected when residents reenter the community. contacts of those patients were also traced and quarantined. table summarizes the number of cities that had imposed "closed management of communities" or "family outdoor restrictions" by different dates in february. in order to help inform evidence-based covid- control measures, we examine the effect of these local quarantine measures in reducing the virus transmission rates. dummy variables for the presence of closed management of communities or family outdoor restrictions are created, and they are interacted with the number of infections in the preceding weeks. several factors may contribute to the containment of the epidemic. the transmission dynamics may change during the course of this epidemic because of improved medical treatments, more effective case isolation and contact tracing, increased public awareness, etc. therefore, we have split the sample into two sub-samples, and the estimated coefficients can be different across the sub-samples (section ). npis such as closed management of communities, city lockdowns, and restrictions on population flow out of areas with high infection risks may also directly affect the transmission rates. while many public health measures are implemented nationwide, spatial variations exist in the adoption of two types of measures: closed management of communities (denoted by closed management) and family outdoor restrictions (denoted by stay at home), which allow us to quantify the effect of these npis on the transmission dynamics. because most of these local npis are adopted in february and our earlier results indicate that the transmission of covid- declines during late january, we restrict the analysis sample to february -february . we also exclude cities in hubei province, which modified the case definition related to clinically diagnosed cases on february and changed the case definition related to reduced backlogs from increased capacity of molecular diagnostic tests on february . these modifications coincide with the adoption of local npis and can significantly affect the observed dynamics of confirmed cases. the adoption of closed management or stay at home is likely affected by the severity of the epidemic and correlated with the unobservables. additional weather controls that have a good predictive power for these npis are selected as the instrumental variables based on the method of belloni et al. ( ) . details are displayed in appendix a. the estimation results of ols and iv regressions are reported in table . effects of local non-pharmaceutical interventions the sample is from february to february , excluding cities in hubei province. the dependent variable is the number of daily new confirmed cases. the instrumental variables include weekly averages of daily maximum temperature, wind speed, precipitation, and the interaction between wind speed and precipitation, in the preceding third and fourth weeks, and the inverse log distance weighted averages of these variables in other cities. additional instrumental variables are constructed by interacting these excluded instruments with variables that predict the adoption of closed management of communities or family outdoor restrictions (table ). the weather controls include weather characteristics in the preceding first and second weeks. standard errors in parentheses are clustered by provinces. *** p < . , ** p < . , we find that closed management and stay at home significantly decrease the transmission rates. as a result of closed management of communities, one infection will generate . ( % ci, − . ∼ − . ) fewer new infections in the first week. the effect in the second week is also negative though not statistically significant. family outdoor restrictions (stay at home) are more restrictive than closing communities to visitors and reduce additional infections from one infection by . ( % ci, − . ∼ − . ) in the first week. the effect in the second week is not statistically significant. to interpret the magnitude of the effect, it is noted that the reproduction number of sars-cov- is estimated to be around . ∼ . as of january , . many cities implement both policies. however, it is not conclusive to ascertain the effect of further imposing family outdoor restrictions in cities that have adopted closed management of communities. when both policies are included in the model, the ols coefficients (column ( )) indicate that closed management reduces the transmission rate by . ( % ci, − . ∼ − . ) in the first week, and by . ( % ci, − . ∼ − . ) in the second week, while the additional benefit from stay at home is marginally significant in the second week (− . , % ci, − . ∼ . ). the iv estimates indicate that closed management reduces the transmission rate in the first week by . ( % ci, − . ∼ . ), while the effect in the second week and the effects of stay at home are not statistically significant. additional research that examines the decision process of health authorities or documents the local differences in the actual implementation of the policies may offer insights into the relative merits of the policies. we further assess the effects of npis by conducting a series of counterfactual exercises. after estimating ( ) by sls, we obtain the residuals. then, the changes in y ct are predicted for counterfactual changes in the transmission dynamics (i.e., coefficients α k within,τ ) and the impositions of npis (i.e.,h kτ ct , and the lockdown of wuhanm kτ c,wuhan,t ). in scenario a, no cities adopted family outdoor restrictions (stay at home). similarly, in scenario b, no cities implemented closed management of communities. we use the estimates in columns ( ) and ( ) of table to conduct the counterfactual analyses for scenarios a and b, respectively. in scenario c, we assume that the index of population flows out of wuhan after the wuhan lockdown (january ) took the value that was observed in for the same lunar calendar date (fig. ) , which would be plausible had there been no lockdown around wuhan. it is also likely that in the absence of lockdown but with the epidemic, more people would leave wuhan compared with last year (fang et al. ) , and the effect would then be larger. in scenario d, we assume that the within-city transmission dynamics were the same as those observed between january and february , i.e., the coefficient of -week lag own-city infections was . and the coefficient of -week lag own-city infections was − . (column ( ) of table ), which may happen if the transmission rates in cities outside hubei increased in the same way as those observed for cities in hubei. appendix c contains the technical details on the computation of counterfactuals. in fig. , we report the differences between the predicted number of daily new cases in the counterfactual scenarios and the actual data, for cities outside hubei province. we also report the predicted cumulative effect in each scenario at the bottom of the corresponding panel in fig. . had the transmission rates in cities outside hubei province increased to the level observed in late january, by february , there would be , , ( % ci, , ∼ , , ) more cases (scenario d). assuming a fatality rate of %, there would be , more deaths. the magnitude of the effect from wuhan lockdown and local npis is considerably smaller. as a result of wuhan lockdown, , ( % ci, ∼ , ) fewer cases would be reported for cities outside hubei by february (scenario c). closed management of communities and family outdoor restrictions would reduce the number of cases by ( % ci, ∼ ; or . per city with the policy) and ( % ci, ∼ ; or . per city with the policy), respectively. these estimates, combined with additional assumptions on the value of statistical life, lost time from work, etc., may contribute to cost-benefit analyses of relevant public health measures. our counterfactual simulations indicate that suppressing local virus transmissions so that transmission rates are kept well below those observed in hubei in late january is crucial in forestalling large numbers of infections for cities outside hubei. our retrospective analysis of the data from china complements the simulation study fig. counterfactual policy simulations. this figure displays the daily differences between the total predicted number and the actual number of daily new covid- cases for each of the four counterfactual scenarios for cities outside hubei province in mainland china. the spike on february in scenario c is due to a sharp increase in daily case counts in wuhan resulting from changes in case definitions in hubei province (see appendix b for details) of ferguson et al. ( ) . our estimates indicate that suppressing local transmission rates at low levels might have avoided one million or more infections in china. chinazzi et al. ( ) also find that reducing local transmission rates is necessary for effective containment of covid- . the public health policies announced by the national and provincial authorities in the last weeks in january may have played a determinant role (tian et al. ) in keeping local transmission rates in cities outside hubei at low levels throughout january and february. among the measures implemented following provincial level i responses, shen et al. ( ) highlight the importance of contact tracing and isolation of close contacts before onset of symptoms in preventing a resurgence of infections once the covid- suppression measures are relaxed. we also find that travel restrictions on high-risk areas (the lockdown in wuhan), and to a lesser extent, closed management of communities and family outdoor restrictions, further reduce the number of cases. it should be noted that these factors may overlap in the real world. in the absence of the lockdown in wuhan, the health care systems in cities outside hubei could face much more pressure, and local transmissions may have been much higher. in china, the arrival of the covid- epidemic coincided with the lunar new year for many cities. had the outbreak started at a different time, the effects and costs of these policies would likely be different. this paper examines the transmission dynamics of the coronavirus disease in china, considering both within-and between-city transmissions. our sample is from january to february and covers key episodes such as the initial spread of the virus across china, the peak of infections in terms of domestic case counts, and the gradual containment of the virus in china. changes in weather conditions induce exogenous variations in past infection rates, which allow us to identify the causal impact of past infections on new cases. the estimates suggest that the infectious effect of the existing cases is mostly observed within week and people's responses can break the chain of infections. comparing estimates in two sub-samples, we observe that the spread of covid- has been effectively contained by mid february, especially for cities outside hubei province. data on real-time population flows between cities have become available in recent years. we show that this new source of data is valuable in explaining between-city transmissions of covid- , even after controlling for traditional measures of geographic and economic proximity. by april of , covid- infections have been reported in more than countries or territories and more than , people have died. behind the grim statistics, more and more national and local governments are implementing countermeasures. cross border travel restrictions are imposed in order to reduce the risk of case importation. in areas with risks of community transmissions, public health measures such as social distancing, mandatory quarantine, and city lockdown are implemented. in a series of counterfactual simulations, we find that based on the experience in china, preventing sustained community transmissions from taking hold in the first place has the largest impact, followed by restricting population flows from areas with high risks of infections. local public health measures such as closed management of communities and family outdoor restrictions can further reduce the number of infections. a key limitation of the paper is that we are not able to disentangle the effects from each of the stringent measures taken, as within this -week sampling period, china enforced such a large number of densely timed policies to contain the virus spreading, often simultaneously in many cities. a second limitation is that shortly after the starting date of the official data release for confirmed infected cases throughout china, i.e., january , , many stringent measures were implemented, which prevents researchers to compare the post treatment sub-sample with a pre treatment subsample during which no strict policies were enforced. key knowledge gaps remain in the understanding of the epidemiological characteristics of covid- , such as individual risk factors for contracting the virus and infections from asymptotic cases. data on the demographics and exposure history for those who have shown symptoms as well as those who have not will help facilitate these research. adjustment in covid- case definitions in hubei province in february. section a contains details on the computation of the counterfactuals. weather conditions affect disease transmissions either directly if the virus can more easily survive and spread in certain environment, or indirectly by changing human behavior. table reports results of the first stage of the iv regressions (table ) using the full sample. in columns ( ) and ( ), the dependent variables are the numbers of newly confirmed covid- cases in the own city in the preceding first and second weeks, respectively. in columns ( ) and ( ), the dependent variables are the sum of inverse log distance weighted numbers of newly confirmed covid- cases in other cities in the preceding first and second weeks, respectively. these are the endogenous variables in the iv regressions. the weather variables in the preceding first and second weeks are included in the control variables. the weather variables in the preceding third and fourth weeks are the excluded instruments, and their coefficients are reported in the table. because the variables are averages in -day moving windows, the error term may be serially correlated, and we include city by week fixed effects. also included in the control variables are the average numbers of new cases in wuhan in the preceding first and second weeks, interacted with the inverse log distance or the population flow. because the spread of the virus depends on both the number of infectious people and the weather conditions, the coefficients in the first stage regressions do not have structural interpretations. the wald tests on the joint significance of the excluded instruments are conducted and their f statistics are reported. the excluded instruments have good predictive power. the implementation of local public health measures is likely correlated with the extent of the virus spread, so weather conditions that affect virus transmissions may also affect the likelihood that the policy is adopted. the influence of weather conditions on policy adoption may be complicated, so we use the cluster-lasso method of belloni et al. ( ) to select the weather variables that have good predictive power on the adoption of closed management of communities or family outdoor restrictions. let d ct be the dummy variable of the adoption of the local public health measure, i.e., d ct = if the policy is in place in city c at day t. q ct is a vector of candidate weather variables, including weekly averages of daily mean temperature, maximum temperature, minimum temperature, dew point, stationlevel pressure, sea-level pressure, visibility, wind speed, maximum wind speed, snow depth, precipitation, dummy for adverse weather conditions, squared terms of these variables, and interactions among them. first, city and day fixed effects are removed. d ct = d ct − n c d ct − t t d ct + nt ct d ct andq ct is defined similarly. the cluster-lasso method solves the following minimization problem: nt ct d ct −q ct b + λ nt k φ k |b k |. (ahrens et al. ) , which implements the cluster-lasso method of belloni et al. ( ) . city and date fixed effects are included. candidate variables include weekly averages of daily mean temperature, maximum temperature, minimum temperature, dew point, station-level pressure, sea-level pressure, visibility, wind speed, maximum wind speed, snow depth, precipitation, dummy for adverse weather conditions, squared terms of these variables, and interactions among them λ and φ are penalty parameters. a larger penalty value forces more coefficients to zero. the penalty parameters are picked using the theoretical result of belloni et al. ( ) . the estimation uses the stata package by ahrens et al. ( ) . table lists the selected weather variables, which are used as the instruments in table . the dependent variable is the number of daily new cases. the endogenous explanatory variables include the average numbers of new confirmed cases in the own city and nearby cities in the preceding first and second weeks (model a) and averages in the preceding days (model b). weekly averages of daily maximum temperature, precipitation, wind speed, the interaction between precipitation and wind speed, and the inverse log distance weighted sum of these variables in other cities, during the preceding third and fourth weeks, are used as instrumental variables in the iv regressions. weather controls include contemporaneous weather variables in the preceding first and second weeks. standard errors in parentheses are clustered by provinces. *** p < . , ** p < . , * p < . economic activity and the spread of viral diseases: evidence from high frequency data lassopack: model selection and prediction with regularized regression in stata coughs and sneezes spread diseases: an empirical study of absenteeism and infectious illness inference in high-dimensional panel models with an application to gun control disease risk and fertility: evidence from the hiv/aids pandemic the effect of travel restrictions on the spread of the epidemiological characteristics of pediatric patients with coronavirus disease in china, pediatrics impacts of social and economic factors on the transmission uncovering the impact of the hiv epidemic on fertility in sub-saharan africa: the case of malawi human mobility restrictions and the spread of the novel coronavirus ( -ncov) in china impacts of nonpharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand fogli a, veldkamp l (forthcoming) germs, social networks and growth reactive school closure weakens the network of social interactions and reduces the spread of influenza the reproductive number of covid- is higher compared to sars coronavirus the effects of employment on influenza rates who has a clue to preventing the flu? unravelling supply and demand effects on the take-up of influenza vaccinations estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship national health commission of the prc ( ) novel coronavirus pneumonia diagnosis and treatment plan estimation of the asymptomatic ratio of novel coronavirus infections (covid- ) routes of infection: exports and hiv incidence in sub-saharan africa the pros and cons of sick pay schemes: testing for contagious presenteeism and noncontagious absenteeism behavior france and germany exceed italy, south korea and japan in temperature-adjusted corona proliferation: a quick and dirty sunday morning analysis review of ferguson et al "impact of non-pharmaceutical interventions sunlight and protection against influenza nber working paper evolving epidemiology and impact of non-pharmaceutical interventions on the outbreak of coronavirus disease temperature significant change covid- transmission in cities measuring social and externality benefits of influenza vaccination journal of human resources world health organization ( a) novel coronavirus situation report report of the who-china joint mission on coronavirus disease nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan china: a modelling study utilize state transition matrix model to predict the novel corona virus infection peak and patient distribution characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention modelling and prediction of the coronavirus disease spreading in china incorporating human migration data economic impacts of climate change on agriculture: the importance of additional climatic variables other than temperature and precipitation impact of population movement on the spread of preliminary estimation of the basic reproduction number of novel coronavirus ( -ncov) in china, from to : a data-driven analysis in the early phase of the outbreak we are grateful to editor klaus zimmermann and three anonymous referees for valuable comments and suggestions which have helped greatly improve the paper. we received helpful comments and suggestions from hanming fang and seminar participants at institute for economic and social research of jinan university and voxchina covid- public health and public policy virtual forum. pei yu and wenjie wu provided excellent research assistance. all errors are our own. conflict of interests the authors declare that they have no conflict of interest.open access this article is licensed under a creative commons attribution . international license, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the creative commons licence, and indicate if changes were made. the images or other third party material in this article are included in the article's creative commons licence, unless indicated otherwise in a credit line to the material. if material is not included in the article's creative commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. to view a copy of this licence, visit http://creativecommonshorg/licenses/by/ . /. the appendix consists of three sections. section a provides details on the first stage of the iv regressions and the selection of the instrumental variables for the local public health policies. section b shows that our main findings are not sensitive to the covid- case definitions were changed in hubei province on february and february . starting on february , covid- cases could also be confirmed based on clinical diagnosis in hubei province, in addition to molecular diagnostic tests. this resulted in a sharp increase in the number of daily new cases reported in hubei, and in particular wuhan (fig. ) . the use of clinical diagnosis in confirming cases ended on february . the numbers of cases that are confirmed based on clinical diagnosis for february , , and are reported by the health commission of hubei province and are displayed in table . as a robustness check, we re-estimate the model after removing these cases from the daily case counts (fig. ). our main findings still hold (table ). the transmission rates are significantly lower in february compared with january. population flow from the epidemic source increases the infections in destinations, and this effect is slightly delayed in february. it is convenient to write it in vector form,where y nt = y t · · · y nt and nt are n × vectors. assuming that y ns = if s ≤ , because our sample starts on january , and no laboratory confirmed case was reported before january in cities outside wuhan. x nt = x t · · · x nt is an n × k matrix of the control variables. h nt,s (α within ) is an n × n y. qiu et al.diagonal matrix corresponding to the s-day time lag, with parameters α within = {α k within,τ } k= ,··· ,k within ,τ = , . for example, for s = , · · · , , the ith diagonal element of h nt,s (α within ) is k within k= α k within, h k ct,i , and for s = , · · · , , the ith diagonal element of h nt,s (α within ) is k within k= α k within, h k ct,i . m nt,s (α between ) is constructed similarly. for example, for s = , · · · , and i = j , the ij th element of m nt,s (α between ) is k between k= α k between, m k ij t . z τ nt is an n × k wuhan matrix corresponding to the transmission from wuhan. for example, the ikth element of z nt is m k i,wuhan,tz t . we first estimate the parameters in eq. by sls and obtain the residualŝ n , · · · ,ˆ nt . let· denote the estimated value of parameters and· denote the counterfactual changes. the counterfactual value of y nt is computed recursively,the counterfactual change for date t is y nt =Ỹ nt − y nt . the standard error of y nt is obtained from bootstrap iterations. in each bootstrap iteration, cities are sampled with replacement and the model is estimated to obtain the parameters. the counterfactual predictions are obtained using the above equations with the estimated parameters and the counterfactual scenario (e.g., no cities adopted lockdown).publisher's note springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. key: cord- -m un y authors: hu, bisong; qiu, jingyu; chen, haiying; tao, vincent; wang, jinfeng; lin, hui title: first, second and potential third generation spreads of the covid- epidemic in mainland china: an early exploratory study incorporating location-based service data of mobile devices date: - - journal: int j infect dis doi: . /j.ijid. . . sha: doc_id: cord_uid: m un y abstract objectives the outbreak of atypical pneumonia caused by the novel coronavirus (covid- ) has currently become a global concern. the generations of the epidemic spread are not well known, yet these are critical parameters to facilitate an understanding of the epidemic. a seafood wholesale market and wuhan city, china, were recognized as the primary and secondary epidemic sources. human movements nationwide from the two epidemic sources revealed the characteristics of the first-generation and second-generation spreads of the covid- epidemic, as well as the potential third-generation spread. methods we used spatiotemporal data of covid- cases in mainland china and two categories of location-based service (lbs) data of mobile devices from the primary and secondary epidemic sources to calculate pearson correlation coefficient,r, and spatial stratified heterogeneity, q, statistics. results two categories of device trajectories had generally significant correlations and determinant powers of the epidemic spread. bothr and q statistics decreased with distance from the epidemic sources and their associations changed with time. at the beginning of the epidemic, the mixed first-generation and second-generation spreads appeared in most cities with confirmed cases. they strongly interacted to enhance the epidemic in hubei province and the trend was also significant in the provinces adjacent to hubei. the third-generation spread started in wuhan from january to , , and in hubei from january to . no obvious third-generation spread was detected outside hubei. conclusions the findings provide important foundations to quantify the effect of human movement on epidemic spread and inform ongoing control strategies. the spatiotemporal association between the epidemic spread and human movements from the primary and secondary epidemic sources indicates a transfer from second to third generations of the infection. urgent control measures include preventing the potential third-generation spread in mainland china, eliminating it in hubei, and reducing the interaction influence of first-generation and second-generation spreads. an outbreak of atypical pneumonia caused by the novel coronavirus (covid- ) was recognized from middle january, , in wuhan city, china. the novel coronavirus that infects human was first reported in wuhan, hubei province, china, on december , (zhu et al. ) . early confirmed cases were mainly linked to a seafood wholesale market in wuhan (li et al. a; zhu et al. ) . epidemiological studies indicate that the covid- epidemic has a basic reproductive number between and (li et al. a; wu et al. ) , which is lower than the severe acute respiratory syndrome (sars) (lipsitch ; riley et al. ) . wuhan is a main transportation hub in central china, several million travelers ventured outward from the epidemic outbreak source in the first half of january, , due to annual chinese (lunar) new year holiday migrations. the large-scale outbreak started on january (the first confirmed case reported outside hubei province). although strict transportation screening measures were activated by many cities in the next - days, the epidemic rapidly spread nationwide in a week. moreover, covid- infections have been identified in other countries and the current epidemic has become a global concern (cohen and normile ; holshue et al. ; rothe et al. ; . the world health organization (who) declared the covid- outbreak as a public health emergency of international concern (pheic) on january (who b) . there is evidence that the epidemic outbreak in china and elsewhere spread along the paths of travel from wuhan (li et al. b) , and local outbreaks could appear in other major cities of china with time lags (wu et al. ) . massive human movements via railways and domestic/international airlines from wuhan, and the timing of chinese new year, has enabled the virus to spread nationwide and worldwide (peeri et al. ) . control measures (e.g., travel quarantine and restrictions) in wuhan were effective to delay the overall epidemic progression in mainland china and reduce the international case importations (chinazzi et al. ) . the huanan seafood wholesale market and wuhan were recognized as the primary and secondary epidemic centers, respectively, and therefore, the movements of populations from the two sources influenced the generations of the covid- epidemic in mainland china, especially during the very early epidemic stage before the transportation measures activated by wuhan and other cities. the first-generation (primary) spread of the epidemic was in part reflected by the human movement from the primary source (i.e., the seafood market), and the secondgeneration (secondary) spread was reflected by that from the secondary center (i.e., wuhan city). they varied and interacted by region and time during the early epidemic progression, and had the potential clues to identify the third-generation spreads in various regions, which are mainly caused by the local cases instead of the imported ones. here, using location-based service (lbs) data of mobile devices, we analyzed the spatiotemporal association of the confirmed covid- cases and human movements from the sources of the epidemic outbreak, and revealed the first, second and potential third generation spreads of the covid- epidemic in mainland china. we collected spatiotemporal data of covid- cases in mainland china from the daily bulletins of the national health commission of the people's republic of china (nhc) and various provincial/municipal health commissions. some publicly available news and media were utilized as supplemental data. the final epidemic dataset was comparatively verified through the public platform of the -ncov-infected pneumonia epidemic from the chinese center for disease control and prevention (china cdc a) . the dataset of the covid- cases includes the following fields: date (starting from january , ), province code/name, city code/name, and numbers of daily new suspected/confirmed cases. from the above dataset, we can generate the cumulative number of daily confirmed cases at a specific city s and until a given end date t, which is denoted by ys,t. the human movement of populations from two epidemic sources (the huanan seafood wholesale market and wuhan), were considered to be associated with the spatiotemporal epidemic spread. the datasets of lbs requests from mobile devices were provide by wayz inc., shanghai, china. the device trace datasets cover over % mobile devices supported by the three telecommunication operators in china. the lbs-requesting statistics are implemented every two hours with highresolution location information. the raw data indicate the individual trajectories of numerous mobile devices with high-resolution spatiotemporal information, and can be easily aggregated in a specific spatial scale and within a given time step. for a subpopulation from the epidemic center, we can aggregate the device trace data from the start date to a given end date t, and the corresponding cumulative number at a specific city s is denoted by xs,t. multiple lbs requests within a time step are only counted once by a same device. private individual information was deleted from the raw data of the mobile devices, and in this study, the device trace data was aggregated to the administrative cities and the epidemic date, i.e., the mobile device traces were associated with the j o u r n a l p r e -p r o o f epidemic dataset according to date and location. these aggregated statistics of mobile device traces are expected to be representative of the human migrations from the epidemic sources. two epidemic sources were considered, including the seafood wholesale market and wuhan city. the devices which activated their lbs requests in the market in november indicated the potential first-generation cases of the covid- epidemic. and the potential second-generation cases were those which were activated in wuhan in december and then traveled to other regions in january . , ( ) and , ( ) are used to denote the spatiotemporal trajectories of the above two subpopulations of mobile devices, respectively. all the processing and aggregation of mobile device trace data were implemented by the provider. the final datasets include the daily counts of two categories of trajectories in all the administrative cities in mainland china. the cumulatively summed device traces had a spatially distributed consistency with the population distribution in mainland china ( figure ). two categories of trajectories mainly spread to the provinces adjacent to hubei and several developed areas a longer distance from hubei, such as guangdong province, zhejiang province and beijing. we considered the spread of the epidemic from the source in various space and time domains, and the corresponding associations with human movements were analyzed in several temporal divisions and spatial scales. seven areas were delineated, including i) wuhan city, ii) hubei province excluding wuhan, iii) hubei province, iv) hubei's adjacent provinces (anhui, chongqing city, henan, hunan, jiangxi and shaanxi), v) mainland china excluding hubei, vi) mainland china excluding wuhan, and vii) mainland china. date periods were generated using three key date stamps, including january , (when the first confirmed cases were reported in wuhan), january (when the large-scale outbreak started) and january (the end of the first week of the largescale outbreak). based on the above datasets of covid- cases in mainland china and two categories of location-based service data of mobile devices from the epidemic sources, we calculated their pearson correlation coefficient, r, and spatial stratified heterogeneity (ssh), q, statistics. pearson correlation is usually used to evaluate the linear association between two variables and calculated as follows: ( ) where rxy denotes the correlation coefficient of covid- spatiotemporal spread and human migrations from the epidemic source, within the period from the start date to a given end date t. ys,t is the cumulative number of daily confirmed cases at city s and xs,t is the cumulative number of device trajectories from the epidemic source, with the mean values of ̄ and , respectively. n is the number of the administrative cities in mainland china. in this study, we calculated two pearson correlations with the spatiotemporal data of two categories of trajectories, , ( ) and , ( ) , to explore the associations between the epidemic spread and the human migrations from the seafood market and wuhan, respectively. the geodetector q statistic is generally applied to quantitatively evaluate the ssh of an explained j o u r n a l p r e -p r o o f variable (wang et al. (wang et al. , , and assess the determinant power of explanatory variables and their interaction, without linear assumptions (yin et al. ) . the fundamental formula of the q statistic is given by: where q is the determinant power of the factor to the objective. n is the number of objective variable observations and σ indicates the variance of all the observations. the objective is stratified into l stratums, denoted by h = , , …, l, which is determined by the determinant factor. nh is the number of observations and ℎ is the corresponding variance within stratum h. the value of q ranges from to . we calculate q statistic to assess the determinant power of human migrations from the epidemic source to covid- spatiotemporal spread. similarly, the spatiotemporal data of two categories of trajectories can be applied to calculate two q statistics for the two epidemic sources. within the period from the start date to a given end date t, we implemented the stratification by the equalinterval division after ordering the trajectory data, xs,t, and divided all the observations into strata to calculate the q statistic of the cumulative trajectories, xs,t, to the cumulative cases, ys,t. this is a common stratification way to deal with the numerical independent variables (yin et al. ) , which can reduce the subjective influence of various stratifications to q statistics. moreover, for two or more determinant factors, an interaction q statistic can be calculated to measure their interaction influences (e.g., are they independent, or do they weaken/enhance each other?) (wang et al. ) . in this study, two categories of trajectories, , ( ) and , ( ) , were used to implement the stratifications and the corresponding q statistics were calculated, respectively, which are denoted by q (m) and q (w) . while the stratification was generated by the intersection between the above two individual stratifications, an interaction q statistic, q (m∩w) , can be calculated, where the symbol "∩" denotes the intersection between two strata layers. various interaction types can be defined according to the comparison between q (m) , q (w) and q (m∩w) (wang et al. ) . for instance, "q (m∩w) > q (m) and q (w) " indicates a bi-enhancement interaction between two categories of trajectories in facilitating the spread of the epidemic (see wang et al. for more details about the interaction q statistic). analyses in this study were performed with the use of the r software package (r foundation for statistical computing) and thematic mapping was implemented in the arcgis platform (esri). similar to the spatial distributions of the mobile device traces (figure ), the pearson correlations r and q statistics between the cumulatively summed cases and two categories of trajectories up to january , had a spatially distributed consistency with the population distribution among the administrative cities in mainland china ( figure ). two categories of trajectories had generally significant correlations and determinant powers of the epidemic spread, and both r and q decreased in distance from the epidemic sources. the first-generation and second-generation transmissions of the infection simultaneously appeared in many cities at the early stage of the outbreak. specifically, devices activated in the market displayed higher values of r and q in several small and medium cities than devices activated in wuhan city (figures a and c) . it is clear that many cities executed a quick response and activated transportation control measures, which helped control the first-generation epidemic spreads. the r and q statistics of the devices activated in wuhan, however, indicate that the second-generation spread still influenced many cities in the first week of the outbreak ( figures b, d and table ). the market trajectories received a much higher pearson correlation value to confirmed cases in wuhan (r= . , p< . ) than hubei province excluding wuhan (r= . , p< . ) and mainland china excluding hubei (r= . , p< . ) . the correlations of wuhan trajectories were . , . and . in the above three areas, respectively. the temporal correlation curves of both market and wuhan trajectories have obvious decreasing trends from january to , in wuhan ( figure a) , which indicates the potential start date of the third-generation epidemic spread. one week after this, market trajectories had higher pearson correlation values than wuhan trajectories, and the first-generation spread still had a serious influence in wuhan ( figure a) . similarly, in hubei province excluding wuhan, the potential start date of the third-generation spread was from january to ( figure b) . moreover, the second-generation spread played a dominant role in the areas outside wuhan, especially in hubei province excluding wuhan and the provinces adjacent to hubei, since wuhan trajectories had much higher values of correlations ( figures b and c ). we found no obvious turning dates in the areas outside hubei ( figures c and d) , and the potential third-generation spread remains to be determined. the curves have remained stationary since january in mainland china excluding hubei ( figure d ). the transportation control measures activated by many cities since january appeared to have been successful in partially controlling the first-generation and second-generation epidemic spreads outside hubei province. we focused on the first week of the large-scale outbreak and calculated the q statistics of the two device-activation categories in introducing cumulative confirmed cases in various areas (table ) . the determinant powers of both categories were extremely high and consistent in wuhan (q= . , p< . ). their temporal curves had the obvious decreasing trends from january to ( figure a ), which validated the start date of the third-generation spread in wuhan. similar validation was observed in hubei province excluding wuhan ( figure b ). two categories of trajectories can explain nearly % ssh of the epidemic spread in wuhan before the large-scale outbreak and the ssh increased constantly since the third-generation spread stage ( figure a ). the market and wuhan trajectories had close determinant powers in introducing the epidemic spread in hubei province (q= . , q= . , respectively, and p< . ). the q statistics reported that these two categories explained . % and . % ssh of the confirmed cases in hubei. the determinant powers of the epidemic spread in hubei province excluding wuhan were . (p< . ) and . (p< . ), respectively. the q statistic values decreased in distance outside wuhan or hubei and showed that the determinant powers in mainland china excluding hubei were . (p< . ) and . (p< . ), respectively. in the first week of the outbreak, wuhan trajectories received higher values of q statistics than market trajectories in hubei province excluding wuhan and in provinces bordering hubei ( figures b and c) . the second-generation spread contributed more influence in the areas surrounding the epidemic source. however, both two categories had close q statistic values in mainland china excluding hubei ( figure d ). the epidemic outside hubei province appeared as a balanced pattern of mixed first-generation and second-generation spreads. furthermore, the q statistics increased constantly outside hubei province, indicating the increasing ssh of the epidemic spread ( figures c and d ). more attention should be given to control of the trend of second-generation spread and to eliminate potential third-generation spread. taking into consideration of the interaction influences of two categories of trajectories, the interaction q statistics were calculated in various areas (table ) . all the interaction types were bienhancement which indicates that two determinant factors (i.e., two categories of trajectories originated from two epidemic sources) enhance each other (the interaction q statistic is higher than each single q statistic but lower than the sum of two single q statistics). the determinant powers and interactions of two categories of trajectories in introducing the epidemic spread decreased in distance from the source to the rest of the nation. the interaction q statistic was . (compared to the single q statistics of . and . ) in mainland china excluding hubei. the interaction q statistic was . (compared to the single q statistics of . and . ) in mainland china. although the interaction strength was weak, the combination of both trajectory categories still carried more information about the spread of the epidemic throughout the country. the interaction q statistic of two categories of trajectories in hubei province excluding wuhan was . , which was close to the sum of two single q statistics ( . and . ) and much higher than each one individually. this interaction indicates strong bi-enhancement in facilitating the spread of the epidemic. two categories of trajectories could significantly enhance each other to explain the ssh of the epidemic spread from wuhan to other areas in hubei province. the majority of the earliest cases of the covid- atypical pneumonia were linked to the seafood wholesale market in wuhan, which is the most severely-affected city of the covid- outbreak. the movements of populations from these two epidemic sources provided potential first-generation and second-generation spreads nationwide and worldwide. here, based on lbs-requesting mobile device traces and spatiotemporal confirmed covid- case data, we applied pearson correlation and geodetector q statistics to analyze the spatiotemporal association between the confirmed cases' dynamic and human movements. our findings provide important foundations to quantify the effect of human movement on the epidemic spread, to judge the epidemic generations, and to inform ongoing and future control strategies. we concentrated on two datasets of lbs-requesting mobile devices associated with two sources linked to the first-generation and second-generation spreads provincewide and nationwide. their traces were aggregated by date in administrative cities and linked to the spatiotemporal confirmed cases. it is notable that the covid- outbreak had a strong consistency with human migrations from the epidemic sources. the confirmed cases had a clear linear correlation with two categories of trajectories from the sources to the rest of the nation. moreover, both trajectory categories could generally indicate the epidemic spread in hubei province and explain to a certain extent the ssh of the spread from wuhan to the rest of hubei province and throughout the rest of china. our analyses provide a new perspective to explore the spread of the epidemics linked to human movement. during the first week of the large-scale outbreak, the epidemic spread showed a spatially distributed consistency with the population distribution in mainland china. the majority of cities with confirmed cases had a mixed pattern of first-generation and second-generation spreads at the very beginning of the outbreak. many cities activated quick response within - days and achieved efficient results in inhibiting the first-generation spread outside hubei province. however, it still had a significant impact in hubei province, especially playing the dominant role inside wuhan city. furthermore, among the other cities in hubei province, the first-generation and second-generation spreads enhanced each other with a much higher interaction q statistic. this might be another signal to identify the potential start date of the third-generation spread in a specific area. due to the quick response and strict control measures in many cities, the interaction enhancement of the firstgeneration and second-generation spreads had a weak strength outside hubei province. there is no evidence that any third-generation spread appeared outside hubei in mainland china in the first week of the outbreak. nevertheless, hubei's adjacent provinces require more effective control measures, since the first-generation and second-generation spreads had an increasing trend. our analyses determined an appropriate approach to explore the spatiotemporal association between the epidemic transmission and human movement. two categories of lbs-requesting mobile devices were used in this study to identify the potential close contacts to the primary and secondary epidemic sources. the datasets covered most devices with lbs requests in the given region and time period. however, the linkage between mobile devices and populations could be subject to information loss (e.g., users may replace their mobile devices with new ones). it is also extremely difficult to cover % potential close contacts in our datasets. the close contacts of these two populations while traveling before/after the outbreak were not collected, and therefore we cannot estimate the potential third-generation cases and their movements. this limitation involves future work with more universal-source data and high-performance computing capabilities. the covid- epidemic data were collected through publicly available sources, and we processed the data of confirmed cases and device traces in the spatial scale of cities. small-scale analyses could be more helpful to construct epidemic control programs in counties or communities within a city. the spatiotemporal association between the spread of the epidemic and human movements indicates a transfer from second to third generations of the infection. this approach has made it possible to assess the start date of the third-generation spreads of covid- epidemic and the interactions between first-generation and second-generation spreads across various regions all over the country. the proposed technique incorporating location-based service data of mobile devices can help identify the spatiotemporal generations at the early stage of the covid- epidemic. it can be easily implemented and extended to the early exploratory study of other epidemics similar to covid- . the results indicate the spatiotemporal characteristics of the epidemic spread associated to human movements from epidemic sources and the potential spatiotemporal risks at the early stage of the outbreak. control measures varying by location and time could be executed in different levels for various regions. for instance, cities with obvious third-generation spread require the strictest controls on both the exportations and the inside quarantine, cities should pay more attention to the importations and the inside quarantine if the first-generation and second-generation spreads have the strong interactive enhancements, and other cities require to focus on the control of the importations. in conclusion, we found that the third-generation spread of the covid- outbreak probably started during january to , in wuhan, the potential start date of the third-generation spread in hubei province excluding wuhan was from january to , and the mixed first-generation and second-generation spreads strongly interacted to enhance the epidemic. the trend of the interactions between the first-generation and second-generation spreads was significant in the provinces adjacent to hubei. the associations between the epidemic spread decreased with distance and had different temporal pattens from the epidemic sources, implying the potential epidemic generation-togeneration evolution on regional spatial scales. at the very beginning of the outbreak, the mixed first-generation and second-generation spreads appeared in most cities with confirmed cases. no obvious third-generation spread was detected outside hubei province. the strict transportation measures implemented in many cities appeared to have been effective in preventing any thirdgeneration spread nationwide. the urgent control measures in hubei province include weakening the third-generation spread and the interaction influence of the first-generation and secondgeneration spreads. even with strict control strategies, effective measures to reduce transmission in the community are still required (li et al. a) . a large increase in migration due to people returning from travel after the new year holiday also introduces challenges to epidemic control . we recommend the urgent control measures of preventing potential thirdgeneration spread in mainland china, eliminating it in hubei, and reducing the interaction influence of first-generation and second-generation spreads. no individual data was collected and the ethical approval or individual consent was not applicable. the lbs-requesting mobile device data were provided by wayz inc., shanghai, china and are not available for distribution due to the constraint in the consent. the dataset of the covid- cases is available from multiple public sources. this work was supported by the national natural science foundation of china ( ) , the national science and technology major project of china ( yfc ) and the science and technology major project of jiangxi province, china ( ybbgw ). the funders had no role in study design and conduct; data collection, management, analysis and interpretation; manuscript preparation, writing and review; decision to submit the manuscript for publication. conceptualization we declare no competing interests. public platform of the -ncov-infected pneumonia epidemic the effect of travel restrictions on the spread of the novel coronavirus (covid- ) outbreak new sars-like virus in china triggers alarm first case of novel coronavirus in the united states early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia potential of large 'first generation' human-to-human transmission of -ncov transmission dynamics and control of severe acute respiratory syndrome the sars, mers and novel coronavirus (covid- ) epidemics, the newest and biggest global health threats: what lessons have we learned? transmission dynamics of the etiological agent of sars in hong kong: impact of public health interventions transmission of -ncov infection from an asymptomatic contact in germany a novel coronavirus outbreak of global health concern. the lancet what to do next to control the -ncov epidemic? the lancet geographical detectors-based health risk assessment and its application in the neural tube defects study of the heshun region, china a measure of spatial stratified heterogeneity emergency committee regarding the outbreak of novel coronavirus ( -ncov) nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study. the lancet mapping the increased minimum mortality temperatures in the context of global climate change a novel coronavirus from patients with pneumonia in china we thank dr. adam thomas devlin at the school of geography and environment, jiangxi normal university for the assistance in the proofreading work for the manuscript. j o u r n a l p r e -p r o o f key: cord- - ra r v authors: liu, fenglin; wang, jie; liu, jiawen; li, yue; liu, dagong; tong, junliang; li, zhuoqun; yu, dan; fan, yifan; bi, xiaohui; zhang, xueting; mo, steven title: predicting and analyzing the covid- epidemic in china: based on seird, lstm and gwr models date: - - journal: plos one doi: . /journal.pone. sha: doc_id: cord_uid: ra r v in december , the novel coronavirus pneumonia (covid- ) occurred in wuhan, hubei province, china. the epidemic quickly broke out and spread throughout the country. now it becomes a pandemic that affects the whole world. in this study, three models were used to fit and predict the epidemic situation in china: a modified seird (susceptible-exposed-infected-recovered-dead) dynamic model, a neural network method lstm (long short-term memory), and a gwr (geographically weighted regression) model reflecting spatial heterogeneity. overall, all the three models performed well with great accuracy. the dynamic seird prediction ape (absolute percent error) of china had been ≤ . % since mid-february. the lstm model showed comparable accuracy. the gwr model took into account the influence of geographical differences, with r( ) = . % in fitting and . % in prediction. wilcoxon test showed that none of the three models outperformed the other two at the significance level of . . the parametric analysis of the infectious rate and recovery rate demonstrated that china's national policies had effectively slowed down the spread of the epidemic. furthermore, the models in this study provided a wide range of implications for other countries to predict the short-term and long-term trend of covid- , and to evaluate the intensity and effect of their interventions. novel coronavirus pneumonia (coronavirus disease , covid- ) break out firstly in wuhan, hubei province, china in december , then the epidemic became prevalent in the rest of the world. with the research on covid- so far, through the comparison of the gene sequence of the virus with that of the mammalian coronavirus, some studies found that its source may be related to bat, snake, mink, malayan pangolins, turtle and other wild animals [ ] [ ] [ ] [ ] . covid- can also cause severe respiratory diseases such as fever and cough [ ] , and there is a possibility of transmission after symptoms of lower respiratory diseases [ ] . however, unlike sars-cov and mers-cov, covid- is separated from airway epithelial cells of patients [ ] , yet the mechanism of receptor recognition is not consistent with sars [ ] . therefore, the pathogenicity of covid- is less than that of sars [ ] , and its transmissibility is higher than that of sars [ ] . in addition, this new coronavirus presents human-to-human transmission [ ] , and close contact could lead to group outbreaks [ ] . as of july th, , , confirmed cases and , deaths had been reported in china [ ] . in addition to china, there are over countries and regions in the world with a total of , , of confirmed cases and , of deaths [ ] . the outbreak of covid- happened right before the lunar new year, which is typical chinese spring festival transportation period. with a population of over million, wuhan is one of the major transportation hubs in china as well as a core city of the yangtze river economic belt. the time and location of the outbreak further led to the rapid spread of the epidemic in china [ ] . since there is still no vaccine or antiviral drug specifically for covid- , the government's policies or actions play an important role in flatting the epidemic curve [ ] . from the perspective of public health, the interventions of wuhan government have achieved the purpose of reducing the flow of people and the risk of exposure to the diagnosed patients, and also effectively slowed down the spread of the epidemic [ ] . nevertheless, covid- can be transmitted by asymptomatic carriers [ ] , and some of the recovered patients may still be virus carriers [ ] . in order to implement non-pharmaceutical interventions more effectively, we used a combination of epidemiological methods, mathematical or statistical modeling tools to provide valuable insights and predictions as benchmarks. for the study of infectious diseases like covid- , sars, and ebola, most of the literature used descriptive research or model methods to assess indicators and analyze the effect of interventions, such as combining migration data to evaluate the potential infection rate [ , ] , understanding the impact of factors like environmental temperature and vaccines that might be potentially linked to the diseases [ , ] , using basic and time-varying reproduction number (r & r t ) to estimate changeable transmission dynamics of epidemic conditions [ ] [ ] [ ] [ ] [ ] [ ] , calculating and predicting the fatal risk to display any stage of outbreak [ ] [ ] [ ] , or providing suggestions and interventions from risk management and other related aspects based on the results of modeling tools or historical lessons [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . some literature only used one kind of model to simulate and predict the course of diseases. for instance, to use relatively common epidemiological dynamics models like seir or sird to forecast epidemic trends and peaks in certain provinces, even the world [ , [ ] [ ] [ ] [ ] [ ] ; to apply some other types of statistical models such as the logistic growth models or time series approaches to analyze the epidemic situation [ , ] , or to develop new models to support more complex trajectories of epidemics or to predict the number of confirmed cases and the spatial progression of outbreaks [ ] [ ] [ ] . several studies were further expanded based on the basic epidemic dynamic models. for example, joining the border protection mechanism with the seir model to better identify high-risk groups and infected cases [ ] ; adding the effect of media or awareness into basic models to assess whether these outside influences would possible change the transmission mode of infectious diseases [ , ] ; or according to transmission routes contained in dynamic models, using a multiplex network model or transmission network topology to analyze the outbreak scale and epidemic spread more accurately [ , ] . a small number of studies combined the analysis capabilities of two types of models, like seir model and the recurrent neural networks model (rnn), to determine whether certain interventions could affect the results of outbreak control [ ] . however, we did not find any analysis method using geographically weighted regression (gwr) on covid- study based on our literature research. there is also a lack of understanding the model efficacy of predicting the epidemic curve among different algorithms. in this study, an seir's extended model seird was used to simulate the epidemic situation in china and to predict the number of confirmed and cured cases in each province and several major chinese cities. an lstm model combined with traffic data and a gwr model were used to predict the number of confirmed patients. specifically, gwr model showing geographical differences was used to predict the development of epidemic situation and analyze the impact of geographical factors. this paper also compares the characteristics and prediction ability of these models. in the absence of vaccines and drugs for covid- , it makes sense to use multiple models to show the situation and intensity of non-pharmaceutical interventions needed to simulate and guide the control of outbreaks. daily updated covid- epidemiological data used in this study were retrieved from national health commission of china [ ] and accessed via https://github.com/wybert/openwuhan-ncov-illness-data. the daily number of outbound from wuhan city and relevant migration indice from january to march were collected from an online platform called baidu qianxi [ ] . the demographic data and medical resources data were from china urban statistical yearbook published by the national bureau of statistics as shown in s table. this study used seird model and the changes in the status of the susceptible (s), exposed (e), infected (i), recovered (r) and dead (d) population in the total population (n) are shown in fig . according to the medical characteristics and clinical trials of covid- , both confirmed patients and asymptomatic carriers have the ability to transmit the virus. therefore, susceptible people have a certain chance to become infected after they come into contact with exposed or infected individuals [ ] . carriers in the exposed status may develop obvious symptoms after the incubation period and become diagnosed or they may be recovered. the final status of individuals can be basically divided into two categories: one is the recovery from the combined effects of treatment in hospital and autoimmunity, and the other is the death without effective treatment. in the model formula, the infectious rate β needs to be adjusted in real time to adapt to the trend of disease development. in the middle and late stages of the epidemic, the number of daily new cases decreased significantly due to the positive influence of government policies. thus, to better fit the model, we added an attenuation factor desc to β. based on the basic seird model formulas [ , ] , our modified model was shown as eqs ( ) ( ) ( ) ( ) ( ) ( ) . here, the parameter t denotes the time; β is the infectious rate; α is the rate for the exposed to be infected; γ is recovery rate for the exposed; γ is the recovery rate for the infected; k is the mortality rate; "desc" is the attenuation factor for β, so that β decays exponentially when