key: cord-0819041-zyn8p6w9 authors: Pirouz, B.; Javadi Nejad, H.; Violini, G. title: Swab Tests and COVID 19, Italy case studied using Artificial Intelligence, Statistical Analysis and MLR date: 2020-06-04 journal: nan DOI: 10.1101/2020.06.02.20120394 sha: d96e9462ae7395570b922193e79cadbba008571d doc_id: 819041 cord_uid: zyn8p6w9 Abstract: Objectives: The outbreak of the new Coronavirus (COVID-19) pandemic has prompted investigations on various aspects. This research aims to study the possible correlation between the numbers of swab tests and confirmed cases of infection, with special attention to the sickness level. Study Design: The study is carried out with reference to the Italian case, but the result is of more general importance, in particular for countries with limited availability of ICUs (intensive care units). Methods: The statistical analysis shows correlation between the number of swab tests and those of daily positive cases, mild cases admitted to hospital, intensive care cases, recovery, and death rate, and provides a basis to carry on an AI study. The results were validated using a multivariate linear regression (MLR) approach. Results: Our main result is the identification of a significant statistical effect of reduction of the pressure on the Health system as result of the increase of the tests. The relevance of this result is not confined to the COVID-19 outbreak, because the high demand of hospitalizations and ICU treatments due to this pandemic has an indirect effect on the possibility of guaranteeing an adequate treatment for other high-fatality disease, such as e.g. cardiological, and oncological. Conclusions: Our results show that swab testing may play a major role to decrease the stress on the Health system of a country. Therefore, this case study is relevant in particular for the planning of the control of the pandemic in countries with a limited capacity of admission to ICU s units. The pandemic COVID-19 disease has been extensively studied considering several different aspects. Virologic studies [1] have investigated virus genoma, proving its natural origin and studying its mutations. Sociological and economical studies have considered the changes in lifestyle and the economic and sustainable development effects [2] [3] [4] [5] correlated with non-pharmacological actions aimed to control the pandemic spread., such as lockdown [6, 7] , border closure, social distancing. Other studies focused on the mechanisms and times of incubation [8] , resistance and dynamics of transmission [9] [10] [11] , environmental factors and conditions, like the role of UV and climate [12] [13] [14] [15] . A large number of studies have been dedicated to statistical assessment methods and modeling of the outbreak, usually considering the number of positive cases [16] [17] [18] [19] , of active cases or of the deaths and intensive care hospitalizations [20, 21] and the health risks [22, 23] . A problem that has been object of much attention is the identification of infected people. It covers two different aspects, namely the physical identification of infected and contagious asymptomatic people, and the identification of people who have been infected and are expected to be no longer susceptible of infection. Swab testing is the usual procedure to assess experimentally the former one which otherwise could only be inferred through statistical studies [16] . In the last month in Italy, country that in the early days of the pandemic was second only to South Korea for these tests, there has been an acceleration on the use of these tests (although it should be noticed that one third of the tests was applied to people already tested at least once) and there was a significant reduction of the active cases [24] . The use of swab testing is growing in many countries as a method to fight the increasing rate of the pandemic [25, 26] . One element which is relevant for this purpose is the criterion and aim used to decide whether and to whom apply the test. Besides being a verification of recovery, this testing may be used for detection of infected people and in this case the test of random samples is opportune [17] or for the early identification of mildly infected people, which, to some extent, was actually dominant in Italy. A broad literature exists about the possibility that these tests provide a wrong answer. Gray et al. (2020) discussed the impact of this and concluded that no test would be better than a bad test [27] . In this context it must be observed that there is an important statistical difference between wrong positive and wrong negative tests [28] . Eberhardt et al. (2020) analyzed a multi-stage group testing method for testing large populations and showed that such a method would be more efficient than individual testing [29] . George and Wang (2020) , analyzed the variations in testing and reporting of confirmed cases of COVID-19. Their results show the advantages of accurate scale mass and systematic tests [30] . Corman et al. (2020) determined how the tests could affect the treatment and the benefits of mild case detection [31] and to some extent our study goes in the same direction. These studies show the importance and interplay, in COVID-19 investigation, of several variables, among which in particular swab testing. This research studies the statistical correlation between the number of swab tests and those of daily positive cases, mild cases admitted to hospital, intensive care cases, recovery, and death rate, in order to assess the role of such a systematic test for controlling the COVID-19 epidemic and reduce the pressure on the health system. The result of this research allowed us to identify certain output variables for which would be convenient to use an AI approach to study their correlation. Finally, the results of these analysis were validated using a MLR technique. We use the available COVID-19 data in Italy and six regions, to explore the possible existence of correlations between the daily number of swabs (Swabs), and those of home isolated patients, of patients admitted to hospital but in mild condition (Mild hosp), of patients treated in the intensive care units of an hospital (Int. care hosp) and of daily deaths and daily new cases. The approach of the study is based on the p-value, Beta coefficient, and T-test. First, in order to analyze the possible impact of daily swabs on the other variables, the datasets were divided into two periods. The statistical analysis was done for each of them and the results of the two analyses compared with each other. For each case study, in order to select the day of splitting the whole period in two periods, we first calculated the total average number of daily swabs, and then looked for the date when the number of daily swabs exceeded that average at least in three consecutive days. That date was selected for the splitting. As next step, Machine Learning through ANN (Artificial Neural Network) has been used to investigate by Artificial Intelligence the correlations between the output variables of the statistical analysis. For Machine Learning, the variables were selected on the basis of the results of the statistical analysis, considering those that have the most significant relationship with the number of daily tests (Swab Tests). For the validation of the ANN model results, MLR (Multivariate Linear Regression) with the same variables of the ANN model has been used and a prediction equation provided. The flowchart of our analysis is presented in Figure 1 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120394 doi: medRxiv preprint In this study, we analyze the relevant datasets of six regions in Italy. Moreover, the analysis has also been carried out for the whole of Italy. The location of the six regions is shown in Figure 2 and the details of the selected case studies are presented in Appendix A. Mathematical modeling and statistical analysis are a tool widely used in various fields, e.g. natural and human hazards [35] [36] [37] and for many phenomena, including hazards and optimization [38, 39] . In all these approaches it is essential to express the problem in form of an equation or a model [40] [41] [42] [43] [44] [45] [46] . Due to the complexity of the natural phenomena, several criteria can be used to obtain the best mathematical modeling for a given problem [47] [48] [49] [50] [51] . Our purpose is to study the effect of an independent variable on one or more dependent variables, and to assess the possible existence of a relationship between them. An efficient statistical technique to achieve this, is to evaluate a coefficient, the beta coefficient, that measures the degree of change in a dependent variable (outcome variable) in correspondence with a 1-unit change in the independent variable (predictor variable). Of course, the Beta coefficient value can be negative or positive. This quantitative evaluation makes sense only if the possible existence of a significant relation between the two variables has been previously assessed. This is established by evaluating another parameter, the p-value, whose value, if smaller than 0.05 (or confidence level of 95%), indicates the existence of a significant correlation between the considered variables [52] . The set-up of the SPSS Model is presented in Appendix B. Our first analysis refers to the global country dataset. The results of the correlations between swabs number and the output variables in the two periods are presented in Table 1 . and As shown in Table 1 , in Italy the average number of the independent variable (swabs) increased significantly by 3.12 times. This is compared with the variation in the same period of the outcome variables. One sees that this increase was accompanied by a significant increase of the average number of home-isolated patients (4.57 times), Mild Hosp cases (2.00 times), and Int Care Hosp cases (1.49 times). Instead, as indicated by the corresponding p-values, the increase by a factor 1.29 in the average number of Daily Deaths, and the decrease by a factor 0.98 in the average number of Daily New Cases were not significant. For what concerns the relationship between the daily swabs and the outcome variables, Table 2 , shows that during the first period (1 March to 31 March) it was significantly positive for each of the five variables. During the second period (1 April to 30 April), the correlation with Home isolation remained significantly positive, those with Mild hosp, Int. care hosp, and Daily deaths, became significantly negative, and that with Daily new cases, although turned into negative, was not significant. The analysis at regional level presented in Appendix C. As the results of Table 2 . show, the number of total home isolation, total mild hospital, and intensive care in the hospital changed significantly with the number of swabs, whereas for the Daily new cases this happens only for the first period. On the other hand, increasing the number of swabs can only be related to new home isolation cases and mild hospital, without connection with that of ICU cases. Therefore, the variables we selected for this analysis were Daily new cases, Total home isolation, and Total mild hospital, and the dependent variable was total Intensive cases in Hospital. One can suspect that there exists a correlation between them and their daily variation. In this section, we shall investigate this possible correlation using the ANN method to calculate the Intensive care number. The results of the model including train and test of the ANN algorithm are presented in Table 3 . and the comparison . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . between the prediction values and real data in Figure 5 . It shows that AI is a powerful tool for the study of our problem and this is confirmed by the value of the final R 2 . Additionally, it confirms the results of the previous statistical analysis on swab-output variables correlations. The impact of each variable on Intensive case in the hospital is shown in Figure 6 . According to Figure 6 , the dominance of the role of Mild hospital over home isolation for the impact on the number of intensive case in the hospital is evident. In this section we summarize the results of the previous section. The first part of our statistical analysis was based on T-test. In Italy, during the period we analyzed (March-April) there was a , - . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . significant change on the number of swab tests. At country level, one can say that there was a first period (March) with a country average of about 15000 tests per day, and that this average grew more than three times in the following month. Analogous increases in the number of tests are observed at regional level, although they are probably also sensitive to the actual level of development of the pandemic in the specific region. This preliminary analysis confirms that there was a significant change in the application of these tests. This supports the hypothesis that such a change may be reflected on the indicators that describe the level of development of the pandemic. To test this hypothesis, we carried out a statistical analysis of the possible correlation between the number of tests and that of contagious people, the recovered ones and the deaths. At national level, the hypothesis was confirmed with the exception of the daily new cases. These correlations generally are observed also at regional level, although minor deviations may occur. In particular these deviations seem to be sizeable in Emilia-Romagna and Veneto and this may find an explanation in the improvement of the situation in these two regions during the month of April [19] . This also happened with respect to the number of deaths. As it might have been expected, we found that with the increase of the number of tests in the second period the correlation between the death rate and the number of swab tests, considerably diminished with respect to the first period. Instead, significant correlations were generally found for the three variables of the outmost importance for what concerns the health system stresses related to the total number of cases, and to a less advanced level of development of the sickness. The evidence is stronger in the regions where the pandemic spread more. Of course, this discussion must take into account the multiple reasons that lead to apply the test. This means that in any case only a fraction of those actually applied can be related to the variables we studied, but this can be estimated to be only a second order effect. The last part of our analysis had a different purpose. Artificial Intelligence are a powerful tool to analyze a large number of data to detect possible correlations between them. We applied this technique to study one of the most critical aspects of the pandemic, the admission to ICUs. We find a very good agreement with the observed number of ICU admission. This aspect must not be underestimated, when one takes into account that the stress on ICU is probably one of the most critical factors to take into account when planning the response to the pandemic. In the specific case of Italy, when the peak shown in Figure 5 . was reached, the value of ICU hospitalizations was close to the capacity of response of the Health system, although it was planned for a couple of weeks later a 40% increase of the available ICU units [53] . We recall that Italian ICU capacity was then in the order of 120 ICUs per million population, and it must be underscored that this is a high performance, in compare with different countries as presented in Appendix D. A validation of our results on the role of swabs is provided by the determination of a prediction equation obtained using MLR. We already pointed out the usefulness of the swabs to identify infected people without symptom who then would be home isolated. The variables for our MLR analysis are the total number of Home isolation (x1), that of mild condition admissions to hospital (x2) and that of Daily new cases (x3). These variables are used as independent variables to determine as dependent variable, y, the total number of patients admitted in the ICUs of a hospital. The prediction equation we obtain is , with R 2 = 0.995 Table 4 . provides the details of the analysis and the values of the beta coefficients show that the highest impact variable is x2 followed by the variable x1. We notice that the increase in the number of home isolation cases that we found to be correlated to that of swab tests is associated to a decrease of the total number of patients admitted to hospital intensive care units. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . We studied the statistic correlation between the number of applied swab tests and the most important indicators of the level of development of the COVID-19 pandemic, in Italy and in six Italian regions, four chosen on the basis of the diffusion of the pandemic and the remaining two on the basis of the size of their population. This analysis was accompanied by one using AI techniques and validated through an MLR. Our results show that swab testing may play a major role to decrease the stress on the Health system of a country. Therefore, this case study is relevant in particular for the planning of the control of the pandemic in countries whose capacity of admission to ICU´s units is limited. We are grateful to Luigi Grasso for an interesting discussion that stimulated this research, and to Professor Manlio Gaudioso and Dr. Agostino Gnasso for their critical reading of the manuscript. Funding: This research received no external funding. Ethical approval: No ethical approval was required for this study. All of the used data is available in the cited websites with free access for public and all of the articles in literature reviews are cited properly. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120394 doi: medRxiv preprint Appendix A: The details of the selected case studies Our analysis covers the period from beginning of March to the end of April. Four out of the six regions we analyzed, namely Lombardy, Piedmont, Emilia-Romagna and Veneto, are those where the outbreak has been stronger. All of them are located in Northern Italy. The remaining two, Campania and Sicily, belong to the Southern part of Italy, and have been selected because the size of their population is similar to that of three of the four Northern regions. • Number of Daily Deaths; The analysis method proceeded as discussed in the previous section. • Use of the Independent-Samples T-Test: The Independent-Samples T-Test is used to test the existence of significant differences (with a confidence level of 95%), between the averages of two groups of data. We recall that our dataset was splitted in two subsets on the basis of the comparison of the daily number of tests with its average over the full period. The T-test was applied both to the independent variable (Swabs) and to the dependent variables (Home isolation, Mild hosp, Int. care hosp, daily deaths, and daily new cases) in order to determine whether the comparison between the two periods of the corresponding data presents a significant difference. • Evaluation of the p-value: If the result of p-value coming from the comparison of the independent variable (Swabs) and each dependent variable (Home isolation, Mild hosp, Int. care hosp,) is less than 0.05, then the relationship between the two variables is considered to be significant and we proceed to the calculation of the Beta coefficient. If instead the p-value is larger than 0.05, the significance of the relationship is excluded. • Evaluation of the Beta coefficient: A positive (negative) value of this coefficient means that for every 1-unit increase (decrease) in the predictor variable (swabs number in our case), the outcome variable increases (decreases) by an amount equal to the value of the beta coefficient. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120394 doi: medRxiv preprint Appendix C: The correlations between swabs and other variables at regional level The global analysis at country level makes interesting to check whether the global trends we found are shared also at local level in the six regions we are considering. We recall that together these six regions account for 57.3% of the country population. We present below the results of the regional analysis of the correlations between the number of swabs and those of the outcome variables, in two different periods determined through the procedure described in Section 2. Tables C1 and C2 refer to Lombardy, Veneto and Piedmont for which data are available from March 1, and Tables C3 and C4 to Emilia-Romagna, Campania and Sicily, for which the first day for which data are available was different. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . Table C1 , we can say that: • In the three regions, there has been a remarkable increase in the average number of daily tests. In Lombardy by a factor 2.48, in Veneto by a factor 2.64, and in Piedmont by a factor 3.69 Against this increase: • The average number of home isolated and Mild Hosp increased significantly in all of them. • The average number of Int Care Hosp also increased everywhere, although in Piedmont no significantly. • Daily New Cases exhibit a general decrease that is not significant • Daily Deaths decreased (not significantly) in Lombardy and significantly increased in Piedmont and Veneto. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . Table C2 , can be summarized as follows: • In the first period, in each of the three regions the global Italian result was confirmed, and one finds a significantly positive correlation between the number of swabs and that of each outcome variable. • In the second period, whereas for Home isolation there is a positive correlation, significant only in Lombardy, all the other correlations turn to negative, generally in a significant way, the only exceptions being Daily new cases in all of the three regions and Daily deaths in Veneto and Piedmont. To summarize the T-test results of Table C3 , we can say that: • In the three regions there has been a remarkable increase in the average number of daily tests. In Emilia-Romagna by a factor 2.18, and of the order of a factor 3.5 in Campania and . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 4, 2020. . Sicily, although one may observe that in the last two regions the average in the first period was quite low. Against this increase: • The average number of home isolated and Mild Hosp increased significantly in all of them. • The average number of Int Care Hosp also increased everywhere, but the increase was significant only in Emilia-Romagna. • For what concerns Daily New Cases no significant variation was observed in Campania, differently from Emilia-Romagna and Sicily where there was a significant decrease. • The variations in Daily Deaths were not significant The correlation analysis of Table C4 . can be summarized as follows: • In the first period, in each region the global Italian result was confirmed with a significantly positive correlation between the number of swabs and that of each outcome variable. Sicily to . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . some extent is the only partial exception, since the relationship for Int.care hosp, although positive, did not reach the level of being significant. • In the second period, in the case of Emilia-Romagna there was a general turn from positive to negative relationship, but without arriving at the level of being significant. In Campania, Home isolation was positively, but not significantly correlated, whereas all the other variables exhibited a significantly negative correlation. In Sicily we found two cases of significant correlation (Home isolation, positive, and Int. care hosp, negative) and a no significant indication of negative correlation for the remaining three variables. The data for other countries can be found in the studies of Prine and Wunsch (2012), and Rhodes and Moreno (2012) [57, 58] The proximal origin of SARS-CoV-2 Phylogenetic network analysis of SARS-CoV-2 genomes Development of an assessment method for evaluation of sustainable factories A new multi-objective dynamic model to close the gaps in sustainable development of industrial sector The Role of the Extensive Green Roofs on Decreasing Building Energy Consumption in the Mediterranean Climate The Role of Power Consumption and Type of Air Conditioner in Direct and Indirect Water Consumption Lockdown may partially halt the spread of 2019 novel coronavirus in Hubei province, China., medRxiv preprint The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science. 2020. 1-7 Persistence of coronaviruses on inanimate surfaces and its inactivation with biocidal agents Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia Importation and human-to-human transmission of a novel coronavirus in Vietnam Relationship between average daily temperature and average cumulative daily rate of confirmed cases of COVID-19, medRixv The possible roles of solar ultraviolet-B radiation and vitamin D in reducing case-fatality rates from the 1918-1919 influenza pandemic in the United States Investigating a serious challenge in the sustainable development process: analysis of confirmed cases of covid-19 (new type of coronavirus) through a binary classification using artificial intelligence and regression analysis Prioritizing and Analyzing the Role of Climate and Urban Parameters in the Confirmed Cases of COVID-19 Based on Artificial Intelligence Applications A Time-dependent SIR model for COVID-19 A novel approach to evaluate the asymptomatic COVID19 contagions Artificial intelligence forecasting of covid-19 in china Development of an Assessment Method for Investigating the Impact of Climate and Urban Parameters in Confirmed Cases of COVID-19: A New Challenge in Sustainable Development Differenze nazionali e regionali riguardo il COVID-19 Does COVID-19 have an internal clock? Or is Pandemic development anthropic? CIF preprint Effects of misleading media coverage on public health crisis: a case of the 2019 novel coronavirus outbreak in China Looming threat of COVID-19 infection in Africa: act collectively, and fast. The Lancet Coronavirus: qual è il numero (vero) di persone sottoposte ai tamponi in Italia? Jefferson Parish testing sites expand to anyone showing symptoms of COVID-19 Coronavirus testing: How it is done, when you should get one and how long results take No test is better than a bad test: Impact of diagnostic uncertainty in mass testing on the spread of Covid-19, medRxiv preprint Multi-Stage Group Testing Improves Efficiency of Large-Scale COVID-19 Screening, medRxiv preprint The proportion testing positive for SARS-COV-2 among the tested population in the U.S.: Benefits of the positive test ratio under scaled testing scenarios, medRxiv preprint Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Application of fuzzy logic for performance evaluation of academic students A computational approach based on the -constraint method in multi-objective optimization problems A Computational Algorithm Based on Normalization for Constructing the Pareto Front of Multiobjective Optimization Problems Optimizing Rainwater Harvesting Systems for Non-potable Water Uses and Surface Runoff Mitigation New Mathematical Optimization Approaches for LID Systems Test Case Optimization Using Artificial Bee Colony Algorithm. Advances in Computing and Communications A formal Model for Component-Based Software Gbest-guided artificial bee colony algorithm for numerical function optimization Flexible job shop scheduling problems by a hybrid artificial bee colony algorithm Applying particle swarm optimization to software testing A Greedy Particle Swarm Optimization Strategy for T-way Software Testing A survey of swarm algorithms applied to discrete optimization problems. Swarm Intelligence and Bio-inspired Computation: Theory and Applications Charged-particle therapy in cancer: clinical uses and future perspectives Integrated beam orientation and scanningspot optimization in intensity-modulated proton therapy for brain and unilateral head and neck tumors Combinatorial optimization in radiotherapy treatment planning Current and future accelerator technologies for charged particle therapy A stochastic approach to full inverse treatment planning for charged-particle therapy Research Design and Statistical Analysis: Published by Routledge La situazione dei posti di terapia intensiva in Italia AdminStat, Maps, analysis and statistics about the resident population Available online: Available online The Countries Critical with the most critical care beds per capital International comparisons of intensive care: informing outcomes and improving standards Intensive care provision: A global problem, Rev Bras Ter Intensiva