key: cord-0050835-hjin7xoc authors: Kumar, Sanjay title: Use of cluster analysis to monitor novel coronavirus-19 infections in Maharashtra, India date: 2020-08-21 journal: Indian J Med Sci DOI: 10.25259/ijms_68_2020 sha: 17aa1f1b22b920aee91e4d02a45cccf4fe66a664 doc_id: 50835 cord_uid: hjin7xoc OBJECTIVES: A novel coronavirus disease (COVID-19) has been continuously spreading in almost all the districts of the state Maharashtra in India. As a part of the healthcare management development, it is very important to monitor districts affected due to novel coronavirus (COVID-19). The main objective of this study was to identify and classify affected districts into real clusters on the basis of observations of similarities within a cluster and dissimilarities among different clusters so that government policies, decisions, medical facilities (ventilators, testing kits, masks, treatment etc.), etc. could be improved for reducing the number of infected and deceased persons and hence cured cased could be increased. MATERIAL AND METHODS: In the study, we focused on COVID-19 affected districts of the state Maharashtra of India. We applied agglomerative hierarchical cluster analysis, one of data mining techniques to fulfill the objective. Elbow method was used for obtaining an optimum number of clusters for further analysis. The study of variations among various clusters for each of the variables was performed using box plots. RESULTS: Results obtained from the Elbow method suggested three optimum numbers of clusters for each of the variables. For confirmed and cured cases, cluster I corresponded to the districts BI, GO, ND, PA, SI, WS, JN, CH, OS, HI, NB, JG, RT, LA, KO, AM, ST, BU, DH, AK, YTL, SN, AH, SO, AU, RG, NG, NS and PL. Cluster II corresponded to the districts TH and PU and cluster III corresponded to the district MC. For the death cases, cluster I corresponded to the districts BI, GO, ND, PA, SI, WS, JN, CH, OS, HI, NB, JG, RT, LA, KO, AM, ST, BU, DH, AK, YTL, SN, AH, SO, AU, RG, NG, NS, PL and TH. Cluster II corresponded to the district PU and cluster III corresponded to the district MC. CONCLUSIONS: The study showed that the district MC under cluster III was affected severely with COVID-19 which had high number of confirmed cases. A good percentage of cured cases were found in some of the districts under cluster I where six districts (GO, SI, CH, OS, SN) had 100% success rate to cure patients. It was observed that the districts TH, PU and MC under clusters II and III had severe conditions which need optimization of medical facilities and monitoring techniques like screening, closedown, curfews, lockdown, evacuations, legal actions, etc. On March 9, 2020, the first case was confirmed in the state Maharashtra (MH) and on March 13, 2020, the state government declared an epidemic in five cities as well as the closure of commercial and educational establishments. The government banned public gatherings and events on March 14, 2020. Due to the severity of the cases in MH, the government imposed section 144 and lockdown on March 23, 2020 and further, sealed off all the borders in all the districts. The Indian government declared this outbreak an epidemic in all the states and union territories (UTs). All educational institutions and commercial offices were shutdown. On March 22, 2020, India announced a 14 h public curfew. Further, the Indian government on March 24, 2020, ordered a nationwide lockdown for 21 days (till April 14, 2020) and after the completion of the period of this lockdown, the central government extended the lockdown up to May 3, 2020. Several types of actions were taken by the state and UT governments to control the spread of the virus COVID-19. [1] The main objective of this study is to optimize screening, closedown, curfews, lockdown, evacuations, legal actions, etc., in affected districts or areas of which will be beneficial in understanding seriousness of the spread of COVID-19 so that the state government, local governments, doctors, the police, and others involved could improve their policies, decisions, and medical facilities such as ventilators, testing kits, and masks to reduce hot spots, number of infected and deceased persons. MH is a state of India which is situated in the western prominent region of India. It is the third largest state by area and the second most populous state of India. It is also the most industrialized state in India. It has a tropical climate and has a hot season during March-May. It is a state which is boarded by the states Madhya Pradesh and Gujarat to the North, the states of Karnataka and Goa to the South, the state Chhattisgarh to the east, the Arabian Sea to the West, the union territory of Dadra and Nagar Haveli and We distribute the whole study into three parts. Part I consists of a collection of data and its exploratory analysis; part II consists of a performance of statistical analysis of COVID-19 data set using cluster analysis; and part III consists of deviations within clusters for each of the cases using a box plot. We collected data related to COVID-19 from March 9, 2020, to April 24, 2020, in MH from the website of "COVID-19 Monitoring Dashboard by Public Health Department, Government of MH; https://phdmah.maps.arcgis.com. " [2] Some related information is also supported by https:// en.wikipedia.org. The data consist of three variables: The total number of confirmed cases, the total number of cured/discharged cases, and the total number of death cases. The total number of confirmed, cured, and deaths cases during the period mentioned above are 6792, 840, and 299, respectively. However, the four districts MU, BH, GA, and WR have no confirmed case found. An exploratory analysis of all the three variables is given in Table 1 which summarizes basic statistics for the variables mentioned above. We did not exclude extreme values from the sample observation because these values can indicate severe situations from a health and a health management point of view. We also represent the characteristics of the three variables of the 32 districts of MH using box plots in Figure 1 and further by bar diagrams for each of the variables in Figure 2 . CS is one of the data mining techniques which clusters the sample observations into classes depending on the essential similarities within a class and dissimilarities among different classes found in the data set. [4] [5] [6] Ward [7] suggested agglomerative hierarchical cluster analysis which is based on a squared Euclidean distance. The ward method is the simplest and the most commonly used method which requires no prior assumption and uses the analysis of variance to calculate distances among clusters. [8] In this study, we used the R software (version R i386 3. 6 .3) to perform the cluster analysis. We scaled the data set before carrying out the cluster analysis. Elbow method using R software was used for getting an optimum number of clusters for each of the variables which are given in Figure 3a -c. To measure the deviation within clusters for each of the variables, we analyzed it statistically using R software and for the purpose, we used box plots for representing the deviation in each of the cases. The observations related to the variables are skewed which were shown by histograms in Figure 1 , so the median is more appropriate to use. [9] It is well known that the box plot is the most powerful tool for showing median, range, as well as the shape of the underlying distribution of the data. From the Table 1 and Figure 2 , it was seen that there was a great difference between minimum and maximum number of observations for all of the variables. Further, from Figure 1 , it was observed that the data related to each of the variables was skewed. Extreme observations were also present in the data set. The box plots [ Figure 7 ] were constructed to judge variation in COVID-19 severity of each case by clusters I-III. Here we Patient's consent not required as there are no patients in this study. Nil. There are no conflicts of interest. India and COVID-19 pandemicstanding at crossroad! Indian Using cluster analysis for medical resource decision making Cluster analysis and related techniques in medical research Cluster Analysis for Researchers. Belmont: Lifetime Learning Publications Hierarchical grouping to optimize an objective function Applied Multivariate Analysis. 5 th ed How to cite this article: Kumar S. Use of cluster analysis to monitor novel coronavirus-19 infections in Maharashtra