key: cord-0848236-wq94exb0 authors: Biswas, R. title: Outlining Big Data Analytics in Health Sector with Special Reference to Covid-19 date: 2021-12-01 journal: Wirel Pers Commun DOI: 10.1007/s11277-021-09446-4 sha: be27038a48accc40924254acaf617d48621eeb6a doc_id: 848236 cord_uid: wq94exb0 With the assistance of Internet of Things (IoT), Big Data analytics has evolved tremendously. The capability of dealing and processing humongous data by high performance computing systems results in great surge in applications of Big Data analytics in various fields spanning healthcare, automobile, computing, climatology, and space communications etc. The health care sector has been recently largely benefitted by this. Driven by the compounding growth as well as impact of Big Data analytics, we endeavor to map out the areas of health sector where Big Data analytics has been largely influential as well as is having the potential for ground-breaking applications. This work starts with fundamentals of IoT driven Big Data Analytics (BDA) as well as key constitutional elements which is then followed by an application overview in healthcare sector with a simultaneous emphasis on future expectations. Besides, the real time application of BDA with special reference to Covid-19 is comprehensively highlighted with recent examples. It is envisioned that the work will serve as a basic reference for IoT driven BDA in healthcare. In general, big data refers to huge information. Meanwhile, upon incorporation of the term Big data in healthcare, there arises a set of diverse definitions in this sector. However, the suitable definition can be given as-the enormous data entailing biological, clinical, and environmental as well as lifestyle information concerning large number of individuals, corresponding to their health and wellbeing within a fixed time span [1] [2] [3] [4] [5] . In nonprofessional's word, it refers to dataset whose enormity in size cannot be handled by single database software for capturing, storing and subsequent analysis. Accordingly, the emphasis goes on to the parameters involving size and volume involving three V's namely variety, veracity, and velocity. The first term variety links to diverse types, sources, and format. Again, veracity refers to quality and validity whereas velocity encapsulates availability in time. Moreover, reliability, data protection and privacy are also key points that have influence on big data. In order to give more impetus to the operation of Big Data, Internet of Things (IoT) emerges out as a vital source [6] [7] [8] [9] [10] . Of late, IoT has become a common technological term for big enterprises. With the advent of smart objects growing at a rate higher than the population of world, IoT has now become an indispensable part of the modern era. However, it goes with a caution of acceptability followed by adaptability [11] [12] [13] [14] [15] [16] . In short, Big Data Analytics refers to effective integration and efficient analysis of various forms of data over a period, which can cater to some impending problems. Although there is an abundance of literature dealing with BDA, however, very few works address IoT driven BDA in healthcare. Apart from this, in most cases, only proposed frameworks for BDA are highlighted without mentioning any real time application. Motivated by these gaps, we assay to delve upon the areas of healthcare where Big data analytics (BDA) has been largely influential as well as is having the potential for ground-breaking applications. Initiating from the basics of IoT driven Big data analytics, the applications of it in healthcare sector have been outlined. Further, this work presents a comprehensive analysis of very recent applications in this sector with special reference to Covid-19. Finally, a visionary approach for IoT and BDA is proposed for optimal efficacy in healthcare sector. Internet of Things (IoT) has come a long way since its inception. It is a common thread, which connects all assorted devices in a synergistic way. As for instance, modern house architecture is mostly IoT enabled. Starting form Thermistors, water heater, fridge and smart lighting system, these household applications are attached/connected to each other through IoT. With the advent of RFID technology [10, 11] , IoT evolves dramatically engaging several aspects and stakeholders spanning academia and industry. The whole concept can be generalized as a fusion of three perspectives-namely orientation with things, orientation with internet and orientation with semantics. Each of the perspective possesses its own scheme. Meanwhile, the fastest growing IoT engages wide range of humongous data, which stem from strategically distributed sensors/smart objects. This, in turn, gives rise to assorted analytics. However, we cannot deny the fact that data analytics emanating from IoT might become ineffective as well as expensive provided cumulative transfer and subsequent handling of data in a central storage system is executed. This issue can be handled with ease via micro service-oriented platform meant for decentralization of the data tree. This platform is driven by software-defined infrastructure (SDI) which will disrupt the environmental monolith of IoBDA (IoT oriented Big Data Analytics) implementation. Accordingly, SDI comprises of software-defined network (SDN) and software-defined storage (SDS) [11] [12] [13] [14] [15] [16] [17] [18] [19] . As such, the dissociation of data transmission from IoT nodes such as switches and routers are enabled by SDN. SDS that helps in decoupling management of data store form the entire unit further accompanies this. As a result, these two vital components emerge as an integral part in facilitating workload demands via OPI in case of heterogeneous hardware. Thus, microservice-oriented platform proves to be a boon in segregating business logics, which are domain specific, from resource control and management [11, 14, [16] [17] [18] [19] . The objective of Big Data analytics in the domain of healthcare is towards developing new technologies such as capturing IoT oriented devices, sensors, and mobile applications with the following outlines. a. Collection of genomic information in an economized way. b. Contribution to enhancement in digitalized social communication on the part of patients. c. Accumulation of more medical knowledge through more discoveries. The modus operandi of BDA has been illustrated in Fig. 1 . As can be seen, there are components spanning from lab to pharmacy entailing patient, physician, and research & development. It is also accompanied by social networking data of the patient in order to build up a harmonized effort in resource optimization. The idea of BDA is to make healthcare accessible to all with the optimal output and simultaneously saving precious time and expenditure of patient. In terms of data analysis, IoTBDA uses Monte Carlo or Convergence Analytics as per exigency [10, 11] . As the name goes, BDA in Healthcare is dependent on several key elements. For example, patient data is a vital element. More technically, we can term it as electronic health records. Again, when we look at this key aspect, then there arise two questions regarding data related to patients. Are they structured or unstructured? Accordingly, we have two sort of database. They are viz. Structured Health Records and Unstructured Health Records. Additionally, there is the necessity of a supporting system for handling these voluminous data. Nonetheless, the clinical data and medicinal data are prima facie the key components of BDA, still there is essential addition of another platform name the context of social networking data of stakeholders. One of the beneficial effects of Big Data over healthcare is the ability to provide valuable tool which acts as a boon for behavioral change. As for example, we can cite mobile health (mHealth). This tool contemplates lifestyle data, which is linked to nutrition, physical activity, sleeping habit of user and then integrates them with large factual reference data. Afterwards, it personalizes interventions of the user. Not only that, but it also furnishes valuable information enabling identification and prescription of proper medication, thus giving a detailed overview of progress and setbacks in therapy. In addition to that, BDA assists in early diagnosis of triggers of chronic diseases. It can direct ongoing research towards smooth understanding between social and other parameters. They comprise of physical behaviors, nutrition, genetic factors, environmental factors, and development of mental/physical diseases. In the general scenario, the interaction between different systems controlling disease spreading still baffles researchers. In such circumstances, BDA can provide us solution by building an integrated view of health encompassing various biomarkers. These will eventually (i.e., omics, quantified self-data) help in improving early detection of diseases and long-term management of adverse health factors thereby reducing costs. Public health policy is influenced by region and socio-economic status. They in turn define societal action whose main goal is to improve health outcomes. Through specific interventions, BDA can act as guidance towards addressing policies pertinent to a certain population. These policies are again dependent on the quality of research and interventions. Meanwhile, there is a dearth of methods for validating certain interventions as in mental health domain. Although BDA supports health policies, still there exist several bottlenecks. For instance, privacy and protection of data hinder analysis through a combined approach of healthcare provider and service. Likewise, unstructured health records are also a problem for BDA. When there will be scalable, methodological and privacy friendly outcomes assisted by advanced statistical methods, it will pave way for development of precise and effective interventions. With advent of advances in technology, one can amalgamate data from healthcare environment and information from society while maintaining synergy. Social networks, forums, blogs etc. can befit health environment by providing a wealth of data which is directly implemented for benefit of public health. When one combines information from informal source and data emerging from diagnosis and surveillance, it is possible to achieve an early detection of disease outbreaks and transmission. As for instance, one can cite about the ARGO model which was basically a forecasting model. This has been accomplished through amalgamation of tracking of disease, spread dynamics and surveillance by adopting social networking means like Twitter. Doing well-planned analysis of these data with the incorporation of parameters such as travel, trade, and climate change, it is possible to attain a predictive model for population-based interventions as well as improved treatment of individual patient. Apart from this, with early detection of disease outbreak, the government experts can well coordinate important strategies, like, quarantine and vaccination. Value based healthcare is now a buzzword. This can be treated as a guiding principle for sustainable health. This actually comprises of units like patient reported outcome and the amount of cost incurring care path and the eventual decision of payment. In such case, the healthcare is incentivized when it surpasses some performance index. This is not to be confused with treatment of patient, rather it has link to patient related outcome. For proper implementation of HVAC, there must be a streamlined flow of collection, analysis, and aggregation of data by inclusion of total care path, cost etc. While doing so, patient linked health outcomes require monitoring in three stages. They are during, before and after treatment. Certain challenges in this connection can be cited like lacking of an updated admin care, which can cater to associated specific care paths to produce an accurate estimate of expenditure incurred. In case there is a succinct connection of care processes as well as care paths with assistance from a huge database, empirical evidence-based decision for specific therapies will be materialized. To do that, there is immediate necessity of standardized and authenticated methods. In industrial sectors, most of the things are predictive. Hence, objectives are well defined in priority. Nevertheless, in healthcare sector, this is completely opposite. It is quite a volatile system. Influenced by patients and their need along with service providers, the productivity becomes a lot more challenging unless the stakeholders are well apprised of the functionalities of the healthcare domain. This calls for requirement of necessary tools, which will pave way for integrated multi stream flow of data encompassing electronic health records, patient monitoring data, laboratory data, nursing operation data etc. to ensure smoother functioning along with optimal utilization of resources. These three words are very much essential in defining data. With advent of increasing data services, everyone has access to data from multiple sources where one has the liberty of combining all of them. This has led to misuse of this. Accordingly, question arises like destination of data, user identification and motto behind use of data. Consequently, there is an utmost need of regulation. Good news is that there has been an updated General Data Protection (GDPR) replacing the old version. As per this, it is no longer required to have a national legislation. It will cater to both public and private sector bringing all organizational sectors under its domain. The following, technical challenges and opportunities are discussed regarding the application of Big Data technologies in healthcare. As the term goes, quality of data is very much vital. Because of expensive processes involved in medical and pharmaceutical sector, the reliability and reproducibility are two stringent measures. Hence, it is dealt with caution how data is generated, executed, and transformed before readying them for storing. With upgradation of analytical methods and complexity of operations, source of data is extremely important as they can significantly affect the conclusion. Equally important is data quantity which too forms a vital part. As stated earlier, BDA is driven by vast dataset spanning clinical, genetic, behavioral, environmental, financial, and operational data. This necessitates the existence of an effective mechanism to tackle such big wealth of information. This will then result in retrieval of valuable insights towards improvement of healthcare in terms of quality and efficiency. Owing to such characteristics, not only optimization of existing products and services but also propositions of new rules can be ensured. Since the outbreak of Covid-19 in Wuhan, China, the pandemic as articulated by WHO has shattered the global scenario. Most part of the world has been undergoing the lock down phase. The SARS-COV-2 virus has caused fatalities amounting to 430 K worldwide and the number is increasing each day. Apart from that, most of the countries have also been undergoing economic turmoil. All big organizations such as FDA, CDC, USA are at a fix to find out the best possible vaccine as well as effective medicine in order to combat this Covid-19 pandemic. In this context, IoT driven BDA has been widely used by health professionals to find out best remedies in the fight against Covid-19. Accordingly, we have complied two very recent application of IoT enabled BDA [20] [21] [22] . We are all aware of the much-hyped IBM' Blue Gene supercomputer. Such was the power of its computation skill that it efficiently surpassed the petascale barrier around sixteen years ago. Based on that, this supercomputer played a crucial role in analyzing the sequencing of human genome, thereby paving way for designing novel drugs and treatments. It has also successfully simulated one percent of the most complex machine of the earth; human brain. Ideally, this supercomputer has been destined for such complex computing process. However, the Covid-19 since its outbreak has been spreading very fast and it has infected two million people globally. As a result, this pandemic situation must be tackled efficiently. Accordingly, the Dept of Energy of United States of America, which has been severely affected by this Covid-19, deployed a powerful ally-the IBM-built Summit supercomputer in combating COVID-19 [20] . In general, the infection of cells by virus is caused by an injection of spike to genetic material belonging to the host cells. The practitioners in wet labs have patiently inspected the reaction of the micro-organism in response to new compounds. Although, this kind of practices bear fruit; in terms of time and maneuvering, they may prove cumbersome. In such cases, computer simulations will not only save ample amount of time but also provide unique solutions. In the cautionary aspect, these simulations being endowed with the ability of analyzing reactions of different viruses corresponding to several variables handle terabytes of data pertaining to each variable. Consequently, the multiple simulations emerge out to be a very tedious process in the context of time as well as hardware involved. In order to integrate all computing hardware, IoT enabled BDA seems to be the only savior. Accordingly, through an efficient IoT enabled BDA scheme, Summit is helping researchers to simulate approximately 8000 compounds in a short span of time. The main goal was to identify the optimum model for constraining the infections of SARS-COV-2. As such, seventy-seven small-molecule compounds, such as medications and natural compounds, have been identified; exhibiting ability of weakening COVID-19's ability to attack and infect host cells. Although, these finding are not a direct mean to cure the viral disease; however, the results will be a direct boon for future studies and provide a basis so that experimentalists will deploy these compounds and find the best suitable one [20] as a potential tool for mitigating the COVID-19. The modeled protein by Summit is illustrated below in Fig. 2. BDA in this time of epidemic (Covid-19) has achieved another feat. Throughout the world, researchers are in a process of digging out all articles related to Covid-19 with the objective of having ample information about the virus activities and eventual curative measures. Accordingly, they accrued 50 K articles, which is a mammoth data. In order to make optimum use of it, one enterprise named Verizon Media has made a praiseworthy task. Utilizing Vespa, being an open-source, big data processing program, they developed a BDAbased coronavirus academic research search engine: . Accordingly, this engine makes best use of the COVID-19 Open Research Dataset (CORD-19) with the sole goal of assisting medical researchers. These medical researchers are expected to able to get better insight in combating SARS-CoV-2. The repository gets piled with the advent of new peer-reviewed research publications. Apart from this, the archival platforms such as arxiv, bioRxiv, printxiv, medRxiv give another boon to the repository. Additionally, the repository entails documents being linked to publication databases such as PubMed, SAGE, Microsoft Academic, and the WHO COVID-19 database of publications [21] [22] [23] . The core of this innovative search engine is the utilization of semantic similarity through scibert-nli model. This model is referred to as a pre-trained data-mining language model with the capability of searching scientific text with utmost efficiency. Precisely, Vespa exploits text and structured search synergistically. Apart from this, it is equipped with article recommendation, user personalization, ad targeting, which is further accompanied by application programming interface (API) for advanced users. Because of such novel features, it is proving very beneficial to the researcher saving ample time in the search of oceanic volume of Covid-19 articles [21 -23] . Besides these, we make a tabulation of some notable implementations of BDA which are enlisted in Table 1 . It is clearly evident that BDA has been utilized to its full efficacy in dealing the current pandemic. With a huge population coming under full grip of COVID-19, it is a herculean task to keep records as well as offer best possible remedies thereof. Through a synergistic combination of IoT and BDA, it is possible to effectively tackle the complicacies that may arise from the ongoing pandemic. Table 1 as well as the aforementioned descriptions of BDA is a testimony to the fact that it can act as a savior provided there is optimal secured encryption embodying the processed mammoth data. Conclusively, the characteristics of IoT oriented BDA in healthcare are comprehensively highlighted. It can be seen that BDA helps in rendering far-reaching, targeted and costeffective health care. It can be understood that BDA cannot be fully exploited until and unless there is any targeted research endeavor. Proper access and quality of big data are some impending challenges. There is a need to explore appropriate and effective ways, which is in harmony with privacy and ethical principles, to monitor big data so that one can have a deeper insight in understanding the objectives of implementation and quality Prediction of pandemic using neural networks and compare the results with other machine learning algorithms [24] Yes -Diagnosis and treatment of COPD using Big Data methodology on the Savana Manager 2.1 clinical platform [25] Yes Yes Analysis and prediction of severe emergency cases [26] Yes Yes Improving performance and quality of auxiliary power-unit services of health monitoring; accompanied by direct implementation in aviation and aerospace [27] Yes NA Real-time exploration of data for datasets of healthcare through in-memory databases [28] Yes Yes Demarcation of COVID-19 cough sound from Crowdsourced Respiratory Sound Data [29] Yes Yes Evaluating impact of COVID-19 on mental health and well being [30] Yes Yes Designing mitigation strategies and resource allocation via estimation of disease incidence rate [31] Yes Yes Recognize the provincial dispersal of COVID-19 and consequent disbursal of healthcare [32] Yes Yes Exploring the plausibility of spread of COVID-19 virus via indirect contact [33] which will eventually lead to developed and optimized care processes, early diagnosis etc. Figure 3 shows a visionary approach for IoT, BDA and domain expertise. As can be seen, it is imperative to have a multidisciplinary field, which positions itself at the intersection of internet of things (sensors and networks), big data technology, and domain specific analytics. This will eventually lead to a combination of the tools and methods aiding transformational solutions for industries, government, and society. There is a need of secured encryption of data which will prevent their mishandling or hacking. With synergistic amalgamation of secured IoT driven BDA with domain expertise, there lies huge potential for smart healthcare which is going to be the emerging trend in future. Data Availability statement Data sharing not applicable to this article as no datasets were generated or analysed during the current study. Content-based image retrieval in radiology: Current status and future directions Algorithmic prediction of health-care costs Physicians' resistance toward healthcare information technology: A theoretical model and empirical test Healthcare information systems research, revelations and visions A review of content-based image retrieval systems in medical applicationsclinical benefits and future directions Predictive analytics in information systems research Biomedical informatics and translational medicine Advantages and disadvantages of realtime continues glucose monitoring in people with type 2 diabetes Data and industrial internet of things for the maritime industry in Northwestern Norway Microservice-oriented platform for internet of big data analytics: a proof of concept SDStorage: a software defined storage experimental framework Big data collection in large-scale wireless sensor networks Softwaredefined infrastructure and the future central office Software-defined infrastructure and the SAVI testbed Software defined environments: An introduction A survey of software-defined networking: Past, present, and future of programmable networks Osmotic computing: A new paradigm for edge/cloud integration A cloud-based distributed big data analytics framework for the internet of things. Software: Practice and Experience com/ US-Dept-of-Energy-Brings-the-Worlds-Most-Power ful-Super compu terthe-IBM-POWER9-based-Summit-Into-the-Fight-Again st-COVID. Accessed on An application overview of IoT enabled-big data analytics in health sector with special reference to Covid-19 BIG data analytics: A boon for SMART healthcare A framework for pandemic prediction using big data analytics Clinical management of COPD in a real-world setting. A big data analysis Big data analytics: Solution to healthcare Big data analytics framework for system health monitoring Work in progress-in-memory analysis for healthcare big data Exploring automatic diagnosis of COVID-19 from Crowdsourced respiratory sound data The Impact of COVID-19 pandemic on mental health and wellbeing among home-quarantined Bangladeshi students: A cross-sectional pilot study Rapid implementation of mobile technology for realtime epidemiology of COVID-19 Using eHealth to support COVID-19 education, self-assessment, and symptom monitoring in the Netherlands: observational study The evidence of indirect transmission of SARS-CoV-2 reported in Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.