key: cord-0892584-ptnbtftg authors: Hameed, B. M. Zeeshan; S. Dhavileswarapu, Aiswarya V. L.; Naik, Nithesh; Karimi, Hadis; Hegde, Padmaraj; Rai, Bhavan Prasad; Somani, Bhaskar K. title: Big Data Analytics in urology: the story so far and the road ahead date: 2021-03-05 journal: Ther Adv Urol DOI: 10.1177/1756287221998134 sha: d1938f6406402cda6bd63d1360ad613f9997ee30 doc_id: 892584 cord_uid: ptnbtftg Artificial intelligence (AI) has a proven record of application in the field of medicine and is used in various urological conditions such as oncology, urolithiasis, paediatric urology, urogynaecology, infertility and reconstruction. Data is the driving force of AI and the past decades have undoubtedly witnessed an upsurge in healthcare data. Urology is a specialty that has always been at the forefront of innovation and research and has rapidly embraced technologies to improve patient outcomes and experience. Advancements made in Big Data Analytics raised the expectations about the future of urology. This review aims to investigate the role of big data and its blend with AI for trends and use in urology. We explore the different sources of big data in urology and explicate their current and future applications. A positive trend has been exhibited by the advent and implementation of AI in urology with data available from several databases. The extensive use of big data for the diagnosis and treatment of urological disorders is still in its early stage and under validation. In future however, big data will no doubt play a major role in the management of urological conditions. Big data has garnered a lot of interest among clinicians in the current scenario. There is no denying the fact that almost every sector in the world today is driven by data and the healthcare industry is no exception. Advances in medical technology, electronic medical databases and computational capacity are generating big data in the field of medicine. 1 This massive quantity of data obtained also involves information from devices as small as ingestible sensors, smartphones and watches, along with a variety of electronic health data sets. The electronic health record (EHR) enables big data in the health industry as patient care is routinely documented in EHRs. The data is then fed to large repositories which grow in size and scope, becoming big data resources. 2 Analysing the information presented in this data may reveal connotations, patterns and trends to progress patient care and reduce costs. 3 With the data emerging at an exponential rate, the complexity of dealing with and utilizing it increases. This leads to difficulties in offering personalized treatment plans. 4 To offer a precise and transversal view of a clinical scenario, artificial intelligence (AI) with machine learning (ML) algorithms and artificial neural networks (ANNs) process was adopted. This soon had a promising wide application and urology is one such area where AI is being widely adopted. 5 Urology is a specialty that has always been at the forefront of innovation and research where technologies have been rapidly embraced, and this has helped achieve better patient outcomes. 6 It is one of the most rapidly expanding surgical super specialties and AI paired up with big data plays an important role behind its exponential propulsion. The scientific breakthroughs have certainly helped over the past 20 years, where AI has been extensively applied for the diagnosis, 7 management 8 and outcome prediction 8, 9 of urological diseases and conditions ( Figure 1 ). AI systems are armed with a lot of information to assist in clinical decision making in both predictive and prescriptive analysis. Therefore understanding what exactly big data is and how it is used in these AI applications for urology is of utmost importance. In this review, we explore the major sources of big data used for the advancements in urology and explicate their current and future applications. 1. Articles on Big Data Analytics, urology and AI; 2. Full-text original articles on all aspects of diagnosis, treatment and outcomes of urological disorders. Exclusion criteria 1. Commentaries, reviews and articles with no full text context and book chapters; 2. Animal, laboratory or cadaveric studies. The literature review was performed as described above. The evaluation of titles and abstracts, screening, and the full article text was conducted for the chosen articles that satisfied the inclusion criteria. Furthermore, the authors manually reviewed the selected articles' reference lists to screen for any additional work of interest. The authors resolved the disagreements about eligibility for a consensus decision after discussion. Digitization of healthcare in recent times led to the generation of large amounts of health data on a day-to-day basis. The data produced is beyond manageable by the traditional software and hardware in terms of storage, processing and analysis, thus rightly being given the name 'big data'. 10, 11 In simple words, big data in healthcare corresponds to the digitally collected patient data amassed from numerous sources including EHRs, medical imaging and genomic sequencing to name a few. The difficulty in harnessing big data is a result of its characteristics -volume (amount of data), variety [type -(structured or unstructured), format -(images, text, video, audio)] and velocity (the increasing rate of data accumulation). 2 To understand where exactly the data is acquired from and how it contributes to urology in particular, it is essential to discuss the different sources of big data in urology and their respective applications. EHRs and electronic medical records EHRs are considered to be the most appropriate form of clinical data available. They comprise the patient's medical history, diagnosis, medications and treatment plans, allergies, imaging data, laboratory reports, test results and clinical outcomes. In short, they are a comprehensive report of a patient's entire health information that can be accessed by authorized users whenever and wherever in the world. Their relevancy compared with any other source of big data in healthcare comes from the fact that they are patient-centred and are created by authorized professionals with the sole purpose of supporting interoperability between health organizations. An ideal EHR system is one that improves aggregation, analysis and communication of patient information. 11 Often confused with EHRs, electronic medical records (EMRs) on the other hand are digitized patient charts that are limited to one practice itself. It contains the medical and treatment history of a patient within one practice alone. These are used by the provider for early diagnosis and treatment, unlike EHRs, which are highly used for decision-making. The main aim of EMR systems is to enhance the quality of care by utilizing its information for various tasks, from scheduling patient appointments to monitoring vital parameters. 12 When choosing a particular EMR system for their practices, the providers must check with the system's features. There are some accomplishments an efficient EHR system is expected to achieve, the most important being privacy for patient data. Figure 2 depicts nine crucial features to look for in the right EHR/EMR. Traditionally, EMR vendors were fixated upon delivering general-purpose systems that can be used across different specialties. This led to the generation of several gaps within the collected data, thus failing to capture precise data related to a particular disease state. Though such limitations initially hindered the usage of EMRs and EHRs due to lack of important features and inefficient design of the systems, various add-on data analytics platforms were introduced to mitigate these difficulties. 11, 12 Along with enabling patient identification and population management, incorporating data analytics into EMR systems provided visibility into clinical data such as symptom scores and medication utilization. 12 Consequently, workflows could be created targeting the highest-priority patients first and delivering appropriate care to them promptly and more efficiently. In the present-day scenario, many urology-based EHR systems are available that primarily focus on gathering disease-specific information from the patient. With the existence of urology-specific EHR templates for conditions such as recurrent urinary tract infections, benign prostate disorders, urolithiasis, uro-oncology and many more, extracting relevant information for studies and research has become easier. Focusing on patientcentred outcomes, Tina et al. 13 used an EHR system to detect urinary incontinence following prostatectomy, highlighting how the data captured in EHRs can be used to assess disease treatment. Other similar studies that made use of existing hospital EHR and EMR systems are discussed in Table 1 . The studies shown in Table 1 emphasize how data from EHRs and EMRs, known as big data, contribute to deriving significant insights related to patients and urological diseases. With frequent upgrades in technology, increasing the adoption of certified urology EMR and EHR systems in practices enables several advantages to move ahead and remain financially competitive in a healthcare setting. Administrative data (also known as routinely collected data) is another source of big data in healthcare that is highly employed to inform clinical research. 33 Unlike EMRs or EHRs, administrative data (AD) is primarily collected for reasons other than research (financial aspects of healthcare) and usually consists of enrolment data, hospital in-patient and out-patient data, health insurance claims and pharmacy data. 33, 34 Typically, the data obtained from AD is used to determine and analyse national healthcare utilization trends, access, charges, quality and outcomes. 34 Though EMRs offer an advantage over AD in terms of possessing more informed patient details, assessing primary care process quality measures, laboratory test ordering or prescriptions, using it for secondary purposes is not advisable. 35, 36 Therefore, secondary data analysis is most commonly applied to AD. Secondary data analysis is leveraging the data for research traditionally collected by someone other than the investigator. 37 Especially in urological literature, there has been a dramatic increase in utilizing secondary data analysis for clinical research. 37 NIS (National Inpatient Sample) and KID (Kid's Inpatient Database) derived from samples of the SID (State Inpatient Database) are some examples of nationally representative discharge data sets that employ secondary data analysis. 34 National Surgical Quality Improvement Data (NSQID) are some popular administrative databases used in urology. The latter comprises more than 100 data points and is widely used in urological studies. Various other AD sources and the reviews of urology contingent on them are discussed in Table 1 . Gathered to analyse and deal with the financial burden of diseases, AD has certain limitations that include difficulty in access and use of incomprehensive information on diagnosis and uncertainty regarding its generalizability. Using EMR (clinical) data as a reference standard for AD (financial) could facilitate in providing a comprehensive picture of patient health information that can be utilized to assess outcomes more accurately. 38 Similarly, having a clear goal for the study, choosing an appropriate dataset and avoiding ill-fitted statistical analysis could resolve the issues when applying secondary data analysis on data sets. 37 The dawn of the genomic medicine era triggered unforeseen perception towards genetic variations that drives tumour development and progression. 39 The emergence of modern bioinformatics in biomedical research opened up tremendous opportunities to derive powerful insights from the clinically constructed genetic databases. Today an individual's entire genome sequence is shared securely over the web. Databases known as genome browsers offer a way of sharing genome information in an accessible format after it is sequenced, assembled and annotated. 40 Some examples of genome browsers include Ensembl, a joint project between European Bioinformatics Institute (EBI), part of the European Molecular Biology Laboratory (EMBL) and the Wellcome Trust Sanger Institute (in UK), UCSC (genome browser-based from University of California Santa Cruz) and NCBI (National Centre for Biotechnology Information). 40 Some of the prominent genome databases that make up big data in healthcare dealing with biological information are shown in Table 2 . The genome is a complete set of information in an organism's DNA. 41 Though the basic concepts involved in discussing genome medicine such as DNA, microRNA, biomarkers and others are challenging to comprehend, understanding them might lead to perceiving various diseases. 42 This highly aids providing optimum care to the patients by identifying individual risk factors and recommending strategies to counter them in short, personalized treatment. Major advances in genome medicine aim to deliver precision medicine, gene therapy and genetic therapy and contribute to the field of 'omics'. 41, 42 Identification of genetic alterations that progress malignant diseases such as prostate cancer facilitates the possibility of personalized medicine. In urology, though genome data was primarily focused on cancer therapy, in recent years there has been a significant influence on non-cancerous diseases such as erectile dysfunction (ED) and journals.sagepub.com/home/tau 9 43 Studies shown in Table 1 give an idea of the range of urologic diseases utilizing bioinformatics and genome databases for advancements in disease identification and treatment. The global availability combined with ease of access to DNA-sequencing data has bestowed upon genetics research an unparalleled potential required to understand diseases and their complex traits. 42 Inappropriate use of genomic data poses particular risks since it can be used to identify an individual. 44 Provided such risks can be avoided, or at the least be reduced, large-scale sharing of genome information could help in extending biomedical research and help tackle and potentially help with certain diseases. The main aim of Speciality Pharmacies (SPs) is to provide expert clinical care to people suffering from serious illnesses such as cancer. They are equipped to handle complicated conditions and have access to advanced medications compared with traditional pharmacies. The high-touch services delivered by high-cost and highly complex specialty pharmaceuticals create data (clinical and financial) opportunities that hold an exceptional value amongst their stakeholders. SPs need to collect and aggregate data for their efficient patient management and overall success. The patient data stored at SPs is gathered by direct interaction with a patient through utilization reviews, patient counselling and follow-up care. 45 This factor makes data from SPs highly valuable for pharmaceutical industries who use it to enhance their drug's efficacy. 44, 45 Strengthening the therapeutic value of the drug not only increases the drug efficacy but also ensures a better patient experience and improves the health of the population. 45 Urologic oncology is one sub-specialty of urology that SPs highly contribute to, in terms of therapy. For effective treatment of patients with conditions such as prostate cancer, bladder cancer, kidney cancer and other urologic diseases, specialty pharmaceuticals are prescribed by urologists. 46 There is also a provision of providing additional SP treatment options in the future. SPs' impact on urology is expected to continuously grow as the data generated by SPs continues to benefit various life-threatening diseases both in urological diseases and other fields of medicine. 45, 46 Clinical or condition-specific registries Condition-specific registries are a type of clinical registry with examples such as population registry, specialty registry, medical device registry and payer registry. Each registry typically focuses on collecting information based on a particular aspect. 46, 47 The medical device registry gathers information fit to answer questions concerning the effectiveness, value and safety of medical devices. Similarly, specialty registries are a type of clinical registry that possesses information similar to SPs. While SPs focus on advancing care for a patient of complex diseases, specialty registries concentrate on doing the same with a medical specialty or sub-specialty (such as surgery or pathology). 37, 47 The classification of clinical registries used as healthcare data is shown in Figure 3 . Condition-based registries are large data sets produced from clinical data of patients with a specific type of disease or disorder. 37 Unlike administrative data or claims data, condition-specific registries are generated to study and analyse a particular disease condition. Apart from being the primary source of study, these are also often used by urologic investigators for secondary data analysis. 37 Some examples of such registries that are mostly used for secondary data analysis are the SEER (Surveillance, Epidemiology, and End Results), CaPSURE (Cancer of the Prostate Strategic Urologic Research Endeavor), and NTDB (The National Trauma Data Bank) data sets. 37 A few urology studies based on these clinical registries are illustrated in Table 1 . Though registries provide a solution for some issues when using AD, their core limitation of cost restricts their scalability. While both automated and manual (by paid registrars) data abstraction costs a lot, the former is susceptible to inaccuracies as well. 48 Nonetheless, clinical registries though are recent developments and are likely to play crucial roles in quality improvement and yield studies that will hold a large share of urologic literature given their advantages over AD. 48 These five sources mentioned above hold up for a significant part as big data sources in the healthcare industry, especially in urology. We have journals.sagepub.com/home/tau 11 discussed the critical applications of each source for urology and briefly corroborated with the studies listed in their respective tables. [14] [15] [16] [17] [18] [19] [26] [27] [28] [29] [30] [31] [32] The impact of Big Data Analytics and secondary data analysis on the collected data is evident. Both processes result in discovering associations and hidden patterns in the collected data to prevent epidemics, cure diseases and improve patient quality of life. [5] [6] [7] [8] [9] [10] Though the real-life implementation of AI remains limited, it has the potential to change the way urology is and will be practised. It enables faster diagnosis and reduction of unnecessary costs in the medical field. Furthermore, AI models are extensively used to enhance treatment efficiency by enabling faster diagnosis, predictive analysis and precision medicine. 9 The application of novel AI technology in urology has been regarded as a promising step towards improving diagnostic capability and prediction of disease recurrences. 49 By using highly predictive and accurate AI algorithms, improved diagnoses of male infertility, urinary tract infections and paediatric malformations are possible. 50 Advancements in technology with the aid of virtual or augmented reality brings in greater potential of AI-assisted surgeries and improves patient care. While AI is hailed for all these accomplishments, it would not have reached that status without big data. AI and big data are equally important and responsible for the advancements made in urology. Therefore, to offer a complete and clear perspective on the future beheld for urology, this review discusses the prominent big data sources in urology in detail. 10, 11 Genome data and SP data can truly deliver groundbreaking results in uro-oncology. While the former can help understand the reason behind the disease, the latter enables a chance to deliver improved medication and patient care for advanced conditions. Both of these sources have the utmost significance in providing precision medicine. 12 Along with assisting quality enhancement, condition-specific registries provide extremely relevant clinical data valuable for urological research. EHRs and AD together can provide a broader view to deal with many aspects of the healthcare industry. Compared with traditional statistical models, AI models are considered superior by the majority of surveyed studies. As the construction and management of big data resources develop along with much more reliable and efficient AI techniques, we believe that there truly will be a transformation in the way urological diseases are dealt with in terms of diagnosis and treatment. [15] [16] [17] [18] With the onslaught of the COVID pandemic, big data is also being used to tackle it and to prioritize mass vaccination programmes. 51, 52 While data from the Internet of Things (IoT) devices are also considered a major contribution to healthcare data, IoT is still in very early stages, especially in urology. For the sake of brevity, survey and research data, which plays a less significant role compared with other sources, is not discussed in this review. We did not carry out a 'risk of bias' assessment in our study, which should also be done in future studies. The use of Big Data Analytics in urology has seen a quantum jump over the last decade. The emergence of AI and its application in urology using the data available from various databases is showing a promising trend. The generalized utilization of big data for the diagnosis of several urological conditions and their treatment is still in the incipient stage and under validation. However, in the future big data is no doubt going to take a paramount role in the treatment of various urological conditions. Big data and machine learning in health care Harnessing big data for health care and research: are urologists ready? The inevitable application of big data to health care Current status of artificial intelligence applications in urology and their potential to influence clinical practice Artificial intelligence and neural networks in urology: current clinical applications Urovision 2020: the future of urology Evaluation of artificial intelligence-based grading of diabetic retinopathy in primary care A systematic review of the tools available for predicting survival and managing patients with urothelial carcinomas of the bladder and of the upper tract in a curative setting Automated performance metrics and machine learning algorithms to measure surgeon performance and anticipate clinical outcomes in robotic surgery Putting the data before the algorithm in big data addressing personalized healthcare Big data analytics in healthcare: promise and potential Data analytics in the large urology practice: patient identification, population management, and protocol adherence New paradigms for patient-centered outcomes research in electronic medical records: an example of detecting urinary incontinence following prostatectomy Clinical documentation to predict factors associated with urinary incontinence following prostatectomy for prostate cancer Using electronic health record data to identify prostate cancer patients that may qualify for active surveillance Can we predict a national profile of non-attendance pediatric urology patients: a multi-institutional electronic health record study A new era: artificial intelligence and machine learning in prostate cancer Leveraging the electronic medical record improves prostate cancer clinical staging in a community urology practice Thirtyday hospital revisits after prostate brachytherapy: who is at risk? Pediatric urinary stone disease in the United States: the urologic diseases in America project Rates and risk factors for opioid dependence and overdose after urological surgery Patient experience and quality of urologic cancer surgery in US hospitals Comparison and trend of perioperative outcomes between robotassisted radical prostatectomy and open radical prostatectomy: nationwide inpatient sample 2009-2014 Robots drive the German radical prostatectomy market: a total population analysis from National practice patterns and outcomes of pediatric nephrectomy: comparison between urology and general surgery Simple operating room bundle reduces superficial surgical site infections after major urologic surgery The evolving role of genetic tests in reproductive medicine Derivation and validation of genome wide polygenic score for urinary tract stone diagnosis Metastatic risk stratification of clear cell renal cell carcinoma patients based on genomic aberrations Artificial intelligence-based personalized and risk-adapted surveillance management for urologic cancer: a SEER-based study Robust health utility assessment among long-term survivors of prostate cancer: results from the cancer of the prostate strategic urologic research endeavor registry Risk of infectious complications in pelvic fracture urethral injury patients managed with internal fixation and suprapubic catheter placement An introduction to health care administrative data The healthcare cost and utilization project: an overview The challenge of measuring quality of care from the electronic health record Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: the SHARPn project Secondary data analysis of large data sets in urology: successes and errors to avoid Evaluation of Electronic Medical Record Administrative data Linked Database (EMRALD) POINT-Prostate cancer genomic analysis: routine or research only? What is genomic medicine Unraveling the genetics of vesicoureteric reflux: a common familial disorder A review of genome wide association studies for erectile dysfunction Benefits and risks of sharing genomic information How Specialty Pharmacy Data Can Boost Your Drug's Success mHealth in urology: a review of experts' involvement in app development What is a clinical data registry and why is it important Improving quality through clinical registries in urology Artificial Intelligence (AI) in urology -current use and future directions: an iTRUE study Applications of neural networks in urology: a systematic review What do urologists need to know: diagnosis, treatment, and follow-up during COVID-19 pandemic Intersection of big data analytics The authors declare that there is no conflict of interest. This research received no specific grant from any funding agency in the public, commercial, or notfor-profit sectors. journals.sagepub.com/home/tau ORCID iDs Nithesh Naik https://orcid.org/0000-0003-0356-7697 Bhaskar K Somani https://orcid.org/0000-0002-6248-6478