key: cord-0269680-qswy0rtt authors: Liptak, P.; Banovcin, P.; Rosolanka, R.; Prokopic, M.; Kocan, I.; Ziacikova, I.; Uhrik, P.; Grendar, M.; Hyrdel, R. title: A machine learning approach for identification of gastrointestinal predictors for the risk of COVID-19 related hospitalization date: 2021-08-30 journal: nan DOI: 10.1101/2021.08.27.21262728 sha: b212a63d7d0ca6f4b95341302d018b8dff5dab2a doc_id: 269680 cord_uid: qswy0rtt Background and aim: COVID-19 can be presented with various gastrointestinal symptoms. Shortly after the pandemic outbreak several machine learning algorithms have been implemented to assess new diagnostic and therapeutic methods for this disease. Aim of this study is to assess gas-trointestinal and liver related predictive factors for SARS-CoV-2 associated risk of hospitalization. Methods: Data collection was based on questionnaire from the COVID-19 outpatient test center and from the emergency department at the University hospital in combination with data from inter-nal hospital information system and from the mobile application used for telemedicine follow-up of patients. For statistical analysis SARS-CoV-2 negative patients were considered as controls to three different SARS-CoV-2 positive patient groups (divided based on severity of the disease). Results: Total of 710 patients were enrolled in the study. Presence of diarrhea and nausea was significantly higher in emergency department group than in the COVID-19 outpatient test center. Among liver enzymes only aspartate transaminase (AST) has been significantly elevated in the hospitalized group compared to patients discharged home. Based on random forest algorithm, AST has been identified as the most important predictor followed by age or diabetes mellitus. Diarrhea and bloating have also predictive importance although much lower than AST. Conclusion: SARS-CoV-2 positivity is connected with isolated AST elevation and the level is linked with the severity of the disease. Furthermore, using machine learning random forest algo-rithm, we have identified elevated AST as the most important predictor for COVID-19 related hos-pitalizations. Acute SARS-CoV-2 infection presents with variable symptoms associated with various organ systems. Typical symptoms of COVID-19 are fever, cough, and in the case of a more severe course, dyspnea with respiratory insufficiency occurs [1] . In addition, COVID-19 may present with gastrointestinal symptoms, which include dominantly nausea, vomiting, diarrhea, anorexia and abdominal pain with relatively wide range of prevalence among different published studies [2] - [8] . Since COVID-19 pandemic is the cause of immense world health crisis, new diagnostic and therapeutic methods are rapidly emerging [9] . The use of artificial intelligence is just one of them. Shortly after the COVID-19 outbreak various machine learning algorithms have been implemented [10] - [13] . Machine learning helps to quickly identify patterns and trends of the large volume of data, that are difficult for humans to recognize [14] . The availability of objective stratification tools to rapidly assess a patient status and prognosis is of a great use for the frontline health-care providers [15] . The primary aim of this study is to assess possible predictive factors for SARS-CoV-2 outcome based on gastrointestinal symptoms and liver related laboratory results using machine learning algorithms of random forest [16] , [17] . The secondary aim is to determinate the prevalence of gastrointestinal symptoms among patients with COVID-19 within different groups based on the severity of the disease. The Study had been performed from February through May 2021. Only persons of 18 years or older were included in the study. All patients enrolled in this study signed the informed consent. This study was approved by the Ethical committee of the University hospital in Martin, decision number: 14/2021. 2 distinct kinds of population had been considered for this study. First group consist of people who underwent nasopharyngeal swab in the outpatient hospital testing center for COVID-19 in order to determine whether they were SARS-CoV-2 positive. The method of SARS-CoV-2 detection from nasopharyngeal swab was PCR (polymerase chain reaction). This group was then subdivided based on their positivity. The negative group was thereafter used as a control group for this study. Second group consist of patients who attended COVID-19 emergency department (ED) in the University hospital. These patients were confirmed positive from nasopharyngeal swab either by PCR . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint or antigen method. Only patients with typical COVID-19 symptoms (fever, cough, dyspnoe) were included in this study. Patients who were SARS-CoV-2 positive but didn't present with typical COVID-19 symptoms (e.g. patients who came to emergency room because of other diagnoses but simultaneously were SARS-CoV-2 positive) were excluded. Therefore, we considered for this study only patients who were both tested positive and have at least one typical COVID-19 symptom. This second group was then divided based on further evaluation and course of the disease. First subgroup consists of patients that didn't require admission to the hospital and were referred to the outpatient care. Second subgroup of patients was admitted to the hospital. Consequently, this group was observed until end of hospitalization either because of death or resolution of the disease. This subgroup was also divided for analysis purposes between patients who required only standard hospital care and those who needed intensive care unit (ICU). Data collection was based on questionnaire in the group from COVID-19 outpatient test center at the University hospital. Data from emergency room were obtained from the same questionnaire which was combined with information from medical examination by attending physician and from the mobile application MEDAsistent used for telemedicine follow-up developed at the Clinic of Pneumology and Phthisiology in the University Hospital in Martin . Further information (including laboratory results, chest X-ray etc.) about patients who were hospitalized was obtained from hospital information system. The questionnaire consists of questions related to the present health complaints typical for COVID-19 and the spectrum of most common gastrointestinal symptoms which had occurred in the last 5-7 days before examination. Patients were also allowed to write down other presented symptoms in the case they were not in the original list. To include only new or worsened gastrointestinal symptoms in the study the questionnaire also included questions about chronic gastrointestinal symptoms and their possible worsening in the last 5-7 days before examination. The data were visualized and analyzed in R [18] , ver. 4.0.5, with the aid of libraries gtsummary [19] , rstatix [20] , DescTools [21] , randomForestSRC [22] , PCAmixdata [23] and ggpubr [24] . The sample median and the lower and upper quartiles were used to summarize the data on continuous variables (e.g., age); counts and percentages were used to summarize factors (e.g., gender). The Chi-squared or Fisher test were applied to test the null hypothesis of independence between factors, . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Hochberg correction of p-values. Two-way ANOVA was used to model the association between AST and group (Discharged home, Admitted to hospital) in interaction with recent ATB use and chronical liver disease (yes, no). The AST values were log-transformed to bring data to normality. Normality of residuals was assessed by the quantile-quantile plot with the 95% confidence band constructed by bootstrap. Assumption of homogenity of variance was tested by the Levene test. In order to assess predictive power of the gastrointestinal parameters and other measured variables for predicting outcome of the patient group the Random Forest Machine Learning algorithm was trained on the data. The predictive ability was quantified by the ROC curve, constructed from the Out-of-Bag data. Importance of the predictors was measured by the Variable Importance. A 2D representation of the data was obtained by Principal Component Analysis for mixed type of data. Findings with the p-value below 0.05 were considered statistically significant. Total of 710 patients were enrolled in the study. 30 patients were excluded from the further analysis after primary screening. 352 participants from the outpatient center who were tested PCR negative for SARS-CoV-2 virus were considered as the control group. SARS-CoV-2 positive group from outpatient center had 166 participants. 162 patients from emergency department were enrolled. From this group 78 were discharged home, 57 admitted to hospital with standard care up until discharged from hospital. 27 patients required intensive care unit. Based on age, the groups from outpatient center had almost similar median of 42 and 41 years of age respectively. Hospitalized patients were significantly older as shown in the table 1. The presence of typical COVID-19 symptoms such as fever and cough were significantly higher in the hospitalized groups as opposed to outpatient participants. There were no significant differences based on sex. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint The presence of diarrhea, constipation, bloating, nausea, heartburn and abdominal pain was considered in this study. Presence of diarrhea and nausea was significantly higher in emergency department group than in the COVID-19 outpatient test center. Comparing SARS-Cov-2 negative and SARS-CoV-2 positive participants the presence of these symptoms has been more than three times higher in the positive group than in the negative one. This trend goes further considering ED patients and the severity of disease. Among gastrointestinal symptoms, diarrhea and bloating were significantly more often presented in patients who were admitted to the hospital compared to those discharged home (40% for diarrhea and 14% for bloating vs 18% and 2.6% respectively). Other symptoms such as abdominal pain, heart burn, nausea, vomitus, anorexia, and constipation had not been differently presented in these groups in the mean of statistical significance. C-reactive protein has been also significantly higher in hospitalized group. In case of alanin transaminase (ALT), aspartate transaminase (AST) and bilirubin as markers of possible liver damage only AST (figure 1) has been significantly higher in the hospitalized group. This difference is substantial. There is no statistically significant difference in the levels of ALT (figure 2) and Bilirubin. Based on random forest algorithm with the data on demographic characteristics, symptoms and gastrointestinal related laboratory findings in hospitalized and discharged patients, several predictors for risk of hospitalization have been identified. AST has been identified as the most important predictor followed by age or diabetes mellitus. Diarrhea and bloating have also positive importance although much lower than AST. Gastrointestinal symptoms such as nausea, abdominal pain or anorexia have none or negative predictive importance. ROC curve for combined factors is shown at the figure 3 with AUC 0.76. When using only liver enzymes (AST, ALT), gastrointestinal symptoms (diarrhea and bloating), chronic liver disease, age and Diabetes Mellitus, the ROC curve (figure 4) for this combination of factors attained AUC 0.799 with AST as the strongest predictor for hospitalization (table 3) . Principal component analysis has been used to get a two-dimensional visualization of the data, with for patients discharged home after ED examination and patients admitted to hospital. Data used for the analysis consist of the data from table 2, that means combination of general patient . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint characteristics, typical COVID-19 symptoms and gastrointestinal symptoms and liver related laboratory results. The PCA plot (figure 5) is showing two distinct clusters which are partially overlapping with tendencies to shift apart. Several studies and metaanalyses have pointed out the gastrointestinal involvement in the SARS-CoV-2 infection [3] , [4] , [6] , [8] , [25] - [27] . The data from the pooled prevalence of gastrointestinal symptoms are varying significantly from 10.5% to 53% between studies [3] , [4] , [25] , [28] . Based on comprehensive metaanalysis by Sultan et al. [4] the pooled prevalence of diarrhea is 7.7%, nausea and vomiting 7.8% and abdominal pain 3.6%. In the presented study we have focused on the presence of diarrhea, constipation, bloating, nausea, heart burn and abdominal pain. Statistically significant differences have been found in the case of diarrhea and nausea when comparing SARS-CoV-2 negative and positive patients. In the group of hospitalized patients (with standard care), the diarrhea was presented in 40% patients and nausea in 21% which is higher compared to some metaanalysis mentioned, but consistent with data considering general presence of gastrointestinal symptoms and gut involvement. When comparing only emergency department group, the presence of bloating is significantly higher in the hospitalized group than those who were discharged home. Interestingly, bloating has lower prevalence in the group of ICU patients than those with standard care management. This could be explained by high subjectivity and interpersonal differences when reporting symptom such as bloating. Considering differences between these two groups of patients, those with more severe course of disease give lower importance to a less annoying symptoms such as heart burn, bloating and nausea when compared to more presented symptoms such as diarrhea, abdominal pain or vomitus. Focusing on the liver enzymes as markers of possible liver impairment resulting from SARS-CoV-2 infection the AST, ALT and bilirubin have been considered for evaluation. The results are showing that median level of liver enzymes has not been elevated in the discharged group. Bilirubin and ALT were also within normal range in the hospitalized group with no statistically significant differences between these two groups. Only AST was elevated over the upper level of the norm in the hospitalized group with progressively higher values in patients who required ICU. The differences between hospitalized and discharged patients are substantially significant. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint Several previously published data have shown elevation in both transaminases and bilirubin to a different extent ranging from 1% to 53% (mainly ALT and AST accompanied by slightly increased bilirubin concentrations) [3] . In most published data, severe liver alterations were uncommon [29] and the pooled prevalence of liver injury regarding severity was 12% based on the meta-analysis by Mao et al. [3] More severe liver injury was also associated with worse outcomes, including intensive care unit admission and mortality [30] . The pathophysiology of liver involvement in COVID-19 is still not completely understood. The direct viral infection of the liver cells is proposed as a one of causes for liver injury, but the comprehensive studies are scarce. Study with pathological analysis of liver tissues from dead victims of COVID-19 showed no viral inclusions in hepatocytes [31] . Another repeatedly proposed and generally accepted mechanism of liver impairment could be drug toxicity [3] . In order to determine possible influence of recent ATB usage on the elevated AST presented in this paper, twoway analysis of variance (two-way ANOVA) was performed. There are no significant differences between groups with or without recent antibiotics usage. Therefore, we have concluded that ATB usage has no relevant influence on the elevated AST levels. The two-way ANOVA was also performed to assess the relationship between presence of chronical liver disease and AST. There is no statistically relevant difference in AST levels in hospitalized patients with and without chronical liver disease. Another possible explanation of elevated transaminases could be the result of a systemic inflammation. ALT is an enzyme most commonly found in liver, with small levels in striated muscle tissue and myocardium. On the other hand, AST could be found in liver but also in striated and myocardial muscle, kidneys, brain and red blood cells. AST had been used as a marker for myocardial infarction for a long time before more sensitive markers were identified and implemented to the routine clinical practice [32] . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint Different types of machine learning are being used in increased rate to determine predictors of outcome in various areas of clinical practice from brain trauma injuries [33] , radiology [34] , oncology [35] to dermatology [36] . Since COVID-19 pandemic is affecting the global population for more than a year a half now and is the cause of immense health crisis in most of the countries in the World new diagnostic tools-machine learning being one of them-and therapeutic methods are rapidly emerging [9] . Shortly after the COVID-19 outbreak various machine learning techniques have been used including taxonomic classification of COVID-19 genomes [10] , determining predictors of severe COVID-19 [11] and searching for new potential drug candidates against SARS-CoV-2 viral infection [12] . Another example of successful implementation of artificial intelligence in COVID-19 diagnosis is evaluation of the CT scans detecting SARS-CoV-2 associated pneumonia and differentiate it from community acquired pneumonia and other similar conditions with specificity and sensitivity higher than 90% [13] . So far several studies have been published using random forest machine algorithm for identifying predictors for COVID-19 outcome from wide variety of symptoms, socioeconomical factors [37] and laboratory results with various results [38] , [39] . To our current knowledge there are no studies specifically focused on gastrointestinal symptoms and gut related laboratory findings to this date. In order to assess predictive power of the gastrointestinal parameters and other measured variables for predicting the need for hospitalization the Random Forest Machine Learning algorithm was trained on the data from our study. Results were plotted as the ROC curve obtained from the Out-Of-Bag data. When considering general COVID-19 symptoms, gastrointestinal symptoms, age, sex, lasting of the symptoms and comorbidities (Diabetes Mellitus, Arterial hypertension and Chronical liver diseases) the AUC is 0.76. Measuring the Variable importance, the most important predictor is AST followed by age and diabetes mellitus which are substantially less important. When using only liver enzymes (AST, ALT), gastrointestinal symptoms (diarrhea and bloating), age and presence of chronic liver disease and Diabetes mellitus the AUC is 0.799 with AST as the strongest predictor for hospitalization. Previously published studies which used mostly methods of classical statistic, have identified the presence of gastrointestinal symptoms [40] predominantly diarrhea [41] , [42] and elevated liver enzymes [2] as predictors of hospitalization associated with COVID-19. In our data, we have singled out aspartate transaminase (AST) as not only the statistically significantly elevated liver enzyme in patients requiring . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint hospitalization but using artificial intelligence with random forest algorithm AST has proved out to be the most import predictor of hospitalization. Finally, we performed the Principal component analysis for mixed type of data to obtain two-dimensional representation of the data on patients who were discharged home and those who were admitted to the hospital. As could be seen on the plot 5 these two groups are partially overlapping but with clear tendencies to shift apart, which is in accordance with the predictive performance of the studied variables in the random forest algorithm. This study has identified elevated AST as the most important predictor for COVID-19 related hospitalizations using machine learning random forest algorithm. We have also shown that SARS-CoV-2 positivity is connected with isolated AST elevation and the level is linked with the severity of the disease. Furthermore, the prevalence of diarrhea and nausea among SARS-CoV-2 positive patients is significantly higher compared to SARS-CoV-2 negative controls. Bloating is occurring significantly more frequently in COVID-19 patients who require hospitalization than those who could be discharged to outpatient care. The authors report no conflict of interest . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted August 30, 2021 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint AST: aspartate transaminase; p<0.0001 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint The ROC curve for general COVID-19 symptoms, gastrointestinal symptoms, age, sex, lasting of the symptoms and comorbidities (Diabetes Mellitus, Arterial hypertension and Chronical liver diseases) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint The ROC curve for liver enzymes (AST, ALT), gastrointestinal symptoms (diarrhea and bloating), chronic liver disease, age and Diabetes Mellitus . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint Principal component analysis for mixed type of data to obtain two-dimensional representation of the data on patients who were discharged home (black dots) and those who were admitted to the hospital (red dots) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 30, 2021. ; https://doi.org/10.1101/2021.08.27.21262728 doi: medRxiv preprint Clinical Characteristics of Coronavirus Disease 2019 in China Gastrointestinal predictors of severe COVID-19: systematic review and meta-analysis Manifestations and prognosis of gastrointestinal and liver involvement in patients with COVID-19: a systematic review and meta-analysis AGA Institute Rapid Review of the Gastrointestinal and Liver Manifestations of COVID-19, Meta-Analysis of International Data, and Recommendations for the Consultative Management of Patients with COVID-19 Gastrointestinal, hepatobiliary, and pancreatic manifestations of COVID-19 Diarrhea During COVID-19 Infection: Pathogenesis, Epidemiology, Prevention, and Management Epidemiological, clinical and virological characteristics of 74 cases of coronavirus-infected disease 2019 (COVID-19) with gastrointestinal symptoms Evidence for gastrointestinal infection of SARS-CoV-2 Artificial intelligence and machine learning to fight COVID-19 Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study A machine learning-based model for survival prediction in patients with severe COVID-19 infection An integrative drug repositioning framework discovered a potential therapeutic agent targeting COVID-19 Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy Significant Applications of Machine Learning for COVID-19 Machine learning for COVID-19-asking the right questions Bagging predictors Random Forests A language and environment for statistical computing. R Foundation for Statistical Computing gtsummary: Presentation-Ready Data Summary and Analytic Result Tables rstatix: Pipe-Friendly Framework for Basic Statistical Tests DescTools: Tools for descriptive statistics. R package version 0.99.41 Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC), R package version 2.11.0 PCAmixdata: Multivariate Analysis of Mixed Data. R package version 3.1 ggpubr: 'ggplot2' Based Publication Ready Plots Clinical characteristics of COVID-19 patients with digestive symptoms in Hubei, China: a descriptive, cross-sectional, multicenter study Gastrointestinal symptoms associated with COVID-19: impact on the gut microbiome COVID-19 pandemic: Pathophysiology and manifestations from the gastrointestinal tract COVID-19 in Latin America: Symptoms, Morbidities, and Gastrointestinal Manifestations Implications of SARS-CoV-2 infection for neurogastroenterology Acute Liver Injury in COVID-19: Prevalence and Association with Clinical Outcomes in a Large Liver injury in COVID-19: management and challenges Aspartate aminotransferase and cardiovascular disease-a narrative review Random Forest-Based Prediction of Outcome and Mortality in Patients with Traumatic Brain Injury Undergoing Primary Decompressive Craniectomy Current Applications and Future Impact of Machine Learning in Radiology Applications of Machine Learning in Cancer Prediction and Prognosis Machine Learning in Medicine Physiological and socioeconomic characteristics predict COVID-19 mortality and resource utilization in Brazil COVID-19 Patient Health Prediction Using Boosted Random Forest Algorithm A descriptive study of random forest algorithm for predicting COVID-19 patients outcome Epidemiological and Clinical Predictors of COVID-19 Gastrointestinal manifestation as clinical predictor of severe COVID-19: A retrospective experience and literature review of COVID-19 in Association of Southeast Asian Nations (ASEAN) The Spectrum of Gastrointestinal Symptoms in Patients With Coronavirus Disease-19: Predictors, Relationship With Disease Severity, and Outcome