key: cord-0028339-m718p0md authors: Scala, Arianna; Trunfio, Teresa Angela; De Coppi, Lucia; Rossi, Giovanni; Borrelli, Anna; Triassi, Maria; Improta, Giovanni title: Regression Models to Study the Total LOS Related to Valvuloplasty date: 2022-03-07 journal: Int J Environ Res Public Health DOI: 10.3390/ijerph19053117 sha: 29a08fd7ff34ad4b5b94a465530ca35238fceb33 doc_id: 28339 cord_uid: m718p0md Background: Valvular heart diseases are diseases that affect the valves by altering the normal circulation of blood within the heart. In recent years, the use of valvuloplasty has become recurrent due to the increase in calcific valve disease, which usually occurs in the elderly, and mitral valve regurgitation. For this reason, it is critical to be able to best manage the patient undergoing this surgery. To accomplish this, the length of stay (LOS) is used as a quality indicator. Methods: A multiple linear regression model and four other regression algorithms were used to study the total LOS function of a set of independent variables related to the clinical and demographic characteristics of patients. The study was conducted at the University Hospital “San Giovanni di Dio e Ruggi d’Aragona” of Salerno (Italy) in the years 2010–2020. Results: Overall, the MLR model proved to be the best, with an R(2) value of 0.720. Among the independent variables, age, pre-operative LOS, congestive heart failure, and peripheral vascular disease were those that mainly influenced the output value. Conclusions: LOS proves, once again, to be a strategic indicator for hospital resource management, and simple linear regression models have shown excellent results to analyze it. The present research paper is an extension of a previous paper that the same authors presented at a conference [1] . In fact, the dataset considered is much larger, both in terms of number of records and the variables considered. Moreover, in order to further improve the regression model, a comparison with other algorithms was made. Valvular heart diseases are diseases that affect the valves by altering the normal circulation of blood within the heart, with repercussions on the general health of the subject. Knowledge of the natural history of the most common valvular heart diseases is important because the onset of symptoms often is the point at which intervention becomes necessary. Most valvular heart diseases are amenable to surgical intervention, which can afford a symptom-free and relatively normal life span [2] . The prevalence of valvular disease increases sharply with age, owing to the predominance of degenerative etiologies. The burden of heart valve disease in the elderly has an important impact on patient management, given the high frequency of comorbidity and the increased risk associated with intervention in this age group [3] . For each subject, it is fundamental to evaluate the severity of valvular disease, given that the risk in surgery is proportional to the degree of valvular disease; specifically in 2 of 13 the elderly, any type of surgery pre-operative evaluation and preparation is especially important for a successful outcome of the surgery [4] . A prospective survey of patients with valvular heart disease in Europe showed that of patients with severe, symptomatic, single VHD, 31.8% did not undergo intervention, most frequently because of comorbidities [5] . The etiology, approach to treatment, and expected outcomes of VHD are different in the elderly compared with younger patients. Both stenotic and regurgitant lesions are associated with unfavorable outcomes if left untreated. Surgical mortality remains high due to multiple co-morbidities, and the long-term survival benefit is dependent on many variables, including valvular pathology. Quality of life is an important consideration in treatment decisions in this age group. Increasingly, octogenarian patients are receiving transcatheter therapies, with transcatheter aortic valve replacement having the greatest momentum [6] . When surgery is not possible, or when the risks outweigh the benefits, percutaneous treatment options may offer effective alternatives. However, procedures may not always go as planned, and frail patients or those whose symptoms are caused by other comorbidities may not benefit from valve intervention at all. Significant effort should be made to assess frailty, comorbidities, and patient goals prior to intervention [7] . In the current guidelines of the European Society for Cardiology, published in 2021 [8] , surgical treatment remains the standard of care for most forms of severe valvular heart disease; however, the presence of chronic kidney disease impairs clinical outcomes and is associated with higher mortality rates when compared to patients with preserved renal function [9] . These latter valvular abnormalities are likely to increase further as the average age of the population increases [10] . For this reason, it is critical to be able to best manage the patient undergoing this surgery. The length of hospital stay (LOS) is considered an excellent indicator of quality in care processes [11, 12] . In fact, many studies have focused on how to reduce patients' LOS by optimizing care processes. For example, Scala et al. and Improta et al. demonstrated how the introduction of a diagnostic therapeutic pathway for femur fracture and diabetic patient management, respectively, reduced LOS with consequent benefits for both patients and hospitals [13, 14] . Biomedical data analytics is key to improving processes, reducing costs, and giving clinicians new tools to manage all different patients. There are many approaches used in the literature for data analysis, including lean six sigma [15] [16] [17] [18] [19] , health technology assessment [20] [21] [22] [23] , machine learning algorithms [24] [25] [26] , and mathematical modelling [27] [28] [29] . The latter approach was chosen for this study. In the literature, there are several applications: Tesfahun et al., in order to optimize medical waste management processes, developed a model capable of predicting the production rate of this waste [ [32] ; and Kadam et al. employed both artificial neural networks and a multiple linear regression to predict the potability of water in an Indian river [33] . Therefore, the aim of this work is to use mathematical modelling and, in particular, several regression algorithms to obtain a model that can help clinicians in the assessment of the LOS of patients undergoing mitral valve repair surgery. As already mentioned, the present work aims to be an extension and improvement of the previous one presented at a conference [1] . In fact, the final model is more complex since the individual comorbidities were considered and, therefore, allows the clinician to take into account more aspects that characterize the particular patient. The research was carried out at the Complex Operative Unit (C.O.U.) of the Cardiology unit at the University Hospital "San Giovanni di Dio e Ruggi d'Aragona" of Salerno (Italy). The dataset was obtained from the hospital's information system, QuaniSDO, and included all patients who underwent open-heart mitral valve repair surgery without replacement from 2010 to 2020. It comprises 379 records and contains the following information: Date of admission, discharge, and procedure. From this information, variables were obtained that were then used in the multiple linear regression. In particular, the dependent variable (i.e., the output of the model), the LOS, was obtained as the difference between the date of discharge and the date of admission; the pre-operative LOS-independent variable was obtained as the difference between the date of discharge and the date of the procedure. By analyzing the procedures, it was possible to determine the number of cardiac procedures carried out in addition to mitral valve repair surgery. For example, interventions for bypass, pacemaker implantation, cardioversions, or other interventions on valves were considered. In addition, the other independent variables are reported below: Four procedures. Table 1 shows the distribution of the features into the sample. [34] was used to build an MLR model used to predict the total LOS. Before its implementation, the following six conditions must be verified: Linear relationship between the independent and dependent variable; 2. Absence of collinearity; 3. Independence of the residuals; 4. Constant variance of the residuals; 5. Normal distribution of residuals; 6. Absence of outliers. With MatLab version R2020a, other regression algorithms (linear support vector machine, LSVM; narrow neural network, NNN; rational quadratic Gaussian process regression, GPR; and random forest, RF) were implemented. In particular, SVM can also be used as a regression method, keeping at the base the same main idea of the classifier to minimize the error by identifying a hyperplane in an N-dimensional space, where N depends on the number of variables, and considering a margin of tolerance that is not part of the classification process. Neural network models can be considered valid alternatives to classical regression models. In fact, they have the property of learning from a set of data without the need for a complete specification of the decision model. They automatically provide all necessary data transformations and are able to see through noise and distortion. Gaussian processes (GP) are a supervised learning method used for regression and probabilistic classification problems. They are versatile, different kernels can be specified, and the prediction is probabilistic (Gaussian). Lastly, RF is a supervised learning algorithm in which multiple learning algorithms are combined together to make a more accurate prediction. The model is powerful and accurate, but overfitting can easily occur. Before performing the analyses, the dataset was divided into training sets for 80% and test sets for 20%. The R 2 parameter was used to evaluate the accuracy of the model. Before implementing the MLR model, the six hypotheses were tested. To verify this assumption, partial dispersion graphs were created to verify the trend of the dependent variable LOS as a function of the selected independent variables. Figure 1 shows what has been obtained for the pre-operative LOS. To verify this assumption, partial dispersion graphs were created to verify the trend of the dependent variable LOS as a function of the selected independent variables. Figure 1 shows what has been obtained for the pre-operative LOS. Consistent with the definition of total LOS, the linear relationship between the variables was deduced. The problem with this type of representation was that the effect of combining several independent variables was not considered. The absence of multicollinearity has been demonstrated through Pearson's correlation, tolerance, and the variance inflation factor (VIF). All variables are a function of the correlation between the i-th independent variable and the others. Table 2 shows the results of the Pearson correlation and the statistical significance. Consistent with the definition of total LOS, the linear relationship between the variables was deduced. The problem with this type of representation was that the effect of combining several independent variables was not considered. The absence of multicollinearity has been demonstrated through Pearson's correlation, tolerance, and the variance inflation factor (VIF). All variables are a function of the correlation between the i-th independent variable and the others. Table 2 shows the results of the Pearson correlation and the statistical significance. Table 3 , instead, shows the values of VIF and tolerance that were obtained for each independent variable. With the exception of the pre-operative LOS, the Pearson correlation value was always less than 0.7. In addition, the VIF values were always less than 10 and the tolerance values were always greater than 0.2, so the absence of multicollinearity was verified. The Durbin-Watson statistical test was used to test this hypothesis. The result is always between 0 and 4, where the intermediate value represents that there is no autocorrelation detected in the sample. In this case, the result was equal to 1.517 and, therefore, was within the acceptability range of (1.5; 2.5). To evaluate the variance of the residuals, the graphic "standardized expected value regression" on the x-axis against "standardized residual regression" was created. Figure 2 shows the obtained result. The scatter plot (Figure 2) shows that the data is randomly distributed around zero. It is possible to say that the homoscedasticity hypothesis is not violated. The hypothesis is therefore verified. With the exception of the pre-operative LOS, the Pearson correlation value was always less than 0.7. In addition, the VIF values were always less than 10 and the tolerance values were always greater than 0.2, so the absence of multicollinearity was verified. The Durbin-Watson statistical test was used to test this hypothesis. The result is always between 0 and 4, where the intermediate value represents that there is no autocorrelation detected in the sample. In this case, the result was equal to 1.517 and, therefore, was within the acceptability range of (1.5; 2.5). To evaluate the variance of the residuals, the graphic "standardized expected value regression" on the x-axis against "standardized residual regression" was created. Figure 2 shows the obtained result. The P-P plot ( Figure 3) shows how well the available data set fits the specific probability distribution. With this tool, the cumulative distribution of the empirical probability of the data is compared with that of the assumed true cumulative distribution functions. The scatter plot ( Figure 2) shows that the data is randomly distributed around zero. It is possible to say that the homoscedasticity hypothesis is not violated. The hypothesis is therefore verified. The P-P plot ( Figure 3) shows how well the available data set fits the specific probability distribution. With this tool, the cumulative distribution of the empirical probability of the data is compared with that of the assumed true cumulative distribution functions. Although the curve did not exactly retrace the ideal line, the slight variation did not affect the good performance of the model. Although the curve did not exactly retrace the ideal line, the slight variation did not affect the good performance of the model. The last hypothesis to be verified was the absence of outliers that affect the estimate of the parameters βi. To accomplish this, Cook's distance was calculated for each observation. Figure 4 shows the obtained result. Although the curve did not exactly retrace the ideal line, the slight variation did not affect the good performance of the model. The last hypothesis to be verified was the absence of outliers that affect the estimate of the parameters βi. To accomplish this, Cook's distance was calculated for each observation. Figure 4 shows the obtained result. For each observation, the Cook's distance was less than 1. Therefore, there were no outliers that caused bias. For each observation, the Cook's distance was less than 1. Therefore, there were no outliers that caused bias. After this verification phase, the MLR model was implemented. Table 4 shows the goodness of the model. The R 2 value was greater than the set threshold value of 0.5. The model was well suited to the problem under consideration and could be a valid preliminary tool. Table 5 shows the model coefficients and the t-test result at a significance level of 0.05. The test showed that of the selected independent variables, age, pre-operative LOS, CHF, and PVD significantly influenced LOS. For all of them, the value of the coefficients was positive and among these the highest was the one associated with the PVD. After analyzing the results of the MLR model, further regression algorithms were implemented (Table 6 ). Among these, the best was the rational quadratic GPR, but the value was still lower than that obtained with the MLR model. The diagrams of the predictions made, with the relative errors for each algorithm, are shown below (Figures 5-8) . In this study, the data provided by the C.O.U. of the Cardiology unit at the University Hospital "San Giovanni di Dio e Ruggi d'Aragona" of Salerno (Italy) were analyzed. Specifically, the information was related to the flow of patients who underwent an open-heart mitral valve repair surgery without replacement from 2010 to 2020, for a total of 379 records. Starting from the extraction of a limited set of information from hospital discharge forms, a group of independent variables were obtained (gender, age, pre-operative LOS, acute myocardial infarction (AMI), congestive heart failure (CHF), cerebrovascular disease (CeVD), peripheral vascular disease (PVD), chronic obstructive pulmonary disease (COPD), diabetes, renal disease (RD), 2 procedures, 3 procedures and 4 procedures) and were used to predict the total LOS. As conducted in the previous study [1] , an MLR model was implemented. The obtained MLR model had an R 2 value equal to 0.720, and among In this study, the data provided by the C.O.U. of the Cardiology unit at the University Hospital "San Giovanni di Dio e Ruggi d'Aragona" of Salerno (Italy) were analyzed. Specifically, the information was related to the flow of patients who underwent an open-heart mitral valve repair surgery without replacement from 2010 to 2020, for a total of 379 records. Starting from the extraction of a limited set of information from hospital discharge forms, a group of independent variables were obtained (gender, age, pre-operative LOS, acute myocardial infarction (AMI), congestive heart failure (CHF), cerebrovascular disease (CeVD), peripheral vascular disease (PVD), chronic obstructive pulmonary disease (COPD), diabetes, renal disease (RD), 2 procedures, 3 procedures and 4 procedures) and were used to predict the total LOS. As conducted in the previous study [1] , an MLR model was implemented. The obtained MLR model had an R 2 value equal to 0.720, and among In this study, the data provided by the C.O.U. of the Cardiology unit at the University Hospital "San Giovanni di Dio e Ruggi d'Aragona" of Salerno (Italy) were analyzed. Specifically, the information was related to the flow of patients who underwent an openheart mitral valve repair surgery without replacement from 2010 to 2020, for a total of 379 records. Starting from the extraction of a limited set of information from hospital discharge forms, a group of independent variables were obtained (gender, age, pre-operative LOS, acute myocardial infarction (AMI), congestive heart failure (CHF), cerebrovascular disease (CeVD), peripheral vascular disease (PVD), chronic obstructive pulmonary disease (COPD), diabetes, renal disease (RD), 2 procedures, 3 procedures and 4 procedures) and were used to predict the total LOS. As conducted in the previous study [1] , an MLR model was implemented. The obtained MLR model had an R 2 value equal to 0.720, and among the variables, those that most influenced the LOS were age, pre-operative LOS, CHF, and PVD. Compared to the result obtained in the short paper, where the model was obtained using 70 records included in the 379 used here, the goodness of the model is slightly lower (R 2 = 0.864) without showing, with the exception of the pre-operative LOS, which is linked to the LOS by definition, any significant influence. Undergoing multiple heart surgeries was not significantly correlated with LOS. In this case, the greater number of records made it possible to identify the classes of patients for which greater organizational effort is required. In addition to the MLR model, further regression algorithms were tested (linear support vector machine, LSVM; narrow neural network, NNN; rational quadratic Gaussian process regression, GPR; and random forest, RF). Of these, the best was rational quadratic GPR, with a value of R 2 = 0.690. The performance, however, was lower than the MLR model, which ultimately remains the best model. The limitation of this work is certainly that of not considering the impact that specific cardiac procedures with the same complexity as the one in the exam, such as coronary revascularization and tricuspid annuloplasty, have on LOS. Future developments will certainly include exceeding the limits, the validation of the models through both an update of the dataset with the inclusion of what has been obtained for the year 2021 and through the analysis of data from different populations. In addition, further regression and classification models may be implemented. In this work, the dataset consisting of 379 patients who underwent open-heart mitral valve repair surgery without replacement from 2010 to 2020 at the C.O.U. of the Cardiology unit at the University Hospital "San Giovanni di Dio e Ruggi d'Aragona" of Salerno (Italy) was analyzed through 4 different regression models/algorithms. An MLR model, linear support vector machine, narrow neural network, rational quadratic Gaussian process regression, and random forest were implemented. Among these, the best was the MLR model, with an R 2 = 0.720. Finally, the statistical analysis showed that the variables that significantly affected the total LOS were age, pre-operative LOS, congestive heart failure, and peripheral vascular disease. The datasets generated and/or analyzed during the current study are not publicly available for privacy reasons but are available from the corresponding author on reasonable request. The authors declare that they have no competing interest. Multiple Regression and Machine Learning to investigate factors influencing the length of hospital stay after valvuloplasty Valvular Heart Disease Epidemiology of valvular heart disease in the adult The preoperative assessment of patients with valvular heart disease as a comorbidity A prospective survey of patients with valvular heart disease in Europe: The Euro Heart Survey on Valvular Heart Disease Valvular Heart Disease in Patients ≥80 Years of Age Palliative care in end-stage valvular heart disease 2021 ESC/EACTS Guidelines for the management of valvular heart disease: Developed by the Task Force for the management of valvular heart disease of the European Society of Cardiology (ESC) and the European Association for Cardio-Thoracic Surgery (EACTS) Valvular heart disease in patients with chronic kidney disease Writing Committee Members Focused Update Incorporated Into the ACC/AHA 2006 Guidelines for the Management of Patients with Valvular Heart Disease Linking Health Outcomes and Resource Efficiency for Hospitalized Patients: Do Physicians with Low Mortality and Morbidity Rates Also Have Low Resource Expenditures? Variation in length of stay as a measure of efficiency in Manitoba hospitals Lean Six Sigma Approach for Reducing Length of Hospital Stay for Patients with Femur Fracture in a University Hospital Management of the Diabetic Patient in the Diagnostic Care Pathway Six Sigma Approach for a First Evaluation of a Pharmacological Therapy in Tongue Cancer Implementing fast track surgery in hip and knee arthroplasty using the lean Six Sigma methodology Application of the Lean Six Sigma approach to the study of the LOS of patients who undergo laparoscopic cholecystectomy at the San Giovanni di Dio and Ruggi d'Aragona University Hospital DMAIC Approach for the Reduction of Healthcare-Associated Infections in the Neonatal Intensive Care Unit of the University Hospital of Naples 'Federico II Application of DMAIC Cycle and Modeling as Tools for Health Technology Assessment in a University Hospital Health technology assessment (HTA) of optoelectronic biosensors for oncology by analytic hierarchy process (AHP) and Likert scale HTA (Health Technology Assessment): A Means to Reach Governance Goals and to Guide Health Politics on the Topic of Clinical Risk Management Analytic Hierarchy Process (AHP) in Dynamic Configuration as a Tool for Health Technology Assessment (HTA): The Case of Biosensing Optoelectronics in Oncology An Innovative Contribution to Health Technology Assessment Assessment of proteinuria level in nephrology patients using a machine learning approach Comparison of machine learning algorithms to predict length of hospital stay in patients undergoing heart bypass surgery A comparison of different Machine Learning algorithms for predicting the length of hospital stay for patients undergoing cataract surgery Modelling the hospital length of stay for patients undergoing laparoscopic cholecystectomy through a multiple regression model Application of Supply Chain Management at Drugs Flow in an Italian Hospital District An Innovative Business Model for a Multi-echelon Supply Chain Inventory Management Pattern Developing models for the prediction of hospital healthcare waste generation rate Healthcare impact of COVID-19 epidemic in India: A stochastic mathematical model Initial Factors Influencing Duration of Hospital Stay in Adult Patients with Peritonsillar Abscess Prediction of water quality index using artificial neural network and multiple linear regression modelling approach in Shivganga River basin IBM SPSS Statistics for Windows