key: cord-0896476-u40u75f3 authors: Liuzzi, Piergiuseppe; Campagnini, Silvia; Fanciullacci, Chiara; Arienti, Chiara; Patrini, Michele; Carrozza, Maria Chiara; Mannini, Andrea title: Predicting SARS-CoV-2 infection duration at hospital admission:a deep learning solution date: 2022-01-07 journal: Med Biol Eng Comput DOI: 10.1007/s11517-021-02479-8 sha: 00c9034840a4b8afa9ed92fd6b2fd4d5d2ba3ec0 doc_id: 896476 cord_uid: u40u75f3 COVID-19 cases are increasing around the globe with almost 5 million of deaths. We propose here a deep learning model capable of predicting the duration of the infection by means of information available at hospital admission. A total of 222 patients were enrolled in our observational study. Anagraphical and anamnestic data, COVID-19 signs and symptoms, COVID-19 therapy, hematochemical test results, and prior therapies administered to patients are used as predictors. A set of 55 features, all of which can be taken in the first hours of the patient’s hospitalization, was considered. Different solutions were compared achieving the best performance with a sequential convolutional neural network-based model merged in an ensemble with two different meta-learners linked in cascade. We obtained a median absolute error of 2.7 days (IQR = 3.0) in predicting the duration of the infection; the error was equally distributed in the infection duration range. This tool could preemptively give an outlook of the COVID-19 patients’ expected path and the associated hospitalization effort. The proposed solution could be viable in tackling the huge burden and the logistics complexity of hospitals or rehabilitation centers during the pandemic waves. With data taken ad admission, entering a PCA-based feature selection, a k-fold cross-validated CNN-based model was implemented. After external texting, a median absolute error of 2.7 days [IQR = 3 days] [Image: see text] Since October 2020, almost 300 million people have been infected by SARS-CoV-2 with more than 5 million of deaths (World Health Organization, WHO, reports). It is well known that in severe cases, treatment in intensive care units is required. This can lead to overcrowding of hospitals and rehabilitation settings [1] that is currently posing a global burden to healthcare systems [2, 3] . As already investigated for other pathological conditions [4, 5] , artificial intelligence (AI) is being applied to extract predictive information with the potential to revolutionize the approach to tackle COVID-19 [6, 7] . Prediction models, capable of correlating patients' characteristics to the evolution traits of the disease and possible patients' responses to it, can provide helpful support to the decision-making process in clinical environments [8] [9] [10] [11] . For what concerns COVID-19 prognostic models, literature mainly focuses on mortality risk, assessing it at admission [11] , after a week [12] , or predicting the discharge setting [13] . To the extent of our knowledge, only two articles attempted to find a relation between length of stay and predictive features [14, 15] . Wang et al. [14] showed that patients in high risk and low risk (identified by using the features with most predictive power in their diagnostic model) had significant difference in length of stay. Qi et al. [15] instead targeted short-term hospital stay (< 10 days) and long-term hospital stay (> 10 days) and obtained a binary classification. In many pathological contexts, the length of stay is often addressed as an outcome, considered both as an indirect index of severity of the disease and an essential data for hospitals administration. However, especially during pandemic outbreaks, length of stay appears to be highly impacted by external factors like personnel/bed availability and local differences in hospital management rules. Focusing on infection duration looks as a promising solution in these regards, thanks to its capability to overcome length of stay limitations and keep at the same time the aforementioned duality trait. Up to our knowledge and to updated systematic reviews [16] , no existing research assesses the specific problem of infection duration by means of data-driven regression models. This knowledge in our view could lead to an innovative tool to be implemented in electronic health record (EHR) allowing for a significant advantage in the management of both clinical and administrative aspects for COVID-19 patients' treatment. Indeed, it could ease the practitioner in the delivery of personalized care to patients as well as supporting management of beds, intensive care units, and ventilation units. To fill this gap, starting from a dataset of 222 COVID-19 patients treated in the Fondazione Don Gnocchi hospital network, we compared different machine learning solutions with the aim of predicting the duration of the infection. The resulting most performant solution was based on a convolutional neural network (CNN) model (namely, CNN-core) and it was obtained by four steps: (1) training of the CNN-core, (2) combining the cores in an ensemble, (3) adding two separate meta-learners (logistic regression and fully connected neural network), and lastly, (4) voting among meta-learners predictions. The achieved accuracy (median infection duration absolute error of 2.7 days, IQR = 3.0 days) looks promising for the implementation of a decision support tool to be integrated with the EHR of the hospital network. An observational study was performed including 518 patients who were discharged from 16 Fondazione Don Gnocchi centers involved in the COVID-19 patients' care. Inclusion criteria were based on current or previous infection by SARS-CoV-2 virus at admission in hospital and thus all patients positive to COVID-19 (age ≥ 18 years) were enrolled in the study. All patients were diagnosed with COVID-19 strictly following WHO guidelines [17] . Positive cases were verified maximally every 10 days via molecular tests. Due to the high spectrum of cases, patients in the database were primarily classified into as follows: type 1, already positive to SARS-CoV-2 before admission; type 2, turned positive during their stay; type 3, hospitalized after the infection for rehabilitation purposes. Given that the target of this study was the estimation of the duration of the infection from hospital admission data, only type 1 patients were retained for further analyses (222 patients). These data referred to the first pandemic wave in Italy and were retrospectively acquired from April to September 2020. Especially during the first pandemic wave, the emergency scenario and the lack of treatment protocols for an unknown disease played a role in increasing the heterogeneity among patients' characteristics. For instance, an aspect of interest for our study was the time difference between the admission to the IRU and the first negative test with no subsequent positive ones (median 12 days, IQR = 20.5). These numbers gave us a further confirmation in targeting the infection duration as outcome, considering it as a more reliable and less regional-dependent proxy of hospitalization than the length of hospital stay. The infection duration, measured in days, was calculated as the difference between the date of the first positive molecular test and the date of the following first negative one, without subsequent positives. The infection duration was finally calculated when at least two negative results were collected. Indeed, this variable hosts more general information with respect to the length of stay and that it can be more versatile. In fact, it can be applied independently of the differences in healthcare organizations in different regions/countries and independently from the specific emergency status of the healthcare system at the time of the recovery. The study protocol has been approved by the Ethical Committee of the IRCSS Fondazione Don Carlo Gnocchi the 16/04/2020. A structured data collection was designed on REDCap (Research Electronic Data Capture, Vanderbilt University, 2021 West End Avenue, Nashville, TN 37,235, USA), an online-based software for database development. The database was structured in a way to collect each evaluation or assessment in four distinct events: admission, during the recovery, discharge, and, only in the case of type 3 subjects, the acute phase of the disease. However, for specific data groups, such as the results of the diagnostic tools, it was given the disposition to collect information about any test repeated, independently on the events planned. More than 800 features have been taken and the complete dataset includes anagraphical data, symptoms and vital signs, hematochemical and hemogasanalysis values, instrumental data (RX, CT, EEG, etc.…), multiple assessments (cognitive, psychological, functional, and nutritional), and prior clinical data [18] . The median age of the 222 patients included in the study was 76 (IQR = 19) and the male was the 46% of the total dataset. The median infection duration was 31 days (IQR = 26) and the values fell in a range between 11 and 97 days (Fig. 1, panel A ). An almost linear relationship between infection duration and number of molecular tests can be observed (Fig. 1, panel B) . Preliminary statistical analyses were carried out in SPSS (Vs 26, Chicago, SPSS Inc.). They concerned univariate analysis to understand the influence of each selected predictor with respect to the infection duration. In particular, Pearson correlations were applied with numerical variables, while the non-parametric Mann-Whitney test was applied for dichotomous variables. The data preprocessing was conducted in Matlab (R2019b, The MathWorks, Inc., Natick, MA, USA) as well as the machine learning (ML) models. The deep learning (DL) models were written in Python 3.0 (Python Software Foundation) using the TensorFlow library. Pseudonymized data can be made available upon request to researchers to validate and reproduce results. For the comparison of different machine learning methods, we presented as touchstone the linear regression, a simple and interpretable model. To compare the different machine learning solutions proposed, median absolute errors of each solution were compared with the linear regression error on the same population by mean of Wilcoxon signedrank tests. Moreover, an effect size of this comparison was calculated as the ratio of negative differences between the measurements and the total numerosity. As it was already pointed out, the initial dataset was composed by 829 features. Firstly, its dimensionality was reduced by keeping features taken in the first 8 h from admission and the ones with a fill percentage higher than the 30% of the column length (total subjects). Secondly, via a literature search of relevant correlates, we included supports, sign and symptoms, and clinical and hematochemical data. Furthermore, pharmacological therapies (both COVID related and non-COVID related) were included in the feature set given their availability in the dataset at the time of admission. Missing data in the training, validation, and test sets were substituted by the mean (for numerical data) or the mode (for the categorical data) of the correspondent variable in the training set, reaching a full set of 55 features (Table 1) . A further reduction on data dimensionality was then achieved by principal component analysis (PCA). Five principal components were retained yielding a variance > 99%. The same PCA transform was then applied to the test set. In order to test different approaches to the problem, ML (linear regressions, random forests) and DL (convolutional neural network) models were compared. We tackled the problem with an approach of growing complexity. Regularized linear regression and random forests were considered because of the simpler interpretability of the model, which is a non-negligible aspect in the clinical practice. More complex architectures (CNN) were subsequently developed to increase accuracy and reliability of the tool. The random forest model is an ensemble learning method using a regression tree as template learner [19, 20] . In such model, a set of binary decision trees are merged in a single ensemble classifier and the input features of each tree are subsamples of the available features. To define the model, both the minimum number of leaf node observations in each tree and the number of predictors to sample (n PTS ) at each node need to be selected. CNNs are a type of neural network capable of hierarchically assembling more structured patterns using simpler ones [21] . The CNN-core model we choose was composed by sequential layers of different type (Fig. 2a) . Together with the 1-D convolutional layers, 1-D max pooling layers were implemented. The aim was to reduce data dimensionality by combining the outputs of neuron clusters at the prior layer into a single neuron in the subsequent layer. It has been previously demonstrated that such a convolution-pooling, fully connected structure, can successfully process both images [22] and one-dimensional data [23, 24] . Each model was trained using as input the PCA-transformed data and the infection duration as target. A trainvalidation-test split was done to validate the model over a different number of subjects and parameters. The validation strategy adopted was K-fold validation with N folds = 5. The test portion removed corresponds to 15% of the samples (33 subjects); hence, each fold of the cross-validation was composed by 37 or 38 subjects. Additionally, the whole process was repeated 10 times: the obtained aggregated results were reported to reduce the effect of randomized parameter initialization (CNN) and randomized test split. In the ML models, the regularization parameter λ for the linear regression, the complexity (depth) of the trees, and the number of predictors to sample at each node for the random forest were optimized by grid search. For the CNN, given the large number of hyper-parameters to be chosen, a more complex grid search was conducted. The involved parameters were the adaptive moment estimation (Adam) optimizer learning rate l r , the number of neurons in the first and second fully connected layer n neurons,1 and n neurons,2 , the number filters in the two convolutional layers n f ilters,1 and n f ilters,2 , and the number of training epochs n epochs . The neuron activation functions were chosen to be Rectified Linear Units (ReLUs) with f ReLU (a) = max(0, a). The third and last fully connected layer (FCL) uses a linear activation function f Lin (a) = a. The range of each variable is shown in Table 2 and for each permutation ( N perm = 4 × 5 × 4 × 3 × 3 × 4 = 2880 different model configurations), the optimization process was run 5 times per configuration and the result aggregated. The final configuration was chosen as the one of the model with minimum validation error. In order to improve the performance of the CNN-core model, a stacked ensemble learning approach was implemented. It was done by concatenating the individual CNN-core models ( N CoreCNN = 5 ) predictions into a second feature vector (Fig. 2a) . The individual CNN-core models differed only for starting weight initialization and a different random number generator seed in the Adam stochastic optimization. X cat,1 will be fed to the learning stage 2, also called meta-learner, in order to reduce the inductive training bias and the effect of random weight initialization of the single sub-models on test predictions. As meta-learner, two different multi-layer perceptron were implemented: (i) a logistic regression (CNN-LR) and (ii) a fully connected neural network (CNN-MLP). The meta-learner (stage 2) training followed the training of each of the models in the learning stage 1. For the logistic regression (LR), the learning stage 1 weights were kept constant to the final weights of the respective training phase during training of the meta-learner, since no backpropagation training is required for logistic regressions. Conversely, while training the MLP, the learning stage 1 weights were re-trained with starting weights set equal to the final weights of their previous training phase (Fig. 2b) . The reason behind this is that the logistic regression does not need a back-propagation training while the MLP does. Unlike meta-learning, during a voting process, each model output is considered with the same weight. In regression tasks, we can increase the performance of the overall model, balancing the offsets of the single sub-models, by averaging among their predictions [25] . Hence, CNN-MLP and CNN-LR predictions were averaged to obtain the final result via y pred = y pred,CNN−MLP +y pred,CNN−LR 2 . Furthermore, to improve the re-training and the voting process, hence reduce each of the CNN-core models bias, we removed from each of the 5 core test predictions the median prediction error of its training set. Then, it was fed again to the meta-learning stage 2. This resulted in an improved approach allowing the subsequent procedures (ensembling, meta-learning, and voting) to yield more accurate estimates. From the preliminary biostatistical analyses, Pearson correlations with the infection duration were found to be statistically significant for the Cumulative Illness Rating Scale (CIRS) [26] declined as severity and comorbidity indexes (p-values respectively of 0.001 and 0.003) (Table 1, Fig. 3 ). For what concerns features related to therapies, patients with an ongoing therapy with tocilizumab (p = 0.033), vitamins (p = 0.02), anticoagulants (p = 0.044), calcium channel blockers (p = 0.019), and anxiolytic-antidepressant (weak, p = 0.054) showed a statistically significant longer duration of the infection. Finally, concerning vital support aids, only the presence of the tracheal cannula showed weak association with a p-value of 0.053 (Fig. 3) . For what concerns the automatic prediction of the outcome, after optimizing hyper-parameters for all the tested methodologies (Table 2) , the linear regression resulted in a median absolute error of 13.23 days (IQR = 10.19), while the random forest, with 15.39 days (IQR = 13.95), performed slightly worst (Fig. 4) . The grid-search for the CNN-core hyper-parameter optimization resulted in the best configuration having train, validation (fivefold), and test median AE of 11.12, 11.35, and 9.63 days respectively. After combining 5 CNN-core models in a stacked ensemble, adding two meta-learners (LR and MLP) and voting among the two models, the median test error resulted to be 4.67 days (IQR = 5.25). However, it can be noticed that the predictions in this case resulted skewed from the ideal output (Fig. 5, orange markers) . Detrending the CNN-core test The solution based on "CNN ensemble + voting" and the one including "CNN detrended + ensemble + voting" showed significantly improved accuracies with respect to the linear regression, as confirmed by the Wilcoxon signed rank test (Fig. 5) . In the comparison between the random forest and the CNN-core model, no significant differences in performance were obtained. Moreover, the effect sizes of the CNN model after voting (0.78) and of the detrended CNN model after voting (0.94) statistically confirm the improvement of our model with respect to the linear regression (p < 0.001). The aggregated results after the repetition of the procedure with the same hyper-parameters multiple times ( N run = 10 ) are summarized in Fig. 5 . The median error of 2.7 days is very similar to the one obtained with only one run (2.5), but the IQR is higher (3.0 days for 10 runs compared to 1.9 for 1 run). The determination coefficient ( R 2 ), calculated between real and predicted values, was positively impacted by both ensembling/voting and detrending procedures, reaching R 2 = 0.91 for the final solution. Table 2 Grid values for the optimization of the ridge linear regression (A), random forest (B), and convolutional NN (C). Subscripts refer respectively to the FCL layers (for the number of neurons) and to the convolutional layers (for the number of filters). The output FCL layer is a single-output neuron, being this a single-output regression In this study, a predictive model for the duration of SARS-CoV-2 infection in hospitalized patients was investigated and validated on data from 222 patients. Classical machine learning algorithms, such as optimized linear regressions and random forests, resulted in performances not fully satisfying for this problem. However, non-linear models resulted to significantly improve the prediction accuracy. Indeed, on our dataset, a model of increased complexity is needed for an accurate prediction of the clinical outcome at the expense of a reduced interpretability. Our cross-validation results confirm that, by means of data taken in the first 8 h from patients' admission, an accurate prediction During a time in which a complex pandemic seems still to affect importantly healthcare services, a prognostic prediction tool can support clinical decision in hospitals or sanitary structures by providing data-driven elements for a better time planning and hospital organization [27] [28] [29] . As already stated, we focused on predicting the infection duration. Up to our knowledge, there is not a previous study involving data-driven regression models targeting the infection duration for COVID-19 or any other illness. Some similar solutions reported in literature concern the length of stay estimation (Table 3) . Nemati et al. [30] , by means of survival analysis, targeted the in-hospital length of stay for COVID-19 patients, showing how the discharge probability reaches 1 after ~ 27 days. Qi et al. [15] , instead, focused on binary outputs as short-and long-term hospital stay (area under the curve, AUC = 0.97 ). By translating our best-performing regression solution in a similar binary classifier using the target median (31 days) as the threshold, we achieved an AUC of 0.98 . Ebinger et al. [31] similarly classify patients according to a LoS threshold set equal to 8 days obtaining an AUC = 0.819. Lastly, Chiari et al., starting from more than 1000 patients and multimodal sources (blood exams and clinical variables), obtained a mean absolute error of 4.11 days in predicting LoS [32] . The latter manuscript presents an internally validated model, trained using a dataset with median LoS of 14 days, using data acquired up to the first 8 days after admission. Finally, Setti et al. developed a linear kernel-based support vector for regression targeting post-COVID rehabilitation LoS [27] . The model, trained on data from the first pandemic wave, was tested with data from the second pandemic wave achieving a median absolute prediction error of ~ 7 days. Regression models targeting length of stay in specific wards (MAE ~ 1 days, range: 2-7 days [33] ) and in emergency unit (RMSE = 13.35 days [34] ) show the complexity this prediction, by means of regression methods. Even if the comparison is not entirely fair, since our patient spectrum is narrower (only COVID-19 with respect to the heterogeneity of patients in emergency unit), we achieved a significant decrease in the prediction error. Some relevant limitations to our work need a further discussion. In addition to the low interpretability of the model, another limit is that the infection duration could be altered by the advent of new therapies and treatments or by the diffusion of SARS-CoV-2 variants. As soon as such information will be available, a redefinition of the solutions will be necessary. Another limitation is that our dataset was acquired in hospitals, involving symptomatic patients only. In this regard, given the simple nature of input features, it is reasonable to assume that by extending the pool of available data to the overall population, a general solution could be achieved. Still, the cohort heterogeneity for what concerns the duration of infection (from ~ 10 days up to ~ 80 days) is a point in favor of the generalizability of the results that could be improved by further patients' stratification on a larger Indeed, a strength of this model is that it is developed on very simple and accessible data, mostly available in the clinical routine and easily collectable in a digital form. The integration of such a model into the clinical workflow can be straightforward, through a simple graphical user interface. Even non-medical personnel can transfer the requested data into the tool, right after the admission, and obtain an estimate of the duration of the infection. This allows us to consider our tool to be "low cost" for the hospital, having at the same time an accuracy level in the estimation of infection duration which is clinically relevant. In conclusion, we reported the development and validation of a predictive model based on data collected from Fondazione Don Gnocchi centers (Italy) during the first COVID-19 pandemic wave. This work confirms that deep learning and machine learning can be viable tools for predicting clinical outcome in order to support the clinical decision-making processes. Given the simple measurement of the input data, the model results to be easily translatable into clinical practice. Further work will aim to perform an external prospective validation and to perform a sensitivity analysis of the prediction with respect to COVID-19 therapies and SARS-CoV-2 variants. To bring the finding of the study into clinical practice, a user-friendly software is currently under development for future integration in the clinical daily practice. Funding The study was supported by the Department of Excellence in Robotics & AI, Scuola Superiore Sant'Anna and the Italian neuroscience and neurorehabilitation research hospitals network ("Rete IRCCS delle Neuroscienze e della Neuroriabilitazione") which funded the study jointly with the "Ricerca corrente RC2020 program" and the 5 × 1000 funds AF2018: "Data Science in Rehabilitation Medicine" AF2019: "Study and development of biomedical data science and machine learning methods to support the appropriateness and the decision-making process in rehabilitation medicine" by the Italian Ministry of Health. ten Hove e R (2020) The COVID-19 rehabilitation pandemic Critical care crisis and some recommendations during the COVID-19 epidemic in China COVID-19: a novel coronavirus and a novel challenge for critical care A review of tree-based prognostic models Stroke prognostic scores and data-driven prediction of clinical outcomes after acute ischemic stroke Prediction models for COVID-19 clinical decision making Leveraging data science to combat COVID-19: a comprehensive review Deepr: a convolutional net for medical records Risk prediction with electronic health records: a deep learning approach Feng e M (2020) Reinforcement learning for clinical decision support in critical care: comprehensive review A web visualization tool using T cell subsets as the predictor to evaluate COVID-19 patient's severity Prognostic modeling of COVID-19 using artificial intelligence in the United Kingdom: model development and validation Combined use of the neutrophil-to-lymphocyte ratio and CRP to predict 7-day disease severity in 84 hospitalized patients with COVID-19 pneumonia: a retrospective cohort study A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis Machine learning-based CT radiomics model for predicting hospital stay in patients with pneumonia associated with SARS-CoV-2 Infection: a multicenter study Role of machine learning techniques to tackle the COVID-19 crisis: systematic review Clinical management of severe acute respiratory infection when novel coronavirus (2019-nCov) infection is suspected: interim guidance The methodology of a "living" COVID-19 registry development in a clinical context Arcing classifier Random forests Neocognitron: a hierarchical neural network capable of visual pattern recognition Qureshi e AS (2020) A survey of the recent architectures of deep convolutional neural networks One-dimensional convolutional neural networks for spectroscopic signal regression: feature extraction based on 1D-CNN is proposed and validated Speech emotion recognition using deep 1D & 2D CNN LSTM networks On meta-learning for dynamic ensemble selection Cumulative Illness Rating Scale Predicting post COVID-19 rehabilitation duration with linear kernel SVR Data-driven prediction of decannulation probability and timing in patients with severe acquired brain injuries Machine learning for clinical outcome prediction Machine-learning approaches in COVID-19 survival analysis and dischargetime likelihood prediction using clinical data A machine learning algorithm predicts duration of hospitalization in COVID-19 patients Length of stay prediction for Northern Italy COVID-19 patients based on lab tests and X-ray data Length of hospital stay prediction at the admission stage for cardiology patients using artificial neural network Predicting hospital length of stay for accident and emergency admissions Control, he works on cerebellar networks and ML for clinical outcome prediction 36. SC is a PhD student at Fondaz. Don Gnocchi and Scuola Sant'Anna. She had her bachelor in Mechatronics Eng. and master in Bionics Eng Clinical Trials Unit coordinator at IRCCS Fondaz. Don Gnocchi Florence, she monitors experimental protocols in central-southern area Coordinator of Cochrane Rehabilitation and of the Clinical Trials Unit at IRCCS Fondazione Don Gnocchi-North Area. She works on CTs methodology in rehabilitation research 39. MP is a researcher at Fondazione Don Carlo Gnocchi and a General Practitioner. Since 2018, he has been part of Cochrane Rehabilitation Headquarters where he followed several projects 40. MCC is Prof. of Industrial Bioeng. at Scuola Sant'Anna coordinating the NeuroRobotics Area and President of CNR AM is Research Engineer at IRCCS Fondaz. Don Gnocchi and affiliate with Scuola Sant'Anna. His interests cover machine learning methods for signal processing and clinical outcome prediction Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations