key: cord-0955568-3y88tl92 authors: Sangla, F.; Marchi, E.; Assouline, B.; Le Terrier, C.; Sgardello, S.; Pugin, J.; Criton, G.; Legouis, D. title: Unsupervised clustering reveals phenotypes of AKI in ICU Covid19 patients date: 2022-03-13 journal: nan DOI: 10.1101/2022.03.11.22272259 sha: ae9671e94c1a4921dbe1eb92a371cab3fc6b4f3e doc_id: 955568 cord_uid: 3y88tl92 Background: Acute Kidney Injury (AKI) is a very frequent condition, occurring in about one in three patients admitted to an intensive care unit (ICU). AKI is a syndrome defined as a sudden decrease in glomerular filtration rate. However, this unified definition does not reflect the various mechanisms involved in AKI pathophysiology, each with its own characteristics and sensitivity to therapy. In this study, we aimed at developing an innovative machine learning based method able to subphenotype AKI according to its pattern of risk factors. Methods: We adopted a three-step pipeline of analyses. Firstly, we looked for factors associated with AKI using a generalized additive model. Secondly, we calculated the importance of each identified AKI related factor in the estimated AKI risk to find the main risk factor for AKI, at the single patient level. Lastly, we clusterized AKI patients according to their profile of risk factors and compared the clinical characteristics and outcome of every cluster. We applied this method to a cohort of severe Covid19 patients hospitalized in the ICU of Geneva University Hospitals. Results: Among the 250 patients analyzed, we found ten factors associated with AKI development. Using the individual expression of these factors, we identified three groups of AKI patients, based on the use of Lopinavir/Ritonavir, a prior history of diabetes mellitus and baseline eGFR and ventilation. The three clusters expressed distinct characteristic in terms of AKI severity and recovery, metabolic patterns and ICU mortality. Conclusion: We propose here a new method to phenotype AKI patients according to their most important individual risk factors for AKI development. When applied to an ICU cohort of Covid19 patients, we were able to differentiate three groups of patients. Each expressed specific AKI characteristics and outcomes, which probably reflects a distinct pathophysiology. Acute Kidney Injury (AKI) is a common condition in the critical care setting [1, 2] . Despite decades of research, AKI is still associated with high mortality and morbidity, even when renal function is substituted by Renal Replacement Therapy (RRT) [3] [4] [5] [6] . AKI is defined as a sudden decrease in glomerular filtration rate, demonstrated by an increase in serum creatinine [7] . This unified definition has resulted in an improved recognition of AKI and has simplified research, healthcare management as well as comparisons across cohorts and different centers. However, AKI is not a single clinical entity but an overarching clinical syndrome. Therefore, the definition of AKI encompasses many underlying conditions and etiologies. Additionally, the high degree of heterogeneity of the Intensive Care Unit (ICU) population including patients with different risk profile adds further complexity when considering AKI outcomes [8] . In this respect, recognizing meaningful subgroups of AKI patients may provide a deeper insight in AKI pathophysiology and may also be helpful in identifying groups with differing prognoses and sensitivity to therapy [9] . From a data-driven perspective, patient sub-phenotyping is essentially a clustering problem [10, 11] . Clustering algorithms are a type of unsupervised machine learning algorithm where no labels are known a priori but get assigned based on inherent similarities between points. A critical step in clusterisation is the representation of the data i.e., the construction of the dataset on which we want to apply clusterisation. Previous studies on AKI sub-phenotyping have defined patients according to diagnostic codes [12] , trajectories of serum creatinine [13] , patterns of AKI reversal [14] or clinical and biological data recorded at ICU admission [15] or during AKI [16, 16, 17] . However, those strategies do not allow for the formulation of any hypothesis based on the pathophysiological mechanisms involved in different AKI phenotypes. In addition, the high number of features used to classify patients makes it difficult to recognize them at the bedside, in current practice. In this study, we aimed at developing an innovative pipeline of analyses in order to identify in an unsupervised manner, distinct phenotypes of AKI in ICU Covid19 patients, based on their pattern of AKI associated factors. (TGO), troponin levels, serum creatinine and eGFR), severity scores (APACHE, SAPS, SOFA) and the FiO2. Once patients were intubated, we recorded the initial respiratory parameters (PaO2/FiO2 ratio, PEEP and plateau pressure levels, compliance, tidal volume, duration from symptoms or hospitalisation to intubation, respiratory rate before intubation). Finally, we screened the following variables for the entire ICU stay: the need for invasive mechanical ventilation, Neuro Muscular Blocking Agents (NMBA), Extra Corporeal Membrane Oxygenation (ECMO), norepinephrine, antibiotics and their total duration, the need for prone positioning and the number of prone sessions, the use of Lopinavir/Ritonavir (LPV/r), hydroxychloroquine, azithromycin, remdesivir, anakinra, dexamethasone and inhaled nitric oxide. At the renal level, we collected all the serum creatinine recorded during the hospital stay, as well as the need for renal replacement therapy. We also recorded the time between symptoms and admission to hospital, ICU and intubation, the duration between hospital and ICU admission and intubation. Glucose and lactate levels measured during the ICU stay were also collected. Baseline characteristics were expressed as mean (standard deviation) and median (25-75 th percentiles) or absolute and relative (%) frequency if categorical. They were compared using a Mann Whitney or Chi-square tests depending on their class. A p-value of less than 0.05 was considered significant Data pre-processing Numerical data were centred, scaled and underwent Yeo-Johnson transformation. Missing data were further imputed using bagged tree imputation [18] . This step was completed using the caret package. To identify factors associated with AKI development, we first fitted for each recorded variable, a univariable logistic regression modelling the logit of AKI. Variables displaying a p-value below 0.2 were first considered for the multivariable analyses. This was conducted using a generalized additive model to allow nonlinear relationships that were fitted using thin plate regression splines from the mgcv package. Variable selection was further performed using a supervised stepwise approach, as previously described, to only keep predictors with a p-value lower than 0.05 [19, 20] . Validation of the nonlinear fitting was achieved by building a second generalized additive model. Instead of regression splines, local regression was used by locally estimating scatterplot smoothing curve fitting, as supported by the gam package,. The two nonlinear fits were further visually compared by displaying the partial dependence plots of each model. Discrimination and calibration of the final model were visually assessed through the receiver operating characteristic (ROC) curve and a calibration plot and numerically by the calculation of the area under the ROC curve and the Hosmer-Lemeshow test. To validate the supervised variable selection performed in the generalized additive model, we used an unsupervised approach using 4 machine learning methods that integrate native automated feature selection: multivariate adaptative regression spline (MARS), multistep adaptative MCPnet, lasso regression and regularized random forest (RRF). These four algorithms were applied on the whole dataset, tuning the hyperparameters by using 5 repetitions of 10 cross-validations. The out-of-bag area under the ROC curve was recorded at each time. Shapley Additive Explanation values were calculated using the shapr package using an empirical approach. A Matrix of SHAP values was used as an input for Uniform Manifold Approximation and Projection (UMAP), using a Euclidean metric, a minimal distance of 0.1 and 15 neighbors. Patients projected on this UMAP were further clusterized using an unsupervised method, the Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm, through the dbscan package. The radius of the epsilon neighborhood was set to 1. The resulting clusterisation was further validated by linear support vector machines (SVM), applied to each cluster against the others. For this purpose, the SHAP matrix was first split in a train and a test dataset using a 0.8:0.2 ratio. Five SVM were first trained on test dataset, in order to tune their hyperparameters to maximize the area under the ROC curve. 3 repetitions of 10 cross-validations were used, with each SVM predicting one cluster against the 4 others. The optimal SVM models were further applied on the 2000-fold bootstrapped test datasets. For each patient, we calculated the relative time spent in one of the five metabolic patterns previously described [21, 22] , i.e. the total duration spent in each of the five profiles divided by the total duration of ICU stay. For each of the three clusters, we used a Naïve Bayes algorithm to calculate the a posteriori probability of each outcome (i.e. ICU mortality, RRT weaning, renal recovery and relative time spent in each metabolic pattern) within each cluster. We used resampling by bootstrap (n=2000) to estimate confidence intervals and p-values. From March to December 2020, 256 Covid-19 patients were admitted to ICU of the Geneva University Hospitals. Among them, 6 were not included for the following reasons: one patient developed AKI prior ICU admission and 5 patients were on chronic dialysis. A total of 250 patients were analyzed, of which 104 (42%) experienced AKI during their ICU stay. Most of them developed KDIGO1 AKI (68%) while 14 (13%) received Renal Replacement Therapy (RRT). Compared to those who did not develop AKI, AKI patients more frequently reported a history of diabetes, hypertension and hypercholesterolemia. They had a lower estimated Glomerular Filtration Rate (eGFR) at hospital entry, were older and mostly male. Furthermore, they had higher APACHE and SOFA scores as well as troponin, C reactive protein and procalcitonin levels but lower lactate levels at ICU admission. During their ICU stay, AKI patients were more likely to receive norepinephrine, Lopinavir/Ritonavir (LPV/r), hydroxychloroquine, azithromycin and neuromuscular blocking agents (NMBA), but not dexamethasone. Finally, AKI patients more frequently required invasive mechanical ventilation and prone positioning, received higher tidal volumes, spent more time on mechanical ventilation and had longer ICU and hospital lengths of stay. Time between onset of symptoms and intubation was longer. However, mortality was not different between AKI and non-AKI patients. Table 1 shows all the compared characteristics between these two groups. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in To identify subgroups of AKI patients, we based our approach on unsupervised clustering. However, unlike in previous studies, we did not apply a clustering algorithm on the raw dataset but rather designed a three-step pipeline of analyses. Firstly, we built a nonlinear statistical model to identify factors significantly associated with AKI development in ICU patients and calculated the importance of each predictor for AKI risk at a single patient level. Secondly, we reduced the dimension of this matrix and applied an unsupervised clustering algorithm in order to define AKI patient subgroups. Thirdly, we compared the clinical outcomes among those clusters of AKI patients (Figure 1) . These three steps are detailed in the following paragraphs. We first aimed at identifying factors associated with AKI development in Covid-19 patients admitted to the ICU. We began by preprocessing the data by following three steps. Firstly, numerical variables were centered, scaled and normalized through a Yeo-Johnson transformation, because independent variables were on very different scales and also to enhance variable selection robustness [23] . Additional File 1 shows the distribution of the numerical variables before and after treatment. Secondly, we imputed missing data using bagged tree imputation [18] to improve accuracy of downstream [24] . Missing data and their distribution for each variable before and after the imputation are presented in Additional File 2. Thirdly, we calculated a correlation matrix to identify colinear variables, and removed or merged those with a correlation coefficient above 0.8 (Additional File 3). Using these pre-processed data, we fitted the relation between AKI and each potential predictor in a univariable fashion, using a logistic regression. As nonlinear relations between AKI and risk factors have often been reported [25] [26] [27] [28] [29] [30] we also fitted a logistic regression using natural restricted cubic splines with two degrees of freedom. We thus performed multivariable analyses using a generalized additive model with a logit link function to allow nonlinear modelling via regression splines. Variables displaying a p-value for univariable association with AKI below 0.2 were primary picked out and feature selection was further achieved using a supervised stepwise approach as previously described [19, 20] . The final model identified 10 variables, which were significantly associated with AKI development in the ICU (Additional Table 1 ): use of LPV/r, NMBA and norepinephrine as well as diabetes mellitus and PCT levels were all positively associated with AKI while administration of dexamethasone was protective. Time between symptom onset and orotracheal intubation, eGFR at hospital entrance, tidal volume and FiO2 at ICU admission displayed a nonlinear association with AKI. To estimate the relative contribution of each factor to the predicted probability of AKI, we calculated the Shapley Additive Explanation (SHAP) values. SHAP values represent a feature's role in changing the model output. In our study, 10 SHAP values were calculated per patient (negative or positive), one for each factor. The sum of each patient SHAP values refers to the predicted AKI probability of this patient. Figure 1a displays the SHAP value (x-axis) for each predictor and each patient, while the color of the dot refers to the original value taken by the variable for each patient being considered. Using this strategy, we were able to classify each predictor according to their importance in predicting AKI. Seeing as the relationship between AKI probability and numerical variables were nonlinear, their marginal effect was shown in Figure 1b . Altogether, the final generalized additive model was a discriminant in predicting an AKI ROC curve equal to 0.88 (95% confidence interval [0.84-0.92]), which was well calibrated (p-value of the Hosmer-Lemeshow test equal to 0.97), Figure 1c . In order to validate the non-linear relationship between the risk of AKI and baseline eGFR, tidal volume, FiO2 at ICU admission and time between symptom onset and oro-tracheal intubation, we first built a second generalized additive model using a local regression by locally estimated scatterplot smoothing curve fitting instead of regression splines. The resulting partial dependence plots (Additional File 4a), showing the marginal effect of each predictor on the risk of AKI, displayed a similar shape to those obtained using regression splines. Secondly, to ensure the robustness of the variable selection, we applied machine learning (ML) algorithms including native feature selection to predict AKI. We then ran multivariate adaptive regression spline (MARS), multistep adaptative MCPnet, LASSO regression and regularized random forest (RRF) on the entire dataset. A hyperparameter grid was used to tune each model whose performance was iteratively assessed by the area under the ROC curve through a repeated cross-validation procedure. The optimal model was that which maximized the area under the ROC curve. Additional File 4b shows the distribution of the out-of-bag area under the ROC curve metric for each predictive model, ranging from 0.73±0.1 to 0.77±0.1 for MCPnet and LASSO models respectively, without significant differences All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; among models. The features selected by each ML algorithm in order of importance in AKI prediction are displayed in Additional File 4c. Use of dexamethasone, LPV/r, norepinephrine, eGFR at ICU admission and prior history of diabetes were chosen for every method, while tidal volume, duration between symptom onset and intubation, PCT level, FiO2 at admission and BMI variables were only captured by a nonlinear method (RRF). Altogether, this sensitivity analysis strengthens both the use of nonlinear fitting between numerical predictors and risk of AKI, as well as the choice of the predictors. In this second part, we aimed at defining clusters of patients, according to the pattern of risk factors expressed by each patient. To achieve this, we first selected AKI patients and started by reducing the dimension of the matrix of SHAP values previously calculated (104 patients x 10 predictors), using the Uniform Manifold Approximation and Projection (UMAP) method. This preliminary step has been shown to improve downstream clusterisation [31] . This two-dimension projection was finally clusterized by the unsupervised Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm. Among the 104 AKI patients, we were able to identify three clusters, each of them expressing a specific pattern of AKI-related factors (Figure 2a) . For each of them, we characterized their pattern of AKI related factors using the SHAP values. Figure 2b shows the predictors in order of importance for each cluster. Cluster 1 was characterized by AKI associated with the use of LPV/r; cluster 2 involves diabetic patients that did not receive dexamethasone; cluster 3 includes patients with low baseline eGFR, long time between symptom onset and orotracheal intubation, high tidal volume and use of norepinephrine. To validate this clusterisation, we defined three subspaces involving two clusters, the cluster of interest and all the others merged. We further applied, on those three subspaces, a support vector machine (SVM) algorithm, and assessed its ability to separate the cluster of interest from the others by a hyperplane. For this purpose, we first randomly split the original cohort into a train and a test dataset, in a 0.8:0.2 ratio. The model was evaluated via a repeated k-fold cross-validation, to find the optimal hyperparameters which maximized the area under the ROC curve. The best model was finally applied on both the train and the test datasets and standard deviations were estimated by bootstrapping. SVM was able to separate each All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; of the three clusters form the others with areas under ROC curves in the test dataset equal to 1.0±0 for each cluster. Subsequently, we aimed at comparing these three clusters from a clinical perspective. To achieve this objective, we used naïve Bayes classifier in the five subspaces described in the previous paragraph to calculate the a posteriori probability of clinical endpoints, that were bootstrapped 2000 times. As outcomes, we investigated AKI severity and recovery, ICU mortality and relative time spent into one of the following metabolic profile: baseline (normal glucose/normal lactate levels), isolated hyperglycaemia (high glucose but normal lactate levels), isolated hypoglycaemia (low glucose but normal lactate levels), stress response (high glucose/high lactate) and impaired metabolism (low to normal glucose level with high lactate level), as previously described [21] . These five metabolic profiles are shown in Additional File 5. Patients from clusters 1 and 2 developed more severe AKI than patients from cluster 3 (34% versus 7% [0-12] of KDIGO3 AKI, p=0.02) and more frequently received RRT (25% versus 4% Patients from clusters 1 and 2 also showed a distinct metabolic profile compared to cluster 3, expressing the impaired metabolism profile at a lower rate (22% [19] [20] [21] [22] [23] [24] versus 29% [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] p=0.005, Figure 3d ). Among clusters 1 and 2, the expression of isolated hyperglycaemia was also higher in cluster 2 (13% [10] [11] [12] [13] [14] [15] [16] versus 25% , p=0.009, Figure 3d ). Finally, mortality varied between the 3 clusters (18% , 30% and 40% respectively, All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; Altogether, this analytic procedure allowed us to identify 3 clusters of AKI patients, each of them expressing a specific pattern of factors associated with AKI. These patients also displayed different clinical characteristics, including different AKI severity, mortality and recovery. The current definition of AKI is limited as it provides no information on AKI etiology, prognosis, molecular pathways, or responses to treatment [32] . Here we identified phenotypes of AKI patients based on their pattern of AKI associated factors, with distinct characteristics and outcomes. We first identified factors associated with AKI development. When considering Covid19 specific therapy, we found LPV/r and dexamethasone to be respectively positively and negatively related to AKI development, in accordance with other groups [33] [34] [35] [36] [37] . We also reported well described AKI risk factors, such as diabetes mellitus, use of norepinephrine and baseline eGFR [38, 39] . Interestingly, the relation between AKI risk and the baseline eGFR was nonlinear, where higher eGFRs were associated with a reduced probability of AKI up to a threshold beyond which the risk increased. We also found a linear relation between AKI development and procalcitonin (PCT) levels at ICU admission. Several studies have reported an increased PCT level in patients with reduced GFR [40] [41] [42] suggesting that PCT may be partially removed via the kidneys. More recently, PCT has also been shown to predict AKI [43] [44] [45] . Finally, we identified four factors related to mechanical ventilation, the time between symptoms onset and intubation, the tidal volume, the FiO2 at ICU admission and the use of NMBA. While very few studies have assessed the impact of these factors on AKI development, several of them confirm the association between mechanical ventilation requirement and AKI occurrence in Covid-19 patients[46,47]. In our cohort of AKI Covid19 patients, our pipeline was able to identify three clusters of patients. At the renal level, while all patients met the criteria for AKI, each cluster display a distinct phenotype in terms of severity, ICU recovery and mid-term recovery. In particular, cluster 1 involving patients receiving LPV/r was characterized by severe AKI with 24% of patients requiring renal replacement therapy. At ICU discharge, only 50% of them had recovered their renal function but, paradoxically, they experienced better mid-term recovery and the lowest ICU mortality rate. At the metabolic level, we found that patients All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; https://doi.org/10.1101/2022.03.11.22272259 doi: medRxiv preprint from cluster 3 had the highest ICU mortality, and were mostly in the impaired metabolism profile, as previously shown by our group [21, 22] . Similarly, diabetic patients in cluster 2 displayed a higher rate of the isolated hyperglycemia pattern. Altogether, these three phenotypes may reflect distinct pathophysiological mechanisms of AKI development. Beyond these results, this study introduces a pipeline of analyses, which are able to phenotype AKI patients according to their pattern of risk factors, with several innovative features. Firstly, while most of the studies identified AKI risk factors through logistic regression[47,48], we used a generalized additive model with regression splines to capture nonlinear associations between AKI and potential risk factors. This method allowed us to identify factors that would have remained otherwise unnoticed with the traditional approach (i.e. baseline eGFR, tidal volume, time between symptoms onset and intubation and FiO2 at ICU admission). Furthermore, the absolute importance of each risk factor in estimating the probability of AKI we calculated for each patient. We thus obtained a pattern of risk factors for each patient that may reflect a specific pathophysiological mechanism. Existing studies on AKI phenotyping have either used supervised clustering, mostly on clinical traits [13, 14] , or unsupervised clustering based on recorded clinical or biological data [15] [16] [17] . Finally, we did not apply the clustering algorithm on the raw dataset as did other groups [15] [16] [17] ., but rather on a dimensionally reduced space; a strategy that has been shown to improve the clustering performance [31] . Our study has some limitations. Firstly, the study was single-centred which limits the extent of our results. Secondly, being a retrospective study, procedures and therapeutic strategies may have changed during the study period. Nonetheless, we feel we have provided a generalizable pipeline that may be applied to various datasets to identify patients with different outcomes and therapeutic sensitivity. We have developed a new pipeline of analyses in order to identify the phenotype of AKI patients based on their pattern of AKI risk factors when applying this method to a Covid19 ICU patient dataset. We identified 3 patient subgroups with distinct renal features and outcomes that may be related to specific pathophysiological mechanisms. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The study was approved by the local ethical committee for human studies of Geneva, Switzerland (CCER 2020-00917, Commission Cantonale d'Ethique de la Recherche) and performed according to the Declaration of Helsinki principles. The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. The authors declare that they have no competing interests perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in predicted probability observed probability All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; https://doi.org/10.1101/2022.03.11.22272259 doi: medRxiv preprint perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; https://doi.org/10.1101/2022.03.11.22272259 doi: medRxiv preprint perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; https://doi.org/10.1101/2022.03.11.22272259 doi: medRxiv preprint Additional perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted March 13, 2022. ; https://doi.org/10.1101/2022.03.11.22272259 doi: medRxiv preprint Epidemiology of acute kidney injury in critically ill patients: the multinational AKI-EPI study Incidence, risk factors and 90-day mortality of patients with acute kidney injury in Finnish intensive care units: the FINNAKI study Initiation Strategies for Renal-Replacement Therapy in the Intensive Care Unit Comparison of two delayed strategies for renal replacement therapy initiation for severe acute kidney injury (AKIKI 2): a multicentre, open-label, randomised, controlled trial. The Lancet Timing of Initiation of Renal-Replacement Therapy in Acute Kidney Injury Timing of Renal-Replacement Therapy in Patients with Acute Kidney Injury and Sepsis KDIGO Clinical Practice Guideline for Acute Kidney Injury The use of clustering algorithms in critical care research to unravel patient heterogeneity Identification of acute kidney injury subphenotypes Clinical criteria for subtyping Parkinson's disease: biomarkers and longitudinal progression Data-driven subtyping of Parkinson's disease using longitudinal clinical records: a cohort study. Scientific reports The Diagnosis-Wide Landscape of Hospital-Acquired AKI Acute kidney injury subphenotypes based on creatinine trajectory identifies patients at increased risk of death Recovery after Acute Kidney Injury Two subphenotypes of septic acute kidney injury are associated with different 90-day mortality and renal recovery Identification of Acute Kidney Injury Subphenotypes with Differing Molecular Signatures and Responses to Vasopressin Therapy Utilization of Deep Learning for Subphenotype Identification in Sepsis-Associated Acute Kidney Injury A Benchmark for Data Imputation Methods Purposeful selection of variables in logistic regression Development of a practical prediction score for chronic kidney disease after cardiac surgery Altered proximal tubular cell glucose metabolism during acute kidney injury is associated with mortality Decreased Renal Gluconeogenesis Is a Hallmark of Chronic Kidney Disease Finding Optimal Normalizing Transformations via bestNormalize Improving Outcome Predictions for Patients Receiving Mechanical Circulatory Support by Optimizing Imputation of Missing Values Anemia Is a Risk Factor for Acute Kidney Injury and Long-Term Mortality in Critically Ill Patients Improved predictive models for acute kidney injury with IDEA: Intraoperative Data Embedded Analytics Development and Validation of a Model for Predicting the Risk of Acute Kidney Injury Associated With Contrast Volume Levels During Percutaneous Coronary Intervention Association of overweight with postoperative acute kidney injury among patients receiving orthotopic liver transplantation: an observational cohort study Impact of admission serum ionized calcium levels on risk of acute kidney injury in hospitalized patients Association Between Base Excess and Mortality Among Patients in ICU With Acute Kidney Injury Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study Imperfect gold standards for kidney injury biomarker evaluation Different incidences of acute kidney injury (AKI) and outcomes in COVID-19 patients with and without non-azithromycin antibiotics: A retrospective study Acute Kidney Injury Associated With Lopinavir/Ritonavir Combined Therapy in Patients With COVID-19 Therapy with lopinavir/ritonavir and hydroxychloroquine is associated with acute kidney injury in COVID-19 patients Characteristics and outcomes of acute respiratory distress syndrome related to COVID-19 in Belgian and French intensive care units according to antiviral strategies: the COVADIS multicentre observational study Impact of dexamethasone use to prevent from severe COVID-19-induced acute kidney injury COVID-19 and the Kidney: A Worrisome Scenario of Acute and Chronic Consequences Latent variable modeling improves AKI risk factor identification and AKI prediction compared to traditional methods Potential use of procalcitonin as biomarker for bacterial sepsis in patients with or without acute kidney injury Procalcitonin in patients with acute and chronic renal insufficiency Clinical relevance of procalcitonin and C-reactive protein as infection markers in renal impairment: a cross-sectional study Association between acute kidney injury Additional File 1 data transformation: distribution of the numerical variables before (a) and after (b) 2 3