key: cord-0025770-7zh1k3hj
authors: Shaka, Hafeez; Edigin, Ehizogie
title: A Revised Comorbidity Model for Administrative Databases Using Clinical Classifications Software Refined Variables
date: 2021-12-14
journal: nan
DOI: 10.7759/cureus.20407
sha: 1f048e6300737252e4b08b7cfac6cf5053fcf301
doc_id: 25770
cord_uid: 7zh1k3hj

Background and objective Database research has shaped policies, identified trends, and informed healthcare guidelines for numerous disease conditions. However, despite their abundant uses and vast potential, administrative databases have several limitations. Adjusting outcomes for comorbidities is often needed during database analysis as a means of overcoming non-randomization. We sought to obtain a model for comorbidity adjustment based on Clinical Classifications Software Refined (CCSR) variables and compare this with current models. Our aim was to provide a simplified, adaptable, and accurate measure for comorbidities in the Agency for Healthcare Research and Quality (AHRQ) databases, in order to strengthen the validity of outcomes. Methods The Nationwide Inpatient Sample (NIS) database for 2018 was the data source. We obtained the mortality rate among all included hospitalizations in the dataset. A model based on CCSR categories was mapped from disease groups in Sundararajan's adaptation of the modified Deyo’s Charlson Comorbidity Index (CCI). We employed logistic regression analysis to obtain the final model using CCSR variables as binary variables. We tested the final model on the 10 most common reasons for hospitalizations. Results The model had a higher area under the curve (AUC) compared to the three modalities of the CCI studied in all the categories. Also, the model had a higher AUC compared to the Elixhauser model in 8/10 categories. However, the model did not have a higher AUC compared to a model made from stepwise backward regression analysis of the original 21-variable model. Conclusion We developed a 15-CCSR-variable model that showed good discrimination for inpatient mortality compared to prior models.

Database research has been instrumental in shaping policies, identifying trends, and informing healthcare guidelines for numerous disease conditions [1] [2] [3] [4] [5] [6] . A majority of databases, including the Agency for Healthcare Research and Quality (AHRQ) databases, are coded using International Classification of Diseases (ICD) codes. Despite their abundant uses and vast potential, administrative databases have several limitations related to coding, missing data, inadequate classification, among others. The difficulty in clinical translation of findings from retrospective and non-randomized databases is a unique challenge facing these databases [7] [8] [9] [10] .

Adjusting outcomes for comorbidities is often needed during database analysis as a means of overcoming non-randomization. Various approaches have been employed in the literature to this end, including the use of index scoring or individual comorbidities [11, 12] . The Charlson Comorbidity Index (CCI) and the Elixhauser Comorbidity Index (ECI) are the most common indices used as comorbidity measures in administrative databases [13, 14] . These have undergone various modifications and adaptations to suit the changing ICD iterations and specific medical conditions [15] [16] [17] . Researchers have had to develop individual comorbidity adjustment methods, which makes reproducibility very challenging. This is often due to nonuniformity in diagnostic codes attributed to comorbidities. The extent to which the CCI and ECI models adjust for individual conditions is the subject of ongoing debate, as there has been substantial improvement in healthcare since they were initially modeled. For example, the CCI attributes a six-fold increase in mortality for a patient with HIV infection compared to heart failure. The current version of the ECI contains 39 variables and requires specialized software to analyze. A large number of variables means analyses are subject to overfitting.

The latest AHRQ databases incorporate the Clinical Classifications Software Refined (CCSR) categories into datasets. This aids in the standardized mapping of diseases into clinically relevant categories. In this study, we sought to obtain a model for comorbidity adjustment based on the incorporated CCSR variables and compare this with current models. Our objective was to provide a simplified, adaptable, and accurate measure for comorbidities in AHRQ databases, which would strengthen the validity of outcomes.

The Nationwide Inpatient Sample (NIS) database for 2018 was the data source. The NIS is developed by the Healthcare Cost and Utilization Project (HCUP), a federal-state-industry partnership sponsored by the AHRQ. It is a registry of hospital inpatient stays derived from billing data submitted by hospitals to statewide data organizations across the US, covering more than 97% of the US population [18] . The 2018 database was coded using the ICD, Tenth Revision, Clinical Modification/Procedure Coding System (ICD-10-CM/PCS). In the NIS, diagnoses are divided into principal diagnosis and secondary diagnosis. A principal diagnosis was the main ICD-10 code for hospitalization. Secondary diagnoses were any ICD-10 code other than the principal diagnosis. Since 2018, HCUP databases have included the Diagnosis and Procedure Groups (DPG) file, and this includes data elements derived from the CCSR for ICD-10-CM [18] . The CCSR for ICD-10-CM diagnoses aggregates more than 70,000 ICD-10-CM diagnosis codes into over 530 clinically meaningful categories. The CCSR for ICD-10-CM diagnosis provides a means by which to identify specific clinical conditions using ICD-10-CM diagnosis codes [19] . The 2018 database contains over seven million unweighted hospitalization stays. We excluded hospitalizations involving patients aged less than 18 years and those with missing values for age, sex, and disposition.

We obtained the mortality rate of all included hospitalizations in the dataset. A model based on CCSR categories present in the DPG file was mapped from disease groups in the Sundararajan's adaptation of the modified Deyo's CCI [15] . We included smoking history, obesity, malnutrition, and anemia as variables that have impacted mortality in prior HCUP studies [20] [21] [22] . Mortality is a common outcome of administrative database analysis, which has demonstrated high reliability in coding [23] . Table 1 shows the 21 CCSR variables included in the initial model. The variables were coded as binary parameters among the hospitalizations. Each data element DXCCSR_AAAnnn identifies whether the CCSR category was triggered by a diagnosis code on the record. The value of AAA indicates the body system. The value of nnn indicates the specific category within the body system. For each CCSR variable included, a recorded value of 3 means the CCSR was triggered by only secondary diagnosis code(s) on the input record [24] . This was used to determine the comorbidity burden of hospitalizations. The exact ICD-10 mapping of these categories is also provided by the HCUP to ensure uniformity during data analysis. 

We employed logistic regression analysis to obtain the final model using CCSR variables as binary variables.

Since the dataset has over six million hospitalizations, we bootstrapped 100 replications of a 5% sample for mortality, as employed by Moore et al. [25] , while employing stepwise backward regression. This was done to avoid overpowering and avoid variables attaining statistical significance while only marginally changing the outcome. We subsequently included variables with p-values <0.01 in the final model. We tested for collinearity among the included variables using the variance-covariance matrix estimation to obtain covariates. We tested the predictive power of the model using the c-statistic, expressed as area under the curve (AUC).

We tested the final model on the 10 most common reasons for hospitalizations as analyzed by Moore et al. [25] . Diagnoses of hospitalizations were mapped using CCSR codes for any hospitalization with a principal diagnosis of the conditions. For each CCSR-mapped principal diagnosis, we compared the c-statistics of the final 15-factor model, against individual stepwise backward regression using the initial 21 CCSR variables, individual CCI weights, total CCI, grouped CCI, and the Elixhauser model for mortality. The CCI was grouped into 0, 1, 2, and ≥3. The c-statistic for the final model and a model with grouped age was compared to that of the Elixhauser model. The backward stepwise selection involved removing terms with p≥0.2 and adding those with p<0.1. All analyses were performed using the unweighted dataset.

The NIS database lacks patient-level identifiers. Hence, this study did not require any institutional review board approval.

The NIS is a large, publicly available, all-payer inpatient care database in the US, containing data on more than seven million hospital stays yearly. Its large sample size makes it ideal for developing national and regional estimates and enables analyses of rare conditions, uncommon treatments, and special populations. 

Our study demonstrated that the 15-CCSR-variable model for comorbidity adjustment is superior to the current CCI-based models and outperforms the ECI in a majority of the conditions analyzed while being simpler to implement. The ease of reproducibility is another advantage of our model. However, we noted significant variability in the model validation between the individual conditions ranging from a c-statistic of 0.560 for congestive heart failure to 0.759 for acute and unspecified renal failure. This translates into fair to very good discrimination as predictive models.

We also discovered that employing a stepwise backward regression to the original 21-CCSR-variable model for the individual conditions was superior to the 15-CCSR-variable model. This allows for individual weighting of comorbidities for a particular condition. Although research by Austin et al. [26] suggests that indexing works, disease-specific models continue to demonstrate superior discrimination as predictive models. Our study again demonstrated this with the relatively poorer performance of the CCI total or CCI grouped, compared to CCI weights. All the CCI models were mostly less discriminant than the CCSR models. This is likely due to the outdated weighting and inclusion of variables that do not have the same impact on mortality as they once did. For instance, the advent of antiretroviral therapy has revolutionized HIV management and the incidence of AIDS.

We noticed that the aOR for mortality for individual comorbidities varied from one condition to another. The stepwise backward regression also excluded different comorbidities while analyzing different conditions. Hence, a model that provides weights to comorbid variables would not adequately account for this variation. Consequently, the 15-CCSR variable model was retained as individual variables and not converted into a weighted index. To our knowledge, this is the first study modeling comorbidity-adjustment based on CCSR variables, which are newly included in HCUP databases.

The addition of biodemographic data such as age, sex, race, household income, primary payer, and hospital characteristics such as hospital location and size is expected to improve the CCSR-based model as observed in prior studies [14] [15] [16] .

Our study has some limitations. Primarily, some CCSR variables that may impact the primary outcome may have been left out of the initial 21-variable model as the literature review to identify variables that impact inpatient mortality was not exhaustive. Another limitation is that the study retains limitations of administrative databases, such as non-randomization, under-coding, and poor classification of disease severity, which may affect mortality. The identification of comorbidities was done without the use of admission indicators to separate comorbid conditions from complications of care that develop during the hospital stay. The ECI model used for comparison was adopted from a study done using a 2011 database, which would likely have different patient characteristics. A 15-variable model could still be subject to overfitting in conditions with small population size, or with very low inpatient mortality, compared to an indexed model.

Administrative databases continue to be an important part of healthcare research. The inclusion of CCSR variables in AHRQ databases provides an opportunity to develop a standardized and reproducible measure of comorbidity for various disease conditions. We developed a 15-CCSR-variable model that showed good discrimination for inpatient mortality compared to prior models. However, a disease-specific model continues to demonstrate superiority in outcomes-based research.

Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.

In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.

AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines and the Heart Rhythm Society

Publicly available data: crowd sourcing to identify and reduce disparities

Predictors and costs of 30-day readmissions after index hospitalizations for alcohol-related disorders in U.S. adults

Opioid overdose hospitalization trajectories in States with and without opioid-dosing guidelines

The epidemiology of inpatient pediatric trauma in United States hospitals 2000 to 2011

Readmission rates for chronic obstructive pulmonary disease under the hospital readmissions reduction program: an interrupted time series analysis

With great power comes great responsibility: big data research from the National Inpatient Sample

Adherence to methodological standards in research using the National Inpatient Sample

Most orthopaedic studies using the National Inpatient Sample fail to adhere to recommended research practices: a systematic review

Opportunities and limitations of risk adjustment of quality indicators based on inpatient administrative health data -a workshop report (Article in German)

A comparison of Charlson and Elixhauser comorbidity measures to predict colorectal cancer survival using administrative health data

Performance of comorbidity measures for predicting outcomes in population-based osteoporosis cohorts

A new method of classifying prognostic comorbidity in longitudinal studies: development and validation

Comorbidity measures for use with administrative data . Med Care

Cross-national comparative performance of three versions of the ICD-10 Charlson index. Med Care

Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases

The update on instruments used for evaluation of comorbidities in total hip arthroplasty

Introduction to the HCUP National Inpatient Sample (NIS)

National (Nationwide) Inpatient Sample database documentation

HCUP Clinical Classifications Software Refined (CCSR) for ICD-10-PCS procedures

Predicting COVID-19 using retrospective data: impact of obesity on outcomes of adult patients with viral pneumonia

In-patient outcomes of patients with diabetic ketoacidosis and concurrent protein energy malnutrition: a national database study from 2016 to 2017

Rate and predictors of 30-day readmission following diabetic ketoacidosis in type 1 diabetes mellitus: a US analysis

Agency for Healthcare Research and Quality. Inpatient quality indicators overview

HCUP Clinical Classifications Software Refined (CCSR) for ICD-10-PCS procedures, v2021.1. Healthcare Cost and Utilization Project (HCUP)

Identifying increased risk of readmission and inhospital mortality using hospital administrative data: the AHRQ Elixhauser Comorbidity Index

Why summary comorbidity measures such as the Charlson Comorbidity Index and Elixhauser score work