INTRODUCTION
The mortality rate is a main key indicator of intensive care unit (ICU) quality. However, patients’ disease severity, comorbidities, and demographics significantly impact mortality [
1]. Therefore, comparing mortality rates among ICUs without considering severity may provide an incorrect assessment of ICU quality.
A variety of outcome prediction scoring systems have been developed to provide an indication of the risk of death of groups of ICU patients. In 1981, the original Acute Physiology and Chronic Health Evaluation (APACHE) model was published [
2], and there have been three subsequent revisions [
3–
5]. The Simplified Acute Physiology Score (SAPS) was devised as a simplification of APACHE II and has been revised twice since then [
6,
7]. Furthermore, because organ dysfunction is associated with high rates of ICU morbidity and mortality, organ failure scores such as the Sequential Organ Failure Assessment (SOFA) score have been developed [
8]. The Mortality Probability Model (MPM) which is another validated ICU mortality prediction model, have been developed and updated [
9,
10].
The severity scores predict in-hospital and ICU mortality based on the severity of the patients’ conditions. These scoring systems can be used in clinical trials for case-mix comparisons and for the assessment and comparison of ICU quality and performance. However, they were not designed for individual prognostication and the scoring system used by each hospital is different. Furthermore, not all hospitals use a prognostic scoring system. Therefore, there are limitations in predicting prognosis using the existing scoring systems.
Despite the availability of public databases such as Health Insurance Review and Assessment Service (HIRA) claims data, the development of a prognostic model for predicting mortality has not been reported in Asia, including South Korea. Therefore, we aimed to develop and validate a novel prognostic model for predicting mortality in Korean ICUs, using national insurance claims data.
METHODS
Study populations and data
This study used the database of 3rd quality evaluation of HIRA in critical care. Data were obtained from the health insurance claims database maintained by the HIRA of South Korea, the sole nationwide governmental agency that operates a fee-for-service reimbursement system. All Koreans are required to subscribe the National Health Insurance, a single medical insurance system. The insurance qualifications, treatment details, and medical institution information are stored in the HIRA database. Since data on ICU administration and discharge dates are not included in the HIRA claims data, it is not possible to create a severity correction model for ICU mortality. However, HIRA periodically evaluates the adequacy of ICUs, which have a high cost of medical care, to check the report data requested by medical institutions.
ICU mortality can be calculated as the ICU adequacy evaluation has data on ICU admission and discharge dates. The evaluation includes all ICUs in Korea. The HIRA third ICU adequacy evaluation data were used to develop a severity correction model for in-hospital and ICU mortality. The third ICU adequacy evaluation was conducted from May to July 2019 for institutions (including all general hospitals and tertiary hospitals) providing inpatient care in the ICU. Patients aged 18 years or older admitted to the ICU were included. Patients who were admitted to the ICU for less than 48 hours or were admitted to neonatal or pediatric ICUs, and burn patients, were excluded. Among the 56,926 patients who underwent the third ICU adequacy evaluation, the following cases were excluded: 11,507 not linked to claim data, 12 with claim data recorded after the date of death, and 2,918 with duplicate claims. The remaining 42,489 patients who were accurately identified and whose data were linked to health insurance claim data and date of death were randomly divided into the derivation and validation cohorts in a ratio of 7:3 (
Fig. 1). In the derivation cohort, a model for calculating the predicted mortality was developed based on the factors influencing death using multiple logistic regression analysis. Using the model developed in the derivation cohort, we analyzed whether it accurately predicted death in the validation cohort. In addition, the model was verified using data from one general and two tertiary hospitals.
The Institutional Review Board of Yonsei University Health System determined that this study qualified for exempt status (IRB permit number: 4-2021-0212). The requirement for informed consent was waived because the study was a retrospective analysis of claims data from the HIRA. All patients in the HIRA dataset were anonymously recorded.
Selection of the variables for prognostic model
From prior research, variables that have been recognized as impacting mortality and are integrated into established prognostic scoring systems, were chosen from health insurance claim data [
3,
6–
8,
11]. The duration of hospitalization, age, sex, and chronic diseases were investigated to identify patient-related demographic factors. Age was divided into eight categories (18–29, 30–39, 40–49, 50–59, 60–69, 70–79, 80–89, and 90–99 yr). To evaluate chronic diseases, variables were generated by calculating the Charlson comorbidity index (CCI) using the claim code (International Classification of Diseases, 10th Revision) for medical use according to the analysis guide provided by the HIRA. The use of ventilators, hemodialysis or continuous renal replacement therapy (CRRT), and vasopressor drugs (norepinephrine, dopamine, or vasopressin) was identified using the claims code. These variables are factors associated with individual patients.
In addition, given the potential impact of a lack of ICU human resources, the presence of a dedicated ICU specialists [
12], and bed-to-nurse grades [
13] were also incorporated into a model as factors for predicting ICU and in-hospital mortality.
Statistical analysis
All categorical data are presented as frequency and proportion. Categorical data were analyzed using the chi-square test. The distribution of variables affecting in-hospital and ICU mortality was investigated in the derivation cohort. Bivariate analysis was performed to test the variable significance for in-hospital and ICU mortality. Based on the analysis results, significant variables were selected. In addition, multiple logistic regression analysis was performed to confirm that these factors affect mortality. The choice of variables was informed by the advisory board and clinical expert hearings.
Calibration was evaluated using the Hosmer–Lemeshow goodness-of-fit test (chi-square H). In this test, a large p value (> 0.05) indicates that the model is performing well, i.e., that there is not large discrepancy between observed and expected mortality. The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate how well the model discriminated between patients who lived and patients who died. AUC (which can range from 0 to 1) greater than 0.7 are generally considered evidence of good model identification. The Youden index was used to determine whether death could be determined according to a cut-off point, which is the optimal criterion. The criteria for mortality were compared using the Youden index to determine sensitivity, specificity, and accuracy using a 2 × 2 contingency table based on the cut-off point. Sensitivity predicts how well 1 predicts when the actual value is 1, specificity predicts how well it predicts 0 when the actual value is 0. Accuracy is a measure of how well it predicts 0 if the observed value is 0 and 1 if the observed value is 1.
Using the developed model, the regression coefficients for each variable can be estimated, and the predicted mortality value of a patient can be calculated. Furthermore, using the model, data on newly admitted patients to the ICU could be used to calculate the predicted mortality through the following steps.
1) Logit g (x) could be calculated.:
β0 is the constant, βixi is the estimated coefficient for the i-th variable; i has a value from 1 to k, and k is a function of the model. Since all variables used in the model are categorical variables, dummy variables were created, and 1 was entered; if each variable was not applicable, 0 was entered. We calculated the logit by multiplying the input value by the corresponding coefficient.
2) The logit value was then converted into a probability (death rate) according to the following equation.
Supplementary Table 1 describes the calculation method for predicting mortality using the model (model 1 in
Supplementary Table 1). The information recorded in the Table is based on one patient randomly selected from the health insurance claim data. The male patient was aged 70–79 years, with a CCI score of ≥ 3. He did not require ventilation, CRRT, or dialysis during hospitalization. This patient’s
g(
x) was calculated as −2.233 and the intra-ICU mortality was calculated using model 1 as 0.097. Since Youden’s index cutoff of ICU model 1 was 0.117, the patient was predicted to be survivor; the actual claim data confirmed the prediction.
To compare the discriminatory power of the developed model with APACHE, SAPS, and SOFA, the tools currently in use, the AUCs were compared using the DeLong’s test.
All statistical analyses were conducted using the SAS Enterprise Guide 7.1 and R version 3.5.1. Two-sided p values < 0.05 were considered statistically significant.
RESULTS
Construction of data for the development of a severity adjustment model
Data from 42,489 cases were randomly divided into the derivation and validation cohorts (7:3). Of the 29,742 patients in the derivation cohort, 3,479 patients had died in the ICU. The baseline demographics of ICU survivors and non-survivors in the derivation cohort are shown in
Table 1. The proportion of male, age more than 70 years, and CCI 3 or above was significantly higher in non-survivors than were in survivors. Ventilator use, CRRT or dialysis, and use of vasopressor drugs were significantly more common in the non-survivor group. In the non-survivor group, the proportion of high bed-to-nurse grade was significantly higher and the proportion of the presence of ICU specialists was lower than in the survivor group. Extracorporeal membrane oxygenation and high-flow nasal cannula use were more common in the non-survivor group; however, these parameters were not included in the model because their application frequency was relatively low, and the data was not available for all ICUs.
Derivation and validation of the severity correction model for prediction of in-hospital and ICU mortality
As described in the methods, age, sex, CCI, ventilator use, hemodialysis or CRRT, vasopressor use were selected to create model 1. Furthermore, the presence or absence of ICU specialists and bed-to-nurse grades were added as variables to create model 2 for predicting in-hospital and ICU mortality more accurately using the billing code.
Multivariate logistic regression analysis was performed on the derivation cohort to determine if these variables were related to the mortality rate. The variables selected for the multivariate analysis were audited by a committee consisting of intensivists comprising nine physicians specializing in critical care medicine.
Finally, two severity correction models were developed in the derivation cohort (models 1 and 2) by applying variables selected after multiple logistic regression analysis and clinical consideration.
Model 1 included six patient-related categorical variables (age, sex, CCI, ventilator use, hemodialysis or CRRT, and vasopressor use). Model 2 included the presence or absence of ICU specialists and nursing grades as correction variables, aiming to improve accuracy when predicting ICU and in-hospital mortality.
The criteria for evaluating whether a model composed of selected variables explains the observed outcome well are discrimination, calibration, and overall model performance.
Tables 2 and
3 show the goodness-of-fit evaluations of the models based on 12,747 patients in the validation cohort, demonstrating the degree of agreement between the observed and predicted mortality based on the model probability. For model 1, the goodness-of-fit for in-hospital and ICU mortality was 20.572 and 34.423, respectively, and the Hosmer–Lemeshow statistic
p value was 0.008 and < 0.01, respectively. The goodness-of-fit for in-hospital and ICU mortality for model 2 was 11.353 and 18.234, respectively, and the Hosmer–Lemeshow statistic
p value was 0.183 and 0.020, respectively (
Table 2,
3). The performance of the models was evaluated using AUCs. For models 1 and 2, the AUCs for the in-hospital mortality rate were 0.802 and 0.811, respectively, confirming that both models had excellent discrimination power of ≥ 0.8. In addition, it was confirmed that the AUCs of models 1 and 2 for ICU mortality were 0.812 and 0.825, respectively, again demonstrating excellent discrimination. Both severity correction models performed better for ICU mortality than for in-hospital mortality. The cutoff value is the result of calculating the optimal decision reference point using the Youden index, and sensitivity, specificity, and accuracy were calculated using the cutoff value. The cutoff values for models 1 and 2, for both in-hospital and ICU mortality, were ≥ 0.7, which indicates the models are suitable.
Using the developed severity correction models, the regression coefficients for each variable can be estimated, and the predicted mortality value for an individual patient can be calculated. Using model 1 coefficients in
Table 4, data on newly admitted patients to the ICU could be used to calculate the predicted mortality.
Calibration plots of the validation cohort for model 1 and model 2, for predicting in-hospital mortality and ICU mortality, are presented in
Figures 2 and
3. The 12,747 patients in the validation cohort were divided into 10 equal sized groups by using the deciles of the predicted mortality. The calibration curve represents the relationship between the predicted mortality and the observed mortality. The diagonal dotted lines represents a good agreement between observed and expected mortality estimates.
Comparison of the developed model and existing mortality prediction models
The external validity of models 1 and 2 was examined, by evaluating their performance when applied to other patient populations. In addition, we confirmed whether the performance of the models using data collected from three hospitals was comparable to the two measurement tools currently in use (APACHE II and SAPS III) (
Table 4,
Fig. 4). AUCs were compared using DeLong’s test. Using data from 1,000 and 404 ICU patients from tertiary hospitals 1 and 2, the degree of prediction of in-hospital and ICU mortality was determined for models 1 and 2. In the comparison of predictive power for in-hospital and ICU mortality, the performance of models 1 and 2 was not inferior to the existing models (APACHE II and SAPS III). When comparing the predictive power for ICU mortality in tertiary hospital 1, the AUCs of models 1 and 2 were significantly higher than that of APACHE II (model 1,
p = 0.007; model 2,
p <0.001;
Table 4,
Fig. 4). When models 1 and 2 were used for data from 897 patients in a general hospital, there was no significant difference in predicting in-hospital and ICU mortality compared to the APACHE II score based on health insurance claims, and the predicted AUC was > 0.8 (
Table 4,
Fig. 4). Calibration plots of external validation data for model 1 and model 2, for predicting in-hospital mortality and ICU mortality, are presented in
Supplementary Figures 1–
5. The diagonal dotted lines represents a good agreement between observed and expected mortality estimates.
DISCUSSION
In this study, we developed models that could predict in-hospital and ICU mortality based on the HIRA claims data. Model 1 included six patient-related categorical variables (age, sex, CCI, ventilator use, hemodialysis or CRRT, and vasopressor use). In addition, the presence or absence of ICU specialists and nursing grades were added as correction variables in model 2. Both models 1 and 2 showed results in predicting in-hospital and ICU mortality comparable to the existing scoring systems in the fitness verification of the validation cohort, and in external validity assessment using data from two tertiary hospitals, and one general hospital. This study is significant in that it developed a more convenient model for predicting ICU mortality than existing mortality prediction models through the analysis of data from the entire ICU patients in Korea.
Researchers have previously studied and developed prognostic scoring systems to predict the mortality rate of critically ill patients. APACHE is the most commonly used scoring system. In 1985, Knaus et al. [
3] revised 34 items of the original APACHE to 12 items and published the results of validation of 5,815 ICU admission patients in 13 hospitals in the USA. The analysis showed that an increase in score was closely related to an increase in hospital death [
3]. According to Zimmerman et al. [
11], among 110,558 critically ill patients admitted to 104 ICUs in 45 hospitals in the USA, the new scoring system (APACHE IV) predicted in-hospital mortality with good calibration and discrimination (AUC: 0.880) . In contrast, Le Gall et al. [
6] proposed the SAPS II model in a study of 13,152 patients admitted to 137 medical and/or surgical ICUs in 12 countries between September 1991 and February 1992. The SAPS II model predicted in-hospital mortality with AUCs of 0.88 and 0.86 in the developmental and validation samples, respectively [
6]. Moreno et al. [
7] presented SAPS III, a new model for predicting in-hospital mortality, through a multicenter, multinational cohort study of patients admitted to 303 ICUs from October to December 2002 (AUC: 0.848).
Studies predicting mortality using the SOFA score have also been reported [
14]. Furthermore, Moreno et al. [
14] suggested the usefulness of the total maximum SOFA score in predicting ICU mortality based on a multicenter-multinational cohort study among 1,449 patients admitted to 75 ICUs in 16 countries in March 1995 (AUC 0.772).
Several countries have also reported studies on mortality prediction models based on public databases. In the USA, the Mortality Probability Admission Model (MPM0)-III and APACHE-IV prognostic scoring systems are commonly used in ICUs. Efforts have been made at the California Healthcare Foundation and the National Quality Forum (NQF) to develop prosaic scoring systems for quality measurement; therefore, the ICU Outcomes Model (modified and recalibrated version of the MPM0-III model) was developed [
15]. Data from 55,304 patients aged ≥ 18 admitted to 55 ICUs at USA hospitals from January 2008 to December 2012 showed that APACHE IV was the most accurate when compared with the ICU Outcomes Model/NQF, and the MPM0-III for predicting in-hospital mortality [
15]. In the UK, Harrison et al. [
16] developed and announced their own Intensive Care National Audit & Research Centre (ICNARC) model, based on the database of the Case Mix Program. The ICNARC model was developed based on the data of 216,626 patients admitted to 163 critical care units in England, Wales, and Northern Ireland from 1995 to 2003, and showed better discrimination and overall fit than pre-existing risk-prediction models for in-hospital mortality [
16].
Several studies conducted in the ICUs in Korea to predict mortality have been reported. In a study which retrospectively reviewed 1,314 patients admitted to the surgical ICU from March 2011 to February 2012 in a university hospital, the overall discrimination and calibration of APACHE IV were similar to those of APACHE II, SAPS 3, and Korean SAPS 3 [
17]. Another Korean study investigated the predictive power of APACHE II for in-hospital mortality in ICU patients [
18]. In this study, data from the Fever and Antipyretics in Critical Illness Evaluation cohort were collected prospectively between September 1 to November 30, 2019, from adult patients aged ≥ 18 years who were admitted to ICUs in 25 hospitals (15 in Japan and 10 in Korea). The analyses showed that APACHE II predicted in-hospital mortality with poor calibration and modest discrimination. In the study of Lim et al. [
19], mortality rates predicted using the general SAPS 3 and its customized equation (Australasia SAPS 3) exhibited good calibration and modest discrimination. However, the Australasia SAPS 3 did not improve the mortality prediction. A prospective multicenter cohort study involving 22 ICUs from 15 centers throughout Korea investigated the validation of the SAPS 3, and customized it, for Korean ICUs [
20]. The new equation for Korean ICU patients was tested in a validation cohort, and demonstrated both good discrimination and good calibration. In particular, this study has clinical significance in that in presented a new equation that can be applied to Korean ICUs.
However, all of these previous studies have difficulties in measuring many variables to predict mortality. Furthermore, comparison between hospitals is difficult because not all ICUs use the same scoring system. Therefore, we aimed to develop and validate a novel prognostic model for predicting mortality, using national insurance claims data.
This study is meaningful in that it is the first study in Asia to develop and apply a model to predict in-hospital and ICU mortality using a large-scale national-level database that included data from all ICUs in Korea. This study’s data were based on the HIRA of South Korea, the sole nationwide governmental agency that operates a fee-for-service reimbursement system. Therefore, the database is very systematic and contains a lot of information. Moreover, this study distinguishes itself by offering a model for predicting in-hospital and ICU mortality rates more conveniently than the existing scoring systems.
This study has several limitations. First, while the study was based on a large database released by the South Korean government, our findings may not be generalizable to other countries, which may have different patterns of ICU use. Second, because of data limitations, in-hospital mortality could not be analyzed for one of the tertiary hospitals; this limited our testing of external validity. Third, the period of study was spring and summer, and as hospital admissions are affected by the season, our findings may not be applicable to other times of the year. Finally, because of the limited information available, it was difficult to compare and validate the newly developed scoring system with SAPS III and SOFA scores. Therefore, a large-scale systematic study is needed.
In this study, we developed a model that can predict in-hospital and ICU mortality based on the HIRA claims data released by the South Korean government. The novel and simple models were not inferior in predicting in-hospital and ICU mortality compared to the pre-existing scoring systems.