knitr::opts_chunk$set(echo = FALSE) devtools::load_all() library(tidyverse) library(patchwork) theme_set(cowplot::theme_cowplot()) data <- read_data_for_analysis()
at_admission <- data$patient_data %>% mutate(alive_12 = case_when( last_record >= 12 ~ TRUE, outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA), alive_30 = case_when( last_record >= 30 ~ TRUE, outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA) )
In all of the following we focus on the area under curve (AUC), also often called C-statistic. We compute AUC using the pROC
package [@http://zotero.org/users/5567156/items/UN6BIC67], which also reports bootstrapped confidence intervals.
For all risk scores we look at 12-day mortality and 30-day mortality - simply because those are the time horizons reported in the studies we looked at. When needed or when it is uncertain how the outcome was coded in the original study, we can also look at "indefinite horizon" where we only include patients that were not censored and had a known outcome.
The ACP index [@http://zotero.org/users/5567156/items/EN4GICJ6] is a simple score that categorizes the patients into 3 categories based on age and C-reactive protein (CRP): grade 1 (age $<$ 60 years and CRP $<$ 34 mg/L), grade 2 (age $\geq$ 60 years and CRP $<$ 34 mg/L or age $<$ 60 years and CRP $\geq$ 34 mg/L), grade 3 (age $\geq$ 60 years and CRP $\geq$ 34 mg/L) and was evaluated primarily on 12-day mortality. Despite the score being developed on patients in Wuhan, China, the 12-day mortality in our dataset is in rough agreement with the values in the original manuscript as shown in Table \@ref(tab:acpresults).
acp_data <- data$marker_data_wide %>% group_by(patient_id) %>% filter(any(!is.na(CRP))) %>% summarise( first_CRP_day = min(day[!is.na(CRP)]), first_CRP = CRP[day == first_CRP_day], worst_breathing_12 = max(breathing[day >= first_CRP_day & day <= first_CRP_day + 12]), .groups = "drop") %>% inner_join(data$patient_data, by = c("patient_id")) %>% filter(outcome %in% c("Death","Discharged") | last_record - first_CRP_day >= 12) %>% mutate( alive_12 = case_when(last_record >= first_CRP_day + 12 | outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA), alive_30 = case_when(last_record >= first_CRP_day + 30 | outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA), ACP_grade = case_when( age < 60 & first_CRP < 34 ~ 1, age >= 60 & first_CRP >= 34 ~ 3, TRUE ~ 2 ) ) acp_data %>% filter(!is.na(alive_12)) %>% group_by(ACP_grade) %>% summarise(`patients` = n(), `12-day mortality` = sum_percent(!alive_12), .groups = "drop") %>% arrange(ACP_grade) %>% mutate(`Li et al. 95% CI` = c("0%", "0 - 11.3%", "19.8 - 44.3%")) %>% rename(`ACP grade` = ACP_grade) %>% knitr::kable(booktabs = TRUE, caption = "Comparing the 12-day mortality based on ACP score (where available) with the 95\\% confidence intervals (CI) for 12-day mortality reported by Lu et al.") # acp_data %>% # group_by(ACP_grade, worst_breathing_12) %>% # summarise(mean(alive_12))
The original authors do not report AUC (C-statistic), but they report the overall counts of alive and death patients, letting us compute it. The counts are:
auc_data_from_counts <- function(score_values, positive_counts, negative_counts) { res <- list() for(i in 1:length(score_values)) { res[[i]] <- rbind( data.frame(score = rep(score_values[i], positive_counts[i]), outcome = rep(TRUE, positive_counts[i])), data.frame(score = rep(score_values[i], negative_counts[i]), outcome = rep(FALSE, negative_counts[i])) ) } do.call(rbind, res) } #Grade 1, 0% mortality,Grade 2, 5.6% mortality, Grade 3, 33.2% mortality li_et_al_data <- auc_data_from_counts(c(1,2,3), positive_counts = c(194, 145-8, 95-32), negative_counts = c(0, 8, 32)) %>% transmute(alive_12 = outcome, ACP_grade = score) li_et_al_data %>% group_by(ACP_grade, alive_12) %>% summarise(N = n(), .groups = "drop")
Giving us this AUC:
acp_auc_orig <- my_auc_summary(alive_12 ~ ACP_grade, li_et_al_data) acp_auc_orig %>% transmute(AUC = format(auc, digits = 3), `95% CI` = paste0("(",format(auc_low, digits = 3), ", ", format(auc_high, digits = 3), ")")) %>% knitr::kable(booktabs = TRUE)
So we can now compare AUC computed on our data for both 12-day mortality (as in the original study) and 30-day mortality.
acp_auc <- rbind( my_auc_summary(alive_12 ~ ACP_grade, acp_data), my_auc_summary(alive_30 ~ ACP_grade, acp_data) ) compare_auc_to_orig(acp_auc_orig, acp_auc)
We see that the score noticeably underperforms in the present dataset.
Chen and Liu [@http://zotero.org/users/5567156/items/4DDUVEKS] propose a score based on a logistic regression model as follows.
$$ Score = -4.211+0.013 \times CRP+0.059 \times Age +0.112 \times DD -1.984 \times LYM $$
Where CRP is C-reactive protein (mg/l) DD is D-dimer (the paper does not have units, but the associated web app is clear that it expects D-dimer as ug/ml FEU) and LYM is the Lympohocyte count (10^9 / L). The paper states:
patients who were with >10% missing values, stayed in the hospital < 7 days, afflicted by a severe disease before admission (e.g., cancer, aplastic anemia, and uremia), were unconscious at admission, and were directly admitted to the intensive care unit (ICU) were excluded.
Non-normally distributed continuous variables were transformed using a Box-Cox transformation.
However, reverse engineering the asociated web app (https://phenomics.fudan.edu.cn/risk_scores/), no transform appears to have been done to the variables that were finally selected. It is also slightly unclear how they computed the outcome (i.e. how they treated censoring).
The units in our dataset are:
data$marker_data %>% filter(marker %in% c("CRP", "d_dimer", "lymphocyte_count")) %>% select(marker, unit) %>% distinct()
Which matches the model except for D-dimer (we use ng/ml DDU, FEU value is double the DDU).
# More precise coefficient values and the CI range determined by reverse engineering the webapp chen_liu_data <- data$marker_data_wide %>% mutate(has_all_values = !is.na(CRP) & !is.na(d_dimer) & !is.na(lymphocyte_count), d_dimer_converted = d_dimer * 2 / 1000) %>% group_by(patient_id) %>% filter(any(has_all_values)) %>% summarise( first_day = min(day[has_all_values]), first_CRP = CRP[day == first_day], first_d_dimer_converted = d_dimer_converted[day == first_day], first_lymphocyte_count = lymphocyte_count[day == first_day], breathing_first = breathing[day == first_day], .groups = "drop") %>% inner_join(at_admission, by = c("patient_id")) %>% mutate( inclusion_criterium = breathing_first < "NIPPV" & (outcome %in% c("Discharged", "Death") | last_record >= first_day + 7), chen_liu = -4.21167 + 0.058699 * age + 0.013391 * first_CRP + 0.112055 * first_d_dimer_converted - 1.983733 * first_lymphocyte_count, risk = 1 / (1 + exp(-chen_liu)), risk_lower = 1 / (1 + exp(-chen_liu + 1.96*0.1721)), risk_upper = 1 / (1 + exp(-chen_liu - 1.96*0.1721)), alive_12_data = case_when(last_record >= first_day + 12 | outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA), alive_30_data = case_when(last_record >= first_day + 30 | outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA), alive_all = case_when(outcome == "Discharged" ~ TRUE, outcome == "Death" ~ FALSE, TRUE ~ NA) ) %>% select(age, starts_with("first"), chen_liu, starts_with("risk"), breathing_first, starts_with("alive"), inclusion_criterium)
A plot of the computed values for our dataset.
chen_liu_data %>% ggplot(aes(x = chen_liu, y = alive_30)) + geom_point()
We look at survival since first day when all of the values required for the score are available, or we ignore when the values were computed and look at survival since admission. Finally we also tried emulating the inclusion criteria from Chen & Lie
Chen & Liu report AUC = 0.881 for the validation dataset and don't provide enough information to let us recompute it with credible intervals.
chen_liu_data_inclusion <- chen_liu_data %>% filter(inclusion_criterium) chen_liu_auc <- rbind( my_auc_summary(alive_12 ~ chen_liu, chen_liu_data), my_auc_summary(alive_30 ~ chen_liu, chen_liu_data), my_auc_summary(alive_12_data ~ chen_liu, chen_liu_data), my_auc_summary(alive_30_data ~ chen_liu, chen_liu_data), my_auc_summary(alive_all ~ chen_liu, chen_liu_data), my_auc_summary(alive_12 ~ chen_liu, chen_liu_data_inclusion, "incl"), my_auc_summary(alive_30 ~ chen_liu, chen_liu_data_inclusion, "incl"), my_auc_summary(alive_12_data ~ chen_liu, chen_liu_data_inclusion, "incl"), my_auc_summary(alive_30_data ~ chen_liu, chen_liu_data_inclusion, "incl"), my_auc_summary(alive_all ~ chen_liu, chen_liu_data_inclusion, "incl") ) #plot_all_auc(chen_liu_auc)
chen_liu_auc_orig <- my_auc_summary_from_estimate(0.881, "alive", "chen_liu", "incl") compare_auc_to_orig(chen_liu_auc_orig, chen_liu_auc)
Shi et al. [@http://zotero.org/users/5567156/items/X4GWJIP5] report a simple score: 1 point for Age >= 50, Male and Hypertension each. They try to predict "severe" illness, but it is not clear what "severe" means. Here, we test both "not ventilated" (or worse, including death) and "not needing supplementary oxygen" as proxy for "severe". They also create a risk score for both "severe" on admission and "severe" during hospitalization. As for other scores we also look at 12- and 30- day mortality.
shi_et_al_data <- data$marker_data_wide %>% group_by(patient_id) %>% summarise( breathing_admission = max(breathing[!is.na(breathing) & day < 2]), not_ventilated_admission = breathing_admission < "NIPPV", not_oxygen_admission = breathing_admission < "Oxygen", .groups = "drop") %>% inner_join(at_admission, by = c("patient_id")) %>% mutate(shi = (age >= 50) + (sex == "M") + has_hypertension_drugs, not_ventilated_all = case_when( outcome == "Death" ~ FALSE, outcome != "Discharged" ~ NA, worst_breathing < "NIPPV" ~ TRUE ), not_oxygen_all = case_when( outcome == "Death" ~ FALSE, outcome != "Discharged" ~ NA, worst_breathing < "Oxygen" ~ TRUE ) ) shi_et_al_auc <- rbind( my_auc_summary(alive_12 ~ shi, shi_et_al_data), my_auc_summary(alive_30 ~ shi, shi_et_al_data), my_auc_summary(not_ventilated_all ~ shi, shi_et_al_data), my_auc_summary(not_oxygen_all ~ shi, shi_et_al_data), my_auc_summary(not_ventilated_admission ~ shi, shi_et_al_data, subgroup = " (admission)"), my_auc_summary(not_oxygen_admission ~ shi, shi_et_al_data, subgroup = " (admission)") ) #plot_all_auc(shi_et_al_auc)
We extract the original data from Fig1 B,C using WebPlotDigitizer.
shi_et_al_reported <- data.frame( score = c(0,1,2,3), shi_severe_admission = c(0,0.057,0.19,0.4), shi_severe_all = c(0.083, 0.138,0.389,0.429), mild_cases_admission = c(118, 199, 98, 21), all_cases_admission = c(118, 211, 121, 35), mild_cases_hospitalization = c(11, 25, 11, 4), all_cases_hospitalization = c(12, 29, 18, 7) ) shi_et_al_reported %>% select(-starts_with("shi_")) %>% knitr::kable(booktabs = TRUE)
# Check that the proportions match # shi_et_al_reported %>% mutate(sev2 = (all_cases_admission - mild_cases_admission)/all_cases_admission) %>% # select(score, shi_severe_admission,sev2) # # shi_et_al_reported %>% mutate(sev2 = (all_cases_hospitalization - mild_cases_hospitalization)/all_cases_hospitalization) %>% # select(score, shi_severe_all,sev2) shie_et_al_auc_data_admission <- auc_data_from_counts(score_values = shi_et_al_reported$score, positive_counts = shi_et_al_reported$mild_cases_admission, negative_counts = shi_et_al_reported$all_cases_admission - shi_et_al_reported$mild_cases_admission) %>% transmute(shi = score, severe_admission = outcome) shi_et_al_auc_reported_admission <- my_auc_summary(severe_admission ~ shi, shie_et_al_auc_data_admission, subgroup = " (admission)") #shi_et_al_auc_reported_admission$summary shie_et_al_auc_data_all <- auc_data_from_counts(score_values = shi_et_al_reported$score, positive_counts = shi_et_al_reported$mild_cases_hospitalization, negative_counts = shi_et_al_reported$all_cases_hospitalization - shi_et_al_reported$mild_cases_hospitalization) %>% transmute(shi = score, severe_hospitalized = outcome) shi_et_al_auc_reported_all <- my_auc_summary(severe_hospitalized ~ shi, shie_et_al_auc_data_all)
Here is the comparison to original data.
compare_auc_to_orig(rbind(shi_et_al_auc_reported_admission, shi_et_al_auc_reported_all), shi_et_al_auc, nrow = 1)
We see that the score consistently underperforms, although some of the CIs are quite wide.
We can also compare the ratios of "severe" cases reported by Shi et al. to ratios in our dataset.
shi_et_al_compare <- shi_et_al_data %>% group_by(shi) %>% summarise( across( one_of(c("not_ventilated_admission", "not_ventilated_all", "not_oxygen_admission", "not_oxygen_all")), ~ mean(.x, na.rm = TRUE), .names = "our_{col}"), n_our_admission = sum(!is.na(not_ventilated_admission)), n_our_all = sum(!is.na(not_ventilated_all)), .groups = "drop") %>% inner_join(shi_et_al_reported, by = c("shi" = "score")) shi_et_al_compare %>% transmute(score = shi, `Shi et al. (admission)` = shi_severe_admission, `Ventilated (admission)` = 1 - our_not_ventilated_admission, `Oxygen (admission)` = 1 - our_not_oxygen_admission, N = n_our_admission) %>% knitr::kable(booktabs = TRUE) shi_et_al_compare %>% transmute(score = shi, `Shi et al. (hospitalization)` = shi_severe_all, `Ventilated (hospitalization)` = 1 - our_not_ventilated_all, `Oxygen (hospitalization)` = 1 - our_not_oxygen_all, N = n_our_all) %>% knitr::kable(booktabs = TRUE)
Caramelo et al. [@http://zotero.org/users/5567156/items/3A2RN4ZS] uses simulations to get patient-level characterstics from aggregate data and then fit a logistic regression model. We use the coefficients they report for the model (the model can't be used directly, because the intercept is not reported). The covariates in the model are age (by decades), sex, presence of hypertension, presence of diabetes, presence of cardiac disease, presence of chronic respiratory disease and presence of cancer. It is unclear whether NYHA > 1 would be considered "cardiac disease" and similarly whether asthma is chronic respiratory disease, so we compute two versions of the score. Our dataset also does not contain cancer, so we ignore it (cancer has one of the lowest reported OR in the model).
Caramelo et al. have not computed AUC in their paper and do not give us any mean to reconstruct it, so we only present our results for the risk score.
caramelo_data <- at_admission %>% mutate( caramelo_base = 0 + case_when(age < 20 ~ 0, age < 30 ~ 0.2017, age < 40 ~ 0.3271, age < 50 ~ 5.6030, age < 60 ~ 6.7626, age < 70 ~ 18.8161, age < 80 ~ 43.7291, TRUE ~ 86.8680) + if_else(sex == "M", 1.8518, 0) + if_else(has_hypertension_drugs, 7.4191, 0) + if_else(diabetes, 9.0329, 0), caramelo1 = caramelo_base + if_else(heart_failure | ischemic_heart_disease | NYHA > 1, 12.8328, 0) + if_else(COPD | asthma, 7.7925, 0), caramelo2 = caramelo_base + if_else(heart_failure | ischemic_heart_disease, 12.8328, 0) + if_else(COPD, 7.7925, 0) ) caramelo_auc <- rbind( my_auc_summary(alive_12 ~ caramelo1, caramelo_data), my_auc_summary(alive_30 ~ caramelo1, caramelo_data), my_auc_summary(alive_12 ~ caramelo2, caramelo_data, note = "v2"), my_auc_summary(alive_30 ~ caramelo2, caramelo_data, note = "v2"), my_auc_summary(alive_12 ~ caramelo_base, caramelo_data, note = "base"), my_auc_summary(alive_30 ~ caramelo_base, caramelo_data, note = "base") ) compare_auc_to_orig(tibble(), caramelo_auc)
The results look somewhat encouraging for the score - the AUC values float around 0.8.
The index given in [@http://zotero.org/users/5567156/items/INZVABQI] is built from following elements:
bello_chavolla_table <- rbind( data.frame(factor = "Age >= 65 years", score = 3), data.frame(factor = "Diabetes", score = 1), data.frame(factor = "Diabetes*Age < 40 years", score = 5), data.frame(factor = "Age < 40 years", score = -6), data.frame(factor = "Obesity", score = 1), data.frame(factor = "Pneumonia", score = 7), data.frame(factor = "Chronic Kidney Disease", score = 3), data.frame(factor = "COPD", score = 1), data.frame(factor = "Immunosuppression", score = 1) ) bello_chavolla_table
Of those, our data does not contain Immunosuppresion and Pneumonia. We can however use taking antibiotics within 1 day of hospitalization as an imperfect proxy for pneumonia. We will ignore immunosuppresion in our test. Both of those choices could reduce the performance of the score.
The authors do not define obesity, so we will use BMI > 30 as threshold for obesity. Our data contains an indicator for "Renal disease" which might be classified slightly differently than "Chronic Kidney Disease".
The authors report C-statistic = 0.830 for mortality on validation dataset - it is unclear how they treat censoring, but they mention 30-day mortality in other part of the paper, so we will focus on 30-day mortality.
atb_first_day <- data$marker_data_wide %>% select(patient_id, day, took_antibiotics) %>% group_by(patient_id) %>% summarise(atb0 = any(took_antibiotics[day <= 0]), atb1 = any(took_antibiotics[day <= 1]), .groups = "drop") bello_chavolla_data <- at_admission %>% inner_join(atb_first_day, by = "patient_id") %>% mutate(bc_base = 3 * (age >= 65) + diabetes + 5 * (diabetes & age < 40) + (-6) * (age < 40) + is_obese + 3 * renal_disease + COPD, bc0 = bc_base + 7 * (atb0 & admitted_for_covid), bc1 = bc_base + 7 * (atb1 & admitted_for_covid) )
# bello_chavolla_data %>% ggplot(aes(x = bc0, y = as.integer(alive_30))) + # geom_smooth() + geom_jitter(width = 0.3, height = 0.3) # bello_chavolla_data %>% ggplot(aes(x = bc_base, y = as.integer(alive_30))) + # geom_smooth() + geom_jitter(width = 0.3, height = 0.3)
bc_auc <- rbind( my_auc_summary(alive_30 ~ bc_base, bello_chavolla_data, note = "base"), my_auc_summary(alive_12 ~ bc_base, bello_chavolla_data, note = "base"), my_auc_summary(alive_30 ~ bc0, bello_chavolla_data, note = "day0"), my_auc_summary(alive_12 ~ bc0, bello_chavolla_data, note = "day0"), my_auc_summary(alive_30 ~ bc1, bello_chavolla_data, note = "day1"), my_auc_summary(alive_12 ~ bc1, bello_chavolla_data, note = "day1") ) bc_auc_orig <- my_auc_summary_from_estimate(0.830, "alive_30", "bc0") compare_auc_to_orig(bc_auc_orig, bc_auc)
The main surprise here is that not inferring pneumonia (shown as "base" in the figure) works best. This indicates that our data may not be well suited to the use of the score.
As a simple baseline we use age in years and the decade of age as the sole predictor and compute AUC for this.
age_only_data <- at_admission %>% mutate(age_decade = floor(age / 10)) age_auc <-rbind( my_auc_summary(alive_30 ~ age, age_only_data), my_auc_summary(alive_12 ~ age, age_only_data), my_auc_summary(alive_30 ~ age_decade, age_only_data, note = "decade"), my_auc_summary(alive_12 ~ age_decade, age_only_data, note = "decade") ) age_auc %>% knitr::kable(booktabs = TRUE) # plot_my_auc(alive_12 ~ age, at_admission)
plot_my_auc(alive_30 ~ age, at_admission)
all_auc <- rbind(acp_auc, chen_liu_auc, shi_et_al_auc, caramelo_auc, bc_auc, age_auc) all_auc_orig <- rbind(acp_auc_orig, chen_liu_auc_orig, shi_et_al_auc_reported_admission, shi_et_al_auc_reported_all, bc_auc_orig) write_csv(all_auc, here::here("manuscript", "all_auc.csv"), na = "") write_csv(all_auc_orig, here::here("manuscript", "all_auc_orig.csv"), na = "") compare_auc_to_orig(all_auc_orig, all_auc, show_outcome = FALSE)
We see that the "age only" baseline performs similarly to the best performing score (Caramelo et al.) and likely better than all of the other scores - there is however some uncertainty prohibiting us to be very sure about the ordering. This holds even if we only take the decade of age. Note that the score by (Caramelo et al.) is actually heavily based on the decade of age. Table of the results is shown below as the small differences are not well visible in the plot.
all_auc %>% mutate(score = score_caption[score]) %>% group_by(score, subgroup, note) %>% summarise(best_auc = max(auc), .groups = "drop") %>% arrange(desc(best_auc)) %>% knitr::kable(booktabs = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.