In dshenker/RFSLAM: What the Package Does (One Line, Title Case)

knitr::opts_chunk$set(echo = TRUE)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(tidyverse)
library(rms)
library(mice)
library(ggRandomForests)
library(survival)
library(missRanger)
library(survminer)
library(glmnet)
library(pROC)
library(rfSLAM)
library(grid)
library(gridExtra)
library(ggsci)
library(pROC)
library(splines)
library(viridis)
library(rpart)
library(compareGroups)
library(tidyverse)
library(lubridate)
library(kableExtra)
library(rpart.plot)
setwd("~/workspace/SAFE/code/SLAM_ML/dshenker/RFSLAM_PACKAGE/RFSLAM")
devtools::load_all()

Load in our example dataset which is in the correct final format

df <- readRDS("~/workspace/SAFE/code/SLAM_ML/dshenker/sample_modeling_df/rf_df_1_withvariantsplusimputed.rda")
drop <- c("rt_1", "rt_7", "rt_7p", "i.sev_died.7")
analysis_vec <- readRDS("~/workspace/SAFE/data/curated_data/SLAM_ML/for_templates/analysis_vector_Sep_22.rds")
analysis_vars <- c(analysis_vec, "dominant_clade", "vaccinated", "remdesivir", "remdesivir_24", "remdesivir_since_admit", "systemic_steroid", "systemic_steroid_24", "systemic_steroid_since_admit", "tocilizumab", "tocilizumab_admit", "variant")
all_vars_for_model <- c(analysis_vars, "rt_1", "rt_7", "rt_1p", "rt_7p", "i.sev_died.1", "i.sev_died.7", "pid", "int.n")


var_key <- read_csv("~/workspace/SAFE/code/SLAM_ML/dshenker/var_key.csv") #example variable key

The initial step before using RF-SLAM once we have our data ready is to create our “risk time” information. These risk times tell RF-SLaM for what proportion of the interval of interest is the patient actually at risk. Here is a guide to the function calls: - First variable is the data frame being operated on. - Second variable is the column containing the time of the FINAL available data for the patient (could be the time of treatment end, death, loss to followup, etc.). - The third variable is the column name for where the currrent time is stored. - Fourth variable is the length of the time intervals. - Fifth variable is the value you want to normalize the times by. - Finally, the last variable is what you would like the new column to be named. Note that there will also be a second new column that is the adjusted version.

df <- as.data.frame(calc_risk_times(df, "t.sd", "q6", 6, 24, "rt_1")) %>% select(all_vars_for_model[all_vars_for_model %in% names(.)]) %>% mutate_if(sapply(.,is.character), as.factor)

We need a variable to use to split for the cross validation folds. Here, we will use a binary variable which tells us if a patient was EVER in a severe condition. Let’s create that column here.

severe_ever <- df %>% group_by(pid) %>% slice(which.max(i.sev_died.1)) %>% select(pid, i.sev_died.1)
colnames(severe_ever) <- c("pid", "sev_ever")
df <- df %>% left_join(severe_ever, by = "pid")

The bulk of the RF-SLAM code is strategically hidden in the “rf_functions.R” file. The user of the package themselves must make the following function calls. The purpose of each one is described here briefly: - create_model: runs the RFSLAM algorithm on the given dataframe - feature_importance_plot: plots the variable importance in the RF-SLAM model along with the variable information provided in the corresponding key - calibrate.model: calibrates the model predictions to align with the dataset - show_aucs: calculate AUC values at pre-specified time values (currently these are preset in “rf_functions” but will be soon customizable) - analysis_plots: create the calibration plots and rpart visualizations of the model - save_risk_vals: save the dataframe along with the corresponding predictions to an RDS file. ## Run RF-SLAM and the corresponding analysis

df$i.sev_died.1_final <- relevel(df$i.sev_died.1, "0")
df$i.sev_died.1 <- NULL
target <- "i.sev_died.1_final"
drop <- c(drop, "i.sev_died.1", "sev_ever")

best_params <- tune_rf_params(df, target, id_col = "pid", risk_time_col = "rt_1p", patient_count_col = "int.n", time_col = "q6", drop, ntree_trys = c(100, 200), nodedepth_trys = c("NULL", 3), nsplit_trys = c(5, 10), n.folds = 3, folds_stratifier = "sev_ever")

We can now look at the table saved in best_params to determine what parameters we’d like to use for modeling.

best_params

We can see that the maximum average AUC across folds comes wen we use 100 trees, a nodedepth of 3, and nsplit equal to 5. Thus, let’s train our final model using those parameter settings.

final_model <- create_model(df[,!(names(df) %in% drop[drop != sym(target)])], target, id_col = "pid", risk_time_col = "rt_1p", patient_count_col = "int.n", "q6", ntree = 100, nodedepth = 3, nsplit = 5)

mymodel.1.full_1day <- final_model$model
p.cpiu.be.ppl_1day <- final_model$preds

feature_importance_plot(mymodel.1.full_1day, var_key) #plot feature importance

df$p.hat <- calibrate.model(p.cpiu.be.ppl_1day, df, target, "q6")
analysis_plots(df, "i.sev_died.1_final", "pid", "p.hat", "q6", analysis_vars)

Finally we can save the risk values.

save_risk_vals(df, "~/workspace/SAFE/code/SLAM_ML/dshenker/Risk_DFs_ForTrees/df_withall_generalframeworkexperiment.Rds")

dshenker/RFSLAM documentation built on Sept. 18, 2022, 3:26 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dshenker/RFSLAM
What the Package Does (One Line, Title Case)

In dshenker/RFSLAM: What the Package Does (One Line, Title Case)

R Markdown

R Package Documentation

Browse R Packages

We want your feedback!

dshenker/RFSLAM What the Package Does (One Line, Title Case)

In dshenker/RFSLAM: What the Package Does (One Line, Title Case)

R Markdown

R Package Documentation

Browse R Packages

We want your feedback!

dshenker/RFSLAM
What the Package Does (One Line, Title Case)