knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The package gbmSPM
can be used to do the exact analysis pipeline used in the paper,
Smedley, Nova F., Benjamin M. Ellingson, Timothy F. Cloughesy, and William Hsu. "Longitudinal Patterns in Clinical and Imaging Measurements Predict Residual Survival in Glioblastoma Patients." Scientific reports 8, no. 1 (2018): 14429.
See also Supplemental Materials.
This vignette shows how temporal features and patient covariates are used in predicing residual survival. This depends on the glmnet
and caret
R packages, uses dummy patient data, and follows the other vignette: "Generate sequential patterns."
Set the cross-validation and logistic regression parameters
```r library(gbmSpm)
seed <- 9 nFolds <- 2 cvReps <- 2 metric <- 'rocAUC' lasso <- TRUE llength <- 100 lmax <- 0.2 ```
Set the parameters used to generate features
Previously, we found temporal features via sequential pattern mining (SPM) with arulesSequences
and placed the patterns in "~/gbm_spm_example". We will also put logit results there as well.
```r
tType <- 'rate' maxgap <- 60 maxlength <- 2 outdir <- '~/gbm_spm_example' dataFolder <- 'sup0.4g60l2z2' # previously created spm patterns dataFile <- file.path(dataFolder, 'featureVectors_rateChange.rds') # data input file
prefix <- 'logits' logitFolder <- file.path(outdir, prefix, dataFolder) ifelse(!dir.exists(logitFolder), dir.create(logitFolder, recursive = T), F)
dataPartitions <- file.path(outdir, 'train_test_partitions.rds')
dataPath <- file.path(outdir,'spm',dataFile)
```
Log the experiment:
```r logfn <- file(file.path(logitFolder, paste0(format(Sys.Date(), format="%Y.%m.%d"), 'logit',getVolTypeName(tType),'.log')), open='wt')
sink(logfn, type='output', split=T)
cat(format(Sys.Date(), format="%Y.%m.%d"), '\n') cat('pattern dataset:', dataFile, '\n') cat('tumor vol variable type: ', getVolTypeName(tType), '\n') cat('seed: ', seed, '\n') cat('nFolds: ', nFolds, '\n') cat('cvReps: ', cvReps, '\n')
if (lasso) { cat('lambda length: ', llength, '\n\n') cat('lambda max: ', lmax, '\n\n') } ```
Create partitions on cleaned data
Data partitioning depends on the fully cleaned dataset and ensures the exact same clinical visits are maintained in training or testing, regardless of any further feature engineering methods.
Load original data: ```r data("fake_data") names(fake_data) colnames(fake_data$person) colnames(fake_data$demo) colnames(fake_data$events)
survData <- fake_data$person fake_demo <- fake_data$demo ```
Get the pool of visits for partitioning:
```r
fake_data$events <- cleanData(fake_data$events, tType = 'c')
fake_data <- merge(fake_data$events, fake_data$person, by='iois', all.x=T)
fake_data <- prepDemographics(fake_data, fake_demo)
fake_data <- prepSurvivalLabels(fake_data)
fake_data$id <- paste0(fake_data$iois,'.',fake_data$eventID)
part.ids <- getTrainTestPartition(data = fake_data, database = NULL, personTable = NULL, survData = survData, seed = seed, verbose = T) names(part.ids) ```
Read in features for prediction task
Get the features generated from SPM:
```r
data <- readRDS(dataPath) names <- colnames(data) data <- data.frame(id=row.names(data),data) colnames(data) <- c('id',names) cat('...overall samples: ', nrow(data), '\n') ```
Convert tumor laterality and tumor location to dummy variables:
r
data <- prepLaterality(data)
data <- prepLocation(data)
Remove clinical visits with no history of length max. gap:
r
data <- removeVisits(data,
maxgap = maxgap,
maxlength = maxlength,
tType = tType,
save = F,
outDir = logitFolder)
Remove unwanted features and labels that should not be in training data for glmnet
:
r
labels <- getClassLabels()
needToRemove <- c('id','iois','eventID', # remove ids
labels, # remove labels
'IDH1') # not interested
Logistic regression
Set labels and formula: ```r
ln <- labels[1] # just do one ln
form <- as.formula(paste0(ln,'~.')) form ```
Set training data and do cross-validation with refitting with the best performing lambda:
```r data$id <- as.character(data$id) data <- data[data$id %in% part.ids$train,]
aucs <- runCV(data = data, formula = form, lasso = TRUE, llength = llength, lmax = lmax, labelName = ln, needToRemove = needToRemove, metric = metric, seed = seed, folds = nFolds, cvReps = cvReps, createModelMatrix = FALSE) ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.