In novasmedley/gbmSpm: Find predictive sequential patterns for clinical glioblastoma patients

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The package gbmSPM can be used to do the exact analysis pipeline used in the paper,

Smedley, Nova F., Benjamin M. Ellingson, Timothy F. Cloughesy, and William Hsu. "Longitudinal Patterns in Clinical and Imaging Measurements Predict Residual Survival in Glioblastoma Patients." Scientific reports 8, no. 1 (2018): 14429.

Vignette Info

This vignette shows how temporal features and patient covariates are used in predicing residual survival. This depends on the glmnet and caret R packages, uses dummy patient data, and follows the other vignette: "Generate sequential patterns."

Example

Set the cross-validation and logistic regression parameters
- seed - for data partitioning
- nFolds - number of cross-validation folds
- cvReps - number of times to repeat cross-validation
- metric - for selecting hyperparameters
- lasso - whether to use lasso regularization (lamba)
- llength - number of lambas to consider
- lmax - the max. lamba to consider
```r library(gbmSpm)

seed <- 9 nFolds <- 2 cvReps <- 2 metric <- 'rocAUC' lasso <- TRUE llength <- 100 lmax <- 0.2 ```
Set the parameters used to generate features

Previously, we found temporal features via sequential pattern mining (SPM) with arulesSequences and placed the patterns in "~/gbm_spm_example". We will also put logit results there as well.

```r

previously created spm patterns parameters

tType <- 'rate' maxgap <- 60 maxlength <- 2 outdir <- '~/gbm_spm_example' dataFolder <- 'sup0.4g60l2z2' # previously created spm patterns dataFile <- file.path(dataFolder, 'featureVectors_rateChange.rds') # data input file

output dirs for logit models

prefix <- 'logits' logitFolder <- file.path(outdir, prefix, dataFolder) ifelse(!dir.exists(logitFolder), dir.create(logitFolder, recursive = T), F)

expected ids of train and test, if doesn't exist, will create it

dataPartitions <- file.path(outdir, 'train_test_partitions.rds')

input path, aka spm data folder

dataPath <- file.path(outdir,'spm',dataFile)

```

Log the experiment:

```r logfn <- file(file.path(logitFolder, paste0(format(Sys.Date(), format="%Y.%m.%d"), 'logit',getVolTypeName(tType),'.log')), open='wt')

sink(logfn, type='output', split=T)

cat(format(Sys.Date(), format="%Y.%m.%d"), '\n') cat('pattern dataset:', dataFile, '\n') cat('tumor vol variable type: ', getVolTypeName(tType), '\n') cat('seed: ', seed, '\n') cat('nFolds: ', nFolds, '\n') cat('cvReps: ', cvReps, '\n')

if (lasso) { cat('lambda length: ', llength, '\n\n') cat('lambda max: ', lmax, '\n\n') } ```
Create partitions on cleaned data

Data partitioning depends on the fully cleaned dataset and ensures the exact same clinical visits are maintained in training or testing, regardless of any further feature engineering methods.

Load original data: ```r data("fake_data") names(fake_data) colnames(fake_data$person) colnames(fake_data$demo) colnames(fake_data$events)

survData <- fake_data$person fake_demo <- fake_data$demo ```

Get the pool of visits for partitioning:

```r

'c' tType does not remove initial imaging volumes

(e.g., rate change needs two visits, the very first one is ignores)

fake_data$events <- cleanData(fake_data$events, tType = 'c')

fake_data <- merge(fake_data$events, fake_data$person, by='iois', all.x=T)

need gender info

fake_data <- prepDemographics(fake_data, fake_demo)

get survival labels, these also change

fake_data <- prepSurvivalLabels(fake_data)

clinical ids

fake_data$id <- paste0(fake_data$iois,'.',fake_data$eventID)

create train and test partitions

and attempt to maintain gender and survival balance in both parts

part.ids <- getTrainTestPartition(data = fake_data, database = NULL, personTable = NULL, survData = survData, seed = seed, verbose = T) names(part.ids) ```
Read in features for prediction task

Get the features generated from SPM:

```r

features and labels for each clinical visit

data <- readRDS(dataPath) names <- colnames(data) data <- data.frame(id=row.names(data),data) colnames(data) <- c('id',names) cat('...overall samples: ', nrow(data), '\n') ```

Convert tumor laterality and tumor location to dummy variables: r data <- prepLaterality(data) data <- prepLocation(data)

Remove clinical visits with no history of length max. gap: r data <- removeVisits(data, maxgap = maxgap, maxlength = maxlength, tType = tType, save = F, outDir = logitFolder)

Remove unwanted features and labels that should not be in training data for glmnet:

r labels <- getClassLabels() needToRemove <- c('id','iois','eventID', # remove ids labels, # remove labels 'IDH1') # not interested
Logistic regression

Set labels and formula: ```r

ln <- labels[1] # just do one ln

form <- as.formula(paste0(ln,'~.')) form ```

Set training data and do cross-validation with refitting with the best performing lambda:

```r data$id <- as.character(data$id) data <- data[data$id %in% part.ids$train,]

aucs <- runCV(data = data, formula = form, lasso = TRUE, llength = llength, lmax = lmax, labelName = ln, needToRemove = needToRemove, metric = metric, seed = seed, folds = nFolds, cvReps = cvReps, createModelMatrix = FALSE) ```

novasmedley/gbmSpm documentation built on May 17, 2019, 10:39 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

novasmedley/gbmSpm
Find predictive sequential patterns for clinical glioblastoma patients

In novasmedley/gbmSpm: Find predictive sequential patterns for clinical glioblastoma patients

Vignette Info

Example

previously created spm patterns parameters

output dirs for logit models

expected ids of train and test, if doesn't exist, will create it

input path, aka spm data folder

'c' tType does not remove initial imaging volumes

(e.g., rate change needs two visits, the very first one is ignores)

need gender info

get survival labels, these also change

clinical ids

create train and test partitions

and attempt to maintain gender and survival balance in both parts

features and labels for each clinical visit

R Package Documentation

Browse R Packages

We want your feedback!

novasmedley/gbmSpm Find predictive sequential patterns for clinical glioblastoma patients

In novasmedley/gbmSpm: Find predictive sequential patterns for clinical glioblastoma patients

Vignette Info

Example

previously created spm patterns parameters

output dirs for logit models

expected ids of train and test, if doesn't exist, will create it

input path, aka spm data folder

'c' tType does not remove initial imaging volumes

(e.g., rate change needs two visits, the very first one is ignores)

need gender info

get survival labels, these also change

clinical ids

create train and test partitions

and attempt to maintain gender and survival balance in both parts

features and labels for each clinical visit

R Package Documentation

Browse R Packages

We want your feedback!

novasmedley/gbmSpm
Find predictive sequential patterns for clinical glioblastoma patients