prepLogitData: Create data for logit model.

Description Usage Arguments Details Value Examples

Description

Create data for logit model.

Usage

1
2
prepLogitData(data, formula, labelName, predictors = NULL,
  needToRemove = NULL, createModelMatrix = FALSE)

Arguments

data

dataframe, rows are samples, cols are features plus some metadata not meant for modeling and will be removed

formula

char or formula object

labelName

char, column name of binary label

predictors

char, names of columns in data that should be in logit fit data

needToRemove

char, names of columns in data that should not be in logit fit data

createModelMatrix

logical, call model.matrix

Details

Removes non-features and non-labels. Creates dataframe ready to serve as input to logit fit, see model.matrix. Removes rows with any 'NA' values.

Value

dataframe ready for logit fitting

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# use training partition to create folds for CV
data("features_ratechange_sup0.4g60l2z2") # features and labels for each clinical visit
t <- 'rate'
maxgap <- 60
maxlen <- 2

# format
names <- colnames(feats)
feats <- data.frame(id=row.names(feats),feats)
colnames(feats) <- c('id',names)
feats <- prepLaterality(feats)
feats <- prepLocation(feats)
feats <- removeVisits(feats,
                     maxgap=maxgap,
                     maxlength=maxlen,
                     tType=t,
                     save=F,
                     outDir=NA)
labels <- getClassLabels()
needToRemove <- c('id','iois','eventID', # remove ids
                  labels, # remove labels
                  'IDH1') # not interested

# data partitions
train.ids <- sample(feats$id, size=floor(0.80*nrow(feats)), replace = F) # random
feats <- feats[feats$id %in% train.ids,] #training data
ind <- getTrainingFolds(trainEvents=feats,
                        folds=3,
                        seed=1,
                        verbose=T)
feats <- prepLogitData(data = feats,
                      formula = 'survivalIn60 ~ .',
                      labelName = 'survivalIn60',
                      needToRemove=needToRemove)
                      

novasmedley/gbmSpm documentation built on May 17, 2019, 10:39 a.m.