setDataCols: Remove unwanted or keep interested columns from logit data

Description Usage Arguments Value Examples

Description

Remove unwanted or keep interested columns from logit data

Usage

1
setDataCols(data, labelName, predictors = NULL, needToRemove = NULL)

Arguments

data

dataframe, rows are samples, cols are features plus some metadata not meant for modeling and will be removed

labelName

char, column name of binary label

predictors

char, names of columns in data that should be in logit fit data

needToRemove

char, names of columns in data that should not be in logit fit data

Value

data with removed columns, ready for logit modeling

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data("features_ratechange_sup0.4g60l2z2") # features and labels for each clinical visit
t <- 'rate'
maxgap <- 60
maxlen <- 2

# format
names <- colnames(feats)
feats <- data.frame(id=row.names(feats),feats)
colnames(feats) <- c('id',names)
feats <- prepLaterality(feats)
feats <- prepLocation(feats)
feats <- removeVisits(feats,
                     maxgap=maxgap,
                     maxlength=maxlen,
                     tType=t,
                     save=F,
                     outDir=NA)
labels <- getClassLabels()
needToRemove <- c('id','iois','eventID', # remove ids
                  labels, # remove labels
                  'IDH1') # not interested

# data partitions
train.ids <- sample(feats$id, size=floor(0.80*nrow(feats)), replace = F) # random
feats <- feats[feats$id %in% train.ids,] #training data
ind <- getTrainingFolds(trainEvents=feats,
                        folds=3,
                        seed=1,
                        verbose=T)
feats <- setDataCols(data=feats, 
                    labelName='survivalIn60', 
                    needToRemove=needToRemove)
                    

novasmedley/gbmSpm documentation built on May 17, 2019, 10:39 a.m.