getTrainTestPartition: Partition patients into testing and training datasets based...

Description Usage Arguments Details Value Examples

Description

Partition patients into testing and training datasets based on mean survival and gender.

Usage

1
2
getTrainTestPartition(data, database = NULL, personTable = NULL,
  verbose = TRUE, seed, survData = NULL)

Arguments

data

a dataframe holding the event data ready for logit modeling, where each row is an event/clinical visit and the columns contain features of the event and the labels. It must at least contain a sample id column in the form "patientID.eventID".

database

a MySQL database, see getData

personTable

a patient-based table in database, see getData

verbose

boolean, True for print, False for silence

seed

int, seed for split

survData

if not calling database, provide the survival data, see getSurvData

Details

Patients are separated in two groups, patients with lower than mean overall survival and patients with higher or equal to mean overall survival. Each group is partitioned into 75% training and 25% testing using gender stratification. The training and testing partitions have similar gender distributions within the two groups.

The training partitions from the low survival group is combined with the high survival group. The testing partitions are also combined in this way. Patients in training are not in testing.

Note: this function calls getSurvData to obtain overall survival days since the parameter data only contains labels and features for classification.

Ratios between folds work better when number of folds are higher (e.g., 10 versus 2).

Value

The eventIDs of the samples to put in training partition.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data("fake_data")
seed <- 1
fake_demo <- fake_data$demo
survData <- fake_data$person
fake_data$events <- cleanData(fake_data$events, tType = 'c') # pool of visits considered

fake_data <- merge(fake_data$events, fake_data$person, by='iois', all.x=T)
fake_data <- prepDemographics(fake_data, fake_demo) # need gender info
fake_data <- prepSurvivalLabels(fake_data) # get survival labels, these also change
fake_data$id <- paste0(fake_data$iois,'.',fake_data$eventID) # clinical ids

train.ids <- getTrainTestPartition(data=fake_data,
                                   database=NULL, 
                                   personTable=NULL,
                                   survData=survData,
                                   seed=seed,
                                   verbose=T)
                                   

novasmedley/gbmSpm documentation built on May 17, 2019, 10:39 a.m.