View source: R/DataSplitting.R
splitData | R Documentation |
splitSettings
Split the plpData into test/train sets using a splitting settings of class
splitSettings
splitData(
plpData = plpData,
population = population,
splitSettings = createDefaultSplitSetting(splitSeed = 42)
)
plpData |
An object of type |
population |
The population created using |
splitSettings |
An object of type |
Returns a list containing the training data (Train) and optionally the test data (Test). Train is an Andromeda object containing
covariates: a table (rowId, covariateId, covariateValue) containing the covariates for each data point in the train data
covariateRef: a table with the covariate information
labels: a table (rowId, outcomeCount, ...) for each data point in the train data (outcomeCount is the class label)
folds: a table (rowId, index) specifying which training fold each data point is in.
Test is an Andromeda object containing
covariates: a table (rowId, covariateId, covariateValue) containing the covariates for each data point in the test data
covariateRef: a table with the covariate information
labels: a table (rowId, outcomeCount, ...) for each data point in the test data (outcomeCount is the class label)
data("simulationProfile")
plpData <- simulatePlpData(simulationProfile, n = 1000)
population <- createStudyPopulation(plpData)
splitSettings <- createDefaultSplitSetting(testFraction = 0.50,
trainFraction = 0.50, nfold = 5)
data = splitData(plpData, population, splitSettings)
# test data should be ~500 rows (changes because of study population)
nrow(data$Test$labels)
# train data should be ~500 rows
nrow(data$Train$labels)
# should be five fold in the train data
length(unique(data$Train$folds$index))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.