splitData: Split the plpData into test/train sets using a splitting...

View source: R/DataSplitting.R

splitDataR Documentation

Split the plpData into test/train sets using a splitting settings of class splitSettings


Split the plpData into test/train sets using a splitting settings of class splitSettings


  plpData = plpData,
  population = population,
  splitSettings = createDefaultSplitSetting(splitSeed = 42)



An object of type plpData - the patient level prediction data extracted from the CDM.


The population created using createStudyPopulation that define who will be used to develop the model


An object of type splitSettings specifying the split - the default can be created using createDefaultSplitSetting


Returns a list containing the training data (Train) and optionally the test data (Test). Train is an Andromeda object containing

  • covariates: a table (rowId, covariateId, covariateValue) containing the covariates for each data point in the train data

  • covariateRef: a table with the covariate information

  • labels: a table (rowId, outcomeCount, ...) for each data point in the train data (outcomeCount is the class label)

  • folds: a table (rowId, index) specifying which training fold each data point is in.

Test is an Andromeda object containing

  • covariates: a table (rowId, covariateId, covariateValue) containing the covariates for each data point in the test data

  • covariateRef: a table with the covariate information

  • labels: a table (rowId, outcomeCount, ...) for each data point in the test data (outcomeCount is the class label)


plpData <- simulatePlpData(simulationProfile, n = 1000)
population <- createStudyPopulation(plpData)
splitSettings <- createDefaultSplitSetting(testFraction = 0.50, 
                                           trainFraction = 0.50, nfold = 5)
data = splitData(plpData, population, splitSettings)
# test data should be ~500 rows (changes because of study population)
# train data should be ~500 rows
# should be five fold in the train data

OHDSI/PatientLevelPrediction documentation built on Feb. 14, 2025, 9:44 a.m.