runPlp | R Documentation |
This provides a general framework for training patient level prediction models. The user can select various default feature selection methods or incorporate their own, The user can also select from a range of default classifiers or incorporate their own. There are three types of evaluations for the model patient (randomly splits people into train/validation sets) or year (randomly splits data into train/validation sets based on index year - older in training, newer in validation) or both (same as year spliting but checks there are no overlaps in patients within training set and validaiton set - any overlaps are removed from validation set)
runPlp(
plpData,
outcomeId = plpData$metaData$call$outcomeIds[1],
analysisId = paste(Sys.Date(), plpData$metaData$call$outcomeIds[1], sep = "-"),
analysisName = "Study details",
populationSettings = createStudyPopulationSettings(),
splitSettings = createDefaultSplitSetting(type = "stratified", testFraction = 0.25,
trainFraction = 0.75, splitSeed = 123, nfold = 3),
sampleSettings = createSampleSettings(type = "none"),
featureEngineeringSettings = createFeatureEngineeringSettings(type = "none"),
preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = T),
modelSettings = setLassoLogisticRegression(),
logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName =
"runPlp Log"),
executeSettings = createDefaultExecuteSettings(),
saveDirectory = getwd()
)
plpData |
An object of type |
outcomeId |
(integer) The ID of the outcome. |
analysisId |
(integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp. |
analysisName |
(character) Name for the analysis |
populationSettings |
An object of type |
splitSettings |
An object of type |
sampleSettings |
An object of type |
featureEngineeringSettings |
An object of |
preprocessSettings |
An object of |
modelSettings |
An object of class
|
logSettings |
An object of |
executeSettings |
An object of |
saveDirectory |
The path to the directory where the results will be saved (if NULL uses working directory) |
This function takes as input the plpData extracted from an OMOP CDM database and follows the specified settings to develop and internally validate a model for the specified outcomeId.
An object containing the following:
model The developed model of class plpModel
executionSummary A list containing the hardward details, R package details and execution time
performanceEvaluation Various internal performance metrics in sparse format
prediction The plpData cohort table with the predicted risks added as a column (named value)
covariateSummary A characterization of the features for patients with and without the outcome during the time at risk
analysisRef A list with details about the analysis
## Not run:
#******** EXAMPLE 1 *********
#load plpData:
plpData <- loadPlpData(file.path('C:','User','home','data'))
# specify the outcome to predict (the plpData can have multiple outcomes)
outcomeId <- 2042
# specify a unique identifier for the analysis
analysisId <- 'lrModel'
# create population settings (this defines the labels in the data)
#create study population to develop model on
#require minimum of 365 days observation prior to at risk start
#no prior outcome and person must be observed for 365 after index (minTimeAtRisk)
#with risk window from 0 to 365 days after index
populationSettings <- createStudyPopulationSettings(plpData,
firstExposureOnly = FALSE,
washoutPeriod = 365,
removeSubjectsWithPriorOutcome = TRUE,
priorOutcomeLookback = 99999,
requireTimeAtRisk = TRUE,
minTimeAtRisk=365,
riskWindowStart = 0,
addExposureDaysToStart = FALSE,
riskWindowEnd = 365,
addExposureDaysToEnd = FALSE)
# create the split setting by specifying how you want to
# partition the data into development (train/validation) and evaluation (test or CV)
splitSettings <- createDefaultSplitSetting(testFraction = 0.25,
trainFraction = 0.75,
splitSeed = sample(100000,1),
nfold=3,
type = 'stratified')
# create the settings specifying any under/over sampling
# in this example we do not do any
sampleSettings <- createSampleSettings(type = 'none')
# specify any feature engineering that will be applied to the train data
# in this example we do not do any
featureEngineeringSettings <- createFeatureEngineeringSettings(type = 'none')
# specify whether to use normalization and removal of rare features
# preprocessSettings <- ...
#lasso logistic regression predicting outcome 200 in cohorts 10
#using no feature selection with a time split evaluation with 30% in test set
#70% in train set where the model hyper-parameters are selected using 3-fold cross validation:
#and results are saved to file.path('C:','User','home')
modelSettingsLR <- setLassoLogisticRegression()
# specify how you want the logging for the analysis
# generally this is saved in a file with the results
# but you can define the level of logging
logSettings <- createLogSettings(verbosity = 'DEBUG',
timeStamp = T,
logName = 'runPlp LR Log')
# specify what parts of the analysis to run:
# in this example we run everything
executeSettings <- createExecuteSettings(runSplitData = T,
runSampleData = T,
runfeatureEngineering = T,
runProcessData = T,
runModelDevelopment = T,
runCovariateSummary = T)
lrModel <- runPlp(plpData = plpData,
outcomeId = outcomeId,
analysisId = analysisId,
populationSettings = populationSettings,
splitSettings = splitSettings,
sampleSettings = sampleSettings,
featureEngineeringSettings = featureEngineeringSettings,
preprocessSettings = preprocessSettings,
modelSettings = modelSettingsLR,
logSettings = logSettings
executeSettings = executeSettings,
saveDirectory = saveDirectory
)
#******** EXAMPLE 2 *********
# Gradient boosting machine with a grid search to select hyper parameters
# using the test/train/folds created for the lasso logistic regression above
modelSettingsGBM <- gradientBoostingMachine.set(rsampRate=c(0.5,0.9,1),csampRate=1,
ntrees=c(10,100), bal=c(F,T),
max_depth=c(4,5), learn_rate=c(0.1,0.01))
analysisId <- 'gbmModel'
gbmModel <- runPlp(plpData = plpData,
outcomeId = outcomeId,
analysisId = analysisId,
populationSettings = populationSettings,
splitSettings = splitSettings,
sampleSettings = sampleSettings,
featureEngineeringSettings = featureEngineeringSettings,
preprocessSettings = preprocessSettings,
modelSettings = modelSettingsGBM,
logSettings = logSettings
executeSettings = executeSettings,
saveDirectory = saveDirectory
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.