runPlp | R Documentation |
This provides a general framework for training patient level prediction models. The user can select various default feature selection methods or incorporate their own, The user can also select from a range of default classifiers or incorporate their own. There are three types of evaluations for the model patient (randomly splits people into train/validation sets) or year (randomly splits data into train/validation sets based on index year - older in training, newer in validation) or both (same as year spliting but checks there are no overlaps in patients within training set and validaiton set - any overlaps are removed from validation set)
runPlp( plpData, outcomeId = plpData$metaData$call$outcomeIds[1], analysisId = paste(Sys.Date(), plpData$metaData$call$outcomeIds[1], sep = "-"), analysisName = "Study details", populationSettings = createStudyPopulationSettings(), splitSettings = createDefaultSplitSetting(type = "stratified", testFraction = 0.25, trainFraction = 0.75, splitSeed = 123, nfold = 3), sampleSettings = createSampleSettings(type = "none"), featureEngineeringSettings = createFeatureEngineeringSettings(type = "none"), preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = T), modelSettings = setLassoLogisticRegression(), logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName = "runPlp Log"), executeSettings = createDefaultExecuteSettings(), saveDirectory = getwd() )
plpData |
An object of type |
outcomeId |
(integer) The ID of the outcome. |
analysisId |
(integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp. |
analysisName |
(character) Name for the analysis |
populationSettings |
An object of type |
splitSettings |
An object of type |
sampleSettings |
An object of type |
featureEngineeringSettings |
An object of |
preprocessSettings |
An object of |
modelSettings |
An object of class
|
logSettings |
An object of |
executeSettings |
An object of |
saveDirectory |
The path to the directory where the results will be saved (if NULL uses working directory) |
This function takes as input the plpData extracted from an OMOP CDM database and follows the specified settings to develop and internally validate a model for the specified outcomeId.
An object containing the following:
inputSettingsA list containing all the settings used to develop the model
model The developed model of class plpModel
executionSummary A list containing the hardward details, R package details and execution time
performanceEvaluation Various internal performance metrics in sparse format
prediction The plpData cohort table with the predicted risks added as a column (named value)
covariateSummary) A characterization of the features for patients with and without the outcome during the time at risk
analysisRef A list with details about the analysis
## Not run: #******** EXAMPLE 1 ********* #load plpData: plpData <- loadPlpData(file.path('C:','User','home','data')) # specify the outcome to predict (the plpData can have multiple outcomes) outcomeId <- 2042 # specify a unique identifier for the analysis analysisId <- 'lrModel' # create population settings (this defines the labels in the data) #create study population to develop model on #require minimum of 365 days observation prior to at risk start #no prior outcome and person must be observed for 365 after index (minTimeAtRisk) #with risk window from 0 to 365 days after index populationSettings <- createStudyPopulationSettings(plpData, firstExposureOnly = FALSE, washoutPeriod = 365, removeSubjectsWithPriorOutcome = TRUE, priorOutcomeLookback = 99999, requireTimeAtRisk = TRUE, minTimeAtRisk=365, riskWindowStart = 0, addExposureDaysToStart = FALSE, riskWindowEnd = 365, addExposureDaysToEnd = FALSE) # create the split setting by specifying how you want to # partition the data into development (train/validation) and evaluation (test or CV) splitSettings <- createDefaultSplitSetting(testFraction = 0.25, trainFraction = 0.75, splitSeed = sample(100000,1), nfold=3, type = 'stratified') # create the settings specifying any under/over sampling # in this example we do not do any sampleSettings <- createSampleSettings(type = 'none') # specify any feature engineering that will be applied to the train data # in this example we do not do any featureEngineeringSettings <- createFeatureEngineeringSettings(type = 'none') # specify whether to use normalization and removal of rare features # preprocessSettings <- ... #lasso logistic regression predicting outcome 200 in cohorts 10 #using no feature selection with a time split evaluation with 30% in test set #70% in train set where the model hyper-parameters are selected using 3-fold cross validation: #and results are saved to file.path('C:','User','home') modelSettingsLR <- setLassoLogisticRegression() # specify how you want the logging for the analysis # generally this is saved in a file with the results # but you can define the level of logging logSettings <- createLogSettings(verbosity = 'DEBUG', timeStamp = T, logName = 'runPlp LR Log') # specify what parts of the analysis to run: # in this example we run everything executeSettings <- createExecuteSettings(runSplitData = T, runSampleData = T, runfeatureEngineering = T, runProcessData = T, runModelDevelopment = T, runCovariateSummary = T) lrModel <- runPlp(plpData = plpData, outcomeId = outcomeId, analysisId = analysisId, populationSettings = populationSettings, splitSettings = splitSettings, sampleSettings = sampleSettings, featureEngineeringSettings = featureEngineeringSettings, preprocessSettings = preprocessSettings, modelSettings = modelSettingsLR, logSettings = logSettings executeSettings = executeSettings, saveDirectory = saveDirectory ) #******** EXAMPLE 2 ********* # Gradient boosting machine with a grid search to select hyper parameters # using the test/train/folds created for the lasso logistic regression above modelSettingsGBM <- gradientBoostingMachine.set(rsampRate=c(0.5,0.9,1),csampRate=1, ntrees=c(10,100), bal=c(F,T), max_depth=c(4,5), learn_rate=c(0.1,0.01)) analysisId <- 'gbmModel' gbmModel <- runPlp(plpData = plpData, outcomeId = outcomeId, analysisId = analysisId, populationSettings = populationSettings, splitSettings = splitSettings, sampleSettings = sampleSettings, featureEngineeringSettings = featureEngineeringSettings, preprocessSettings = preprocessSettings, modelSettings = modelSettingsGBM, logSettings = logSettings executeSettings = executeSettings, saveDirectory = saveDirectory ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.