runPlp: runPlp - Develop and internally evaluate a model using...
In quinterpriest/PatientLevelPrediction: Developing patient level prediction using data in the OMOP Common Data Model

runPlp

R Documentation

runPlp - Develop and internally evaluate a model using specified settings

Description

This provides a general framework for training patient level prediction models. The user can select various default feature selection methods or incorporate their own, The user can also select from a range of default classifiers or incorporate their own. There are three types of evaluations for the model patient (randomly splits people into train/validation sets) or year (randomly splits data into train/validation sets based on index year - older in training, newer in validation) or both (same as year spliting but checks there are no overlaps in patients within training set and validaiton set - any overlaps are removed from validation set)

Usage

runPlp(
  plpData,
  outcomeId = plpData$metaData$call$outcomeIds[1],
  analysisId = paste(Sys.Date(), plpData$metaData$call$outcomeIds[1], sep = "-"),
  analysisName = "Study details",
  populationSettings = createStudyPopulationSettings(),
  splitSettings = createDefaultSplitSetting(type = "stratified", testFraction = 0.25,
    trainFraction = 0.75, splitSeed = 123, nfold = 3),
  sampleSettings = createSampleSettings(type = "none"),
  featureEngineeringSettings = createFeatureEngineeringSettings(type = "none"),
  preprocessSettings = createPreprocessSettings(minFraction = 0.001, normalize = T),
  modelSettings = setLassoLogisticRegression(),
  logSettings = createLogSettings(verbosity = "DEBUG", timeStamp = T, logName =
    "runPlp Log"),
  executeSettings = createDefaultExecuteSettings(),
  saveDirectory = getwd()
)

Arguments

`plpData`	An object of type `plpData` - the patient level prediction data extracted from the CDM.
`outcomeId`	(integer) The ID of the outcome.
`analysisId`	(integer) Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp.
`analysisName`	(character) Name for the analysis
`populationSettings`	An object of type `populationSettings` created using `createStudyPopulationSettings` that specifies how the data class labels are defined and addition any exclusions to apply to the plpData cohort
`splitSettings`	An object of type `splitSettings` that specifies how to split the data into train/validation/test. The default settings can be created using `createDefaultSplitSetting`.
`sampleSettings`	An object of type `sampleSettings` that specifies any under/over sampling to be done. The default is none.
`featureEngineeringSettings`	An object of `featureEngineeringSettings` specifying any feature engineering to be learned (using the train data)
`preprocessSettings`	An object of `preprocessSettings`. This setting specifies the minimum fraction of target population who must have a covariate for it to be included in the model training and whether to normalise the covariates before training
`modelSettings`	An object of class `modelSettings` created using one of the function: setLassoLogisticRegression() A lasso logistic regression model setGradientBoostingMachine() A gradient boosting machine setAdaBoost() An ada boost model setRandomForest() A random forest model setDecisionTree() A decision tree model setCovNN()) A convolutional neural network model setCIReNN() A recurrent neural network model setMLP() A neural network model setDeepNN() A deep neural network model setKNN() A KNN model
`logSettings`	An object of `logSettings` created using `createLogSettings` specifying how the logging is done
`executeSettings`	An object of `executeSettings` specifying which parts of the analysis to run
`saveDirectory`	The path to the directory where the results will be saved (if NULL uses working directory)

Details

This function takes as input the plpData extracted from an OMOP CDM database and follows the specified settings to develop and internally validate a model for the specified outcomeId.

Value

An object containing the following:

inputSettingsA list containing all the settings used to develop the model
model The developed model of class plpModel
executionSummary A list containing the hardward details, R package details and execution time
performanceEvaluation Various internal performance metrics in sparse format
prediction The plpData cohort table with the predicted risks added as a column (named value)
covariateSummary) A characterization of the features for patients with and without the outcome during the time at risk
analysisRef A list with details about the analysis

Examples

## Not run: 
#******** EXAMPLE 1 ********* 
#load plpData:
plpData <- loadPlpData(file.path('C:','User','home','data'))

# specify the outcome to predict (the plpData can have multiple outcomes)
outcomeId <- 2042

# specify a unique identifier for the analysis
analysisId <- 'lrModel'

# create population settings (this defines the labels in the data)
#create study population to develop model on
#require minimum of 365 days observation prior to at risk start
#no prior outcome and person must be observed for 365 after index (minTimeAtRisk)
#with risk window from 0 to 365 days after index
populationSettings <- createStudyPopulationSettings(plpData,
                                    firstExposureOnly = FALSE,
                                    washoutPeriod = 365,
                                    removeSubjectsWithPriorOutcome = TRUE,
                                    priorOutcomeLookback = 99999,
                                    requireTimeAtRisk = TRUE,
                                    minTimeAtRisk=365,
                                    riskWindowStart = 0,
                                    addExposureDaysToStart = FALSE,
                                    riskWindowEnd = 365,
                                    addExposureDaysToEnd = FALSE)
                                    
# create the split setting by specifying how you want to
# partition the data into development (train/validation) and evaluation (test or CV)
splitSettings <- createDefaultSplitSetting(testFraction = 0.25, 
                                           trainFraction = 0.75, 
                                           splitSeed = sample(100000,1), 
                                           nfold=3,
                                           type = 'stratified')                                   
                                    
                                    
# create the settings specifying any under/over sampling 
# in this example we do not do any
sampleSettings <- createSampleSettings(type = 'none')  

# specify any feature engineering that will be applied to the train data
# in this example we do not do any
featureEngineeringSettings <- createFeatureEngineeringSettings(type = 'none')   

# specify whether to use normalization and removal of rare features
# preprocessSettings <- ... 


#lasso logistic regression predicting outcome 200 in cohorts 10 
#using no feature selection with a time split evaluation with 30% in test set
#70% in train set where the model hyper-parameters are selected using 3-fold cross validation:
#and results are saved to file.path('C:','User','home')
modelSettingsLR <- setLassoLogisticRegression()

# specify how you want the logging for the analysis
# generally this is saved in a file with the results 
# but you can define the level of logging 
logSettings <- createLogSettings(verbosity = 'DEBUG',
                                 timeStamp = T,
                                 logName = 'runPlp LR Log')
                                 
# specify what parts of the analysis to run:
# in this example we run everything
executeSettings <- createExecuteSettings(runSplitData = T,
                                         runSampleData = T,
                                         runfeatureEngineering = T,
                                         runProcessData = T,
                                         runModelDevelopment = T,
                                         runCovariateSummary = T)                                        

lrModel <- runPlp(plpData = plpData,
                  outcomeId = outcomeId, 
                  analysisId = analysisId,
                  populationSettings = populationSettings,
                  splitSettings = splitSettings,
                  sampleSettings = sampleSettings,
                  featureEngineeringSettings = featureEngineeringSettings,
                  preprocessSettings = preprocessSettings,
                  modelSettings = modelSettingsLR,
                  logSettings = logSettings
                  executeSettings = executeSettings,
                  saveDirectory = saveDirectory
                  )
 
#******** EXAMPLE 2 *********                                               
# Gradient boosting machine with a grid search to select hyper parameters  
# using the test/train/folds created for the lasso logistic regression above                       
modelSettingsGBM <- gradientBoostingMachine.set(rsampRate=c(0.5,0.9,1),csampRate=1, 
                           ntrees=c(10,100), bal=c(F,T),
                           max_depth=c(4,5), learn_rate=c(0.1,0.01))
                           
analysisId <- 'gbmModel'

gbmModel <- runPlp(plpData = plpData,
                  outcomeId = outcomeId, 
                  analysisId = analysisId,
                  populationSettings = populationSettings,
                  splitSettings = splitSettings,
                  sampleSettings = sampleSettings,
                  featureEngineeringSettings = featureEngineeringSettings,
                  preprocessSettings = preprocessSettings,
                  modelSettings = modelSettingsGBM,
                  logSettings = logSettings
                  executeSettings = executeSettings,
                  saveDirectory = saveDirectory
                  )

## End(Not run)

quinterpriest/PatientLevelPrediction documentation built on April 20, 2022, 12:50 a.m.

quinterpriest/PatientLevelPrediction index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

quinterpriest/PatientLevelPrediction
Developing patient level prediction using data in the OMOP Common Data Model

runPlp: runPlp - Develop and internally evaluate a model using...
In quinterpriest/PatientLevelPrediction: Developing patient level prediction using data in the OMOP Common Data Model

runPlp - Develop and internally evaluate a model using specified settings

Description

Usage

Arguments

Details

Value

Examples

Related to runPlp in quinterpriest/PatientLevelPrediction...

R Package Documentation

Browse R Packages

We want your feedback!

quinterpriest/PatientLevelPrediction Developing patient level prediction using data in the OMOP Common Data Model

runPlp: runPlp - Develop and internally evaluate a model using... In quinterpriest/PatientLevelPrediction: Developing patient level prediction using data in the OMOP Common Data Model

runPlp - Develop and internally evaluate a model using specified settings

Description

Usage

Arguments

Details

Value

Examples

Related to runPlp in quinterpriest/PatientLevelPrediction...

R Package Documentation

Browse R Packages

We want your feedback!

quinterpriest/PatientLevelPrediction
Developing patient level prediction using data in the OMOP Common Data Model

runPlp: runPlp - Develop and internally evaluate a model using...
In quinterpriest/PatientLevelPrediction: Developing patient level prediction using data in the OMOP Common Data Model