Description Usage Arguments Details Value Examples
This provides a general framework for training patient level prediction models. The user can select various default feature selection methods or incorporate their own, The user can also select from a range of default classifiers or incorporate their own. There are three types of evaluations for the model patient (randomly splits people into train/validation sets) or year (randomly splits data into train/validation sets based on index year - older in training, newer in validation) or both (same as year spliting but checks there are no overlaps in patients within training set and validaiton set - any overlaps are removed from validation set)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | runPlp(
population,
plpData,
minCovariateFraction = 0.001,
normalizeData = T,
modelSettings,
testSplit = "stratified",
testFraction = 0.25,
trainFraction = NULL,
splitSeed = NULL,
nfold = 3,
indexes = NULL,
saveDirectory = NULL,
savePlpData = T,
savePlpResult = T,
savePlpPlots = T,
saveEvaluation = T,
verbosity = "INFO",
timeStamp = FALSE,
analysisId = NULL,
runCovariateSummary = T,
save = NULL
)
|
population |
The population created using createStudyPopulation() who will be used to develop the model |
plpData |
An object of type |
minCovariateFraction |
The minimum fraction of target population who must have a covariate for it to be included in the model training |
normalizeData |
Whether to normalise the covariates before training (Default: TRUE) |
modelSettings |
An object of class
|
testSplit |
Either 'stratified', 'subject' or 'time' specifying the type of evaluation used. 'time' find the date where testFraction of patients had an index after the date and assigns patients with an index prior to this date into the training set and post the date into the test set 'stratified' splits the data into test (1-testFraction of the data) and train (validationFraction of the data) sets. The split is stratified by the class label. 'subject' split is useful when a subject is in the data multiple times and you want all rows for the same subject in either the test or the train set but not in both. |
testFraction |
The fraction of the data to be used as the test set in the patient split evaluation. |
trainFraction |
A real number between 0 and 1 indicating the train set fraction of the data. If not set trainFraction is equal to 1 - test |
splitSeed |
The seed used to split the test/train set when using a person type testSplit |
nfold |
The number of folds used in the cross validation (default 3) |
indexes |
A dataframe containing a rowId and index column where the index value of -1 means in the test set, and positive integer represents the cross validation fold (default is NULL) |
saveDirectory |
The path to the directory where the results will be saved (if NULL uses working directory) |
savePlpData |
Binary indicating whether to save the plpData object (default is T) |
savePlpResult |
Binary indicating whether to save the object returned by runPlp (default is T) |
savePlpPlots |
Binary indicating whether to save the performance plots as pdf files (default is T) |
saveEvaluation |
Binary indicating whether to save the oerformance as csv files (default is T) |
verbosity |
Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are:
|
timeStamp |
If TRUE a timestamp will be added to each logging statement. Automatically switched on for TRACE level. |
analysisId |
Identifier for the analysis. It is used to create, e.g., the result folder. Default is a timestamp. |
runCovariateSummary |
Whether to calculate the mean and sd for each covariate |
save |
Old input - please now use saveDirectory |
Users can define a risk period of interest for the prediction of the outcome relative to index or use the cohprt dates. The user can then specify whether they wish to exclude patients who are not observed during the whole risk period, cohort period or experienced the outcome prior to the risk period.
An object containing the model or location where the model is save, the data selection settings, the preprocessing and training settings as well as various performance measures obtained by the model.
predict |
A function that can be applied to new data to apply the trained model and make predictions |
model |
A list of class |
prediction |
A dataframe containing the prediction for each person in the test set |
evalType |
The type of evaluation that was performed ('person' or 'time') |
performanceTest |
A list detailing the size of the test sets |
performanceTrain |
A list detailing the size of the train sets |
time |
The complete time taken to do the model framework |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | ## Not run:
#******** EXAMPLE 1 *********
#load plpData:
plpData <- loadPlpData(file.path('C:','User','home','data'))
#create study population to develop model on
#require minimum of 365 days observation prior to at risk start
#no prior outcome and person must be observed for 365 after index (minTimeAtRisk)
#with risk window from 0 to 365 days after index
population <- createStudyPopulation(plpData,outcomeId=2042,
firstExposureOnly = FALSE,
washoutPeriod = 365,
removeSubjectsWithPriorOutcome = TRUE,
priorOutcomeLookback = 99999,
requireTimeAtRisk = TRUE,
minTimeAtRisk=365,
riskWindowStart = 0,
addExposureDaysToStart = FALSE,
riskWindowEnd = 365,
addExposureDaysToEnd = FALSE)
#lasso logistic regression predicting outcome 200 in cohorts 10
#using no feature selection with a time split evaluation with 30% in test set
#70% in train set where the model hyper-parameters are selected using 3-fold cross validation:
#and results are saved to file.path('C:','User','home')
model.lr <- lassoLogisticRegression.set()
mod.lr <- runPlp(population=population,
plpData= plpData, minCovariateFraction = 0.001,
modelSettings = model.lr ,
testSplit = 'time', testFraction=0.3,
nfold=3, indexes=NULL,
saveDirectory =file.path('C:','User','myPredictionName'),
verbosity='INFO')
#******** EXAMPLE 2 *********
# Gradient boosting machine with a grid search to select hyper parameters
# using the test/train/folds created for the lasso logistic regression above
model.gbm <- gradientBoostingMachine.set(rsampRate=c(0.5,0.9,1),csampRate=1,
ntrees=c(10,100), bal=c(F,T),
max_depth=c(4,5), learn_rate=c(0.1,0.01))
mod.gbm <- runPlp(population=population,
plpData= plpData,
modelSettings = model.gbm,
testSplit = 'time', testFraction=0.3,
nfold=3, indexes=mod.lr$indexes,
saveDirectory =file.path('C:','User','myPredictionName2'))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.