createLearningCurve: createLearningCurve

Description Usage Arguments Value Examples

View source: R/LearningCurve.R

Description

Creates a learning curve object, which can be plotted using the plotLearningCurve() function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
createLearningCurve(
  population,
  plpData,
  modelSettings,
  testSplit = "person",
  testFraction = 0.25,
  trainFractions = c(0.25, 0.5, 0.75),
  trainEvents = NULL,
  splitSeed = NULL,
  nfold = 3,
  indexes = NULL,
  verbosity = "TRACE",
  clearffTemp = FALSE,
  minCovariateFraction = 0.001,
  normalizeData = T,
  saveDirectory = getwd(),
  savePlpData = F,
  savePlpResult = F,
  savePlpPlots = F,
  saveEvaluation = F,
  timeStamp = FALSE,
  analysisId = NULL
)

Arguments

population

The population created using createStudyPopulation() that will be used to develop the model.

plpData

An object of type plpData - the patient level prediction data extracted from the CDM.

modelSettings

An object of class modelSettings created using one of the function:

  • setLassoLogisticRegression - a lasso logistic regression model

  • setGradientBoostingMachine - a gradient boosting machine

  • setRandomForest - a random forest model

  • setKNN - a k-nearest neighbour model

testSplit

Specifies the type of evaluation used. Can be either 'person' or 'time'. The value 'time' finds the date that splots the population into the testing and training fractions provided. Patients with an index after this date are assigned to the test set and patients with an index prior to this date are assigned to the training set. The value 'person' splits the data randomly into testing and training sets according to fractions provided. The split is stratified by the class label.

testFraction

The fraction of the data, which will be used as the testing set in the patient split evaluation.

trainFractions

A list of training fractions to create models for. Note, providing trainEvents will override your input to trainFractions.

trainEvents

Events have shown to be determinant of model performance. Therefore, it is recommended to provide trainEvents rather than trainFractions. Note, providing trainEvents will override your input to trainFractions. The format should be as follows:

  • c(500, 1000, 1500) - a list of training events

splitSeed

The seed used to split the testing and training set when using a 'person' type split

nfold

The number of folds used in the cross validation (default = 3).

indexes

A dataframe containing a rowId and index column where the index value of -1 means in the test set, and positive integer represents the cross validation fold (default is NULL).

verbosity

Sets the level of the verbosity. If the log level is at or higher in priority than the logger threshold, a message will print. The levels are:

  • DEBUG - highest verbosity showing all debug statements

  • TRACE - showing information about start and end of steps

  • INFO - show informative messages (default)

  • WARN - show warning messages

  • ERROR - show error messages

  • FATAL - be silent except for fatal errors

clearffTemp

Clears the temporary ff-directory after each iteration. This can be useful, if the fitted models are large.

minCovariateFraction

Minimum covariate prevalence in population to avoid removal during preprocssing.

normalizeData

Whether to normalise the data

saveDirectory

Location to save log and results

savePlpData

Whether to save the plpData

savePlpResult

Whether to save the plpResult

savePlpPlots

Whether to save the plp plots

saveEvaluation

Whether to save the plp performance csv files

timeStamp

Include a timestamp in the log

analysisId

The analysis unique identifier

Value

A learning curve object containing the various performance measures obtained by the model for each training set fraction. It can be plotted using plotLearningCurve.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
# define model
modelSettings = PatientLevelPrediction::setLassoLogisticRegression()

# create learning curve
learningCurve <- PatientLevelPrediction::createLearningCurve(population,
                                                             plpData,
                                                             modelSettings)
# plot learning curve
PatientLevelPrediction::plotLearningCurve(learningCurve)

## End(Not run)

hxia/plp-git-demo documentation built on March 19, 2021, 1:54 a.m.