library(PatientLevelPrediction)
This vignette describes how you can add your own custom function for feature engineering in the Observational Health Data Sciencs and Informatics (OHDSI) PatientLevelPrediction
package. This vignette assumes you have read and are comfortable with building single patient level prediction models as described in the BuildingPredictiveModels
vignette.
We invite you to share your new feature engineering functions with the OHDSI community through our GitHub repository.
To make a custom feature engineering function that can be used within PatientLevelPrediction you need to write two different functions. The 'create' function and the 'implement' function.
The 'create' function, e.g., create\<FeatureEngineeringFunctionName>, takes the parameters of the feature engineering 'implement' function as input, checks these are valid and outputs these as a list of class 'featureEngineeringSettings' with the 'fun' attribute specifying the 'implement' function to call.
The 'implement' function, e.g., implement\<FeatureEngineeringFunctionName>, must take as input: * trainData - a list containing: - covariateData: the plpData$covariateData restricted to the training patients - labels: a data frame that contain rowId (patient identifier) and outcomeCount (the class labels) - folds: a data.frame that contains rowId (patient identifier) and index (the cross validation fold) * featureEngineeringSettings - the output of your create\<FeatureEngineeringFunctionName>
The 'implement' function can then do any manipulation of the trainData (adding new features or removing features) but must output a trainData object containing the new covariateData, labels and folds for the training data patients.
Let's consider the situation where we wish to create an age spline feature. To make this custom feature engineering function we need to write the 'create' and 'implement' R functions.
Our age spline feature function will create a new feature using the plpData$cohorts ageYear column. We will implement a restricted cubic spline that requires specifying the number of knots.
. Therefore, the inputs for this are:
* knots
an integer/double specifying the number of knots
createAgeSpine <- function( knots = 5 ){ # add input checks checkIsClass(knots, c('numeric','integer')) checkHigher(knots,0) # create list of inputs to implement function featureEngineeringSettings <- list( knots = knots ) # specify the function that will implement the sampling attr(featureEngineeringSettings, "fun") <- "implementAgeSpine" # make sure the object returned is of class "sampleSettings" class(featureEngineeringSettings) <- "featureEngineeringSettings" return(featureEngineeringSettings) }
We now need to create the 'implement' function implementAgeSpine()
All 'implement' functions must take as input the trainData and the featureEngineeringSettings (this is the output of the 'create' function). They must return a trainData object containing the new covariateData, labels and folds.
In our example, the createAgeSpine()
will return a list with 'knots'. The featureEngineeringSettings therefore contains this.
implementAgeSpine <- function(trainData, featureEngineeringSettings){ # currently not used knots <- featureEngineeringSettings$knots # age in in trainData$labels as ageYear ageData <- trainData$labels # now implement the code to do your desired feature engineering data <- Matrix::sparseMatrix( i = 1:length(ageData$rowId), j = rep(1, length(ageData$rowId)), x = ageData$ageYear, dims=c(length(ageData$rowId),1) ) data <- as.matrix(data) x <- data[,1] y <- ageData$outcomeCount mRCS <- rms::ols( y~rms::rcs(x, stats::quantile( x, c(0, .05, .275, .5, .775, .95, 1), include.lowest = TRUE ) ) ) newData <- data.frame( rowId = ageData$rowId, covariateId = 2002, covariateValue = mRCS$fitted.values ) # add new data Andromeda::appendToTable(tbl = trainData$covariateData$covariates, data = newData) # return the updated trainData return(trainData) }
Considerable work has been dedicated to provide the PatientLevelPrediction
package.
citation("PatientLevelPrediction")
Please reference this paper if you use the PLP Package in your work:
This work is supported in part through the National Science Foundation grant IIS 1251151.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.