library(PheValuator)

\newpage

Introduction

The Phevaluator package enables evaluating the performance characteristics of phenotype algorithms (PAs) using data from databases that are translated into the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM).

This vignette describes how to run the PheValuator process from start to end in the Phevaluator package.

Overview of Process

There are several steps in performing a PA evaluation: 1. Creating the extremely specific (xSpec), extremely sensitive (xSens), and prevalence cohorts 2. Creating the Diagnostic Predictive Model and the Evaluation Cohort using the PatientLevelPrediction (PLP) package 5. Evaluating the PAs 6. Examining the results of the evaluation

Each of these steps is described in detail below. For this vignette we will describe the evaluation of PAs for diabetes mellitus (DM).

Creating the Extremely Specific (xSpec), Extremely Sensitive (xSens), and Prevalence Cohorts

The extremely specific (xSpec), extremely sensitive (xSens), and prevalence cohorts are developed using the ATLAS tool. The xSpec is a cohort where the subjects in the cohort are likely to be positive for the health outcome of interest (HOI) with a very high probability. This may be achieved by requiring that subjects have multiple condition codes for the HOI in their patient record. An example of this for DM is included in the OHDSI ATLAS repository. In this example each subject has an initial condition code for DM. The cohort definition further specifies that each subject also has a second code for DM between 1 and 30 days after the initial DM code and 10 additional DM codes in the rest of the patient record. This very specific algorithm for DM ensures that the subjects in this cohort have a very high probability for having the condition of DM. This PA also specifies that subjects are required to have at least 365 days of observation in their patient record.

An example of an xSens cohort is created by developing a PA that is very sensitive for the HOI. The system uses the xSens cohort to create a set of "noisy" negative subjects, i.e., subjects with a high likelihood of not having the HOI. This group of subjects will be used in the model building process and is described in detail below. An example of an xSens cohort for DM is also in the OHDSI ATLAS repository.

The system uses the prevalence cohort to provide a reasonable approximation of the prevalence of the HOI in the population. This improves the calibration of the predictive model. The system will use the xSens cohort as the default if a prevalence cohort is not specified. This group of subjects will be used in the model building process and is described in detail below. An example of an prevalence cohort for DM is also in the OHDSI ATLAS repository.

Evaluating phenotype algorithms for health conditions

The function createEvaluationCohort creates a diagnostic predictive model and an evaluation cohort that will allow the user to perform an analysis for determining the performance characteristics for one or more phenotype algorithms (cohort definitions) for health conditions. This function initiates the process for the first two steps in PheValuator, namely:
1) Develop a diagnostic predictive model for the health condition.
2) Select a large, random set of subjects from the dataset and use the model to determine the probability of each of the subjects having the health condition.

createEvaluationCohort should have as inputs:

The createEvaluationCohort function will produce the following artifacts:
1) A Patient Level Prediction file (in .rds format) containg the information from the model building process
2) A Patient Level Prediction file (in .rds format) containg the information from applying the model to the evaluation cohort

For example:

options(fftempdir = "c:/temp/ff") #place to store large temporary files

connectionDetails <- createConnectionDetails(dbms = "postgresql",
                                              server = "localhost/ohdsi",
                                              user = "joe",
                                              password = "supersecret")

phenoTest <- createEvaluationCohort(connectionDetails = connectionDetails,
                                   xSpecCohortId = 1769699,
                                   xSensCohortId = 1770120,
                                   prevalenceCohortId = 1770119,
                                   cdmDatabaseSchema = "my_cdm_data",
                                   cohortDatabaseSchema = "my_results",
                                   cohortTable  = "cohort",
                                   workDatabaseSchema = "scratch.dbo",
                                   covariateSettings = 
                                    createDefaultChronicCovariateSettings(
                                     excludedCovariateConceptIds = c(201826),
                                     addDescendantsToExclude = TRUE),
                                   baseSampleSize = 2000000,
                                   lowerAgeLimit = 18,
                                   upperAgeLimit = 90,
                                   gender = c(8507, 8532),
                                   startDate = "20101010",
                                   endDate = "21000101",
                                   cdmVersion = "5",
                                   outFolder = "c:/phenotyping",
                                   evaluationCohortId = "diabetes",
                                   removeSubjectsWithFutureDates = TRUE,
                                   saveEvaluationCohortPlpData = FALSE,
                                   modelType = "chronic")

In this example, we used the cohorts developed in the "my_results" cdm, specifying the location of the cohort table (cohortDatabaseSchema, cohortTable - "my_results.cohort") and where the model will find the conditions, drug exposures, etc. to inform the model (cdmDatabaseSchema - "my_cdm_data"). The subjects included in the model will be those whose first visit in the CDM is between January 1, 2010 and December 31, 2017. We are also specifically excluding the concept ID 201826, "Type 2 diabetes mellitus", which was used to create the xSpec cohort as well as all of the descendants of that concept ID. Their ages at the time of first visit will be between 18 and 90.

In this example, the parameters specify that the function will create the model file:
"c:/phenotyping/model_diabetes.rds",

produce the evaluation cohort file:
"c:/phenotyping/evaluationCohort_diabetes.rds"

Evaluating the phenotype algorithms to be used in studies

The function testPhenotypeAlgorithm allows the user to determine the performance characteristics of phenotype algorithms (cohort defintions) to be used in studies. It uses the evaluation cohort developed in the previous step. The same evaluation cohort may be used to test as many different phenotype algorithms as you wish that pertain to the same health condition.

testPhenotypeAlgorithm should the following parameters:

options(fftempdir = "c:/temp/ff") #place to store large temporary files

connectionDetails <- createConnectionDetails(dbms = "postgresql",
                                              server = "localhost/ohdsi",
                                              user = "joe",
                                              password = "supersecret")

phenotypeResults <- testPhenotypeAlgorithm(connectionDetails,
                                   cutPoints = c("EV"),
                                   outFolder = "c:/phenotyping",
                                   evaluationCohortId = "diabetes",
                                   phenotypeCohortId = 7142,
                                   cdmDatabaseSchema = "my_cdm_data",
                                   cohortDatabaseSchema = "my_results",
                                   cohortTable  = "cohort",
                                   washoutPeriod = 365)

In this example, we are using only the expected value ("EV"). Given that parameter setting, the output from this step will provide performance characteristics (i.e, sensitivity, specificity, etc.) at each prediction threshold as well as those using the expected value calculations as described in the Step 2 diagram. The evaluation uses the prediction information for the evaluation cohort developed in the prior step. This function returns a dataframe with the performance characteristics of the phenotype algorithm that was tested. The user can write this dataframe to a csv file using code such as:

      write.csv(phenotypeResults, "c:/phenotyping/diabetes_results.csv", row.names = FALSE)


OHDSI/PheValuator documentation built on Jan. 28, 2024, 4:05 a.m.