knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(dplyr) library(ggplot2) set.seed(1234)
This document accompanies the "A method to estimate probability of disease and vaccine efficacy from clinical trial immunogenicity data." publication. It describes the application of PoDBAY package on the PoDBAY efficacy estimation examples using data from clinical trial(s).
The goal of PoDBAY efficacy estimation analysis is to:
We describe two scenarios of application in PoDBAY efficacy estimation:
PoDBAY efficacy is estimated in two subsequent steps as described in the publication, section Methods.
Notes:
PoD curve is estimated (point estimate together with confidence intervals) in three steps - further details can be found in the publication, section Methods.
Titers of all diseased and all non-diseased subjects are used for estimation of PoD curve parameters. Parameter estimates $p_{max}^$, $et_{50}^
$ and $\gamma^`$ are obtained.
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values $p_{max}^$, $et_{50}^
$ and $\gamma^`$. New disease status is assigned to each titer based on the probability of disease.
Titers of all new diseased and all new non-diseased subjects are used for re-estimation of PoD curve parameters. Parameter estimates $p_{max}^{}$, $et_{50}^{
}$ and $\gamma^{``}$ are obtained.
Diseased and non-diseased subject level data are required. We'll use PoDBAY::diseased
and PoDBAY::nondiseased
mock-up data. Both datasets contain population summary statistics (N, mean, sd) and individual subject level data (log2 titers, disease status (DS))
Only the individual subject level data (log2 titers, DS) are used for the PoD curve estimation as described above.
library(PoDBAY) data(diseased) data(nondiseased) str(diseased) str(nondiseased)
Note: To convert your data in to the population
class object use generatePopulation()
function from PoDBAY package. See vignette vignette("population", package = "PoDBAY")
for further details.
Once we have our data prepared function PoDParamEstimation
is used to estimate PoD curve parameters in three steps as described above. For more details about the usage of the function see examples in ?PoDParamEstimation()
.
estimatedParameters <- PoDParamEstimation(diseasedTiters = diseased$titers, nondiseasedTiters = nondiseased$titers, nondiseasedGenerationCount = nondiseased$N, repeatCount = 50)
Step 1: $p_{max}^$, $et_{50}^
$ and $\gamma^`$
Results corresponding to the first step of estimation of PoD-titer relationship can be obtained via estimatedParameters$resultsPriorReset
.
as_tibble(estimatedParameters$resultsPriorReset)
Note that parameter estimates are the same for every repeatCount
iteration. This is according to our expectations as the same diseased and non-diseased cases are used in every iteration in step 1
of this example.
Step 2: Bootstrap and re-assignment of disease status
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values $p_{max}^$, $et_{50}^
$ and $\gamma^`$. New disease status is assigned to each titer based on the probability of disease.
Step 3: $p^{}_{max}$, $et^{
}_{50}$ and $\gamma^{``}$
Results corresponding to the third step of Estimation of PoD-titer relationship can be obtained via estimatedParameters$results
.
as_tibble(estimatedParameters$results)
Non-parametric bootstrap described in step 2
is applied inside the function. Therefore, the estimated PoD curve parameters differ in this case.
Parameters of PoD curve point estimate representing the PoD-titer relationship are estimated using results from 'step 1' - estimatedParameters$resultsPriorReset
.
PoDParamsPointEst <- PoDParamPointEstimation(estimatedParameters$resultsPriorReset) PoDParamsPointEst
Confidence intervals (95% level of significance) of PoD curve parameters are calculated using results from 'step 3' - estimatedParameters$results
.
PoDParametersCI <- PoDParamsCI(estimatedParameters$results, ci = 0.95) unlist(PoDParametersCI)
Using PoD curve point estimate and results from step 3
- estimatedParameters
PoD curve can be plotted.
titers <- seq(from = 0, to = 15, by = 0.01) PoDCurve <- PoDCurvePlot(titers, estimatedParameters, ci = 0.95) PoDCurve
PoDBAY Efficacy (point estimate together with confidence intervals) is estimated - further details can be found in the publication, section Methods.
PoDParamsPointEst
from step 1 PoD-titer relationship estimation - Trial A
}$, $et_{50}^{
}$ and $\gamma^{`}$ from
estimatedParameters$resultsfrom step 1
PoD-titer relationship estimation - Trial A`step 2
) and standard deviations`}$ from
step 3` Vaccinated and control population summary statistics (N, mean, sd) are required. We'll use PoDBAY::vaccinated
and PoDBAY::control
mock-up data. Both datasets contain population summary statistics (N, mean, sd) and individual subject level log2 titers.
Only the population summary statistics (N, mean, sd) data are used for the PoDBAY efficacy estimation as described above.
data(vaccinated) data(control) str(vaccinated) str(control)
Note: To convert your data in to the population
class object use generatePopulation()
function from PoDBAY package. See vignette vignette("population", package = "PoDBAY")
for further details.
Once we have our data prepared function efficacyComputation
is used to estimate Efficacy point estimate as described above in step 1
.
means <- list("vaccinated" = vaccinated$mean, "control" = control$mean) standardDeviations <- list("vaccinated" = vaccinated$stdDev, "control" = control$stdDev) EfficacyPointEst <- efficacyComputation(PoDParamsPointEst, means, standardDeviations) EfficacyPointEst
Jittering of population mean from step 2
by drawing from sampling distribution is done inside of PoDBAYEfficacy
function. Efficacy set is estimated as described above in step 3
efficacySet <- PoDBAYEfficacy(estimatedParameters$results, vaccinated, control)
Confidence intervals (95% level of significance) of PoDBAY efficacy are obtained as described above in step 4
.
CI <- EfficacyCI(efficacySet, ci = 0.95) unlist(CI)
Analysis provides following results:
result <- list( EfficacyPointEst = EfficacyPointEst, efficacyCI = unlist(CI), PoDParamsPointEst = PoDParamsPointEst, PoDParametersCI = unlist(PoDParametersCI), PoDCurve = PoDCurve ) result
In a frequent case when serum samples at baseline and after vaccination are collected and assayed only in a subset of subjects (“immunogenicity sample/ subset”) and the assay value of titer is obtained also for all disease cases at the same time points, the general method for PoD curve estimation described above can be extended. Further details can be found in the publication Appendix A.
PoD curve is estimated (point estimate together with confidence intervals) in three steps.
Titers of all non-diseased subjects are generated by random sampling with replacement from immunogenicity subset.
Titers of all diseased and all generated non-diseased subjects (generated in step 1
) are used for estimation of PoD curve parameters. Parameter estimates $p_{max}^$, $et_{50}^
$ and $\gamma^`$ are obtained.
Titers of all diseased and all non-diseased subjects are put together and bootstrapped. For each individual titer a probability of disease is calculated using the PoD curve with parameter values $p_{max}^$, $et_{50}^
$ and $\gamma^`$. New disease status is assigned to each titer based on the probability of disease.
New immunogenicity subset is selected from all new non-diseased, such that the ratio of all diseased versus non-diseased in immunogenicity subset in new data match the ratio in original data.
Titers of all new non-diseased subjects are generated by random sampling with replacement from new immunogenicity subset.
Titers of all new diseased and all new generated non-diseased subjects (generated in step 5
) are used for re-estimation of PoD curve parameters. Parameter estimates $p_{max}^{}$, $et_{50}^{
}$ and $\gamma^{``}$ are obtained.
Assume hypothetical case where we have clinical trial data of 2,000 subjects from which only 200 subjects' plasma samples are collected and examined in the immunogenicity study. Further, out of these 2,000 we identify 35 disease cases to which we measure titers from the same time point. In the end we have titer information about 200 subjects from the immunogenicity study and 35 diseased subjects.
| Population | # subjects (N) | |:-----------------------------------|:--------------:| | Whole Trial | | | All subjects | 2,000 | | Diseased | 35 | | Non-diseased | 1,965 | | | | | Measured titers | | | Diseased | 35 | | Immunogenicity sample | 200 |
Note that in the immunogenicity sample the disease status is unknown as the sample is created before the clinical study. However, vaccination status is known.
In our example the steps would be following:
Titers of all non-diseased subjects (N = 1,965) are generated by random sampling with replacement from immunogenicity subset (N = 200).
Titers of all diseased (N = 35) and all generated non-diseased (N = 1,965) subjects are used for estimation of PoD curve parameters.
Titers of all diseased (N = 35) and all generated non-diseased (N = 1,965) subjects are put together and bootstrapped (N = 2,000).
New immunogenicity subset is selected from all new non-diseased ($N^$ = 2000 - X), such that the ratio of all new diseased ($N^
$ = X) versus new non-diseased in immunogenicity subset in new data match the ratio in original data (ratio = 200:35).
| Population | # subjects ($N^`$) | |:-----------------------------------|:---------------------| | New diseased | $X$ | | New non-diseased | $2000 - X$ | | New Immunogenicity sample | $X * \frac{200}{35}$ |
Titers of all new non-diseased subjects are generated by random sampling with replacement from new immunogenicity subset ($N^` = X * \frac{200}{35}$)
Titers of all new diseased ($N^= X$) and all new **generated** non-diseased subjects ($N^
= 2000 - X$) are used for second estimation of PoD curve parameters.
Diseased and non-diseased subject level data are required. We'll use PoDBAY::diseased
and PoDBAY::nondiseased
mock-up data. Both datasets contain population summary statistics (N, mean, sd) and individual subject level data (log2 titers, diseases status (DS))
Only the individual subject level data (log2 titers, DS) are used for the PoD curve estimation as described above.
We create the immunogenicity sample from our mock-up data as described above - We start with the titer information about 200 subjects from the immunogenicity study and 35 diseased subjects.
data(diseased) data(nondiseased) # Immunogenicity sample created ImmunogenicitySample <- BlindSampling(diseased, nondiseased, method = list(name = "Fixed", value = 200)) nondiseasedImmunogenicitySample <- ImmunogenicitySample$ImmunogenicityNondiseased str(diseased) str(nondiseasedImmunogenicitySample)
Note: From now on the analysis and used functions are the same as in general case. Only the input variable change from unifected
to NondiseasedImmunogenicitySample
. The nondiseasedGenerationCount
remains the same as the total number of nondiseased remains the same in the whole trial.
Once we have our data prepared, function PoDParamEstimation
is used to estimate PoD curve parameters in six steps as described above. For more details about the usage of the function see examples in ?PoDParamEstimation()
.
estimatedParametersAP <- PoDParamEstimation(diseasedTiters = diseased$titers, nondiseasedTiters = nondiseasedImmunogenicitySample$titers, nondiseasedGenerationCount = nondiseased$N, repeatCount = 50)
Step 1: $p_{max}^$, $et_{50}^
$ and $\gamma^`$
Results corresponding to the first step of Estimation of PoD-titer relationship can be obtained via estimatedParametersAP$resultsPriorReset
.
as_tibble(estimatedParametersAP$resultsPriorReset)
Note that parameter estimates are now different for each repeatCount
iteration. This is according to our expectations as titers of all non-diseased subjects are generated by random sampling with replacement from immunogenicity subset in every iteration in step 1
of this example.
Step 2: Data generation and re-assignment of disease status
Step 3: $p^{}_{max}$, $et^{
}_{50}$ and $\gamma^{``}$
Results corresponding to the sixth step of estimation of PoD-titer relationship can be obtained via estimatedParametersAP$results
.
as_tibble(estimatedParametersAP$results)
Non-parametric bootstrap described in step 3
together with creation of new immunogenicity sample in step 4-5
is applied inside the function.
Parameters of PoD curve point estimate representing the PoD-titer relationship are estimated using results from 'step 1' - estimatedParametersAP$resultsPriorReset
.
PoDParamsPointEst <- PoDParamPointEstimation(estimatedParametersAP$resultsPriorReset) PoDParamsPointEst
Confidence intervals (80%, 90% and 95% level of significance) of PoD curve parameters are calculated using results from 'step 6' - estimatedParametersAP$results
.
PoDParametersCI <- PoDParamsCI(estimatedParametersAP$results) unlist(PoDParametersCI)
Using PoD curve point estimate and results from step 6
- estimatedParametersAP
PoD curve can be plotted.
titers <- seq(from = 0, to = 15, by = 0.01) PoDCurve <- PoDCurvePlot(titers, estimatedParametersAP, ci = 0.95) PoDCurve
There are two possible situations:
We will describe the approach in the situation where Trial A = Trial B.
As stated above the only difference is in the data availability. The fact that vaccinated and control population summary statistics (N, mean, sd) are required remains the same. Therefore, we calculate summary statistics for both populations using immunogenicity subset data - created in the PoD-titer relationship estimation step.
# Immunogenicity sample - vaccinated str(ImmunogenicitySample$ImmunogenicityVaccinated) # Immunogenicity sample - control str(ImmunogenicitySample$ImmunogenicityControl)
means <- list("vaccinated" = ImmunogenicitySample$ImmunogenicityVaccinated$mean, "control" = ImmunogenicitySample$ImmunogenicityControl$mean) standardDeviations <- list("vaccinated" = ImmunogenicitySample$ImmunogenicityVaccinated$stdDev, "control" = ImmunogenicitySample$ImmunogenicityControl$stdDev) EfficacyPointEst <- efficacyComputation(PoDParamsPointEst, means, standardDeviations) EfficacyPointEst
efficacySet <- PoDBAYEfficacy(estimatedParametersAP$results, ImmunogenicitySample$ImmunogenicityVaccinated, ImmunogenicitySample$ImmunogenicityControl)
CI <- EfficacyCICoverage(efficacySet) unlist(CI)
Analysis provides following results:
result <- list( EfficacyPointEst = EfficacyPointEst, efficacyCI = unlist(CI), PoDParamsPointEst = PoDParamsPointEst, PoDParametersCI = unlist(PoDParametersCI), PoDCurve = PoDCurve ) result
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.