bootf2: Estimate 90% Confidence Intervals of f2 with Bootstrap...

Description Usage Arguments Details Value References Examples

View source: R/bootf2.R

Description

Main function to estimate 90% confidence intervals of f2 using bootstrap methodology.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
bootf2(test, ref, path.in, file.in, path.out, file.out,
       n.boots = 10000L, seed = 306L, digits = 2L, alpha = 0.05,
       regulation = c("EMA", "FDA", "WHO","Canada", "ANVISA"),
       min.points = 1L, both.TR.85 = FALSE, print.report = TRUE,
       report.style = c("concise", "intermediate", "detailed"),
       f2.type = c("all", "est.f2", "exp.f2", "bc.f2",
                   "vc.exp.f2", "vc.bc.f2"),
       ci.type = c("all", "normal", "basic", "percentile",
                   "bca.jackknife", "bca.boot"),
       quantile.type = c("all", as.character(1:9), "boot"),
       jackknife.type = c("all", "nt+nr", "nt*nr", "nt=nr"),
       time.unit = c("min", "h"), output.to.screen = FALSE,
       sim.data.out = FALSE)

Arguments

test, ref

Data frames of dissolution profiles of test and reference product if path.in and file.in are not specified; otherwise, they should be character strings indicating the worksheet names of the Excel file where the dissolution data is saved. See Input/Output in Details.

path.in, file.in, path.out, file.out

Character strings of input and output directories and file names. See Input/Output in Details.

n.boots

An integer indicating the number of bootstrap samples.

seed

Integer seed value for reproducibility. If missing, a random seed will be generated for reproducibility purpose.

digits

An integer indicating the decimal points for the output.

alpha

A numeric value between 0 and 1 to estimate (1 - 2*alpha)*100 confidence interval.

regulation

Character strings indicating regulatory guidelines. @seealso calcf2() for details on regulation rules.

min.points

An integer indicating the minimum time points to be used to calculate f2. For conventional f2 calculation, the default is 3, however, for bootstrap f2, the value should be lower as there might be less time points available in certain bootstrap samples. The default is 1. @seealso calcf2().

both.TR.85

Logical. If TRUE, and if regulation = "FDA", all measurements up to the time points at which both test and reference products dissolve more than 85% will be used to calculate f2. This is the conventional, but incorrect, interpretation of the US FDA rule. Therefore, the argument should only be set to TRUE for validation purpose such as comparing the results from old literature that use the wrong interpretation to calculate f2. @seealso calcf2() for details on regulation rules.

print.report

Logical. If TRUE, a plain text report will be produced. See Input/Output in Details.

report.style

"concise" style produces the estimators and their confidence intervals; "intermediate" style adds a list of individual f2s for all bootstrap samples in the end of "concise" report; "detailed" style further adds individual bootstrap samples along with their f2s in the end of "intermediate" report. See Input/Output in Details.

f2.type

Character strings indicating which type of f2 estimator should be calculated. See Types of estimators in Details.

ci.type

Character strings indicating which type of confidence interval should be estimated. See Types of confidence intervals in Details.

quantile.type

Character strings indicating the type of percentile.

jackknife.type

Character strings indicating the type of jackknife method. See Details.

time.unit

Character strings indicating the unit of time. It should be either "min" for minute or "h" for hour. It is mainly used for checking CV rules and making plot. @seealso calcf2().

output.to.screen

Logical. If TRUE, a "concise" style summary report will be printed on screen. See Input/Output in Details.

sim.data.out

Logical. If TRUE, all individual bootstrap data sets will be included in the output.

Details

Minimum required arguments that must be provided by the user

Arguments test and ref must be provided by the user. They should be R data frames, with time as the first column, and all individual profiles profiles as the rest columns. The actual names of the columns do not matter since they will be renamed internally.

Input/Output

The dissolution data of test and reference product can either be provided as data frames for test and ref, as explained above, or be read from an Excel file with data of test and reference stored in separate worksheets. In the latter case, the argument path.in, the directory where the Excel file is, and file.in, the name of the Excel file including the file extension .xls or .xlsx, must be provided. In such case, the argument test and ref must be the names of the worksheets in quotation marks. The first column of each Excel worksheet must be time, and the rest columns are individual dissolution profiles. The first row should be column names, such as time, unit01, unit02, ... The actual names of the columns do not matter as they will be renamed internally.

Arguments path.out and file.out are the names of the output directory and file. If they are not provided, but argument print.report is TRUE, then a plain text report will be generated automatically in the current working directory with file name test_vs_ref_TZ_YYYY-MM-DD_HHMMSS.txt, where test and ref are data set names of test and reference, TZ is the time zone such as CEST, YYYY-MM-DD is the numeric date format and HHMMSS is the numeric time format for hour, minute, and second.

For a quick check, set argument output.to.screen = TRUE, a summary report very similar to concise style report will be printed on screen.

Types of Estimators

According to Shah et al, the population f2 for the inference is

f2 = 100 - 25 log(1 + 1/P(∑(μ(Ti) - μ(Ri))^2)),

where P is the number of time points; μ(Ti) and μ(Ri) are population mean of test and reference product at time point i, respectively; is the summation from i = 1 to P.

Five estimators for f2 are included in the function:

  1. The estimated f2, denoted by est.f2, is the one written in various regulatory guidelines. It is expressed differently, but mathematically equivalently, as

    est.f2 = 100 - 25 log(1 + 1/P(∑(X(Ti) - X(Ri))^2)),

    where P is the number of time points; X(Ti) and X(Ri) are mean dissolution data at the ith time point of random samples chosen from the test and the reference population, respectively. Compared to the equation of population f2 above, the only difference is that in the equation of est.f2 the sample means of dissolution profiles replace the population means for the approximation. In other words, a point estimate is used for the statistical inference in practice.

  2. The Bias-corrected f2, denoted by bc.f2, was described in the article of Shah et al, as

    bc.f2 = 100 - 25 log(1 + 1/P(∑(X(Ti) - X(Ri))^2 - 1/n∑(S(Ti)^2 + S(Ri)^2))),

    where S(Ti)^2 and S(Ri)^2 are unbiased estimates of variance at the ith time points of random samples chosen from test and reference population, respectively; and n is the sample size.

  3. The variance- and bias-corrected f2, denoted by vc.bc.f2, does not assume equal weight of variance as bc.f2 does.

    vc.bc.f2 = 100 -25 log(1 + 1/P(∑(X(Ti) - X(Ri))^2 - 1/n∑(w(Ti) S(Ti)^2 + w(Ri)S(Ri)^2))),

    where w(Ti) and w(Ri) are weighting factors for variance of test and reference products, respectively, which can be calculated as follows:

    w(Ti) = 0.5 + S(Ti)^2/(S(Ti)^2 + S(Ri)^2),

    and

    w(Ri) = 0.5 + S(Ri)^2/(S(Ti)^2 + S(Ri)^2).

  4. The expected f2, denoted by exp.f2, is calculated based on the mathematical expectation of estimated f2,

    exp.f2 = 100 - 25 log(1 + 1/P(∑(X(Ti) - X(Ri))^2 + 1/n∑( S(Ti)^2 + S(Ri)^2))),

    using mean dissolution profiles and variance from samples for the approximation of population values.

  5. The variance-corrected expected f2, denoted by vc.exp.f2, is calculated as

    vc.exp.f2 = 100 - 25 log(1 + 1/P(∑(X(Ti) - X(Ri))^2 + 1/n∑(w(Ti) S(Ti)^2 + w(Ri)S(Ri)^2))).

Types of Confidence Interval

The following confidence intervals are included:

  1. The Normal interval with bias correction, denoted by normal in the function, is estimated according to Davison and Hinkley,

    f2(L,U) = f2(S) - E(B) -/+ sqrt(V(B))Z(1-α)),

    where f2(L,U) are the lower and upper limit of the confidence interval estimated from bootstrap samples; f2(S) denotes the estimators described above; Z(1-α) represents the inverse of standard normal cumulative distribution function with type I error α; E(B) and V(B) are the resampling estimates of bias and variance calculated as

    E(B) = 1/B∑(f2(b)) - f2(S) = f2(b,m) - f2(S),

    and

    V(B) = 1/(B-1)∑(f2(b) - f2(b,m))^2,

    where B is the number of bootstrap samples; f2(b) is the f2 estimate with bth bootstrap sample, and f2(b,m) is the mean value.

  2. The basic interval, denoted by basic in the function, is estimated according to Davison and Hinkley,

    f2(L) = 2*f2(S) - f2((B+1)(1-α)),

    and

    f2(U) = 2*f2(S) - f2((B+1)α),

    where f2((B+1)α) and f2((B+1)(1-α)) are the (B+1)αth and (B+1)(1-α)th ordered resampling estimates of f2, respectively. When (B+1)α is not an integer, the following equation is used for interpolation,

    f2((B+1)α) = f2(k) + (Φ^(-1)(α) - Φ^(-1)(k/(B+1)))/(Φ^(-1)((k+1)/(B+1)) - Φ^(-1)(k/(B+1)))*(f2(k+1) - f2(k)),

    where k is the integer part of (B+1)α, f2(k+1) and f2(k) are the (k+1)th and the kth ordered resampling estimates of f2, respectively.

  3. The percentile intervals, denoted by percentile in the function, are estimated using nine different types of quantiles, Type 1 to Type 9, as summarized in Hyndman and Fan's article and implemented in R's quantile function. Using R's boot package, program bootf2BCA outputs a percentile interval using the equation above for interpolation. To be able to compare the results among different programs, the same interval, denoted by Percentile (boot) in the function, is also included in the function.

  4. The bias-corrected and accelerated (BCa) intervals are estimated according to Efron and Tibshirani,

    f2(L) = f2(α1),

    f2(L) = f2(α2),

    where f2(α1) and f2(α2) are the 100α1th and the 100α2th percentile of the resampling estimates of f2, respectively. Type I errors α1 and α2 are obtained as

    α1 = Φ(z0 + (z0 + za)/(1 - a(z0 + za))),

    and

    α2 = Φ(z0 + (z0 + z(1-α))/(1 - a(z0 + z(1 - α)))),

    where z0 and a are called bias-correction and acceleration, respectively.

    There are different methods to estimate z0 and a. Shah et al. used jackknife technique, denoted by bca.jackknife in the function,

    z0 = Φ^(-1)(N(f2(b) < f2(S))/B)

    and

    a = (∑(f2(m) - f2(i)))^3/(6(∑(f2(m) - f2(i))^2)^(3/2)),

    where N(f2(b) < f2(S)) denotes the number of element in the set, f2(i) is the ith jackknife statistic, f2(m) is the mean of the jackknife statistics, and is the summation from 1 to sample size n.

    Program bootf2BCA gives a slightly different BCa interval with R's boot package. This approach, denoted by bca.boot in the function, is also implemented in the function for estimating the interval.

Notes on the argument jackknife.type

For any sample with size n, the jackknife estimator is obtained by estimating the parameter for each subsample omitting the ith observation. However, when two samples (e.g., test and reference) are involved, there are several possible ways to do it. Assuming sample size of test and reference are nt and nr, the following three possibility are considered:

Value

A list of 3 or 5 components.

References

Shah, V. P.; Tsong, Y.; Sathe, P.; Liu, J.-P. In Vitro Dissolution Profile Comparison—Statistics and Analysis of the Similarity Factor, f2. Pharmaceutical Research 1998, 15 (6), 889–896. DOI: 10.1023/A:1011976615750.

Davison, A. C.; Hinkley, D. V. Bootstrap Methods and Their Application. Cambridge University Press, 1997.

Hyndman, R. J.; Fan, Y. Sample Quantiles in Statistical Packages. The American Statistician 1996, 50 (4), 361–365. DOI: /10.1080/00031305.1996.10473566.

Efron, B.; Tibshirani, R. An Introduction to the Bootstrap. Chapman & Hall, 1993.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# time points
tp <- c(5, 10, 15, 20, 30, 45, 60)
# model.par for reference with low variability
par.r <- list(fmax = 100, fmax.cv = 3, mdt = 15, mdt.cv = 14,
              tlag = 0, tlag.cv = 0, beta = 1.5, beta.cv = 8)
# simulate reference data
dr <- sim.dp(tp, model.par = par.r, seed = 100, plot = FALSE)
# model.par for test
par.t <- list(fmax = 100, fmax.cv = 3, mdt = 12.29, mdt.cv = 12,
              tlag = 0, tlag.cv = 0, beta = 1.727, beta.cv = 9)
# simulate test data with low variability
dt <- sim.dp(tp, model.par = par.t, seed = 100, plot = FALSE)

# bootstrap. to reduce test run time, n.boots of 100 was used in the example.
# In practice, it is recommended to use n.boots of 5000--10000.
# Set `output.to.screen = TRUE` to view the result on screen
d <- bootf2(dt$sim.disso, dr$sim.disso, n.boots = 100, print.report = FALSE)

bootf2 documentation built on Aug. 25, 2021, 5:07 p.m.