knitr::opts_chunk$set( collapse = TRUE, comment = "#" )
txt <- paste0("Version ", packageVersion("PowerTOST"), " built ", packageDate("PowerTOST", date.fields = "Built"), " with R ", substr(packageDescription("PowerTOST", fields = "Built"), 3, 7)) # date.fields = "Date/Publication") is available if package is already on CRAN! # But we want submit it to CRAN as stable. Thus the next doesn't do the job. # if (grepl("900", as.character(packageVersion("PowerTOST"))) | # is.na(packageDate("PowerTOST", date.fields = "Date/Publication"))) { # txt <- paste(txt, "\n(development version not on CRAN).") # } else { # txt <- paste0(txt, "\n(stable release on CRAN ", # packageDate("PowerTOST", date.fields = "Date/Publication"), ").") # } cat(txt)
The package contains functions to calculate power and estimate sample size for various study designs used in (not only bio-) equivalence studies.
library(PowerTOST) designs <- known.designs() print(designs[, c(2, 9, 3)], row.names = FALSE)
Codes of designs follow this pattern: treatments x sequences x periods
.
Although some replicate designs are more ‘popular’ than others, sample size estimations are valid for all of the following designs:
|design|type|sequences||periods|
|:----:|:----:|:-:|---------|:-:|
|2x2x4
|full|2|TRTR\|RTRT|4|
|2x2x4
|full|2|TRRT\|RTTR|4|
|2x2x4
|full|2|TTRR\|RRTT|4|
|2x4x4
|full|4|TRTR\|RTRT\|TRRT\|RTTR|4|
|2x4x4
|full|4|TRRT\|RTTR\|TTRR\|RRTT|4|
|2x2x3
|full|2|TRT\|RTR|3|
|2x2x3
|full|2|TRR\|RTT|3|
|2x4x2
|full|4|TR\|RT\|TT\|RR|2|
|2x3x3
|partial|3|TRR\|RTR\|RRT|3|
|2x2x3
|partial|2|TRR\|RTR|3|
Balaam’s design TR|RT|TT|RR should be avoided due to its poor power characteristics. The three period partial replicate design with two sequences TRR|RTR (a.k.a. extra-reference design) should be avoided because it is biased in the presence of period effects.
For various methods power can be calculated based on
For all methods the sample size can be estimated based on
Power covers balanced as well as unbalanced sequences in crossover or replicate designs and equal/unequal group sizes in two-group parallel designs. Sample sizes are always rounded up to achieve balanced sequences or equal group sizes.
Design "2x2"
(TR|RT), exact method (Owen’s Q).
Design "2x2x4"
(TRTR|RTRT), upper limit of the confidence interval of σ~wT~/σ~wR~ ≤2.5, approximation by the non-central t-distribution, 100,000 simulations.
Point estimate constraints (0.80, 1.25), homoscedasticity (CV~wT~ = CV~wR~), scaling is based on CV~wR~, design "2x3x3"
(TRR|RTR|RRT), approximation by the non-central t-distribution, 100,000 simulations.
θ~0~ 0.90.1
Regulatory constant 0.760
, upper cap of scaling at CV~wR~ 50\%, evaluation by ANOVA.
Regulatory constant 0.760
, upper cap of scaling at CV~wR~ ~57.4\%, evaluation by intra-subject contrasts.
Regulatory constant log(1/0.75)/sqrt(log(0.3^2+1))
, widened limits 75.00--133.33\% if CV~wR~ >30\%, no upper cap of scaling, evaluation by ANOVA.
Regulatory constant log(1.25)/0.25
, no upper cap of scaling, evaluation by linearized scaled ABE (Howe’s approximation).
θ~0~ 0.975, regulatory constant log(1.11111)/0.1
, implicit upper cap of scaling at CV~wR~ ~21.4\%, design "2x2x4"
(TRTR|RTRT), evaluation by linearized scaled ABE (Howe’s approximation), upper limit of the confidence interval of σ~wT~/σ~wR~ ≤2.5.
β~0~ (slope) 1+log(0.95)/log(rd)
where rd
is the ratio of the highest and lowest dose, target power 0.80, crossover design, details of the sample size search suppressed.
Minimum acceptable power 0.70. θ~0~; design, conditions, and sample size method depend on defaults of the respective approaches (ABE, ABEL, RSABE, NTID, HVNTID).
Before running the examples attach the library.
library(PowerTOST)
If not noted otherwise, the functions’ defaults are employed.
Power for total CV 0.35 (35\%), group sizes 52 and 49.
power.TOST(CV = 0.35, n = c(52, 49), design = "parallel")
Sample size for assumed within- (intra-) subject CV 0.20 (20\%).
sampleN.TOST(CV = 0.20)
Sample size for assumed within- (intra-) subject CV 0.40 (40\%), θ~0~ 0.90, four period full replicate study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT). Wider acceptance range for C~max~ (South Africa).
sampleN.TOST(CV = 0.40, theta0 = 0.90, theta1 = 0.75, design = "2x2x4")
Sample size for assumed within- (intra-) subject CV 0.125 (12.5\%), θ~0~ 0.975. Narrower acceptance range for NTIDs (most jurisdictions).
sampleN.TOST(CV = 0.125, theta0 = 0.975, theta1 = 0.90)
Sample size for equivalence of the ratio of two means with normality on the original scale based on Fieller’s (‘fiducial’) confidence interval.2 Within- (intra-) subject CV~w~ 0.20 (20\%), between- (inter-) subject CV~b~ 0.40 (40\%).
Note the default α 0.025 (95\% CI) of this function because it is intended for studies with clinical endpoints.
sampleN.RatioF(CV = 0.20, CVb = 0.40)
Sample size for assumed within- (intra-) subject CV 0.45 (45\%), θ~0~ 0.90, three period full replicate study (TRT|RTR or TRR|RTT).
sampleN.TOST(CV = 0.45, theta0 = 0.90, design = "2x2x3")
Note that the conventional model assumes homoscedasticity (equal variances of treatments). For heteroscedasticity we can ‘switch off’ all conditions of one of the methods for reference-scaled ABE. We assume a σ^2^-ratio of ⅔ (i.e., the test has a lower variability than the reference). Only relevant columns of the data frame shown.
reg <- reg_const("USER", r_const = NA, CVswitch = Inf, CVcap = Inf, pe_constr = FALSE) CV <- CVp2CV(CV = 0.45, ratio = 2/3) res <- sampleN.scABEL(CV=CV, design = "2x2x3", regulator = reg, details = FALSE, print = FALSE) print(res[c(3:4, 8:9)], digits = 5, row.names = FALSE)
Similar sample size because the pooled CV~w~ is still 0.45.
Sample size assuming heteroscedasticity (CV~w~ 0.45, variance-ratio 2.5, i.e., the test treatment has a substantially higher variability than the reference). TRTR|RTRT according to the FDA’s guidances.3,4,5 Assess additionally which one of the components (ABE, s~wT~/s~wR~-ratio) drives the sample size.
CV <- signif(CVp2CV(CV = 0.45, ratio = 2.5), 4) n <- sampleN.HVNTID(CV = CV, details = FALSE)[["Sample size"]] suppressMessages(power.HVNTID(CV = CV, n = n, details = TRUE))
The ABE component shows a lower probability to demonstrate BE than the s~wT~/s~wR~ component and hence, drives the sample size.
Sample size assuming homoscedasticity (CV~wT~ = CV~wR~ = 0.45).
sampleN.scABEL(CV = 0.45)
Iteratively adjust α to control the Type I Error.6 Heteroscedasticity (CV~wT~ 0.30, CV~wR~ 0.40, i.e., variance-ratio ~0.58), four period full replicate study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT), 24 subjects, balanced sequences.
scABEL.ad(CV = c(0.30, 0.40), design = "2x2x4", n = 24)
With the nominal α 0.05 the Type I Error will be inflated (0.05953). With the adjusted α 0.03997 (i.e., a ~92\% CI) the TIE will be controlled, although with a slight loss in power (decreases from 0.805 to 0.778).
Consider sampleN.scABEL.ad(CV = c(0.30, 0.35), design = "2x2x4")
to estimate the sample size preserving both the TIE and target power. In this example 26 subjects would be required.
ABEL cannot be applied for AUC (except for the WHO). Hence, in many cases ABE drives the sample size. Four period full replicate study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT).
PK <- c("Cmax", "AUC") CV <- c(0.45, 0.30) # extract sample sizes and power r1 <- sampleN.scABEL(CV = CV[1], design = "2x2x4", print = FALSE, details = FALSE)[8:9] r2 <- sampleN.TOST(CV = CV[2], theta0 = 0.90, design = "2x2x4", print = FALSE, details = FALSE)[7:8] n <- as.numeric(c(r1[1], r2[1])) pwr <- signif(as.numeric(c(r1[2], r2[2])), 5) # compile results res <- data.frame(PK = PK, method = c("ABEL", "ABE"), n = n, power = pwr) print(res, row.names = FALSE)
AUC drives the sample size.
For Health Canada it is the opposite (ABE for C~max~ and ABEL for AUC).
PK <- c("Cmax", "AUC") CV <- c(0.45, 0.30) # extract sample sizes and power r1 <- sampleN.TOST(CV = CV[1], theta0 = 0.90, design = "2x2x4", print = FALSE, details = FALSE)[7:8] r2 <- sampleN.scABEL(CV = CV[2], design = "2x2x4", print = FALSE, details = FALSE)[8:9] n <- as.numeric(c(r1[1], r2[1])) pwr <- signif(as.numeric(c(r1[2], r2[2])), 5) # compile results res <- data.frame(PK = PK, method = c("ABE", "ABEL"), n = n, power = pwr) print(res, row.names = FALSE)
Here C~max~ drives the sample size.
Sample size assuming homoscedasticity (CV~wT~ = CV~wR~ = 0.45) for the widened limits of the Gulf Cooperation Council.
sampleN.scABEL(CV = 0.45, regulator = "GCC", details = FALSE)
Sample size for a four period full replicate study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT) assuming heteroscedasticity (CV~wT~ 0.40, CV~wR~ 0.50, i.e., variance-ratio ~0.67). Details of the sample size search suppressed.
sampleN.RSABE(CV = c(0.40, 0.50), design = "2x2x4", details = FALSE)
Sample size assuming heteroscedasticity (CV~w~ 0.10, variance-ratio 2.5, i.e., the test treatment has a substantially higher variability than the reference). TRTR|RTRT according to the FDA’s guidance.7 Assess additionally which one of the three components (scaled ABE, conventional ABE, s~wT~/s~wR~-ratio) drives the sample size.
CV <- signif(CVp2CV(CV = 0.10, ratio = 2.5), 4) n <- sampleN.NTID(CV = CV)[["Sample size"]] suppressMessages(power.NTID(CV = CV, n = n, details = TRUE))
The s~wT~/s~wR~ component shows the lowest probability to demonstrate BE and hence, drives the sample size.
Compare that with homoscedasticity (CV~wT~ = CV~wR~ = 0.10):
CV <- 0.10 n <- sampleN.NTID(CV = CV, details = FALSE)[["Sample size"]] suppressMessages(power.NTID(CV = CV, n = n, details = TRUE))
Here the scaled ABE component shows the lowest probability to demonstrate BE and drives the sample size – which is much lower than in the previous example.
Comparison with fixed narrower limits applicable in other jurisdictions. Note that a replicate design is not mandatory -- reducing the chance of dropouts and requiring less administrations
CV <- 0.10 # extract sample sizes and power r1 <- sampleN.NTID(CV = CV, print = FALSE, details = FALSE)[8:9] r2 <- sampleN.TOST(CV = CV, theta0 = 0.975, theta1 = 0.90, design = "2x2x4", print = FALSE, details = FALSE)[7:8] r3 <- sampleN.TOST(CV = CV, theta0 = 0.975, theta1 = 0.90, design = "2x2x3", print = FALSE, details = FALSE)[7:8] r4 <- sampleN.TOST(CV = CV, theta0 = 0.975, theta1 = 0.90, print = FALSE, details = FALSE)[7:8] n <- as.numeric(c(r1[1], r2[1], r3[1], r4[1])) pwr <- signif(as.numeric(c(r1[2], r2[2], r3[2], r4[2])), 5) # compile results res <- data.frame(method = c("FDA/CDE", rep ("fixed narrow", 3)), design = c(rep("2x2x4", 2), "2x2x3", "2x2x2"), n = n, power = pwr, a = n * c(4, 4, 3, 2)) names(res)[5] <- "adm. #" # number of administrations print(res, row.names = FALSE)
CV 0.20 (20\%), doses 1, 2, and 8 units, assumed slope β~0~ 1, target power 0.90.
sampleN.dp(CV = 0.20, doses = c(1, 2, 8), beta0 = 1, targetpower = 0.90)
Note that the acceptance range of the slope depends on the ratio of the highest and lowest doses (i.e., it gets tighter for wider dose ranges and therefore, higher sample sizes will be required).
In an exploratory setting wider equivalence margins {θ~1~, θ~2~} (0.50, 2.00) were proposed,8 translating in this example to an acceptance range of 0.66667 ... 1.3333
and a sample size of only six subjects.
Explore impact of deviations from assumptions (higher CV, higher deviation of θ~0~ from 1, dropouts) on power. Assumed within-subject CV 0.20 (20\%), target power 0.90. Plot suppressed.
res <- pa.ABE(CV = 0.20, targetpower = 0.90) print(res, plotit = FALSE)
If the study starts with 26 subjects (power \~0.92), the CV can increase to \~0.27 or θ~0~ decrease to \~0.90 or the sample size decrease to 10 whilst power will still be ≥0.70.
However, this is not a substitute for the ‘Sensitivity Analysis’ recommended in ICH-E9,9 since in a real study a combination of all effects occurs simultaneously. It is up to you to decide on reasonable combinations and analyze their respective power.
Performed on a Xeon E3-1245v3 3.4 GHz, 8 MB cache, 16 GB RAM, R r getRversion()
64 bit on Windows 7.
2×2 crossover design, CV 0.17. Sample sizes and achieved power for the supported methods (the 1st one is the default).
method n power time (s) owenq 14 0.80568 0.00128 mvt 14 0.80569 0.11778 noncentral 14 0.80568 0.00100 shifted 16 0.85230 0.00096
The 2nd exact method is substantially slower than the 1st. The approximation based on the noncentral t-distribution is slightly faster but matches the 1st exact method closely. Though the approximation based on the shifted central t-distribution is the fastest, it might estimate a larger than necessary sample size. Hence, it should be used only for comparative purposes.
Four period full replicate study (any of TRTR|RTRT, TRRT|RTTR, TTRR|RRTT), homogenicity (CV~wT~ = CV~wR~ 0.45). Sample sizes and achieved power for the supported methods.
function method n power time (s) sampleN.scABEL ‘key’ statistics 28 0.81116 0.1348 sampleN.scABEL.sdsims subject simulations 28 0.81196 2.5377
Simulating via the ‘key’ statistics is the method of choice for speed reasons.
However, subject simulations are recommended if
You can install the released version of PowerTOST from CRAN with
package <- "PowerTOST" inst <- package %in% installed.packages() if (length(package[!inst]) > 0) install.packages(package[!inst])
… and the development version from GitHub with
# install.packages("remotes") remotes::install_github("Detlew/PowerTOST")
Skips installation from a github remote if the SHA-1 has not changed since last install. Use force = TRUE
to force installation.
Inspect this information for reproducibility. Of particular importance are the versions of R and the packages used to create this workflow. It is considered good practice to record this information with every analysis.\
Version r packageVersion("PowerTOST")
built r packageDate("PowerTOST", date.fields = "Built")
with R r substr(packageDescription("PowerTOST", fields = "Built"), 3, 7)
.
options(width = 66) sessionInfo()
1. Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmacol Sci. 2012; 15(1): 73--84. doi:10.18433/j3z88f. Open access. ↩\ 2. Fieller EC. Some Problems In Interval Estimation. J Royal Stat Soc B. 1954; 16(2): 175–85. JSTOR:2984043. ↩\ 3. U.S. Food and Drug Administration, Office of Generic Drugs. Draft Guidance on Dabigatran Etexilate Mesylate. Recommended Jun 2012; Revised Sep 2015, Jul 2017. Online. ↩\ 4. U.S. Food and Drug Administration, Office of Generic Drugs. Draft Guidance on Rivaroxaban. Recommended Sep 2015. Online. ↩\ 5. U.S. Food and Drug Administration, Office of Generic Drugs. Draft Guidance on Edoxaban Tosylate. Recommended May 2017; Revised Mar 2020. Online. ↩\ 6. Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016; 33(11): 2805--14. doi:10.1007/s11095-016-2006-1. ↩\ 7. U.S. Food and Drug Administration, Center for Drug Evaluation and Research. Draft Guidance for Industry. Bioequivalence Studies with Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. August 2021. Online. ↩\ 8. Hummel J, McKendrick S, Brindley C, French R. Exploratory assessment of dose proportionality: review of current approaches and proposal for a practical criterion. Pharm. Stat. 2009; 8(1): 38--49. doi:10.1002/pst.326. ↩\ 9. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH Harmonised Tripartite Guideline. E9. Statistical Principles for Clinical Trials. 5 February 1998. Online. ↩
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.