fPASS: An R package for Power and Sample Size analysis (PASS) for Projection-based Two-Sample test for functional data.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval=TRUE
)
#options(rmarkdown.html_vignette.check_title = FALSE)

This document will present a elaborate understanding of the fPASS R package and its capabilities. First we need to load the current development version of the package using the following code. The package is designed to conduct for Power and Sample Size analysis (PASS) for Projection-based Two-Sample test for functional data. Please see Wang (2021) and Koner and Luo (2023) for the details of the testing procedure and its advantage over the current state-of-the-art testing procedures used for longitudinal data such as mixed model and GEE.

Installation

# Install development version from GitHub
devtools::install_github("SalilKoner/fPASS", build_vignettes = FALSE)

Usage

library(fPASS)
library(nlme)
library(face)
st <- proc.time()

The key function to use for finding the power and sample size (PASS) for the test is fPASS::PASS_Proj_Test_ufDA(). In the following, we will see how the function can be used to conduct PASS for the Projection-based test for longitudinal and functional data under different covariance structure in order to design a randomized clinical trials (RCT). We will present the examples of using the function with different case studies.

Case study: Calculating PASS analysis for designing RCT with Longitudinal data with a specific visit times.

Suppose that we want to conduct a clinical trial to test the efficacy of a treatment effect against placebo, where the visit times for each subjects are at baseline, 3 months, 6 months, and every 6 months thereafter for 24 months. Assume that for each visit (except for the baseline), there is an window of 15 days around the stipulated visit time when the patients can appear to the clinic. This means that, a patient can come to the clinic for third visit around $5.5 - 6.5$ months from baseline. Assuming that the higher the response, the better is the treatment, it is hypothesized that the effect size (mean to variance ratio) between the treatment and placebo will be $1$ unit each year, and the treatment effect follows a linear trend. Further, it is assumed that the longitudinal response trajectory for each subject has a marginal standard deviation about 5 units at each visit time and assumes a dying correlation which can be characterized a conditional auto regressive process with correlation parameter $0.5$. Suppose that after collection of data, we want to use the Two-sample projection-based test by Wang (2021) to assess the treatment efficacy. Assuming that we can have equal number of subjects in each of the treatment and the placebo group, we want to find the minimum sample size required to test the treatment efficacy with a power of $80\%$ at a significance level of $5\%$, using the routine fPASS::PASS_Proj_Test_ufDA().

Argument specifiction for fPASS::PASS_Proj_Test_ufDA()

The argument of the function can be obtained by running ?fPASS::PASS_Proj_Test_ufDA() in R. The key arguments of are mean_diff_fnm denoting the group difference function, obs.design specifying the observation design and cov.par specifying the covariance structure of the latent trajectory. We will demonstrate how to supply each of the parameters in the function from the statement of the problem.

obs.design <- list("design" = "longitudinal",
                  "visit.schedule" = c(3,6,12,18,24), # pre-determined visit times, other than baseline
                  "visit.window" = 0.5) # visit window
nobs_per_subj <- length(obs.design$visit.schedule) + 1 # number of observation per subject is 6.
# The function will internally create a data set
# and a factor variable Subject denoting subject id.
# You can change time and Subject by any other name as well.
cor.str <- nlme::corCAR1(0.5, form = ~ time | Subject);
# The marginal sd at each time-point is assumed to be 5. 
cov.par  <- list("var" = 25, "cor" = cor.str) 
# We must set cov.type = "ST" for any covariance 
# structure belonging to nlme::corClasses. 
cov.type <- "ST"
mean_diff_fn  <- function(t){2*t}
mean_diff_fnm <- "mean_diff_fn" # the name of mean difference function needs to
                                # be specified with the argument. 
missing_type <- "nomiss"
missing_percent <- 0 # Has be to a number between 0 and 0.8
alloc.ratio <- c(1,1)
target.power <- 0.8
sig.level  <- 0.05
eval_SS <- 5000
fpca_method <- "fpca.sc"
sigma2.e <- 0.001
fpca_optns  <- list("pve" = 0.95) 
mean_diff_add_args  <- list() 
nWgrid <- 201
Distribution of minimum sample size using fPASS::fpca.sc() function:

Now that we have explained how to specify each and every arguments of fPASS::PASS_Proj_Test_ufDA() function, we will compute the minimum sample size required. However, as the testing procedure internally computes the eigenfunctions based on large number of samples and then uses the estimated eigenfunctions to compute the PASS, the computed power and the minimum sample size will be different due to the fact that the estimated eigenfunctions are different at each run because of sampling variation. Therefore, it is important to run the function fPASS::PASS_Proj_Test_ufDA() for few iterations to see how the sampling variation affects the computed power or sample size. In this example, we run the same function in parallel for about 10 times, and present the percentiles of the estimated sample size. It is recommended to run the the function at least about 50 to 100 times and use the median of these computed sample sizes obtained after a reasonable number of iteration, as a point estimate of the minimum sample size and the interquartile range (IQR) as a valid measure of range of the sample size that could be considered. From a practical perspective, depending on the budget of the trial, one can choose the minimum number of subjects that will be enrolled for the trial based on the value of IQR.

library(foreach)
required_sample_size <- foreach(i=1:10, .combine='c', .packages = "fPASS") %do% {
        mean_diff_fn <- mean_diff_fn
        PASS_Proj_Test_ufDA(sample_size = NULL, target.power = target.power,
                            sig.level = sig.level, nobs_per_subj = nobs_per_subj,
                            obs.design = obs.design, mean_diff_fnm = "mean_diff_fn",
                            cov.type = cov.type, cov.par = cov.par,
                            sigma2.e = sigma2.e, missing_type = missing_type,
                            missing_percent = missing_percent, eval_SS = eval_SS,
                            alloc.ratio = alloc.ratio, fpca_method = fpca_method,
                            mean_diff_add_args = mean_diff_add_args,
                            fpca_optns = fpca_optns, npc_to_use = NULL,
                            nsim = 1e3)$required_SS
}
quantile(required_sample_size, probs = c(0.25,0.5,0.75), names = TRUE)

We have intentionally left out the discussion of npc_to_use argument for the fPASS::PASS_Proj_Test_ufDA() function. The number of eigenfunction to use to compute the PASS is not before hand. Therefore, for a specified sampling design and covariance structure of the response trajectory, it is not possible to to provide an appropriate value unless the user has an substantial information about the eigencomponents before. This is where the function fPASS::Extract_Eigencomp_fDA() can assist. If we look at the help page of that function, you can notice that except for one argument, all of arguments of that function matches that of fPASS::PASS_Proj_Test_ufDA(). Therefore, for a specified sampling design and covariance parameter we can take a look the estimated eigenfunctions and eigenvalues to have a more informed decision how many eigencomponents to use to in the main function to conduct the PASS analysis. Here is an example of concering to in our case study.

Estimate eigenfunctions using fPASS::fpca.sc() function
fpca_method <- "fpca.sc"

library(foreach)
eigenlist <- foreach(i=1:3, .packages = "fPASS") %do% {
  mean_diff_fn <- mean_diff_fn
eigencomp <- Extract_Eigencomp_fDA(nobs_per_subj = nobs_per_subj,
                      obs.design = obs.design, mean_diff_fnm = mean_diff_fnm,
                      cov.type = cov.type, cov.par = cov.par,
                      sigma2.e = sigma2.e, missing_type = missing_type,
                      missing_percent = missing_percent, eval_SS = eval_SS,
                      alloc.ratio = alloc.ratio, fpca_method = fpca_method,
                      mean_diff_add_args = mean_diff_add_args,
                      fpca_optns = fpca_optns,
                      data.driven.scores = FALSE) 
 eigencomp[c("working.grid", "est_eigenfun", "est_eigenval")]
}
print(lapply(eigenlist, function(x) x$est_eigenval))

For a specified PVE of $95\%$, fPASS::fpca.sc() function detects two leading eigenfunction with non-zero eigenvalues for the covariance structure in most of the cases. Next, we see what happens by using face::face.sparse() function for the same PVE The face::face.sparse() function is much slower than the fPASS::fpca.sc(), so we have kept eval_SS argument at $500$ to not significantly increase the run time of the vignette. The user are advised to use a large value of eval_SS in order to get high enough precision for the estimated eigenfunction.

Estimate eigenfunctions using face::face.sparse() function
fpca_method <- "face"
# Setting eval_SS to 500, specifically for 
# the face::face.sparse() function
# to ensure that the vignette builds within
# real time. The user must set a much larger value
# of eval_SS e.g. ranging between 2000-5000, to ensure
# enough precision accuracy for the eigenfunctions. 
eval_SS  <- 500 # increase it to 2000 for practical case

library(foreach)
# library(doParallel)
# cl <- makeCluster(detectCores()-1)
# registerDoParallel(cl)
eigenlist <- foreach(i=1:3, .packages = "fPASS") %do% {
  mean_diff_fn <- mean_diff_fn
eigencomp <- Extract_Eigencomp_fDA(nobs_per_subj = nobs_per_subj,
                      obs.design = obs.design, mean_diff_fnm = mean_diff_fnm,
                      cov.type = cov.type, cov.par = cov.par,
                      sigma2.e = sigma2.e, missing_type = missing_type,
                      missing_percent = missing_percent, eval_SS = eval_SS,
                      alloc.ratio = alloc.ratio, fpca_method = fpca_method,
                      mean_diff_add_args = mean_diff_add_args,
                      fpca_optns = fpca_optns,
                      data.driven.scores = FALSE) 
 eigencomp[c("working.grid", "est_eigenfun", "est_eigenval")]
}
# stopCluster(cl)
# matplot(eigencomp$working.grid, eigencomp$est_eigenfun, type="l", 
#         xlab = "timepoints (scaled)", ylab = "eigenfunctions")
print(lapply(eigenlist, function(x) x$est_eigenval))

We notice that face::face.sparse() function is detecting three eigenfunction in most of the cases, for the same setup. However, the eigenvalue corresponding to the last eigenfunction seems to be small, and that is possibly the reason it was not detected by fPASS::fpca.sc() function. This provides us a more informed decision about the number of eigenfunctions to use. So, technically we should discard the last eigenfunction to compute the projection of mean difference onto the eigenfunctions, and use npc_to_use = 2 in the fPASS::PASS_Proj_Test_ufDA function, so that the power function is not overinflated by detection of extra eigenfunctions. However, for better understanding of the reader, we will still use npc_to_use = 3 for fpca_method == 'face' to find out how the distribution of minimum sample size required is allowing an extra eigenfunction (with possibly zero eigenvalues) for fpca_method == 'face'. The number of replication is kept at 10, to ensure that the vignette runs within a reasonable amount of time, please increase it for more comprehensive understanding of the effect of estimating eigenfunctions on range of the minimum sample size required.

Distribution of minimum sample size using face::face.sparse() function with pre-specified number of eigenfunction.
npc_to_use <- 3
fpca_method <- "face"

library(foreach)
required_sample_size <- foreach(i=1:5, .combine='c', .packages = "fPASS") %do% {
        mean_diff_fn <- mean_diff_fn
        PASS_Proj_Test_ufDA(sample_size = NULL, target.power = target.power,
                            sig.level = sig.level, nobs_per_subj = nobs_per_subj,
                            obs.design = obs.design, mean_diff_fnm = mean_diff_fnm,
                            cov.type = cov.type, cov.par = cov.par,
                            sigma2.e = sigma2.e, missing_type = missing_type,
                            missing_percent = missing_percent, eval_SS = 500,
                            alloc.ratio = alloc.ratio, fpca_method = fpca_method,
                            mean_diff_add_args = mean_diff_add_args,
                            fpca_optns = fpca_optns, npc_to_use = npc_to_use,
                            nsim = 1e3)$required_SS
}
quantile(required_sample_size, probs = c(0.25,0.5,0.75), names = TRUE)

As expected, using a higher value of npc_to_use leads to over inflation of power, and consequently computed sample size is significantly less than which we obtained for fpca_method = 'fpca.sc'. But, this comes at cost of over inflation of type I error of the test. Therefore, for a specified correlation structure of the response, we must take extra care to NOT use unnecessary eigenfunctions, to avoid the risk of inflation of inflated type I error.

Missing data

Under the same setup, if We want to compute the minimum sample size required when the percentage of missing observation at each time point is around $25\%$, then we can see the distribution of sample size required after running the function about 100 times as follows.

missing_type <- "constant"
missing_percent <- 0.25
fpca_method <- "fpca.sc"
eval_SS  <- 5000

library(foreach)
required_sample_size <- foreach(i=1:10, .combine='c', .packages = "fPASS") %do% {
        mean_diff_fn <- mean_diff_fn
        PASS_Proj_Test_ufDA(sample_size = NULL, target.power = target.power,
                            sig.level = sig.level, nobs_per_subj = nobs_per_subj,
                            obs.design = obs.design, mean_diff_fnm = "mean_diff_fn",
                            cov.type = cov.type, cov.par = cov.par,
                            sigma2.e = sigma2.e, missing_type = missing_type,
                            missing_percent = missing_percent, eval_SS = eval_SS,
                            alloc.ratio = alloc.ratio, fpca_method = fpca_method,
                            mean_diff_add_args = mean_diff_add_args,
                            fpca_optns = fpca_optns, npc_to_use = 2, nsim = 1e3)$required_SS
}
quantile(required_sample_size, probs = c(0.25,0.5,0.75), names = TRUE)

The IQR of the distribution of the minimum sample sizes when the percentage of missing is as significant as $25\%$ is still about the same as that when there is no missing, i.e. missing_type == 'nomiss' and missing_percent = 0. This reflects one of the biggest advantage of the projection-based test compared to traditional mixed model approaches, that as long as the missing percentage do not affect the estimation of the eigencomponents, the power of the test does not degrade with the increasing percentage of missing responses.

Do try it out the same example in NCSS PASS software and check out the minimum sample size required for the test to achieve a power of $80\%$, and how does the minimum sample size increases when you increase the missing percentage to $25\%$, to get a fair comparison of the effectiveness of our procedure compared to traditional GEE type tests. To do this, go to NCSS PASS (2023) software, GEE -> GEE Tests for the Slope of Two groups in a Repeated Measures Design (Continuous Outcome) -> Choose AR(1) proportional correlation structure and Number of measurement times as 6.

We report the total computation time it needed create the entire vignette is approximately r round((proc.time()[3] - st[3])/60) minutes.



Try the fPASS package in your browser

Any scripts or data that you put into this service are public.

fPASS documentation built on July 26, 2023, 5:08 p.m.