reconsi: Perform simultaneous inference through collapsed resampling...
In reconsi: Resampling Collapsed Null Distributions for Simultaneous Inference

Description Usage Arguments Details Value Note Examples

Perform simultaneous inference through collapsed resampling null distributions

reconsi(
  Y,
  x = NULL,
  B = 1000L,
  test = "wilcox.test",
  argList = list(),
  distFun = "pnorm",
  zValues = TRUE,
  testPargs = list(),
  z0Quant = pnorm(c(-1, 1)),
  gridsize = 801L,
  maxIter = 1000L,
  tol = 1e-08,
  center = FALSE,
  replace = is.null(x),
  zVals = NULL,
  estP0args = list(z0quantRange = seq(0.05, 0.45, 0.0125), smooth.df = 3),
  resamZvals = FALSE,
  smoothObs = TRUE,
  scale = FALSE,
  tieBreakRan = FALSE,
  pi0 = NULL
)

`Y`	the matrix of sequencing counts
`x`	a grouping factor. If provided, this grouping factor is permuted. Otherwise a bootstrap procedure is performed
`B`	the number of resampling instances
`test`	Character string, giving the name of the function to test for differential absolute abundance. Must accept the formula interface. Features with tests resulting in observed NA test statistics will be discarded
`argList`	A list of arguments, passed on to the testing function
`distFun`	the distribution function of the test statistic, or its name. Must at least accept an argument named 'q', 'p' and 'x' respectively.
`zValues`	A boolean, should test statistics be converted to z-values. See details
`testPargs`	A list of arguments passed on to distFun
`z0Quant`	A vector of length 2 of quantiles of the null distribution, in between which only null values are expected
`gridsize`	The number of bins for the kernel density estimates
`maxIter`	An integer, the maximum number of iterations in the estimation of the null distribution
`tol`	The tolerance for the infinity norm of the central borders in the iterative procedure
`center`	A boolean, should observations be centered in each group prior to permuations? See details.
`replace`	A boolean. Should resampling occur with replacement (boostrap) or without replacement (permutation)
`zVals`	An optional list of observed (statObs) and resampling (statsPerm) z-values. If supplied, the calculation of the observed and resampling test statistics is skipped and the algorithm proceeds with calculation of the consensus null distribution
`estP0args`	A list of arguments passed on to the estP0 function
`resamZvals`	A boolean, should resampling rather than theoretical null distributions be used?
`smoothObs`	A boolean, should the fitted rather than estimated observed distribution be used in the Fdr calculation?
`scale`	a boolean, should data be scaled prior to resampling
`tieBreakRan`	A boolean, should ties of resampling test statistics be broken randomly? If not, midranks are used
`pi0`	A known fraction of true null hypotheses. If provided, the fraction of true null hypotheses will not be estimated. Mainly for oracle purposes.

Efron (2007) centers the observations in each group prior to permutation. As permutations will remove any genuine group differences anyway, we skip this step by default. If zValues = FALSE, the density is fitted on the original test statistics rather than converted to z-values. This unlocks the procedure for test statistics with unknown distributions, but may be numerically less stable.

A list with entries

`statsPerm`	Resampling Z-values
`statObs`	Observed Z-values
`distFun`	Density, distribution and quantile function as given
`testPargs`	Same as given
`zValues`	A boolean, were z-values used?
`resamZvals`	A boolean, were the resampling null distribution used?
`cdfValObs`	Cumulative distribution function evaluation of observed test statistics
`p0estimated`	A boolean, was the fraction of true null hypotheses estimated from the data?
`Fdr,fdr`	Estimates of tail-area and local false discovery rates
`p0`	Estimated or supplied fraction of true null hypotheses
`iter`	Number of iterations executed
`fitAll`	Mean and standard deviation estimated collapsed null
`PermDensFits`	Mean and standard deviations of resamples
`convergence`	A boolean, did the iterative algorithm converge?
`zSeq`	Basis for the evaluation of the densities
`weights`	weights of the resampling distributions
`zValsDensObs`	Estimated overall densities, evaluated in zSeq

Ideally, it would be better to only use unique resamples, to avoid unnecesarry replicated calculations of the same test statistics. Yet this issue is almost alwyas ignored in practice; as the sample size grows it also becomes irrelevant. Notice also that this would require to place weights in case of the bootstrap, as some bootstrap samples are more likely than others.

#Important notice: low number of resamples B necessary to keep
# computation time down, but not recommended. Pray set B at 200 or higher.
p = 50; n = 20; B = 5e1
x = rep(c(0,1), each = n/2)
mat = cbind(
matrix(rnorm(n*p/10, mean = 5+x), n, p/10), #DA
matrix(rnorm(n*p*9/10, mean = 5), n, p*9/10) #Non DA
)
fdrRes = reconsi(mat, x, B = B)
fdrRes$p0
#Indeed close to 0.9
estFdr = fdrRes$Fdr
#The estimated tail-area false discovery rates.

#With another type of test. Need to supply quantile function in this case
fdrResLm = reconsi(mat, x, B = B,
test = function(x, y){
fit = lm(y~x)
c(summary(fit)$coef["x","t value"], fit$df.residual)},
distFun = function(q){pt(q = q[1], df = q[2])})

#With a test statistic without known null distribution(for small samples)
fdrResKruskal = reconsi(mat, x, B = B,
test = function(x, y){
kruskal.test(y~x)$statistic}, zValues = FALSE)

#Provide an additional covariate through the 'argList' argument
z = rpois(n , lambda = 2)
fdrResLmZ = reconsi(mat, x, B = B,
test = function(x, y, z){
fit = lm(y~x+z)
c(summary(fit)$coef["x","t value"], fit$df.residual)},
distFun = function(q){pt(q = q[1], df = q[2])},
argList = list(z = z))

#When nog grouping variable is provided, a bootstrap is performed
matBoot = cbind(
matrix(rnorm(n*p/10, mean = 1), n, p/10), #DA
matrix(rnorm(n*p*9/10, mean = 0), n, p*9/10) #Non DA
)
fdrResBoot = reconsi(matBoot, B = B,
test = function(y, x){testRes = t.test(y, mu = 0, var.equal = TRUE);
c(testRes$statistic, testRes$parameter)},
distFun = function(q){pt(q = q[1], df = q[2])},
center = TRUE, replace = TRUE)