reconsi: Perform simultaneous inference through collapsed resampling...

View source: R/F_reconsi.R

reconsiR Documentation

Perform simultaneous inference through collapsed resampling null distributions

Description

Perform simultaneous inference through collapsed resampling null distributions

Usage

reconsi(
  Y,
  x = NULL,
  B = 1000L,
  test = "wilcox.test",
  argList = list(),
  distFun = "pnorm",
  zValues = TRUE,
  testPargs = list(),
  z0Quant = 0.25,
  gridsize = 801L,
  maxIter = 100L,
  tol = 1e-06,
  zVals = NULL,
  center = FALSE,
  replace = is.null(x),
  assumeNormal = TRUE,
  estP0args = list(z0quantRange = seq(0.05, 0.45, 0.0125), smooth.df = 3, evalVal = 0.05),
  resamZvals = FALSE,
  smoothObs = TRUE,
  scale = FALSE,
  tieBreakRan = FALSE,
  pi0 = NULL,
  resamAssumeNormal = TRUE
)

Arguments

Y

the matrix of sequencing counts

x

a grouping factor. If provided, this grouping factor is permuted. Otherwise a bootstrap procedure is performed

B

the number of resampling instances

test

Character string, giving the name of the function to test for differential absolute abundance. Must accept the formula interface. Features with tests resulting in observed NA test statistics will be discarded

argList

A list of arguments, passed on to the testing function

distFun

the distribution function of the test statistic, or its name. Must at least accept an argument named 'q', 'p' and 'x' respectively.

zValues

A boolean, should test statistics be converted to z-values. See details

testPargs

A list of arguments passed on to distFun

z0Quant

A vector of length 2 of quantiles of the null distribution, in between which only null values are expected

gridsize

The number of bins for the kernel density estimates

maxIter

An integer, the maximum number of iterations in the estimation of the null distribution

tol

The tolerance for the infinity norm of the central borders in the iterative procedure

zVals

An optional list of observed (statObs) and resampling (statsPerm) z-values. If supplied, the calculation of the observed and resampling test statistics is skipped and the algorithm proceeds with calculation of the consensus null distribution

center

A boolean, should observations be centered in each group prior to permuations? See details.

replace

A boolean. Should resampling occur with replacement (boostrap) or without replacement (permutation)

assumeNormal

A boolean, should normality be assumed for the null distribution?

estP0args

A list of arguments passed on to the estP0 function

resamZvals

A boolean, should resampling rather than theoretical null distributions be used?

smoothObs

A boolean, should the fitted rather than estimated observed distribution be used in the Fdr calculation?

scale

a boolean, should data be scaled prior to resampling

tieBreakRan

A boolean, should ties of resampling test statistics be broken randomly? If not, midranks are used

pi0

A known fraction of true null hypotheses. If provided, the fraction of true null hypotheses will not be estimated. Mainly for oracle purposes.

resamAssumeNormal

A boolean, should normality be assumed for resampling dists

Details

Efron (2007) centers the observations in each group prior to permutation. As permutations will remove any genuine group differences anyway, we skip this step by default. If zValues = FALSE, the density is fitted on the original test statistics rather than converted to z-values. This unlocks the procedure for test statistics with unknown distributions, but may be numerically less stable.

Value

A list with entries

statsPerm

Resampling Z-values

statObs

Observed Z-values

distFun

Density, distribution and quantile function as given

testPargs

Same as given

zValues

A boolean, were z-values used?

resamZvals

A boolean, were the resampling null distribution used?

cdfValObs

Cumulative distribution function evaluation of observed test statistics

p0estimated

A boolean, was the fraction of true null hypotheses estimated from the data?

Fdr,fdr

Estimates of tail-area and local false discovery rates

p0

Estimated or supplied fraction of true null hypotheses

iter

Number of iterations executed

fitAll

Mean and standard deviation estimated collapsed null

PermDensFits

Mean and standard deviations of resamples

convergence

A boolean, did the iterative algorithm converge?

zSeq

Basis for the evaluation of the densities

weights

weights of the resampling distributions

zValsDensObs

Estimated overall densities, evaluated in zSeq

Note

Ideally, it would be better to only use unique resamples, to avoid unnecesarry replicated calculations of the same test statistics. Yet this issue is almost alwyas ignored in practice; as the sample size grows it also becomes irrelevant. Notice also that this would require to place weights in case of the bootstrap, as some bootstrap samples are more likely than others.

Examples

#Important notice: low number of resamples B necessary to keep
# computation time down, but not recommended. Pray set B at 200 or higher.
p = 60; n = 20; B = 5e1
x = rep(c(0,1), each = n/2)
mat = cbind(
matrix(rnorm(n*p/10, mean = 5+x), n, p/10), #DA
matrix(rnorm(n*p*9/10, mean = 5), n, p*9/10) #Non DA
)
fdrRes = reconsi(mat, x, B = B)
fdrRes$p0
#Indeed close to 0.9
estFdr = fdrRes$Fdr
#The estimated tail-area false discovery rates.

#With another type of test. Need to supply quantile function in this case
fdrResLm = reconsi(mat, x, B = B,
test = function(x, y){
fit = lm(y~x)
c(summary(fit)$coef["x","t value"], fit$df.residual)},
distFun = function(q){pt(q = q[1], df = q[2])})

#With a test statistic without known null distribution(for small samples)
fdrResKruskal = reconsi(mat, x, B = B,
test = function(x, y){
kruskal.test(y~x)$statistic}, zValues = FALSE)

#Provide an additional covariate through the 'argList' argument
z = rpois(n , lambda = 2)
fdrResLmZ = reconsi(mat, x, B = B,
test = function(x, y, z){
fit = lm(y~x+z)
c(summary(fit)$coef["x","t value"], fit$df.residual)},
distFun = function(q){pt(q = q[1], df = q[2])},
argList = list(z = z))

#When nog grouping variable is provided, a bootstrap is performed
matBoot = cbind(
matrix(rnorm(n*p/10, mean = 1), n, p/10), #DA
matrix(rnorm(n*p*9/10, mean = 0), n, p*9/10) #Non DA
)
fdrResBoot = reconsi(matBoot, B = B,
test = function(y, x){testRes = t.test(y, mu = 0, var.equal = TRUE);
c(testRes$statistic, testRes$parameter)},
distFun = function(q){pt(q = q[1], df = q[2])},
center = TRUE, replace = TRUE)

CenterForStatistics-UGent/rransi documentation built on Nov. 13, 2023, 2:07 a.m.