StructFDR: False Discovery Rate (FDR) Control Integrating a General...
In StructFDR: False Discovery Control Procedure Integrating the Prior Structure Information

StructFDR

R Documentation

False Discovery Rate (FDR) Control Integrating a General Prior Structure

Description

The procedure is based on an empirical Bayes hierarchical model, where a structure-based prior distribution is designed to utilize the prior structure information. A moderated statistic based on posterior mean is used for permutation-based FDR control. By borrowing information from neighboring features defined based a distance metric, it is able to improve the statistical power of detecting associated features while controlling the FDR at desired levels.

Usage

StructFDR(X, Y, D, test.func, perm.func, eff.sign = TRUE, B = 20, q.cutoff = 0.5,
          alpha = 1, adaptive = c('Fisher', 'Overlap'), alt.FDR = c('BH', 'Permutation'),
           ...)

Arguments

`X`	a data matrix, rows are the features and columns are the samples.
`Y`	a vector of the phenotypic values, where association tests are being assessed
`D`	a matrix of pairwise distances between features. It determines how the features are related and the neighborhood for information borrowing.
`test.func`	a function that performs the actual tests. It takes `X`, `Y` and ... as the inputs, and returns a list with two slots `p.value` and `e.sign`, which are vectors of p-values and signs of the effects.
`perm.func`	a function that performs the permutation tests. It takes `X`, `Y` and ... as the inputs, and returns a list with two slots `X` and `Y`, which contain the permuted data.
`eff.sign`	a logical value indicating whether the direction of the effects should be considered. If it is true (default), negative and positive effects provide conflicting information.
`B`	the number of permutations. The default is 20. If computation time is not a big concern, `B=100` is suggested to achieve excellent reproducibility between different runs.
`q.cutoff`	the quantile cutoff to determine the feature sets to estimate the number of false positives under the null. This cutoff is to protect the signal part of the distributions. The default is 0.5.
`alpha`	the exponent applied to the distance matrix. Large values have more smoothing effects for closely related species. The default is 1. If the underlying structure assumption is considered to be very strong, robustness can be improved by decreasing the value to 0.5.
`adaptive`	the proposed procedure is most powerful when the signal structure conforms to the prior structure. When this assumption is seriously violated, it loses power. We provide two heuristic adaptive approaches to compensate the power loss in such situations. 'Fisher' approach compares the number of hits by our method to the alternative FDR approach at an FDR of 20% and uses the alternative FDR approach if the number of hits is significantly less based on Fisher's exact test; 'Overlap' method selects the alternative approach when it fails to identify half of the hits from the alternative approach at an FDR of 20%. The default is 'Fisher'.
`alt.FDR`	the alternative FDR control used when the proposed approach is powerless. The default is 'BH' procedure and another option is the permutation-based FDR control ('Permutation')
`...`	further arguments such as covariates to be passed to `test.func`

Value

`p.adj`	StructFDR adjusted p-values.
`p.unadj`	raw p-values.
`z.adj`	moderated z-values. The scale may be different from the raw z-values.
`z.unadj`	raw z-values.
`k`, `rho`	the estimates of the hyperparameters. The values indicate the informativeness of the prior structure.

Author(s)

Jun Chen

References

Xiao, J., Cao, H., Chen, J. (2017). False discovery rate control incorporating phylogenetic tree increases detection power in microbiome-wide multiple testing. Bioinformatics, 33(18), 2873-2881.

Examples

require(ape) 
require(nlme)
require(cluster)
require(StructFDR)

# Generate a caelescence tree and partition into 10 clusters
set.seed(1234)
n <- 20
p <- 200
tree <- rcoal(p)
# Pairwise distance matrix. 
D <- cophenetic(tree)
clustering <- pam(D, k=10)$clustering

# Simulate case-control data, assuming cluster 2 is differential  
X.control <- matrix(rnorm(n*p), p, n)
X.case <- matrix(rnorm(n*p), p, n)
eff.size <- rnorm(sum(clustering == 2), 0.5, 0.2)     
X.case[clustering == 2, ] <- X.case[clustering == 2, ] + eff.size
X <- cbind(X.control, X.case)
Y <- gl(2, n) 

# Define testing and permutation function
test.func <- function (X, Y) {
	obj <- apply(X, 1, function(x) {
				ttest.obj <- t.test(x ~ Y)
				c(ttest.obj$p.value, sign(ttest.obj$statistic))
			})
    return(list(p.value=obj[1, ], e.sign=obj[2, ]))
}

perm.func <- function (X, Y) {
	return(list(X=X, Y=sample(Y)))
}

# Call StructFDR
tree.fdr.obj <- StructFDR(X, Y, D, test.func, perm.func)

# Compare StructFDR and BH
tree.fdr.obj$p.adj
tree.fdr.obj$p.adj[clustering == 2]
BH.p.adj <- p.adjust(tree.fdr.obj$p.unadj, 'fdr')
BH.p.adj[clustering == 2]

# Adjusted statistics vs clustering
par(mfrow=c(1, 2))
plot(clustering, tree.fdr.obj$z.unadj)
plot(clustering, tree.fdr.obj$z.adj)

StructFDR documentation built on May 29, 2024, 4:46 a.m.