power.estimate: Power Estimation by Generalized Model-Free Knockoffs Filter

Description Usage Arguments Details Value References

View source: R/power.estimate.R

Description

This function could be used for estimating power and FDP using the knockoffs filter prior to data collection. Once the user inputs the dimensions of data (sample size and number of covariates), and certain expectation for data structure and association type, this function could simulate data for multiple times and ultimately give an expected value for power and FDR.

Usage

1
2
3
4
5
6
power.estimate(n, p, X.dist=c("Gaussian","Binary","Exponential"), X.mu=rep(0,p), X.cov=diag(p),
                           beta = NULL, numTrue = NULL, percentTrue = NULL, amplitude=1,
                           association = c("linear","power","exponential","cosine"), power.degree=2,
                           link = c("identity","logit","survival"), family = NULL,
                           surv.lambdaT=.002, surv.lambdaC=.004, surv.shape=1,
                           nIterations = 10, ...)

Arguments

n

sample size

p

number of covariates, including null variables

X.dist

distribution of design matrix. Either "Gaussian", "Binary" or "Exponential"

X.mu

expected values for X, a vector of length p (default: zero vector of length p)

X.cov

variance-covariance matrix (p by p) for X (default: identity)

beta

coefficients for p variables if known, a vector of length p

numTrue

number of true signals among p variables

percentTrue

percentage of true signals among p variables

amplitude

signal amplitude

association

association between predictors and response (on the scale of linear predictors). The linear predictor will be X*beta when the input argument is "linear", X^[some power]*beta when "power", exp(X)*beta when "exponential", and cos(X)*beta when "cosine".

power.degree

power degree when the "power" association is selected (default: 2)

link

link function between linear predictor and the response. "identity" for identity link and "logit" for logit link. If "survival" is selected, then survival response will be generated using the hazard function in Cox model.

family

Binomial(), Binomial(link = “logit”, type=”glm”), Gaussian(), Poisson(), CoxPH(), Cindex(), GammaReg(), NBinomial(), Weibull(), Loglog(), Lognormal(), etc. See mboost documentation for details.

surv.lambdaT

baseline hazard in survival response, default: 0.002

surv.lambdaC

hazard of censoring in survival response, default: 0.004

surv.shape

shape parameter of weibull distribution, default: 1

nIterations

number of runs to get the means / distributions of estimated power and FDR

...

further arguments passed to function selection

Details

At least one of the three arguments, beta, numTrue, and percentTrue, must be specified, or, an error would appear. For now, the signal amplitude is set to be identical for all the true signals. Generalizations could be made in the future.

Value

A list containing expected value of power, a list of power values from all experiments, standard deviation of power, mean value of FDR achieved (expected to be around the target value)

References

Candes et al., Panning for Gold: Model-free Knockoffs for High-dimensional Controlled Variable Selection, arXiv:1610.02351 (2016). https://statweb.stanford.edu/~candes/MF_Knockoffs/index.html

Barber and Candes, Controlling the false discovery rate via knockoffs. Ann. Statist. 43 (2015), no. 5, 2055–2085. https://projecteuclid.org/euclid.aos/1438606853

Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid (2014). Model-based Boosting in R: A Hands-on Tutorial Using the R Package mboost. Computational Statistics, 29, 3–35. http://dx.doi.org/10.1007/s00180-012-0382-5 Available as vignette via: vignette(package = "mboost", "mboost_tutorial")


hanfu-bios/varsel documentation built on March 19, 2018, 10:08 a.m.