generate_null: Generate empirical null distribution using permuted data.

Description Usage Arguments Value See Also Examples

View source: R/generate_null.R

Description

This function runs locus on data with permuted response sample labels. A common use is for estimatimation of data-driven false discovery rates of given thresholds on the posterior probabilities of inclusion.

Usage

1
2
3
4
generate_null(n_perm, Y, X, p0_av, Z = NULL, link = "identity",
  ind_bin = NULL, list_hyper = NULL, list_init = NULL,
  list_blocks = NULL, user_seed = NULL, tol = 0.001, maxit = 1000,
  verbose = TRUE, results_dir = NULL, n_cpus = 1)

Arguments

n_perm

Number of permuted datasets on which locus is run.

Y

Response data matrix (without permuted sample indices) of size n x d, where n is the number of observations and d is the number of response variables.

X

Input matrix of size n x p, where p is the number of candidate predictors. X cannot contain NAs. No intercept must be supplied.

p0_av

Prior average number of predictors expected to be included in the model (if list_blocks is TRUE, average number per predictor blocks). Must be NULL if list_init and list_hyper are both non-NULL or if list_cv is non-NULL. If list_blocks is NULL, can also be a vector of length p with entry s corresponding to the prior probability that candidate predictor s is associated with at least one response.

Z

Covariate matrix of size n x q, where q is the number of covariates. NULL if no covariate. Factor covariates must be supplied after transformation to dummy coding. No intercept must be supplied.

link

Response link. Must be "identity" for linear regression, "logit" for logistic regression, "probit" for probit regression, or "mix" for a mix of identity and probit link functions (in this case, the indices of the binary responses must be gathered in argument ind_bin, see below).

ind_bin

If link = "mix", vector of indices corresponding to the binary variables in Y. Must be NULL if link != "mix".

list_hyper

An object of class "hyper" containing the model hyperparameters. Must be filled using the set_hyper function or must be NULL for default hyperparameters.

list_init

An object of class "init" containing the initial variational parameters. Must be filled using the set_init function or be NULL for a default initialization.

list_blocks

An object of class "blocks" containing setting for parallel inference on a partitioned predictor space. Must be filled using the set_blocks function or must be NULL for no partitioning.

user_seed

Seed set for reproducible default choices of hyperparameters (if list_hyper is NULL) and inital variational parameters (if list_init is NULL). Default is NULL, no seed set.

tol

Tolerance for the stopping criterion.

maxit

Maximum number of iterations allowed.

verbose

If TRUE, messages are displayed during execution.

results_dir

Path where the output of each of the n_perm runs will be saved. Default is NULL, the output is not saved to files and a list of objects of class "perm" is returned instead.

n_cpus

Number of CPUs to be used. If large, one should ensure that enough RAM will be available for parallel execution. Set to 1 for serial execution.

Value

If results_dir is NULL, list of length n_perm with each element corresponding to the output of locus on one permuted dataset. Each contains:

ind_perm

Vector of length n containing the permuted response sample labels.

gam_vb

Posterior inclusion probability matrix of size p x d. Entry (s, t) corresponds to the posterior probability of association between predictor s and response t.

om_vb

Vector of length p containing the posterior mean of omega. Entry s controls the proportion of responses associated with candidate predictor s.

See Also

locus

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
seed <- 123; set.seed(seed)

###################
## Simulate data ##
###################

## Example using small problem sizes:
##
n <- 200; p <- 250; p0 <- 20; d <- 25; d0 <- 20

## Candidate predictors (subject to selection)
##
# Here we simulate common genetic variants (but any type of candidate
# predictors can be supplied).
# 0 = homozygous, major allele, 1 = heterozygous, 2 = homozygous, minor allele
#
X_act <- matrix(rbinom(n * p0, size = 2, p = 0.25), nrow = n)
X_inact <- matrix(rbinom(n * (p - p0), size = 2, p = 0.25), nrow = n)

shuff_x_ind <- sample(p)
X <- cbind(X_act, X_inact)[, shuff_x_ind]

bool_x_act <- shuff_x_ind <= p0

pat_act <- beta <- matrix(0, nrow = p0, ncol = d0)
pat_act[sample(p0*d0, floor(p0*d0/5))] <- 1
beta[as.logical(pat_act)] <-  rnorm(sum(pat_act))

## Gaussian responses
##
Y_act <- matrix(rnorm(n * d0, mean = X_act %*% beta, sd = 0.5), nrow = n)
Y_inact <- matrix(rnorm(n * (d - d0), sd = 0.5), nrow = n)
shuff_y_ind <- sample(d)
Y <- cbind(Y_act, Y_inact)[, shuff_y_ind]

########################
## Infer associations ##
########################

list_perm <- generate_null(n_perm = 10, Y, X, p0_av = p0, link = "identity",
                           user_seed = seed, verbose = FALSE)

hruffieux/locus documentation built on Oct. 22, 2018, 6:54 a.m.