succotash: Surrogate and Confounder Correction Occuring Together with...

Description Usage Arguments Details Value See Also

Description

This function implements the full SUCCOTASH method. First, it rotates the response and explanatory variables into a part that we use to estimate the confounding variables and the variances, and a part that we use to estimate the coefficients of the observed covariates. This function will implement a factor analysis for the first part then run succotash_given_alpha for the second part.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
succotash(Y, X, k = NULL, sig_reg = 0.01, num_em_runs = 2,
  z_start_sd = 1, two_step = TRUE, fa_method = c("pca", "reg_mle",
  "quasi_mle", "homoPCA", "pca_shrinkvar", "mod_fa", "flash_hetero", "non_homo",
  "non_hetero", "non_shrinkvar"), lambda_type = c("zero_conc", "ones"),
  mix_type = c("normal", "uniform"), likelihood = c("normal", "t"),
  lambda0 = 10, tau_seq = NULL, em_pi_init = NULL,
  plot_new_ests = FALSE, em_itermax = 200, var_scale = TRUE,
  inflate_var = 1, optmethod = c("coord", "em"), use_ols_se = FALSE,
  z_init_type = c("null_mle", "random"), var_scale_init_type = c("null_mle",
  "one", "random"))

Arguments

Y

An n by p matrix of response variables.

X

An n by q matrix of covariates. Only the variable in the last column is of interest.

k

An integer. The number of hidden confounders. If NULL and sva is installed, this will be estimated, by the num.sv function in the sva package available on Bioconductor.

sig_reg

A numeric. If fa_method is "reg_mle", then this is the value of the regularization parameter.

num_em_runs

An integer. The number of times we should run the EM algorithm.

z_start_sd

A positive numeric. At the beginning of each EM algorithm, Z is initiated with independent mean zero normals with standard deviation z_start_sd.

two_step

A logical. Should we run the two-step SUCCOTASH procedure of inflating the variance (TRUE) or not (FALSE)? Defaults to TRUE.

fa_method

Which factor analysis method should we use? The regularized MLE implemented in factor_mle ("reg_mle"), two methods fromthe package cate: the quasi-MLE ("quasi_mle") from Bai and Li (2012), just naive PCA ("pca"), FLASH ("flash_hetero"), homoscedastic PCA ("homoPCA"), PCA followed by shrinking the variances using limma ("pca_shrinkvar"), or moderated factor analysis ("mod_fa"). Three methods for no confounder adjustment are available, "non_homo", "non_shrinkvar", and "non_hetero".

lambda_type

See succotash_given_alpha for options on the regularization parameter of the mixing proportions.

mix_type

Should the prior be a mixture of normals mix_type = 'normal' or a mixture of uniforms mix_type = 'uniform'?

likelihood

Which likelihood should we use? Normal ("normal") or t ("t")?

lambda0

If lambda_type = "zero_conc", then lambda0 is the amount to penalize pi0.

tau_seq

A vector of length M containing the standard deviations (not variances) of the mixing distributions.

em_pi_init

A vector of length M containing the starting values of π. If NULL, then one of three options are implemented in calculating pi_init based on the value of pi_init_type. Only available in normal mixtures for now.

plot_new_ests

A logical. Should we plot the mixing proportions at each iteration of the EM algorithm?

em_itermax

A positive numeric. The maximum number of iterations to run during the EM algorithm.

var_scale

A logical. Should we update the scaling on the variances (TRUE) or not (FALSE). Only works for the normal mixtures case right now. Defaults to TRUE.

inflate_var

A positive numeric. The multiplicative amount to inflate the variance estimates by. There is no theoretical justification for it to be anything but 1, but I have it in here to play around with it.

optmethod

Either coordinate ascent ("coord") or an EM algorithm ("em"). Coordinate ascent is currently only implemented in the uniform mixtures case, for which it is the default.

use_ols_se

A logical. Should we use the standard formulas for OLS of X on Y to get the estimates of the variances (TRUE) or not (FALSE)

z_init_type

How should we initiate the confounders? At the all-null MLE ("null_mle") or from iid standard normals ("random")?

var_scale_init_type

If var_scale = TRUE, how should we initiate the variance inflaiton parameter? From the all-null MLE ("null_mle"), at no inflation ("one"), or from a chi-squared distribution with one degree of freedom ("random")?

Details

The assumed mode is

Y = Xβ + Zα + E.

Y is a n by p matrix of response varaibles. For example, each row might be an array of log-transformed and quantile normalized gene-expression data. X is a n by q matrix of observed covariates. It is assumed that all but the last column of which contains nuisance parameters. For example, the first column might be a vector of ones to include an intercept. β is a q by p matrix of corresponding coefficients. Z is a n by k matrix of confounder variables. α is the corresponding k by p matrix of coefficients for the unobserved confounders. E is a n by p matrix of error terms. E is assumed to be matrix normal with identity row covariance and diagonal column covariance Σ. That is, the columns are heteroscedastic while the rows are homoscedastic independent.

This function will first rotate Y and X using the QR decomposition. This separates the model into three parts. The first part only contains nuisance parameters, the second part contains the coefficients of interest, and the third part contains the confounders. succotash applies a factor analysis to the third part to estimate the confounding factors, then runs an EM algorithm on the second part to estimate the coefficients of interest.

Many forms of factor analyses are avaiable. The default is PCA with the column-wise residual mean-squares as the estimates of the column-wise variances.

Value

See succotash_given_alpha for details of output.

Y1_scaled The OLS estimates.

sig_diag_scaled The estimated standard errors of the estimated effects (calculated from the factor analysis step) times scale_val.

sig_diag The estimates of the gene-wise variances (but not times scale_val).

pi0 A non-negative numeric. The marginal probability of zero.

alpha_scaled The scaled version of the estimated coefficients of the hidden confounders.

Z A vector of numerics. Estimated rotated confounder in second step of succotash.

pi_vals A vector of numerics between 0 and 1. The mixing proportions.

tau_seq A vector of non-negative numerics. The mixing standard deviations (not variances).

lfdr A vector of numerics between 0 and 1. The local false discovery rate. I.e. the posterior probability of a coefficient being zero.

lfsr A vector of numerics between 0 and 1. The local false sign rate. I.e. the posterior probability of making a sign error if one chose the most probable sign.

qvals A vector of numerics between 0 and 1. The q-values. The average error rate if we reject all hypotheses that have smaller q-value.

betahat A vector of numerics. The posterior mean of the coefficients.

See Also

succotash_given_alpha, factor_mle, succotash_summaries.


dcgerard/succotashr documentation built on May 15, 2019, 1:25 a.m.