ash_ruv4: Use control genes to estimate hidden confounders and variance...

Description Usage Arguments Details Value Author(s) References

View source: R/ash_wrap.R

Description

This function will perform a variant of Removing Unwanted Variation 4-step (RUV4) (Gagnon-Bartsch et al, 2013) where the control genes are used not only to estimate the hidden confounders, but to estimate a variance inflation parameter. This variance inflation step is akin to the "empirical null" approach of Efron (2004). After this procedure, Adaptive SHrinkage (ASH) (Stephens, 2016) is performed on the coefficient estimates and the inflated standard errors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
ash_ruv4(
  Y,
  X,
  ctl = NULL,
  k = NULL,
  cov_of_interest = ncol(X),
  likelihood = c("t", "normal"),
  ash_args = list(),
  limmashrink = TRUE,
  degrees_freedom = NULL,
  include_intercept = TRUE,
  gls = TRUE,
  fa_func = pca_naive,
  fa_args = list(),
  scale_var = TRUE
)

Arguments

Y

A matrix of numerics. These are the response variables where each column has its own variance. In a gene expression study, the rows are the individuals and the columns are the genes.

X

A matrix of numerics. The covariates of interest.

ctl

A vector of logicals of length ncol(Y). If position i is TRUE then position i is considered a negative control.

k

A non-negative integer.The number of unobserved confounders. If not specified and the R package sva is installed, then this function will estimate the number of hidden confounders using the methods of Buja and Eyuboglu (1992).

cov_of_interest

A positive integer. The column number of the covariate in X whose coefficients you are interested in. The rest are considered nuisance parameters and are regressed out by OLS. ash_ruv4 only works with one covariate of interest right now.

likelihood

Either "normal" or "t". If likelihood = "t", then the user may provide the degrees of freedom via degrees_freedom.

ash_args

A list of arguments to pass to ash. See ash.workhorse for details.

limmashrink

A logical. Should we apply hierarchical shrinkage to the variances (TRUE) or not (FALSE)? If degrees_freedom = NULL and limmashrink = TRUE and likelihood = "t", then we'll also use the limma returned degrees of freedom.

degrees_freedom

if likelihood = "t", then this is the user-defined degrees of freedom for that distribution. If degrees_freedom is NULL then the degrees of freedom will be the sample size minus the number of covariates minus k.

include_intercept

A logical. If TRUE, then it will check X to see if it has an intercept term. If not, then it will add an intercept term. If FALSE, then X will be unchanged.

gls

A logical. Should we use generalized least squares (TRUE) or ordinary least squares (FALSE) for estimating the confounders? The OLS version is equivalent to using RUV4 to estimate the confounders.

fa_func

A factor analysis function. The function must have as inputs a numeric matrix Y and a rank (numeric scalar) r. It must output numeric matrices alpha and Z and a numeric vector sig_diag. alpha is the estimate of the coefficients of the unobserved confounders, so it must be an r by ncol(Y) matrix. Z must be an r by nrow(Y) matrix. sig_diag is the estimate of the column-wise variances so it must be of length ncol(Y). The default is the function pca_naive that just uses the first r singular vectors as the estimate of alpha. The estimated variances are just the column-wise mean square.

fa_args

A list. Additional arguments you want to pass to fa_func.

scale_var

A logical. Should we use the variance inflation parameter in the estimate standard errors when inserting into ash.workhorse (TRUE) or not (FALSE)?

Details

The model is

Y = XB + ZA + E,

where Y is a matrix of responses (e.g. log-transformed gene expression levels), X is a matrix of covariates, B is a matrix of coefficients, Z is a matrix of unobserved confounders, A is a matrix of unobserved coefficients of the unobserved confounders, and E is the noise matrix where the elements are independent Gaussian and each column shares a common variance. The rows of Y are the observations (e.g. individuals) and the columns of Y are the response variables (e.g. genes).

This model is fit using a two-step approach proposed in Gagnon-Bartsch et al (2013) and described in Wang et al (2015), modified to include estimating a variance inflation parameter. Rather than use OLS in the second step of this two-step procedure, we estimate the coefficients using Adaptive SHrinkage (ASH) (Stephens, 2016). In the current implementation, only the coefficients of one covariate can be estimated using ASH. The rest are regressed out using OLS.

Value

Except for the list ruv4, the values returned are the exact same as in ash.workhorse. See that function for more details. Elements in the ruv4 are the exact same as returned in vruv4.

Author(s)

David Gerard

References


dcgerard/vicar documentation built on July 7, 2021, 1:08 p.m.