vruv2: Calibrated RUV2.

Description Usage Arguments Details Value Author(s) References See Also

View source: R/ruv2.R

Description

This function will perform a variant of Removing Unwanted Variation 2-step (RUV2) (Gagnon-Bartsch et al, 2013), where we include a variance inflation parameter in the factor analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
vruv2(
  Y,
  X,
  ctl,
  k = NULL,
  cov_of_interest = ncol(X),
  likelihood = c("t", "normal"),
  limmashrink = TRUE,
  degrees_freedom = NULL,
  include_intercept = TRUE,
  gls = TRUE,
  fa_func = pca_2step,
  fa_args = list(),
  use_factor = FALSE,
  force_check = TRUE,
  fa_limmashrink = TRUE
)

Arguments

Y

A matrix of numerics. These are the response variables where each column has its own variance. In a gene expression study, the rows are the individuals and the columns are the genes.

X

A matrix of numerics. The covariates of interest.

ctl

A vector of logicals of length ncol(Y). If position i is TRUE then position i is considered a negative control.

k

A non-negative integer.The number of unobserved confounders. If not specified and the R package sva is installed, then this function will estimate the number of hidden confounders using the methods of Buja and Eyuboglu (1992).

cov_of_interest

A vector of positive integers. The column numbers of the covariates in X whose coefficients you are interested in. The rest are considered nuisance parameters and are regressed out by OLS.

likelihood

Either "normal" or "t". If likelihood = "t", then the user may provide the degrees of freedom via degrees_freedom.

limmashrink

A logical. Should we apply hierarchical shrinkage to the variances (TRUE) or not (FALSE)? If degrees_freedom = NULL and limmashrink = TRUE and likelihood = "t", then we'll also use the limma returned degrees of freedom.

degrees_freedom

if likelihood = "t", then this is the user-defined degrees of freedom for that distribution. If degrees_freedom is NULL then the degrees of freedom will be the sample size minus the number of covariates minus k.

include_intercept

A logical. If TRUE, then it will check X to see if it has an intercept term. If not, then it will add an intercept term. If FALSE, then X will be unchanged.

gls

A logical. Should we estimate the part of the confounders associated with the nuisance parameters with gls (TRUE) or with ols (FALSE).

fa_func

A factor analysis function. It must take parameters: Y a data matrix of numerics, r a positive integer for the rank, and vr a positive integer for the number of the first rows that have a different variance than the last rows. It must return: alpha a matrix for the factor loadings, Z a matrix for the factors, sig_diag a vector of the column-wise variances, and lambda a numeric for the variance inflation of the first vr rows of Y. The default function is pca_2step, which is the main difference between RUV2 and this version.

fa_args

A list. Additional arguments you want to pass to fa_func.

use_factor

A logical. Should we use the estimates of alpha and sig_diag from the factor analysis (TRUE), or re-estimate these using OLS as RUV2 does it (FALSE)? Right now it's probably a bad idea to have the settings use_factor = TRUE, fa_limmashrink = TRUE, limmashrink = TRUE since then the variance estimates of the control genes are being shrunk twice.

force_check

A logical. Are you REALLY sure you want to use another fa_func (FALSE) or should I ask you again (TRUE)?

fa_limmashrink

A logical. Should we shrink the variances during the factor analysis step (TRUE) or not (FALSE)?

Details

See vruv4 for a description of the model.

You can provide your own factor analysis, but it must include an estimate for the variance inflation parameter. This turns out to be pretty hard. The way I do it now seems to work OK.

Value

A list whose elements are:

betahat A matrix of numerics. The ordinary least squares estimates of the coefficients of the covariate of interest WHEN YOU ALSO INCLUDE THE ESTIMATES OF THE UNOBSERVED CONFOUNDERS.

sebetahat A matrix of positive numerics. This is the post-inflation adjusted standard errors for ruv$betahat.

tstats A vector of numerics. The t-statistics for testing against the null hypothesis of the coefficient of the covariate of interest being zero. This is after estimating the variance inflation parameter.

pvalues A vector of numerics. The p-values of said test above.

alphahat A matrix of numerics. The estimates of the coefficients of the hidden confounders. Only identified up to a rotation on the rowspace.

Zhat A matrix of numerics. The estimates of the confounding variables. Only identified up to a rotation on the columnspace.

sigma2 A vector of positive numerics. The estimates of the variances PRIOR to inflation.

sigma2_adjusted A vector of positive numerics. The estimates of the variances AFTER to inflation. This is equal to sigma2 * multiplier.

multiplier A numeric. The estimated variance inflation parameter.

mult_matrix A matrix of numerics. Equal to solve(t(cbind(X, Zhat)) %*% cbind(X, Zhat)). One multiplies sigma2 or simga2_adjused by the diagonal elements of mult_matrix to get the standard errors of betahat.

degrees_freedom The degrees of freedom of the t- statistics.

Author(s)

David Gerard

References

See Also

pca_2step for the special factor analysis that results in variance inflation in RUV2.


dcgerard/vicar documentation built on July 7, 2021, 1:08 p.m.