test.vc.twosigma: Convenient wrapper function for performing (joint) likelihood...
In edvanburen/twosigma: DE Analysis for Single-Cell RNA-Sequencing Data

View source: R/test.vc.twosigma.R

test.vc.twosigma

R Documentation

Convenient wrapper function for performing (joint) likelihood ratio tests of variance components using the TWO-SIGMA model.

Description

Convenient wrapper function for performing (joint) likelihood ratio tests of variance components using the TWO-SIGMA model.

Usage

test.vc.twosigma(
  count_matrix,
  mean_covar,
  zi_covar,
  mean_re = TRUE,
  zi_re = TRUE,
  id,
  return_full_fits = TRUE,
  adhoc = FALSE,
  adhoc_thresh = 0.1,
  silent = FALSE,
  disp_covar = NULL,
  weights = rep(1, ncol(count_matrix)),
  control = glmmTMBControl(),
  ncores = 1,
  cluster_type = "Fork",
  chunk_size = 1,
  lb = FALSE
)

Arguments

`count_matrix`	Matrix of non-negative integer read counts, with rows corresponding to genes and columns corresponding to cells. It is recommended to make the rownames the gene names for better output.
`mean_covar`	Covariates for the (conditional) mean model. Must be a matrix (without an intercept column) or a vector if a single covariate is being tested.
`zi_covar`	Covariates for the zero-inflation model. Must be a matrix (without an intercept column) or a vector if a single covariate is being tested.
`mean_re`	Should random intercepts be tested in the (conditional) mean model?
`zi_re`	Should random intercepts be tested in the zero-inflation model?
`id`	Vector of individual-level ID's. Used for random effect prediction and the adhoc method but required regardless.
`return_full_fits`	If TRUE, fit objects of class glmmTMB are returned. If FALSE, only objects of class summary.glmmTMB are returned. The latter require a much larger amount of memory to store.
`adhoc`	Should the adhoc method be used by default to judge if random effects are needed?
`adhoc_thresh`	Value below which the adhoc p-value is deemed significant (and thus RE are deemed necessary). Only used if adhoc==TRUE.
`silent`	If TRUE, progress is not printed.
`disp_covar`	Covariates for a log-linear model for the dispersion. Either a matrix or = 1 to indicate an intercept only model.
`weights`	weights, as in glm. Defaults to 1 for all observations and no scaling or centering of weights is performed. See `?glmmTMBControl`.
`control`	Control parameters for optimization in glmmTMB.
`ncores`	Number of cores used for parallelization. Defaults to 1, meaning no parallelization of any kind is done.
`cluster_type`	Whether to use a "cluster of type "Fork" or "Sock". On Unix systems, "Fork" will likely improve performance. On Windows, only "Sock" will actually result in parallelized computing.
`chunk_size`	Number of genes to be sent to each parallel environment. Parallelization is more efficient, particularly with a large count matrix, when the count matrix is 'chunked' into some common size (e.g. 10, 50, 200). Defaults to 10.
`lb`	Should load balancing be used for parallelization? Users will likely want to set to FALSE for improved performance.

Value

A list with the following elements:

fit_null: Model fits under the null hypothesis. If return_summary_fits=TRUE, returns a list of objects of class summary.glmmTMB. If return_summary_fits=FALSE, returns a list of model fit objects of class glmmTMB. In either case, the order matches the row order of count_matrix, and the names of the list elements are taken as the rownames of count_matrix.
fit_alt: Model fits under the alt hypothesis of the same format as fit_null.
LR_stat: Vector of Likelihood Ratio statistics. A value of 'NA' implies a convergence issue or other model fit problem.
LR_p.val: Vector of Likelihood Ratio p-values. A value of 'NA' implies a convergence issue or other model fit problem.

Details

If either model fails to converge, or the LR statistic is negative, both the statistic and p-value are assigned as NA.

Examples

# Set Parameters to Simulate Some Data

nind<-10;ncellsper<-rep(50,nind)
sigma.a<-.5;sigma.b<-.5;phi<-.1
alpha<-c(1,0,-.5,-2);beta<-c(2,0,-.1,.6)
beta2<-c(2,1,-.1,.6)
id.levels<-1:nind;nind<-length(id.levels)
id<-rep(id.levels,times=ncellsper)
sim.seed<-1234

# Simulate individual level covariates

t2d_sim<-rep(rbinom(nind,1,p=.4),times=ncellsper)
cdr_sim<-rbeta(sum(ncellsper),3,6)
age_sim<-rep(sample(c(20:60),size=nind,replace = TRUE),times=ncellsper)

# Construct design matrices

Z<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(Z)<-c("t2d_sim","age_sim","cdr_sim")
X<-cbind(scale(t2d_sim),scale(age_sim),scale(cdr_sim))
colnames(X)<-c("t2d_sim","age_sim","cdr_sim")

# Simulate Data

sim_dat<-matrix(nrow=2,ncol=sum(ncellsper))
for(i in 1:nrow(sim_dat)){
   sim_dat[i,]<-simulate_zero_inflated_nb_random_effect_data(ncellsper,X,Z,alpha,beta2
   ,phi,sigma.a,sigma.b,id.levels=NULL)$Y
}
rownames(sim_dat)<-paste("Gene",1:2)

# Run test.vc.twosigma

test.vc.twosigma(sim_dat[1,,drop=FALSE],mean_covar = X,zi_covar=Z
,mean_re = TRUE,zi_re=FALSE,id = id)

edvanburen/twosigma documentation built on July 3, 2023, 3:39 p.m.