run_CARseq: Conduct a CARseq test

Description Usage Arguments Value Examples

View source: R/loglik.R

Description

Conduct a CARseq test

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
run_CARseq(
  count_matrix,
  cellular_proportions,
  groups,
  formula = NULL,
  data = NULL,
  read_depth = 1,
  shrunken_lfc = TRUE,
  cores = 1,
  fix_overdispersion = FALSE,
  useSocket = TRUE
)

Arguments

count_matrix

A matrix of G x n total read counts observed.

cellular_proportions

A matrix of n x H of cellular proportions.

groups

A vector of length n indicating groups that we would like to test. Will be coerced to be factors.

formula

A formula of an intercept term "1" and other cell type-independent variables. Can be NULL.

data

A data frame containing the cell type-indepedent variables specified in formula. Can be NULL.

read_depth

A vector of n sample-specific read depths. It it used as an offset term in the regression model. Alternatively, it can be 1, NA or NULL so that we don't have an offset term in the model, and log(read_depth) can be included as one of the cell type-independent variables.

shrunken_lfc

Logical. If TRUE (default), provide shrunken log fold change for cell type-specific variables.

cores

Numeric. Number of cores to use in parallel::makePSOCKcluster. Note that for faster execution of the package, OPENBLAS or MKL library is recommended. When OPENBLAS is used, add environment variable Sys.setenv(OPENBLAS_NUM_THREADS=1). When MKL is used, add environment variables Sys.setenv(MKL_NUM_THREADS=1) and Sys.setenv(MKL_THREADING_LAYER="GNU").

fix_overdispersion

Logical or numeric. In general, when the sample size is sufficiently large (for example 10 samples per degree of freedom), fix_overdispersion should be FALSE so that the overdispersion parameter is re-estimated in the reduced model. However, when sample size is smaller, overdispersion parameter is hard to estimate and it is might be advantageous to always use overdispersion parameter estimated under the full model, which is similar to how DESeq2 LRT works. Alternatively, only for experimental purposes, a list of overdispersion parameters equal to the number of samples, parametrized so that the variance of the negative binomial distribution is mean + mean^2/overdispersion, can be provided so that pre-computed overdispersion parameters can be used.

useSocket

If TRUE (default), use socket for parallel computation, which works on Windows as well as on POSIX systems (Mac, Linux, Unix, BSD) most of the time. If FALSE, use fork for parallel computation, which on POSIX systems and not Windows.

Value

Returns a list mostly of matrices. Note that the matrices with "shrunken" in their names are only available when shrunken_lfc is TRUE:

p

A matrix of G x H p-values

padj

A matrix of G x H p-values adjusted using Benjamini & Hochberg (1995).

shrunken_lfc

A matrix of shrunken log fold change between cell type-specific effects.

shrunken_lfcSE

A matrix of standard errors of shrunken log fold change of cell type-specific effects between different groups.

shrunken_coefficients

A matrix of shrunken coefficient estimates.

shrunken_coefficientsSE

A matrix of standard errors of shrunken coefficient estimates.

lfc

A matrix of log fold change between cell type-specific effects.

lfcSE

A matrix of standard errors of log fold change of cell type-specific effects between between different groups of cell type-specific effects.

coefficients

A matrix of MLE coefficient estimates.

coefficientsSE

A matrix of standard errors of coefficient estimates.

overdispersion

A matrix of overdispersion parameters. The overdispersion parameter is parametrized so that the variance of the negative binomial distribution is mean + mean^2/overdispersion.

lambda

The inverse of the variance in a zero-centered multivariate normal distribution that is used as the prior of effects when shrinkage is requested. The entries in λ stand for K cell type-independent effects and H(M+1) cell type-specific effects. Originally, in a normal design matrix, there are \(H M\) cell type-specific effects. In an expanded design matrix, however, for each cell type h, there are M contrasts γ_{jhm} and one group mean γ_{jh0}, so there are H(M+1) cell type-specific effects altogether.

elapsed_time

The time elapsed.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data("n50DE221rep1")
# As an example, run only the first 10 genes among the all 10,000 genes to save time:
res = run_CARseq(count_matrix = n50DE221rep1$observed_read_count[1:10,],
                 cellular_proportions = n50DE221rep1$rho,
                 groups = gl(2, 25),
                 formula = ~ RIN,
                 data = n50DE221rep1$clinical_variables,
                 read_depth = n50DE221rep1$d,
                 shrunken_lfc = TRUE,
                 cores = 1,
                 fix_overdispersion = FALSE,
                 useSocket = TRUE
)

Sun-lab/CARseq documentation built on Oct. 7, 2021, 1:52 p.m.