nebula: Association analysis of a multi-subject single-cell data set...

Description Usage Arguments Value Examples

Description

Association analysis of a multi-subject single-cell data set using a fast negative binomial mixed model

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
nebula(
  count,
  id,
  pred = NULL,
  offset = NULL,
  min = c(1e-04, 1e-04),
  max = c(10, 1000),
  model = "NBGMM",
  method = "LN",
  cutoff_cell = 20,
  kappa = 800,
  opt = "lbfgs",
  verbose = TRUE,
  cpc = 0.005,
  covariance = FALSE
)

Arguments

count

A raw count matrix of the single-cell data. The rows are the genes, and the columns are the cells. The matrix can be a matrix object or a sparse dgCMatrix object.

id

A vector of subject IDs. The length should be the same as the number of columns of the count matrix.

pred

A design matrix of the predictors. The rows are the cells and the columns are the predictors. If not specified, an intercept column will be generated by default.

offset

A vector of the scaling factor. The values must be strictly positive. If not specified, a vector of all ones will be generated by default.

min

Minumum values for the overdispersions parameters σ^2 and φ. Must be positive. The default is c(1e-4,1e-4).

max

Maximum values for the overdispersions parameters σ^2 and φ. Must be positive. The default is c(10,1000).

model

'NBGMM', 'PMM' or 'NBLMM'. 'NBGMM' is for fitting a negative binomial gamma mixed model. 'PMM' is for fitting a Poisson gamma mixed model. 'NGLMM' is for fitting a negative binomial lognormal mixed model (the same model as that in the lme4 package). The default is 'NBGMM'.

method

'LN' or 'HL'. 'LN' is to use NEBULA-LN and 'HL' is to use NEBULA-HL. The default is 'LN'.

cutoff_cell

The data will be refit using NEBULA-HL to estimate both overdispersions if the product of the cells per subject and the estimated cell-level overdispersion paremeter φ is smaller than cutoff_cell. The default is 20.

kappa

Please see the vignettes for more details. The default is 800.

opt

'lbfgs' or 'trust'. Specifying the optimization algorithm used in NEBULA-LN. The default is 'lbfgs'. If it is 'trust', a trust region algorithm based on the Hessian matrix wil be used for optimization.

verbose

An optional logical scalar indicating whether to print additional messages. Default is FALSE.

cpc

A non-negative threshold for filtering low-expressed genes. Genes with counts per cell smaller than the specified value will not be analyzed.

covariance

If TRUE, nebula will output the covariance matrix for the estimated log(FC), which can be used for testing contrasts.

Value

summary: The estimated coefficient, standard erro and p-value for each predictor.

overdispersion: The estimated cell-level and subject-level overdispersions σ^2 and φ^{-1}.

convergence: More information about the convergence of the algorithm for each gene. A value of -20 or -30 indicates a potential failure of the convergence.

algorithm: The algorithm used for analyzing the gene. More information can be found in the vignettes.

Examples

1
2
3
4
library(nebula)
data(sample_data)
pred = model.matrix(~X1+X2+cc,data=sample_data$pred)
re = nebula(count=sample_data$count,id=sample_data$sid,pred=pred)

nebula documentation built on Sept. 24, 2021, 3 a.m.