zlm: Zero-inflated regression for SingleCellAssay

Description Usage Arguments Value Empirical Bayes variance regularization See Also Examples

View source: R/zeroinf.R

Description

For each gene in sca, fits the hurdle model in formula (linear for et>0), logistic for et==0 vs et>0. Return an object of class ZlmFit containing slots giving the coefficients, variance-covariance matrices, etc. After each gene, optionally run the function on the fit named by 'hook'

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
zlm(
  formula,
  sca,
  method = "bayesglm",
  silent = TRUE,
  ebayes = TRUE,
  ebayesControl = NULL,
  force = FALSE,
  hook = NULL,
  parallel = TRUE,
  LMlike,
  onlyCoef = FALSE,
  exprs_values = assay_idx(sca)$aidx,
  ...
)

Arguments

formula

a formula with the measurement variable on the LHS and predictors present in colData on the RHS

sca

SingleCellAssay object

method

character vector, either 'glm', 'glmer' or 'bayesglm'

silent

Silence common problems with fitting some genes

ebayes

if TRUE, regularize variance using empirical bayes method

ebayesControl

list with parameters for empirical bayes procedure. See ebayes.

force

Should we continue testing genes even after many errors have occurred?

hook

a function called on the fit after each gene.

parallel

If TRUE and option(mc.cores)>1 then multiple cores will be used in fitting.

LMlike

if provided, then the model defined in this object will be used, rather than following the formulas. This is intended for internal use.

onlyCoef

If TRUE then only an array of model coefficients will be returned (probably only useful for bootstrapping).

exprs_values

character or integer passed to 'assay' specifying which assay to use for testing

...

arguments passed to the S4 model object upon construction. For example, fitArgsC and fitArgsD, or coefPrior.

Value

a object of class ZlmFit with methods to extract coefficients, etc. OR, if data is a data.frame just a list of the discrete and continuous fits.

Empirical Bayes variance regularization

The empirical bayes regularization of the gene variance assumes that the precision (1/variance) is drawn from a gamma distribution with unknown parameters. These parameters are estimated by considering the distribution of sample variances over all genes. The procedure used for this is determined from ebayesControl, a named list with components 'method' (one of 'MOM' or 'MLE') and 'model' (one of 'H0' or 'H1') method MOM uses a method-of-moments estimator, while MLE using the marginal likelihood. H0 model estimates the precisions using the intercept alone in each gene, while H1 fits the full model specified by formula

See Also

ZlmFit-class, ebayes, GLMlike-class, BayesGLMlike-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data(vbetaFA)
zlmVbeta <- zlm(~ Stim.Condition, subset(vbetaFA, ncells==1)[1:10,])
slotNames(zlmVbeta)
#A matrix of coefficients
coef(zlmVbeta, 'D')['CCL2',]
#An array of covariance matrices
vcov(zlmVbeta, 'D')[,,'CCL2']
waldTest(zlmVbeta, CoefficientHypothesis('Stim.ConditionUnstim'))

## Can also provide just a \code{data.frame} instead
data<- data.frame(x=rnorm(500), z=rbinom(500, 1, .3))
logit.y <- with(data, x*2 + z*2); mu.y <- with(data, 10+10*x+10*z + rnorm(500))
y <- (runif(500)<exp(logit.y)/(1+exp(logit.y)))*1
y[y>0] <- mu.y[y>0]
data$y <- y
fit <- zlm(y ~ x+z, data)
summary.glm(fit$disc)
summary.glm(fit$cont)

Example output

Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DelayedArray
Loading required package: matrixStats

Attaching package: 'matrixStats'

The following objects are masked from 'package:Biobase':

    anyMissing, rowMedians


Attaching package: 'DelayedArray'

The following objects are masked from 'package:matrixStats':

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges

The following object is masked from 'package:base':

    apply


Attaching package: 'MAST'

The following object is masked from 'package:stats':

    filter

Warning messages:
1: no function found corresponding to methods exports from 'DelayedArray' for: 'acbind', 'arbind' 
2: no function found corresponding to methods exports from 'SummarizedExperiment' for: 'acbind', 'arbind' 

Done!
 [1] "coefC"              "coefD"              "vcovC"             
 [4] "vcovD"              "LMlike"             "sca"               
 [7] "deviance"           "loglik"             "df.null"           
[10] "df.resid"           "dispersion"         "dispersionNoshrink"
[13] "priorDOF"           "priorVar"           "converged"         
[16] "hookOut"           
         (Intercept) Stim.ConditionUnstim 
          -3.8329217           -0.5108005 
                     (Intercept) Stim.ConditionUnstim
(Intercept)            0.1439196           -0.1254838
Stim.ConditionUnstim  -0.1254838            0.9182853
, , metric = lambda

        test.type
primerid         cont        disc    hurdle
  B3GAT1 0.9617702324 0.038250068  1.000020
  BAX    7.2211565188 3.645901878 10.867058
  BCL2   0.3766814067 2.202291748  2.578973
  CCL2   0.8414775522 0.284135226  1.125613
  CCL3             NA 3.548463195        NA
  CCL4             NA 2.012308210        NA
  CCL5   0.1746468538 0.862093478  1.036740
  CCR2   5.3383734489 2.308408187  7.646782
  CCR4   2.0437612666 0.003737042  2.047498
  CCR5   0.0005534473 2.811952306  2.812506

, , metric = df

        test.type
primerid cont disc hurdle
  B3GAT1    1    1      2
  BAX       1    1      2
  BCL2      1    1      2
  CCL2      1    1      2
  CCL3      1    1      2
  CCL4      1    1      2
  CCL5      1    1      2
  CCR2      1    1      2
  CCR4      1    1      2
  CCR5      1    1      2

, , metric = Pr(>Chisq)

        test.type
primerid        cont       disc      hurdle
  B3GAT1 0.326741272 0.84494185 0.606524503
  BAX    0.007204927 0.05620735 0.004367654
  BCL2   0.539384664 0.13780573 0.275412150
  CCL2   0.358974532 0.59400356 0.569608276
  CCL3            NA 0.05960063          NA
  CCL4            NA 0.15602777          NA
  CCL5   0.676014596 0.35315351 0.595490308
  CCR2   0.020860929 0.12867576 0.021853574
  CCR4   0.152831343 0.95125460 0.359245545
  CCR5   0.981231129 0.09356445 0.245059834


Call:
NULL

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-35.152   -1.172    1.019    1.201   10.449  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.2448     0.1436  -1.705   0.0881 .  
x             2.1279     0.1902  11.190  < 2e-16 ***
z             2.6130     0.3515   7.434 1.05e-13 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 680.93  on 499  degrees of freedom
Residual deviance: 383.28  on 208  degrees of freedom
AIC: 389.28

Number of Fisher Scoring iterations: 6


Call:
NULL

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.7737  -0.6076   0.0398   0.6539   3.2698  

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.02431    0.09429  106.31   <2e-16 ***
x           10.02232    0.07557  132.63   <2e-16 ***
z            9.96644    0.12435   80.15   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.017034)

    Null deviance: 20909.09  on 288  degrees of freedom
Residual deviance:   290.87  on 286  degrees of freedom
AIC: 830.01

Number of Fisher Scoring iterations: 2

MAST documentation built on Nov. 8, 2020, 8:19 p.m.