knitr::opts_chunk$set( echo = TRUE, warning = FALSE, message = FALSE, error = FALSE, tidy = FALSE, dev = c("png"), cache = TRUE )
dreamlet()
evaluates precision-weighted linear (mixed) models on each gene that passes standard filters. The linear mixed model used by dream()
can be a little fragile for small sample sizes and correlated covariates. dreamlet()
runs variancePartition::dream()
in the backend for each cell cluster. dream()
reports model failures for each cell cluster and dreamlet()
reports these failures to the user. dreamlet()
returns all successful model fits to be used for downstream analysis.
See details from variancePartition
error page.
Due to a recent bug in the dependency Matrix
package, all random effects models may fail for technical reasons. If your random effects analysis is failing for all genes in cases with no good explanation, this bug may be responsible. This case can be detected and resolved as follows:
library(lme4) # Fit simple mixed model lmer(Reaction ~ (1 | Subject), sleepstudy) # Error in initializePtr() : # function 'chm_factor_ldetL2' not provided by package 'Matrix'
This error indicates incompatible installs of Matrix
and lme4
. This can be solved with
r
install.packages("lme4", type = "source")
followed by restarting R.
The most common issue is when dreamlet()
analysis succeeds for most genes, but a handful of genes fail in each cell cluster. These genes can fail if the iterative process of fitting the linear mixed model does not converge, or if the estimated covariance matrix that is supposed be positive definite has an eigen-value that is negative or too close to zero due to rounding errors in floating point arithmetic.
In these cases, dreamlet()
stores a summary of these failures for all cell clusters that is accessible with details()
. Specific failure messages for each cell cluster and gene can be extracted using seeErrors()
Here we demonstrate how dreamlet()
handles model failures:
library(dreamlet) library(muscat) library(SingleCellExperiment) data(example_sce) # create pseudobulk for each sample and cell cluster pb <- aggregateToPseudoBulk(example_sce, assay = "counts", cluster_id = "cluster_id", sample_id = "sample_id", verbose = FALSE ) # voom-style normalization for each cell cluster res.proc <- processAssays( pb[1:300, ], ~group_id ) # Redundant formula # This example is an extreme example of redundancy # but more subtle cases often show up in real data form <- ~ group_id + (1 | group_id) # fit dreamlet model res.dl <- dreamlet(res.proc, form) ## B cells...7.9 secs ## CD14+ Monocytes...10 secs ## CD4 T cells...9 secs ## CD8 T cells...4.4 secs ## FCGR3A+ Monocytes...11 secs ## ## Of 1,062 models fit across all assays, 96.2% failed # summary of models res.dl ## class: dreamletResult ## assays(5): B cells CD14+ Monocytes CD4 T cells CD8 T cells FCGR3A+ Monocytes ## Genes: ## min: 3 ## max: 11 ## details(7): assay n_retain ... n_errors error_initial ## coefNames(2): (Intercept) group_idstim ## ## Of 1,062 models fit across all assays, 96.2% failed # summary of models for each cell cluster details(res.dl) ## assay n_retain formula formDropsTerms n_genes n_errors error_initial ## 1 B cells 4 ~group_id + (1 | group_id) FALSE 201 190 FALSE ## 2 CD14+ Monocytes 4 ~group_id + (1 | group_id) FALSE 269 263 FALSE ## 3 CD4 T cells 4 ~group_id + (1 | group_id) FALSE 216 207 FALSE ## 4 CD8 T cells 4 ~group_id + (1 | group_id) FALSE 118 115 FALSE ## 5 FCGR3A+ Monocytes 4 ~group_id + (1 | group_id) FALSE 258 247 FALSE
assay
: cell type n_retain
: number of samples retainedformula
: regression formula used after variable filtering formDropsTerms
: whether a variable was dropped from the formula for having zero variance following filteringn_genes
: number of genes analyzedn_errors
: number of genes with errorserror_initial
: indicator for assay-level errorBefore the full dataset is analyzed, dreamlet()
runs a test for each assay to see if the model succeeds. If the model fails, its does not continue analysis for that assay. These assay-level errors are reported above in the error_initial
column, and details are returned here.
# Extract errors as a tibble res.err = seeErrors(res.dl) ## Assay-level errors: 0 ## Gene-level errors: 1038 # No errors at the assay level res.err$assayLevel # the most common error is: "Some predictor variables are on very different scales: consider rescaling"
This indicates that the scale of the predictor variables are very different and can affect the numerical stability of the iterative algorithm. This can be solved by running scale()
on each variable in the formula:
form = ~ scale(x) + scale(y) + ...
A model can fail for a single gene if covariates are too correlated, or for other numerical issues. Failed models are reported here and are not included in downstream analysis.
# See gene-level errors for each assay res.err$geneLevel[1:2,] ## # A tibble: 2 × 3 ## assay feature errorText ## <chr> <chr> <chr> ## B cells ISG15 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):… ## B cells AURKAIP1 "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol):… # See full error message text res.err$geneLevel$errorText[1] "Error in lmerTest:::as_lmerModLT(model, devfun, tol = tol): (converted from warning) Model may not have converged with 1 eigenvalue close to zero: 1.4e-09\n"
This message indicates that the model was numerically unstable likely because the variables are closely correlated.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.