scglmmr: Sample-level Single-cell Generalized Linear Multilevel Models in R

Single cell gene and module differential expression with nested group repeated measures experiment designs ================

Single cell gene module level testing.
Load and normalize single cell data
Define modules to fit
Format cell metadata
Quality control cell types to be fit
Calculate module scores
Fit modules
Examine results
a custom function for automatically extracting effects from a 2 group repeated measures treatment design

library(scglmmr)

Compare single cell level module score responses across groups. As in the pseudobulk vignette, fit models within each cell type adjusting for covariates using a random intercept term for each donor. Functions used in this workflow include:

WeightedCellModuleScore - score each module activity for each cell; any method can be used here. Below we discuss relative vs absolute scaling across cells vs within cell types for effect size interpretation.

FitLmer - fits a linear model using lme4 notation to the module scores. Returns a nested list of model results that can be queried to extract effects of interest.

FitLmerContrast - as above, but specifically designed for 2 group 2 timepoint repeated measures perturbation designs. Automatically calculates baseline differences, treatment effect across subjects and difference in treatment effects between groups using the emmeans package.

A plot as shown below is generated for each module x cell type.

suppressMessages(library(tidyverse))
suppressMessages(library(Seurat))
suppressMessages(library(here))
suppressMessages(library(scglmmr))

# demonstrate using Seurat object
s = readRDS(here("singlecellobject.rds"))

# create a vector of celltypes 
ct = unique(s@meta.data$celltype)

# normalize with log normalization
s = NormalizeData(s,normalization.method = 'LogNormalize',assay = 'RNA')

These functions can be useful to further interrogate the pathways identified in the pseudobulk pipeline or for unbiased analysis. Here for demonstration purposes, a subset of the hallmark pathways are used.

sigs_test = hallmark[c(22, 23, 24,25,26,27)]

# format cell metadata as factors group_id is order leveled for: 
md = s@meta.data %>% 
  filter(timepoint %in% c(0,1)) %>% 
  mutate(group_id = paste(group, timepoint, sep = '_')) %>% 
  mutate(group_id = factor(group_id,  levels = c('0_0', '0_1', '1_0', '1_1'))) %>%
  select(celltype, subjectid, sex , age, timepoint, group_id) %>% 
  mutate(age = as.numeric(age)) %>% 
  droplevels()

If a cell type is very donor specific such that there are only cells for a couple of subjects we cant interpret the group level effects so we remove these cell types.

# qc data to remove celltypes with no cells for some subhects at both timepoints 
# keeps MLE more stable for the estimates of random intercept 
ct.si = apply(table(md$celltype, md$subjectid) , 1, min) 
c.keep = names(ct.si[ct.si > 7])
md = md[md$celltype %in% c.keep, ]

This can be done in multiple ways. The function below calculates the simple average. Note that for interpretability of the efects within each cluster, the score is being applied and scaled within each cell type.


# add single cell weighted module scores
# split to standardize within cell type 
ct.md = split(md, f = md$celltype)

# get umi data 
umi = s@assays$RNA@data[ ,rownames(md)]

# fit module scores 
mod_scores = lapply(ct.md, function(x){ 
  scglmmr::WeightedCellModuleScore(gene_matrix = umi[ ,rownames(x)], 
                                   module_list = sigs_test, 
                                   threshold = 0,
                                   # standardize within protein celltype
                                   cellwise_scaling = TRUE, 
                                   return_weighted = FALSE )
  })
ms = bind_rows(mod_scores)

# correctly order rows after the split. 
ms = ms[match(x = rownames(md), table = rownames(ms)), ]
stopifnot(all.equal(rownames(ms), rownames(md)))

Specify a model using lme4 notation and fit models. Note that the LHS outcome “modulescore” should not be changed as this is adaptively applied to each module internally.



# specify model 
f1 = 'modulescore ~ 0 + timepoint + age + sex + (1|subjectid)'

# specify a path to save plots 
plot_savepath = here('/mypath/')


# fit sc mod mixed model on ewighted module scores. 
mmres = FitLmer(module_data_frame = ms,
                celltype_column = 'celltype',
                metadata = md, 
                lme4.formula = f1)

Results are formatted as a list indexed by cell type. If we want to find the effects of time on expression we can use the following code to extract out this effect as shown below using the emmeans package. This allows maximum flexibility since any effect can be extracted and custom contrast matrices can be utilized – see emmeans documentation.


# write wrapper to extract out time effect.
extract.time.effect = function(fit.list.celltype){ 
  c_res = list()
  for(u in length(fit.list.celltype)){ 
    emm1 = tryCatch(
        emmeans::emmeans(object = m1, specs = ~ timepoint , data = res, lmer.df = "asymptotic"),
        error = function(e) return(NA)
        )
    c_res[[u]] = emm1
  }
}

# now we can apply this to each cell type. 
time.result = lapply(mmres , extract.time.effect)

This function was written for a 2 group design (as shown in the main readme) where subjects each have a pre and post perturbation measurement. WE wantto compare the pertuebation effects across subjects, and between groups and test the baseline differences. All of these effects are extracted automatically into a simple output, and plots are created as above. For this function, on the RHS, + (1\|subjectid) should also not be changed. One can also specify fixed effects covariates as a vector.


f1 = 'modulescore ~ 0 + groupid + age + sex + (1|subjectid)'

# fit sc mod mixed model on ewighted module scores. 
mm_res.m1 = FitLmerContrast(module_data_frame = ms, 
                            celltype_column = 'celltype', 
                            metadata = md, 
                            lmer_formula = f1, 
                            plotdatqc = TRUE, 
                            fixed_effects = NULL,
                            figpath = plot_savepath)
# saveRDS(mm_res.m1,file = paste0(datapath, "mm_res.m1.rds"))

Note that the effect size of specific module scores may be compared using the code below to determine which celltype has the greatest absolute effect size (epression) of a given module. Both the between and within cell type expression group level differences are likely of interest for interpretation.

ms.full = scglmmr::WeightedCellModuleScore(
  gene_matrix = umi, 
  module_list = sigs_test, 
  threshold = 0,
  # standardize within protein celltype
  cellwise_scaling = TRUE, 
  return_weighted = FALSE 
  )

d = cbind(ms.full, md)

# absolute effect size for hallmark 
f1 = as.formula(HALLMARK_HYPOXIA ~ 0 + celltype + age + sex + (1|subjecctID))
m1 = lme4::lmer(f1, data = bl)
emm1 = emmeans::emmeans(m1, specs = ~celltype)
plot(emm1)

MattPM/scglmmr documentation built on April 26, 2024, 12:55 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MattPM/scglmmr
Sample-level Single-cell Generalized Linear Multilevel Models in R

vignettes/scglmmr_singlecell.md
In MattPM/scglmmr: Sample-level Single-cell Generalized Linear Multilevel Models in R

Single cell gene module level testing.

Load and normalize single cell data

Define modules to fit

Format cell metadata

Quality control cell types to be fit

Calculate module scores

Fit modules

Examine results

a custom function for automatically extracting effects from a 2 group repeated measures treatment design

R Package Documentation

Browse R Packages

We want your feedback!

MattPM/scglmmr Sample-level Single-cell Generalized Linear Multilevel Models in R

vignettes/scglmmr_singlecell.md In MattPM/scglmmr: Sample-level Single-cell Generalized Linear Multilevel Models in R

Single cell gene module level testing.

Load and normalize single cell data

Define modules to fit

Format cell metadata

Quality control cell types to be fit

Calculate module scores

Fit modules

Examine results

a custom function for automatically extracting effects from a 2 group repeated measures treatment design

R Package Documentation

Browse R Packages

We want your feedback!

MattPM/scglmmr
Sample-level Single-cell Generalized Linear Multilevel Models in R

vignettes/scglmmr_singlecell.md
In MattPM/scglmmr: Sample-level Single-cell Generalized Linear Multilevel Models in R