library(MAGMA.Celltyping) 
library(dplyr)

Intro

MAGMA.Celltyping is a software package that facilitates conducting cell-type-specific enrichment tests on GWAS summary statistics.

The following vignette shows the step-by-step version of the MAGMA.Celltyping pipeline. For the newer streamlined version of the pipeline (that wraps many of these functions), please see the Getting Started vignette.

Prepare data

Import GWAS

gwas_sumstats_path <- MAGMA.Celltyping::get_example_gwas(
  trait = "fluid_intelligence",
  munged = FALSE)

Munge GWAS

Our lab have created a robust Bioconductor package for formatting multiple types of summary statistics files: MungeSumstats (please cite the associated publication if you use this package):

Murphy, Schilder, & Skene, MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics, Bioinformatics, Volume 37, Issue 23, 1 December 2021, Pages 4593–4596, https://doi.org/10.1093/bioinformatics/btab665

Genome builds:

path_formatted <- MungeSumstats::format_sumstats(path=gwas_sumstats_path,
                                                 save_path = tempfile(fileext = ".formatted.tsv.gz"),
                                                 ref_genome ="GRCh37")

For the sake of this example, we also provide a pre-munged version of the above file. If you use this, you can skip the previous two steps.

path_formatted_intelligence <- MAGMA.Celltyping::get_example_gwas(
    trait = "fluid_intelligence", 
    munged = TRUE)

Map SNPs to Genes

Next, we must convert the GWAS summary statistics file to a gene-level signature so that we can compare it to a gene-level transcriptomic cell-type reference.

Note you can input the genome build of your summary statistics for this step or it can be inferred if left NULL:

genesOutPath_intelligence <- MAGMA.Celltyping::map_snps_to_genes(
    path_formatted = gwas_sumstats_path,
    genome_build = "GRCh37")

CellTypeDataset

Import CTD

The second step in MAGMA.Celltyping is to prepare your Cell Type Dataset (CTD) reference, from which we will get gene signatures for each cell-type in the reference.

The ewceData package comes with a CTD which we use as an example.

If you want to import your own single cell RNA-seq dataset, then this needs converting into CTD format; please see the EWCE vingette for explanation of how to do this.

ctd <- ewceData::ctd()

Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:

ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")

You can find a list of other datasets provided by this package using ?MAGMA.Celltyping::get_ctd:

ctd <- get_ctd("ctd_Tasic")
ctd <- get_ctd("ctd_DivSeq")
ctd <- get_ctd("ctd_AIBS")
ctd <- get_ctd("ctd_DRONC_human")
ctd <- get_ctd("ctd_DRONC_mouse")
ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly")
ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly")
ctd <- get_ctd("ctd_Saunders")
ctd <- get_ctd("ctd_Zeisel2018")

Prepare quantiles

Preparing quantiles in the CTD is now something that is automatically handled by by MAGMA.Celltyping so the user doesn't need to do this themselves as a pre-step anymore.

However, we still export the function prepare_quantile_groups in case users want to have more control.

prepare_quantile_groups calculates the quantile groups for each celltype within the single-cell dataset. If your single-cell dataset is not from mouse, then change the ctd_species argument.

ctd <- MAGMA.Celltyping::prepare_quantile_groups(ctd = ctd,
                                                 ctd_species="mouse")
# Examine how the quantile groups look
print(ctd[[1]]$specificity_quantiles[c("Gfap","Dlg4","Aif1"),])
print(table(ctd[[1]]$specificity_quantiles[,1]))

Run the main cell type association analysis

The analyses can be run in either linear or top10% enrichment modes. Each have their own advantages/disadvantages.

Let's start with linear:

ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  ctd_species = "mouse")

FigsLinear <- MAGMA.Celltyping::plot_celltype_associations(
  ctAssocs = ctAssocsLinear,
  ctd = ctd)

Now let's add the top 10% mode

ctAssocsTop <- MAGMA.Celltyping::calculate_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  EnrichmentMode = "Top 10%")

FigsTopDecile <- MAGMA.Celltyping::plot_celltype_associations(
  ctAssocs = ctAssocsTop,
  ctd = ctd)

Then plot linear together with the top 10% mode.

ctAssocMerged <- MAGMA.Celltyping::merge_magma_results(
  ctAssoc1 = ctAssocsLinear,
  ctAssoc2 = ctAssocsTop)

FigsMerged <- MAGMA.Celltyping::plot_celltype_associations(
  ctAssocs = ctAssocMerged,
  ctd = ctd)

Conditional analysis (linear mode)

Sometimes a single cell-type can dominate the signature in a given GWAS (e.g. pyramidal cells in Schizophrenia). This makes it difficult to disentangle whether other cell-types are truly associated with the GWAS trait, or whether the are simply enriched because they have an overlapping signature with the top cell-type. Conditional mode allows you to control for one (or several) cell-type(s) and see whether other cell-types still remain significantly enriched.

By default, it is assumed that you want to run the linear enrichment analysis. There are two modes for conditional analyses, you can either control for the top N cell types from the baseline analysis (in which case, set controlTopNcells) or control for specific specified cell types (in which case, set controlledCTs).

ctCondAssocs <- MAGMA.Celltyping::calculate_conditional_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  analysis_name = "Conditional",
  controlTopNcells= 2)

if(!is.null(ctCondAssocs)){ 
    MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocs,
                                                 ctd = ctd)
}

Let's try as an alternative to control for expression of 3 neuronal subtypes at the same time:

ctCondAssocs <- MAGMA.Celltyping::calculate_conditional_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  analysis_name = "Conditional",
  controlledCTs = c("pyramidal CA1","pyramidal SS","interneurons"),
  controlledAnnotLevel = 1)

MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocs,
                                             ctd=ctd)

Note that Periventricular Microglia (PVM) go from totally non-significant to significant once the neurons are controlled for. Test if this change is significant as follows:

magma1 = ctCondAssocs[[2]]$results[ctCondAssocs[[2]]$results$CONTROL=="BASELINE",]
magma2 = ctCondAssocs[[2]]$results[ctCondAssocs[[2]]$results$CONTROL=="interneurons,pyramidal_CA1,pyramidal_SS",]

resCompared <- MAGMA.Celltyping:::compare_trait_enrichments(magma1=magma1,
                                                            magma2=magma2,
                                                            annotLevel=2,
                                                            ctd=ctd)
resCompared[1:3,]

Using this approach we can see that the increased enrichment is microglia in the controlled analysis is almost significantly increased relative to the baseline analysis.

Conditional analyses (top 10% mode)

Conditional analyses can also be performed with top 10% mode (although the conditioning is done in linear mode).

ctCondAssocsTopTen = MAGMA.Celltyping::calculate_conditional_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  analysis_name = "Conditional",
  controlledCTs=c("pyramidal CA1","pyramidal SS","interneurons"),
  controlledAnnotLevel=1,
  EnrichmentMode = "Top 10%")

MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocsTopTen,
                                             ctd = ctd)

Controlling for a second GWAS

We now want to test enrichments that remain in a GWAS after we control for a second GWAS. So let's download a second GWAS sumstats file and prepare it for analysis.

20016.assoc.tsv is the sumstats file for 'Fluid Intelligence Score' from the UK Biobank.

So let's subtract genes associated with prospective memory from those involved in fluid intelligence.

Download and prepare the 'Prospective memory' GWAS summary statistics

#### Download pre-munged GWAS summary stats ####
gwas_sumstats_path_memory <- MAGMA.Celltyping::get_example_gwas(
    trait = "prospective_memory",
    munged = FALSE)
#### Format summary stats ####
## and infer the genome build from the data whilst doing so (see output messages)
path_formatted_memory <- MungeSumstats::format_sumstats(
    path = gwas_sumstats_path_memory)

#### Can also skip ahead by importing pre-munged summary stats ####
# path_formatted_memory <- MAGMA.Celltyping::get_example_gwas(
#     trait = "prospective_memory",
#     munged = TRUE)

#### Map SNPs to genes ####
genesOutPath_memory <- MAGMA.Celltyping::map_snps_to_genes(
    path_formatted = path_formatted_memory,
    genome_build = "GRCH37")

Check which cell types this GWAS is associated with at baseline

ctAssocsLinear_memory <- MAGMA.Celltyping::calculate_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_memory, 
  ctd_species = "mouse")

ctAssocsLinear_intelligence <- MAGMA.Celltyping::calculate_celltype_associations(
  ctd = ctd,
  gwas_sumstats_path = path_formatted_intelligence, 
  ctd_species = "mouse")

MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctAssocsLinear_memory,
                                             ctd = ctd)

Compare enrichments in the two GWAS using a tile plot

ctAssocMerged_MemInt <- MAGMA.Celltyping::merge_magma_results(
  ctAssoc1 = ctAssocsLinear_memory,
  ctAssoc2 = ctAssocsLinear_intelligence)

FigsMerged_MemInt <- lapply(seq_len(length(ctd)), 
                            function(lvl){
    MAGMA.Celltyping::magma_tileplot(ctd=ctd,
                          results=ctAssocMerged_MemInt[[lvl]]$results,
                          annotLevel=lvl,
                          fileTag=paste0("Merged_MemInt_lvl",lvl))
}) 

Check which cell types 'Fluid Intelligence' is associated with after controlling for 'Prospective memory'

# Set paths for GWAS sum stats + .genes.out file (with the z-scores)  
ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations(
  ctd = ctd,
  ctd_species = "mouse", 
  gwas_sumstats_path = path_formatted_intelligence, 
  genesOutCOND = genesOutPath_memory,
  analysis_name = "ControllingForPropMemory")

FigsLinear <- MAGMA.Celltyping::plot_celltype_associations(
    ctAssocs = ctAssocsLinear,
    ctd = ctd,
    fileTag = "ControllingForPropMemory")

We find that after controlling for prospective memory, there is no significant enrichment left associated with fluid intelligence.

Calculate cell type enrichments directly (using linear model)

Instead of relying on the tool MAGMA (which requires an extra installation step), we can also compute cell-type enrichment scores in an R-native re-implementation of the MAGMA enrichment analyses using the following functions:

ctd <- ewceData::ctd()
magma_GenesOut_file <- MAGMA.Celltyping::import_magma_files(
    file_types = "genes.out",
    return_dir = FALSE)

magmaAdjZ <- MAGMA.Celltyping:: adjust_zstat_in_genesOut(
  ctd = ctd,
  ctd_species = "mouse",
  magma_GenesOut_file = magma_GenesOut_file)

output <- MAGMA.Celltyping:: calculate_celltype_enrichment_limma(
  magmaAdjZ = magmaAdjZ,
  ctd = ctd,
  thresh = 0.0001,
  ctd_species = "mouse",
  annotLevel = 2)

We can then get the probability of the celltype being enriched as follows

print(sort(output))

The results should closely resemble those obtained using MAGMA

Gene set enrichments

To test whether a gene set is enriched using MAGMA the following commands can be used:

data("rbfox_binding")

geneset_res <- MAGMA.Celltyping::calculate_geneset_enrichment(
  geneset = rbfox_binding,
  gwas_sumstats_path = path_formatted_intelligence,
  analysis_name = "Rbfox_20016",
  geneset_species = "mouse")
print(geneset_res)

We can then test whether the gene set is still enriched after controlling for celltype enrichment:

ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")

analysis_name <- "Rbfox_16_pyrSS"
controlledCTs <- c("pyramidal SS")

cond_geneset_res_pyrSS = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
  geneset = geneset,
  ctd = ctd,
  controlledAnnotLevel = 1,
  controlledCTs = controlledCTs,
  gwas_sumstats_path = gwas_sumstats_path,
  analysis_name = analysis_name, 
  ctd_species = "mouse")

controlledCTs <- c("pyramidal CA1")
cond_geneset_res_pyrCA1 <- MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
  geneset = geneset,
  ctd = ctd,
  controlledAnnotLevel = 1,
  controlledCTs = controlledCTs,
  gwas_sumstats_path = gwas_sumstats_path,
  analysis_name = analysis_name, 
  ctd_species = "mouse")

controlledCTs <- c("pyramidal CA1","pyramidal SS")
cond_geneset_res_pyr = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
  geneset = geneset,
  ctd = ctd,
  controlledAnnotLevel = 1,
  controlledCTs = controlledCTs,
  gwas_sumstats_path = gwas_sumstats_path,
  analysis_name = analysis_name, 
  ctd_species = "mouse")

controlledCTs <- c("Medium Spiny Neuron")
cond_geneset_res_MSN = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
  geneset = geneset,
  ctd = ctd,
  controlledAnnotLevel = 1,
  controlledCTs = controlledCTs,
  gwas_sumstats_path = gwas_sumstats_path,
  analysis_name = analysis_name, 
  ctd_species = "mouse")

controlledCTs <- c("Medium Spiny Neuron","pyramidal CA1","pyramidal SS","interneurons")
cond_geneset_res = MAGMA.Celltyping::calculate_conditional_geneset_enrichment(
  geneset = geneset,
  ctd = ctd,
  controlledAnnotLevel = 1,
  controlledCTs = controlledCTs,
  gwas_sumstats_path = gwas_sumstats_path,
  analysis_name = analysis_name, 
  ctd_species = "mouse")

Session Info

utils::sessionInfo()



NathanSkene/MAGMA_Celltyping documentation built on Oct. 11, 2024, 7:19 a.m.