library(MAGMA.Celltyping) library(dplyr)
MAGMA.Celltyping
is a software package that facilitates conducting cell-type-specific enrichment tests on GWAS summary statistics.
The following vignette shows the step-by-step version of the MAGMA.Celltyping
pipeline. For the newer streamlined version of the pipeline (that wraps many of these functions),
please see the Getting Started vignette.
get_example_gwas()
.gwas_sumstats_path <- MAGMA.Celltyping::get_example_gwas( trait = "fluid_intelligence", munged = FALSE)
Our lab have created a robust Bioconductor package for formatting multiple types
of summary statistics files: MungeSumstats
(please cite the associated publication if you use this package):
MungeSumstats
here to reformat the summary statistics for Fluid Intelligence.
The minimum info needed after munging is: Genome builds:
MungeSumstats
will infer this information by default (simply set ref_genome=NULL
).path_formatted <- MungeSumstats::format_sumstats(path=gwas_sumstats_path, save_path = tempfile(fileext = ".formatted.tsv.gz"), ref_genome ="GRCh37")
For the sake of this example, we also provide a pre-munged version of the above file. If you use this, you can skip the previous two steps.
path_formatted_intelligence <- MAGMA.Celltyping::get_example_gwas( trait = "fluid_intelligence", munged = TRUE)
Next, we must convert the GWAS summary statistics file to a gene-level signature so that we can compare it to a gene-level transcriptomic cell-type reference.
Note you can input the genome build of your summary statistics for this step or
it can be inferred if left NULL
:
genesOutPath_intelligence <- MAGMA.Celltyping::map_snps_to_genes( path_formatted = gwas_sumstats_path, genome_build = "GRCh37")
The second step in MAGMA.Celltyping
is to prepare your Cell Type Dataset (CTD) reference, from which we will get gene signatures for each cell-type in the reference.
The ewceData
package comes with a CTD which we use as an example.
If you want to import your own single cell RNA-seq dataset, then this needs converting into CTD format; please see the EWCE
vingette for explanation of how to do this.
ctd <- ewceData::ctd()
Note that the cell type dataset loaded in the code above is the Karolinksa cortex/hippocampus data only. For the full Karolinska dataset with hypothalamus and midbrain instead use the following:
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI")
You can find a list of other datasets provided by this package using ?MAGMA.Celltyping::get_ctd
:
ctd <- get_ctd("ctd_Tasic") ctd <- get_ctd("ctd_DivSeq") ctd <- get_ctd("ctd_AIBS") ctd <- get_ctd("ctd_DRONC_human") ctd <- get_ctd("ctd_DRONC_mouse") ctd <- get_ctd("ctd_BlueLake2018_FrontalCortexOnly") ctd <- get_ctd("ctd_BlueLake2018_VisualCortexOnly") ctd <- get_ctd("ctd_Saunders") ctd <- get_ctd("ctd_Zeisel2018")
Preparing quantiles in the CTD is now something that is automatically handled by by MAGMA.Celltyping
so the user doesn't need to do this themselves as a pre-step anymore.
However, we still export the function prepare_quantile_groups
in case users want to have more control.
prepare_quantile_groups
calculates the quantile groups for each celltype within the single-cell dataset. If your single-cell dataset is not from mouse, then change the ctd_species
argument.
ctd <- MAGMA.Celltyping::prepare_quantile_groups(ctd = ctd, ctd_species="mouse") # Examine how the quantile groups look print(ctd[[1]]$specificity_quantiles[c("Gfap","Dlg4","Aif1"),]) print(table(ctd[[1]]$specificity_quantiles[,1]))
The analyses can be run in either linear or top10% enrichment modes. Each have their own advantages/disadvantages.
Let's start with linear:
ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_intelligence, ctd_species = "mouse") FigsLinear <- MAGMA.Celltyping::plot_celltype_associations( ctAssocs = ctAssocsLinear, ctd = ctd)
Now let's add the top 10% mode
ctAssocsTop <- MAGMA.Celltyping::calculate_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_intelligence, EnrichmentMode = "Top 10%") FigsTopDecile <- MAGMA.Celltyping::plot_celltype_associations( ctAssocs = ctAssocsTop, ctd = ctd)
Then plot linear together with the top 10% mode.
ctAssocMerged <- MAGMA.Celltyping::merge_magma_results( ctAssoc1 = ctAssocsLinear, ctAssoc2 = ctAssocsTop) FigsMerged <- MAGMA.Celltyping::plot_celltype_associations( ctAssocs = ctAssocMerged, ctd = ctd)
Sometimes a single cell-type can dominate the signature in a given GWAS (e.g. pyramidal cells in Schizophrenia). This makes it difficult to disentangle whether other cell-types are truly associated with the GWAS trait, or whether the are simply enriched because they have an overlapping signature with the top cell-type. Conditional mode allows you to control for one (or several) cell-type(s) and see whether other cell-types still remain significantly enriched.
By default, it is assumed that you want to run the linear enrichment analysis. There are two modes for conditional analyses, you can either control for the top N cell types from the baseline analysis (in which case, set controlTopNcells
) or control for specific specified cell types (in which case, set controlledCTs
).
ctCondAssocs <- MAGMA.Celltyping::calculate_conditional_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_intelligence, analysis_name = "Conditional", controlTopNcells= 2) if(!is.null(ctCondAssocs)){ MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocs, ctd = ctd) }
Let's try as an alternative to control for expression of 3 neuronal subtypes at the same time:
ctCondAssocs <- MAGMA.Celltyping::calculate_conditional_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_intelligence, analysis_name = "Conditional", controlledCTs = c("pyramidal CA1","pyramidal SS","interneurons"), controlledAnnotLevel = 1) MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocs, ctd=ctd)
Note that Periventricular Microglia (PVM) go from totally non-significant to significant once the neurons are controlled for. Test if this change is significant as follows:
magma1 = ctCondAssocs[[2]]$results[ctCondAssocs[[2]]$results$CONTROL=="BASELINE",] magma2 = ctCondAssocs[[2]]$results[ctCondAssocs[[2]]$results$CONTROL=="interneurons,pyramidal_CA1,pyramidal_SS",] resCompared <- MAGMA.Celltyping:::compare_trait_enrichments(magma1=magma1, magma2=magma2, annotLevel=2, ctd=ctd) resCompared[1:3,]
Using this approach we can see that the increased enrichment is microglia in the controlled analysis is almost significantly increased relative to the baseline analysis.
Conditional analyses can also be performed with top 10% mode (although the conditioning is done in linear mode).
ctCondAssocsTopTen = MAGMA.Celltyping::calculate_conditional_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_intelligence, analysis_name = "Conditional", controlledCTs=c("pyramidal CA1","pyramidal SS","interneurons"), controlledAnnotLevel=1, EnrichmentMode = "Top 10%") MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctCondAssocsTopTen, ctd = ctd)
We now want to test enrichments that remain in a GWAS after we control for a second GWAS. So let's download a second GWAS sumstats file and prepare it for analysis.
20016.assoc.tsv is the sumstats file for 'Fluid Intelligence Score' from the UK Biobank.
So let's subtract genes associated with prospective memory from those involved in fluid intelligence.
#### Download pre-munged GWAS summary stats #### gwas_sumstats_path_memory <- MAGMA.Celltyping::get_example_gwas( trait = "prospective_memory", munged = FALSE) #### Format summary stats #### ## and infer the genome build from the data whilst doing so (see output messages) path_formatted_memory <- MungeSumstats::format_sumstats( path = gwas_sumstats_path_memory) #### Can also skip ahead by importing pre-munged summary stats #### # path_formatted_memory <- MAGMA.Celltyping::get_example_gwas( # trait = "prospective_memory", # munged = TRUE) #### Map SNPs to genes #### genesOutPath_memory <- MAGMA.Celltyping::map_snps_to_genes( path_formatted = path_formatted_memory, genome_build = "GRCH37")
ctAssocsLinear_memory <- MAGMA.Celltyping::calculate_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_memory, ctd_species = "mouse") ctAssocsLinear_intelligence <- MAGMA.Celltyping::calculate_celltype_associations( ctd = ctd, gwas_sumstats_path = path_formatted_intelligence, ctd_species = "mouse") MAGMA.Celltyping::plot_celltype_associations(ctAssocs = ctAssocsLinear_memory, ctd = ctd)
ctAssocMerged_MemInt <- MAGMA.Celltyping::merge_magma_results( ctAssoc1 = ctAssocsLinear_memory, ctAssoc2 = ctAssocsLinear_intelligence) FigsMerged_MemInt <- lapply(seq_len(length(ctd)), function(lvl){ MAGMA.Celltyping::magma_tileplot(ctd=ctd, results=ctAssocMerged_MemInt[[lvl]]$results, annotLevel=lvl, fileTag=paste0("Merged_MemInt_lvl",lvl)) })
# Set paths for GWAS sum stats + .genes.out file (with the z-scores) ctAssocsLinear <- MAGMA.Celltyping::calculate_celltype_associations( ctd = ctd, ctd_species = "mouse", gwas_sumstats_path = path_formatted_intelligence, genesOutCOND = genesOutPath_memory, analysis_name = "ControllingForPropMemory") FigsLinear <- MAGMA.Celltyping::plot_celltype_associations( ctAssocs = ctAssocsLinear, ctd = ctd, fileTag = "ControllingForPropMemory")
We find that after controlling for prospective memory, there is no significant enrichment left associated with fluid intelligence.
Instead of relying on the tool MAGMA (which requires an extra installation step), we can also compute cell-type enrichment scores in an R-native re-implementation of the MAGMA enrichment analyses using the following functions:
ctd <- ewceData::ctd() magma_GenesOut_file <- MAGMA.Celltyping::import_magma_files( file_types = "genes.out", return_dir = FALSE) magmaAdjZ <- MAGMA.Celltyping:: adjust_zstat_in_genesOut( ctd = ctd, ctd_species = "mouse", magma_GenesOut_file = magma_GenesOut_file) output <- MAGMA.Celltyping:: calculate_celltype_enrichment_limma( magmaAdjZ = magmaAdjZ, ctd = ctd, thresh = 0.0001, ctd_species = "mouse", annotLevel = 2)
We can then get the probability of the celltype being enriched as follows
print(sort(output))
The results should closely resemble those obtained using MAGMA
To test whether a gene set is enriched using MAGMA the following commands can be used:
data("rbfox_binding") geneset_res <- MAGMA.Celltyping::calculate_geneset_enrichment( geneset = rbfox_binding, gwas_sumstats_path = path_formatted_intelligence, analysis_name = "Rbfox_20016", geneset_species = "mouse") print(geneset_res)
We can then test whether the gene set is still enriched after controlling for celltype enrichment:
ctd <- MAGMA.Celltyping::get_ctd("ctd_allKI") analysis_name <- "Rbfox_16_pyrSS" controlledCTs <- c("pyramidal SS") cond_geneset_res_pyrSS = MAGMA.Celltyping::calculate_conditional_geneset_enrichment( geneset = geneset, ctd = ctd, controlledAnnotLevel = 1, controlledCTs = controlledCTs, gwas_sumstats_path = gwas_sumstats_path, analysis_name = analysis_name, ctd_species = "mouse") controlledCTs <- c("pyramidal CA1") cond_geneset_res_pyrCA1 <- MAGMA.Celltyping::calculate_conditional_geneset_enrichment( geneset = geneset, ctd = ctd, controlledAnnotLevel = 1, controlledCTs = controlledCTs, gwas_sumstats_path = gwas_sumstats_path, analysis_name = analysis_name, ctd_species = "mouse") controlledCTs <- c("pyramidal CA1","pyramidal SS") cond_geneset_res_pyr = MAGMA.Celltyping::calculate_conditional_geneset_enrichment( geneset = geneset, ctd = ctd, controlledAnnotLevel = 1, controlledCTs = controlledCTs, gwas_sumstats_path = gwas_sumstats_path, analysis_name = analysis_name, ctd_species = "mouse") controlledCTs <- c("Medium Spiny Neuron") cond_geneset_res_MSN = MAGMA.Celltyping::calculate_conditional_geneset_enrichment( geneset = geneset, ctd = ctd, controlledAnnotLevel = 1, controlledCTs = controlledCTs, gwas_sumstats_path = gwas_sumstats_path, analysis_name = analysis_name, ctd_species = "mouse") controlledCTs <- c("Medium Spiny Neuron","pyramidal CA1","pyramidal SS","interneurons") cond_geneset_res = MAGMA.Celltyping::calculate_conditional_geneset_enrichment( geneset = geneset, ctd = ctd, controlledAnnotLevel = 1, controlledCTs = controlledCTs, gwas_sumstats_path = gwas_sumstats_path, analysis_name = analysis_name, ctd_species = "mouse")
utils::sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.