library(knitr)
knitr::opts_chunk$set(
 collapse = TRUE,
 comment = "#>",
  fig.path = "figures/vignette-",
 out.width = "60%"
)

1.Utilizing Real-World Examples in scPagwas for Comprehensive Testing

To showcase the visualization effects accurately, we have opted to use real-world examples from the referenced article instead of the software's provided demo dataset. In this scenario, we will be utilizing the BMMC scRNA-seq data in conjunction with the GWAS data for monocyte count. It is important to note that due to the large size of the data, the computation time for this analysis may take several minutes. Hence, we recommend directly downloading the result data for convenient examination and exploration. The processed monocytecount gwas data can be download from: - Onedrive: https://1drv.ms/t/s!As-aKqXDnDUHi6sx7Hqblj2Sgl7P8w?e=pwSq1Q

BMMC example scRNA-seq data can be obtained from: - Ondrive: https://1drv.ms/u/s!As-aKqXDnDUHjfcMjN4VGtAw1Utm0w?e=bOOBJl

We can download the result files from: - Ondrive: https://1drv.ms/u/s!As-aKqXDnDUHjLNvq64d2GAK_pIGVA?e=bZSawC

Code: 1234

If you want to run the deno data, please go to Conventional Usage Instructions for scPagwas with Demo Example Data

2. Computing the singlecell and celltype result for Monocytecount trait

If we running scPagwas in multi-core in server environment, there may cause an error: Error: Two levels of parallelism are used. See ?assert_cores\` add this code before call in R environment:

export OPENBLAS_NUM_THREADS=1

There is no need to run this code in window system.

In this example, we run the scPagwas for usual step, both running singlecell and celltype process.

library(scPagwas)
Pagwas<-scPagwas_main(Pagwas =NULL,
                     gwas_data="./monocytecount_prune_gwas_data.txt",
                     Single_data ="./Seu_Hema_data.rds",
                     output.prefix="mono",
                     output.dirs="monocytecount_bmmc",
                     Pathway_list=Genes_by_pathway_kegg,
                     n.cores=2,
                     assay="RNA",
                     singlecell=T, 
                     iters_singlecell = 100,
                     celltype=T,
                     block_annotation = block_annotation,
                     chrom_ld = chrom_ld)
save(Pagwas,file="./BMMC_scPagwas.RData")

Sometimes, we need to remove the objects in cache folder:

if(length(SOAR::Objects())>0){
 SOAR::Remove(SOAR::Objects()) 
}

3. Result interpretation

Data Results and Output in scPagwas:

3.1 Pagwas

> print(Pagwas)
An object of class Seurat 
16478 features across 35582 samples within 3 assays 
Active assay: RNA (15860 features, 5000 variable features)
 2 other assays present: scPagwasPaHeritability, scPagwasPaPca
 3 dimensional reductions calculated: pca, tsne, umap

> names(Pagwas@misc)
[1] "Pathway_list"                  "pca_cell_df"                   "lm_results"                   
[4] "bootstrap_results"             "scPathways_rankPvalue"         "GeneticExpressionIndex"
[7] "Merged_celltype_pvalue"
  1. In this Seruat result, two new assays were added:

    • scPagwasPaPca: An assay for S4 type of data; the svd score result for pathways in each cells;

    • scPagwasPaHeritability: An assay for S4 type of data; the gPas score matrix for pathways in each cells;

These two files are intermediate files generated by scPagwas.

  1. In meta.data, four columns were added:

  2. scPagwas.TRS.Score1: inheritance related score, enrichment socre for genetics top genes;

  3. scPagwas.upTRS.Score: Trait relevant score based on genetic relevant genes

  4. scPagwas.downTRS.Score: Trait relevant score based on anti-genetic relevant genes

  5. scPagwas.gPAS.score: Inheritance pathway regression effects score for each cells;

  6. Random_Correct_BG_p: CellpValue for each cells;

  7. Random_Correct_BG_adjp: fdr for each cells, adjust p value.

  8. Random_Correct_BG_z:z score for eahc cells.

  1. A new element names misc is added in scPagwas's result, Pagwas@misc is a list including:

    • Pathway_list: filtered pathway gene list;

    • pca_cell_df: a pahtway and celltype data frame for pathway svd(1'st pca) result for each celltype;

    • lm_results: the regression result for each celltype;

    • PCC: pearson correlation coefficients,heritability correlation value for each gene;

    • PCC_up: pearson correlation coefficients, high heritability correlation value for high gPas score of cells

    • PCC_down: pearson correlation coefficients, Low heritability correlation value for Low gPas score of cells

    • bootstrap_results: The bootstrap data frame results for celltypes including bootstrap pvalue and confidence interval.

    • Merged_celltype_pvalue: The resulting Merged_celltype_pvalue represents p-values at the cell type level, which differ from the bootstrap_results. The bootstrap_results are based on merging cell-level expression data into cell type-level expression data and performing calculations. In contrast, the Merged_celltype_pvalue is derived from the Random_Correct_BG_p results, combining the p-values of all cells within a specific cell type into a single p-value result for that cell type.

    Note: Although both the bootstrap and merged strategies yield cell type-level results, they may not be identical. This is because the bootstrap approach calculates p-values based on pseudo-bulk data, which can differ from the results obtained from single-cell calculations. In contrast, the merged strategy directly integrates the results from single-cell calculations, resulting in greater consistency with single-cell results. The primary focus in scPagwas paper is on the bootstrap_results; however, during the later stages of development, we discovered that discrepancies between cell type-level and single-cell-level calculation strategies posed some challenges in downstream analyses. Hence, in the new version development, we directly utilize the integration of single-cell results to obtain cell type-level p-values. Additionally, the major advantage of the bootstrap_results is its speed, as the calculation of Random_Correct_BG_p for single cells requires at least 100 iterations, making it computationally demanding. To avoid calculating Random_Correct_BG_p, setting the parameter iters_singlecell=0 is a good option.

3.2 Pagwas : output files result

image-20230828102322990

In monocytecount_bmmc folder, There is several result files :

scPagwas_cache is a temporary folder to save the SOAR data, you can remove it when finish the scPagwas.

4. Result Visualization

4.1 Visualize the scPagwas Score of single cell data in UMAP or TSNE plot.

1) DimPlot of singlecell data.

 require("RColorBrewer")
 require("Seurat")
 require("SeuratObject")
 require("ggsci")
 require("dplyr")
 require("ggplot2")
 require("ggpubr")
 #check the objects
color26 <- c("#D9DD6B","#ECEFA4","#D54C4C","#8D2828","#FDD2BF","#E98580","#DF5E5E","#492F10","#334257","#476072","#548CA8","#00A19D","#ECD662","#5D8233","#284E78","#3E215D","#835151","#F08FC0","#C6B4CE","#BB8760","#FFDADA","#3C5186","#558776","#E99497","#FFBD9B","#0A1D37")

Seurat::DimPlot(Pagwas,pt.size=1,reduction="tsne",label = T, repel=TRUE)+
 scale_colour_manual(name = "celltypes", values = color26)+
 umap_theme()+ggtitle("Monocyte BMMC")+labs(x="TSNE",y="")+theme(aspect.ratio=1)

scPagwas.TRS.Score1 and positive(p<0.05) cells showing in dimplot.

 scPagwas_Visualization(Single_data=Pagwas,
                        p_thre = 0.05,
                        FigureType = "tsne",
                        width = 7,
                        height = 7,
                        lowColor = "white", 
                        highColor = "red",
                        output.dirs="figure",
                        size = 0.5,
                        do_plot = T)

2) scPagwas.gPAS.score dimplot.

3) scPagwas.TRS.Score dimplot.

4)Random_Correct_BG_adjp dimplot.

Positive cells(q value<0.05): red dot; Negative cells: other dot.

4.2 Plot the barplot of the proportion of positive Cell.

Plot the barplot of the proportion of positive Cells in celltypes:

plot_bar_positie_nagtive(seurat_obj=Pagwas,
                         var_ident="celltypes",
                         var_group="positiveCells",
                         vec_group_colors=c("#E8D0B3","#7EB5A6"),
                         do_plot = T)

Plot the barplot of the proportion of celltypes in positive and negative Cells:

plot_bar_positie_nagtive(seurat_obj=Pagwas,
                              var_ident="positiveCells",
                              var_group="celltypes",
                              p_thre = 0.01,
                              vec_group_colors=NULL,
                              f_color=colorRampPalette(brewer.pal(n=10, name="RdYlBu")),
                              do_plot = T)

4.3 Plot the heritability correlated genes

heritability_cor_scatterplot(gene_heri_cor=Pagwas@misc$PCC,
                             topn_genes_label=10,
                             color_low="#035397",
                             color_high ="#F32424",
                             color_mid = "white",
                             text_size=2,
                             do_plot=T,
                             max.overlaps =20,
                             width = 7,
                             height = 7)

4.5 celltypes bootstrap_results reuslt

Barplot for celltypes

Bootstrap_P_Barplot(p_results=Pagwas@misc$bootstrap_results$bp_value[-1],
                    p_names=rownames(Pagwas@misc$bootstrap_results)[-1],
                    figurenames = "Bootstrap_P_Barplot.pdf",
                    width = 5,
                    height = 7,
                    do_plot=T,
                    title = "monocytecount_bmmc")

Forest plot for estimate value.

Bootstrap_estimate_Plot(bootstrap_results=Pagwas@misc$bootstrap_results,
                        figurenames = "estimateplot.pdf",
                        width = 9,
                        height = 7,
                        do_plot=T)

4.6 Visualize the heritability correlated Pathways

Plot the specific genetics pathway for each celltypes

library(tidyverse)
library("rhdf5")
library(ggplot2)
library(grDevices)
library(stats)
library(FactoMineR)
library(scales)
library(reshape2)
library(ggdendro)
library(grImport2)
library(gridExtra)
library(grid)
library(sisal)

source(system.file("extdata", "plot_scpathway_contri_dot.R", package = "scPagwas"))
plot_scpathway_dot(Pagwas=Pagwas,
                   celltypes=c("01_HSC","02_Early.Eryth","05_CMP.LMPP","11_CD14.Mono.1","12_CD14.Mono.2","13_CD16.Mono","15_CLP.2","19_CD8.N","20_CD4.N1","21_CD4.N2"),
                   topn_path_celltype=5,
                   filter_p=0.05,
                   max_logp=15,
                   display_max_sizes=F,
                   size_var ="logrankPvalue" ,
                   col_var="proportion",
                   shape.scale = 8,
                   cols.use=c("lightgrey", "#E45826"),
                   dend_x_var = "logrankPvalue",
                   dist_method="euclidean",
                   hclust_method="ward.D",
                   do_plot = T,
                   width = 7,
                   height = 7)

sessionInfo()


dengchunyu/scPagwas documentation built on Nov. 29, 2024, 2:53 p.m.