knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1.Paramers

Parameters and their Descriptions in scPagwas:

Note that during the computation, the filtering criteria ensure that the number of expressed genes in a pathway is not less than 5 and not more than 300. This helps prevent abnormal pathway calculations. However, it means that for different single-cell datasets with the same pathway data, the pathways included in the analysis may vary. The filtered pathway data can be found in the "misc" section of the scPagwas results.

Please note that the above descriptions are for the provided parameters in scPagwas.

2.scPagwas Calculation Processes Based on Singlecell and Celltype Parameters

There are some different situations for running scPagwas!

2.1. Run both singlecell and celltypes functions.

library(scPagwas)
Pagwas<-scPagwas_main(Pagwas =NULL,
                     gwas_data =system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"),
                     Single_data =system.file("extdata", "scRNAexample.rds", package = "scPagwas"),
                     output.prefix="Test",
                     output.dirs="Test",
                     Pathway_list=Genes_by_pathway_kegg,
                     assay="RNA",
                     singlecell=T, 
                     iters_singlecell = 100,
                     celltype=T,
                     block_annotation = block_annotation,
                     chrom_ld = chrom_ld)

Interpretation of Results: Conventional Result and Visualization Instructions with Real-World Examples.

2.2. Only run celltypes functions.

The executive programs for celltypes and single cell are independent, If you only want to know the celtypes, set the celltype=T and singlecell=F. The advantages is save a lot of times, for celltype only can omit many running processes.

Pagwas_celltypes<-scPagwas_main(Pagwas =NULL,
                     gwas_data =system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"),
                     Single_data =system.file("extdata", "scRNAexample.rds", package = "scPagwas"),
                     output.prefix="Test",
                     output.dirs="Test",
                     Pathway_list=Genes_by_pathway_kegg,
                     assay="RNA",
                     singlecell=F, 
                     celltype=T,
                     block_annotation = block_annotation,
                     chrom_ld = chrom_ld)

The reuslt of celltypes is list format(not seurat format).

names(Pagwas_celltypes)
 [1] "Celltype_anno"     "data_mat"          "VariableFeatures"  "merge_scexpr"     
 [5] "rawPathway_list"   "Pathway_list"      "pca_scCell_mat"    "pca_cell_df"      
 [9] "snp_gene_df"       "lm_results"        "bootstrap_results"

Pagwas_celltypes: A Result List with Distinct Characteristics and Inheritable Intermediate Data.

2.3. Only run singlecell functions

Pagwas_singlecell<-scPagwas_main(Pagwas =NULL, 
                                 gwas_data =system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"),
                                 Single_data =system.file("extdata", "scRNAexample.rds", package = "scPagwas"),
                                 output.prefix="Test",
                                 output.dirs="Test",
                                 Pathway_list=Genes_by_pathway_kegg,
                                 assay="RNA",
                                 singlecell=T, 
                                 celltype=F,
                                 block_annotation = block_annotation,
                                 chrom_ld = chrom_ld)

Because the parameters and input data are the same with celltypes function, we also can take the celltypes result as the input data for single cell function. The advantage is there is no need to run the svd code block, save a lot of time.

Pagwas_singlecell<-scPagwas_main(Pagwas =Pagwas_celltypes, 
                                 gwas_data =system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"),
                                 Single_data =system.file("extdata", "scRNAexample.rds", package = "scPagwas"),
                                 output.prefix="Test",
                                 output.dirs="Test",
                                 Pathway_list=Genes_by_pathway_kegg,
                                 assay="RNA",
                                 singlecell=T, 
                                 celltype=F,
                                 block_annotation = block_annotation,
                                 chrom_ld = chrom_ld)

The result is seurat format.

3. Running scPagwas step by step

The main function, scPagwas_main, is actually a package of multiple sub-functions designed to simplify the process. However, in practical calculations, a more flexible approach may be required. Here, we will introduce each step one by one to better understand the entire computational workflow.

We use an example provided by scPagwas package.

3.1 Single data input

The first step involves the reading and preprocessing of single cells, primarily aimed at filtering out clusters with very few cells and obtaining the intersection of genes between Genes_by_pathway_kegg and single-cell genes.

library(scPagwas)
Pagwas <- list()
Single_data <- readRDS(system.file("extdata", "scRNAexample.rds", package = "scPagwas"))
Pagwas <- Single_data_input(
      Pagwas = Pagwas,
      assay = "RNA",
      Single_data = Single_data,
      Pathway_list = Genes_by_pathway_kegg
    )
Single_data <- Single_data[, colnames(Pagwas$data_mat)]
names(Pagwas)
#[1] "Celltype_anno"    "data_mat"         "VariableFeatures" "merge_scexpr"

3.2 Run pathway pca score

The SVD method is used to compute the SVD results for single cells and cell types.

Pagwas <- Pathway_pcascore_run(
        Pagwas = Pagwas,
        Pathway_list = Genes_by_pathway_kegg
      )
names(Pagwas)
#[1] "Celltype_anno"    "data_mat"         "VariableFeatures" "merge_scexpr"     "rawPathway_list" 
#[6] "Pathway_list"     "pca_scCell_mat"   "pca_cell_df"

3.3 GWAS summary data input

Read the GWAS summary data, remove the MHC and sex chromosome and filtered the maf of SNP.

gwas_data <- bigreadr::fread2(system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas"))
Pagwas <- GWAS_summary_input(
    Pagwas = Pagwas,
    gwas_data = gwas_data,
    maf_filter = 0.1
  )
names(Pagwas)
#[1] "Celltype_anno"    "data_mat"         "VariableFeatures" "merge_scexpr"     "rawPathway_list" 
#[6] "Pathway_list"     "pca_scCell_mat"   "pca_cell_df"      "gwas_data"

3.4 Mapping Snps to Genes

We set the marg is 10KB, means the position for SNP is less 10KB distance to TSS of gene.

Pagwas$snp_gene_df <- SnpToGene(
        gwas_data = Pagwas$gwas_data,
        block_annotation = block_annotation,
        marg = 10000
      )

3.5 Pathway-SNP annotation

Mapping SNPs to pathways and getting block data.

Pagwas <- Pathway_annotation_input(
      Pagwas = Pagwas,
      block_annotation = block_annotation
    )
names(Pagwas)
#[1] "Celltype_anno"    "data_mat"         "VariableFeatures" "merge_scexpr"     "rawPathway_list" 
# [6] "Pathway_list"     "pca_scCell_mat"   "pca_cell_df"      "gwas_data"        "snp_gene_df"     
#[11] "pathway_blocks" 

3.6 Link the pathway blocks to pca score

The regression analysis step requires a relatively long computation time.

Pagwas <- Link_pathway_blocks_gwas(
      Pagwas = Pagwas,
      chrom_ld = chrom_ld,
      singlecell = T,
      celltype = T,
      backingpath="./temp")
names(Pagwas)
 #[1] "Celltype_anno"        "data_mat"             "VariableFeatures"     "merge_scexpr"        
 #[5] "rawPathway_list"      "Pathway_list"         "pca_scCell_mat"       "pca_cell_df"         
 #[9] "snp_gene_df"          "Pathway_sclm_results" "Pathway_ld_gwas_data"

In this step, a temporary file will be generated in the result folder to store intermediate data during the regression analysis, aiming to improve efficiency. Although the contents of this file will be deleted during the execution, sometimes it may not be completely removed. It is recommended to manually delete it after the program finishes running.

3.7 Perform regression for celltypes

Run the regression function for celltypes.

Pagwas$lm_results <- Pagwas_perform_regression(Pathway_ld_gwas_data = Pagwas$Pathway_ld_gwas_data)
Pagwas <- Boot_evaluate(Pagwas, bootstrap_iters = 200, part = 0.5)
names(Pagwas)
# [1] "Celltype_anno"        "data_mat"             "VariableFeatures"     "merge_scexpr"        
# [5] "rawPathway_list"      "Pathway_list"         "pca_scCell_mat"       "pca_cell_df"         
# [9] "snp_gene_df"          "Pathway_sclm_results" "Pathway_ld_gwas_data" "lm_results"          
#[13] "bootstrap_results"
#remove the Pathway_ld_gwas_data, it takes a lot of memory.
Pagwas$Pathway_ld_gwas_data <- NULL

3.8 Construct the scPagwas score

The gPAS scPagwas score mainly to deal with the single-cell regression result.

Pagwas <- scPagwas_perform_score(
      Pagwas = Pagwas,
      remove_outlier = TRUE
    )
names(Pagwas)
# [1] "Celltype_anno"          "data_mat"               "VariableFeatures"      
# [4] "merge_scexpr"           "rawPathway_list"        "Pathway_list"          
# [7] "pca_scCell_mat"         "pca_cell_df"            "snp_gene_df"           
#[10] "Pathway_sclm_results"   "Pathway_ld_gwas_data"   "lm_results"            
#[13] "bootstrap_results"      "Pathway_single_results" "scPathways_rankPvalue" 
#[16] "scPagwas.gPAS.score" 

3.9 Get the gene heritability correlation

Run heritability correlation for all genes.

#pcc gene for all gPas score!
Pagwas$PCC <- scPagwas::scGet_PCC(scPagwas.gPAS.score=Pagwas$scPagwas.gPAS.score,
                                    data_mat=Pagwas$data_mat)

3.10 Calculate the TRS score for top genes.

Calculate the TRS score for top genes by AddModuleScore and running the p value for each cell by Get_CorrectBg_p.

#Obtain the top 500 genes with the highest PCC.
n_topgenes=500
scPagwas_topgenes <- rownames(Pagwas$PCC)[order(Pagwas$PCC, decreasing = T)[1:n_topgenes]]
scPagwas_downgenes <- rownames(Pagwas$PCC)[order(Pagwas$PCC, decreasing =F)[1:n_topgenes]]

#Single_data refers to the single-cell data initially inputted.
Single_data <- Seurat::AddModuleScore(Single_data, assay = "RNA", list(scPagwas_topgenes,scPagwas_downgenes), name = c("scPagwas.TRS.Score","scPagwas.downTRS.Score"))

#Calculate the p-values for scPagwas.TRS.Score of single cells after background correction.
correct_pdf<-Get_CorrectBg_p(Single_data=Single_data,
                             scPagwas.TRS.Score=Single_data$scPagwas.TRS.Score1,
                             iters_singlecell=100,
                             n_topgenes=1000,
                             assay="RNA",
                             scPagwas_topgenes=scPagwas_topgenes)
Pagwas$Random_Correct_BG_pdf <- correct_pdf

#Merge the p-values of cells belonging to the same cell type into a single p-value for each cell type.
Pagwas$Merged_celltype_pvalue<-Merge_celltype_p(single_p=correct_pdf$pooled_p,celltype=Pagwas$Celltype_anno$annotation)

#Integrate and output the results of single-cell analysis.
a <- data.frame(
     scPagwas.TRS.Score = Single_data$scPagwas.TRS.Score1,
    scPagwas.downTRS.Score = Single_data$scPagwas.downTRS.Score2,
     scPagwas.gPAS.score = Pagwas$scPagwas.gPAS.score,
     Random_Correct_BG_p = correct_pdf$pooled_p,
     Random_Correct_BG_adjp = correct_pdf$adj_p,
     Random_Correct_BG_z = correct_pdf$pooled_z)
utils::write.csv(a,file = "_singlecell_scPagwas_score_pvalue.Result.csv",quote = F)

Note: Although our article primarily focuses on the TRS scores, it does not imply that reverse "downTRS" are always devoid of significance. During our following research, we observed that in certain phenotypes, such as some cancer traits.When observing the density distribution plot of gPas scores, it becomes evident that it exhibits a high and sharp peak. This indicates that the majority of cells have minor effects. However, if the distribution shows a more pronounced long-tail behavior, the direction of this long tail can influence the impact of PCC gene pairs on the phenotype.

image-20230823150635803

In other words, specific genotypes may be more closely related to a reduced risk of the phenotype. However, It requires a case-by-case analysis. The question then arises: do GWAS genetic effect mainly have a positive or negative impact on phenotypes? What factors determine this impact? This is a highly intriguing question that we intend to explore further in subsequent research. We encourage researchers to consider the distribution of scPagwas.gPAS.score as an adjunctive factor for assessing directionality, though it is not absolute and should be analyzed on a case-by-case basis.

All these sub-functions for scPagwas can running dependently, but need to run orderly. The scPagwas_main function is a wrapper functions for these sub-functions.



dengchunyu/scPagwas documentation built on Nov. 29, 2024, 2:53 p.m.