scmageck_eff_estimate: Detect heterogenous perturbation responses from Perturb-seq...

View source: R/scmageck_eff_estimate.R

scmageck_eff_estimateR Documentation

Detect heterogenous perturbation responses from Perturb-seq using perturbation-response score (PS)

Description

"This function uses constrained linear least squares to calculate perturbation-rsponse score (PS), which measures heterogenous perturbation effects from single-cell CRISPR screens (e.g., Perturb-seq, CROP-seq)."

Usage

scmageck_eff_estimate(
  rds_object, 
  bc_frame, 
  perturb_gene, 
  non_target_ctrl, 
  perturb_target_gene = NULL, 
  scale_factor = 3, 
  target_gene_min = 10, 
  target_gene_max = 500,
  assay_for_cor = 'MAGIC_RNA', 
  subset_rds = TRUE, 
  scale_score = TRUE,
  perturb_gene_exp_id_list = NULL,
  lambda = 0,
  background_correction=FALSE,
  use_perturb_exp=TRUE,
  logfc.threshold=0.1
)
 

Arguments

rds_object

A Seurat object or local RDS file path that contains the scRNA-seq dataset; or a path to RDS file. Note that the dataset has to be normalized and scaled.

bc_frame

A txt file to include cell identity information, generated from the cell identity collection step; or a corresponding data.frame.

perturb_gene

The list of perturbed genes. By default, all genes in the table are subject to regression.

non_target_control

The list of genes (separated by ",") served as negative controls.

perturb_target_gene

The list of target genes for modeling. If null, will automatic search and identify the target genes.

scale_factor

The upper bound of the constraints. Must be a positive value. Default 3. Assign a higher value for a more continuous distribution of the signa scores.

target_gene_min

The minimum number of genes selected for target genes. If DEG analysis does not provide enough number of genes that reach this number, the algorithm will iteratively (at most 3 times) decrease LFC threshold to select more genes.

target_gene_max

The maximum number of genes selected for target genes.

assay_for_cor

The assays used for estimating correlation. Default: MAGIC_RNA, which is generated by MAGIC.

subset_rds

Whether to return an R object that only contains cells that express guides targeting perturbed genes (or negative control genes). If TRUE, a gene column in metadata will be added (or updated) that assigns perturbed genes to single cells. Default: TRUE

scale_score

Whether to scale the scores for each gene to 1. Default: TRUE

perturb_gene_exp_id_list

If the perturbed_gene id is different from expression feature id, use this parameter to provide the corresponding expression features. Must be the same length as perturb_gene. Default: NULL

lambda

Sparse penalty (similar with the lambda value in LASSO reguession). Must be non-negative. Default: 0

background_correction

Whether to extract background gene expression, which is estimated from negative control cells. Turn this option on will reduce false positives in datasets containing multiple cell types, where gene expressions may be largely different from different cell types. Default: False

use_perturb_exp

Whether to use the expression of perturbed gene in the estimation of efficiency score. If False, the perturbed gene (or the corresponding id in perturb_gene_exp_id_list) will be removed from estimating the score. Default: True

logfc.threshold

logfc threshold when determining the downstream targets of a perturbed gene. Default: 0.1

Value

Returns a list of several items: eff_matrix: the PS score matrix containing the PS scores of each cells for each perturbed gene rds: the R object if subset_rds is set as TRUE optimization_matrix: the matrix used for actually performing the constrained optimizaiton target_gene_search_result: the results of target gene search for each perturbed gene

Examples


    library(Seurat)
    # set the BARCODE and RDS file path 
    # if you have a guide matrix, use guidematrix_to_triplet() function in scMAGeCK to convert guide matrix to BARCODE file
    BARCODE = system.file("extdata","barcode_rec.txt",package = "scMAGeCK")
    bc_frame=read.table(BARCODE,header = T,as.is = T)
    
    # needs clean later, but cell identity will need to be fixed
    bc_frame$cell=sub('-1','',bc_frame$cell)
    
    
    ## RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset
    RDS = system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK")
    rds_object=readRDS(RDS)
    
    # Run scmageck_eff_estimate function
    # By default, the result will be saved to the current working directory. 
    ## rds_object<-assign_cell_identity(bc_frame,rds_object)
    
    eff_object <- scmageck_eff_estimate(rds_object, bc_frame, perturb_gene='TP53', 
                                        non_target_ctrl = 'NonTargetingControlGuideForHuman',assay_for_cor='RNA')
    
    eff_estimat=eff_object$eff_matrix
    rds_subset=eff_object$rds
    
    # TP53 scores clearly show the pattern of clustering
    FeaturePlot(rds_subset,features='TP53_eff',reduction = 'tsne')
    
    # whereas TP53 gene expression did not have this pattern
    FeaturePlot(rds_subset,features='TP53',reduction = 'tsne')




weililab/scMAGeCK documentation built on April 21, 2024, 10:36 a.m.