SCDV is an R package for analyzing differentially variable genes in single-cell RNA-seq data.
To install SCDV via Github, run the following code in R:
if (!require("devtools"))
install.packages("devtools")
devtools::install_github("WeiqiangZhou/SCDV")
05/06/2018: Updated differential mean test and differential var test by using the log-transformed data in the tests. 05/16/2018: Fix a bug in the calculation of two-sided p-values in differential var test.
There are three major steps for using SCDV
library(SCDV)
##input your count data as data_treatment_count and data_control_count, rownames of the input matrix should be ensembl id
data_treatment_count <- input_data_treatment
data_control_count <- input_data_control
##get gene information
gene_info <- annotation_data[["gene_len_GRCh37"]]
##change to "gene_info <- annotation_data[["gene_len_GRCm38"]]" for mouse data
match_gene_idx <- match(rownames(data_treatment_count),gene_info[,1])
gene_len <- gene_info[match_gene_idx,2]
names(gene_len) <- gene_info[match_gene_idx,1]
match_gene_name <- gene_info[match_gene_idx,-2]
##get neighboring cells
neighbor_result_treatment <- get_expected_cell(data_treatment_count,gene_len)
neighbor_result_control <- get_expected_cell(data_control_count,gene_len)
##estimate dropout probability, set multi-core using ncore, if not using multi-core, set ncore=1
output_treatment <- estimate_dropout_main(data_treatment_count,neighbor_result_treatment$data_expect,gene_len,ncore=6)
output_control <- estimate_dropout_main(data_control_count,neighbor_result_control$data_expect,gene_len,ncore=6)
save(output_treatment,file="output_treatment.rda")
save(output_control,file="output_control.rda")
hp_gene <- annotation_data[["LV_gene_human"]][,1]
##change to "hp_gene <- annotation_data[["LV_gene_mouse"]][,1]" for mouse data
gene_names <- sapply(rownames(data_treatment_count),function(x) sub("\\..*","",x))
library_scale <- adjust_library_size(c(output_treatment,output_control),hp_gene,gene_names)
treatment_data_true <- sapply(output_treatment,function(x) x$data_true)
treatment_data_weight <- 1 - sapply(output_treatment,function(x) x$post_weight)
control_data_true <- sapply(output_control,function(x) x$data_true)
control_data_weight <- 1 - sapply(output_control,function(x) x$post_weight)
treatment_data_adjust <- t(t(treatment_data_true)*library_scale[1:length(output_treatment)])
control_data_adjust <- t(t(control_data_true)*library_scale[-c(1:length(output_treatment))])
SCDV supports three types of tests:
##set multi-core using ncore, if not using multi-core, set ncore=1
diff_disper <- scdv_main(treatment_data_adjust,treatment_data_weight,control_data_adjust,control_data_weight,num_permute=10000,span_param=0.5,ncore=6)
write.csv(cbind(match_gene_name,diff_disper),file="diff_hypervar.csv",row.names=FALSE)
##set multi-core using ncore, if not using multi-core, set ncore=1
diff_var <- test_var_main(treatment_data_adjust,treatment_data_weight,control_data_adjust,control_data_weight,num_permute=10000,ncore=6,log_transform=TRUE)
write.csv(data.frame(match_gene_name,diff_var),file="diff_var.csv",row.names=FALSE)
##set multi-core using ncore, if not using multi-core, set ncore=1
diff_expr <- test_mean_main(treatment_data_adjust,treatment_data_weight,control_data_adjust,control_data_weight,num_permute=10000,ncore=6,log_transform=TRUE)
write.csv(data.frame(match_gene_name,diff_expr),file="diff_expr.csv",row.names=FALSE)
All the tests in SCDV are permutation-based tests. These are non-parametric tests that do not rely on strong parametric assumptions.
The major difference is they test different types of variance.
The variance of a gene in single-cell RNA-seq data has been showed to be highly correlated with the mean expression of the gene. To remove such "mean effects", SCDV calculates the hyper-variability statistics which is a ratio of the observed variance of a gene to the expected variance at the same mean expression level.
Use ?scdv_main
, ?test_var_main
, and ?test_mean_main
in R to check the description of the return values in the help page of the package.
The "house keeping genes" are genes that show low variability but consistently expressed in different cell types which are obtained using recount2 (https://jhubiostatistics.shinyapps.io/recount/).
Use citation("SCDV")
in R to get the citation information.
Please contact Weiqiang Zhou: wzhou14@jhu.edu for questions and suggestions.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.