Yipeng Gao, Wei Li 2020-08-22
2020-08-22: - Version 0.0.1 is released!
scDaPars is a bioinformatics algorithm to accurately quantify Alternative Polyadenylation (APA) events at both single-cell and single-gene resolution using standard scRNA-seq data.
Step.1 scDaPars first takes scRNA-seq genome coverage data (bedgraph format) as input and forms a linear regression model to jointly infer the exact location of proximal poly(A) sites (Current Version of scDaPars do not support this function, raw PDUI values are calulated seperately using DaPars2). DaPars2 can be downloaded and the instructions to run DaPars2 can be found in DaPars2 Instruction.
Step.2 scDaPars constructs a nearest neighbor graph based on the sparse APA matrix generated in step.1 to identify a pool of candidate neighboring cells that have similar APA profiles.
Step.3 scDaPars uses a non-negative least square (NNLS) regression model to refine neighboring cells and impute PDUIs of dropout genes in each cell.
We welcome any suggestions on the package. For technical problems, please report to Issues. For suggestions and comments, please contact Yipeng (yipeng.gao@bcm.edu) or Dr. Wei Li (wei.li@uci.edu).
The package is not on CRAN yet. For installation please use the following codes in R
install.packages("devtools")
library(devtools)
install_github("YiPeng-Gao/scDaPars")
The imputation steps of scDaPars
takes the APA matrix from step.1 as input and in the simplest case, the imputation can be done with one single function scDaPars
:
scDaPars(raw_PDUI_file, # full path of the raw PDUI matrix generated by step1 of scDaPars
out_dir, # full path of the output directory
filter_gene_thre, # the percent of cells a gene's APA must be detected in step.1
filter_cell_thre) # the percent of gene APA a cell must be detected in step.1
The dataset used in this example is a time-course scRNA-seq dataset containing 758 cells sequenced at 0, 12, 24, 36, 72 and 96 h of differentiation during human definitive endoderm (DE) emergence from Chu et al. under GEO accession code GSE75748. After quality control, there is 739 cells remained for analysis.
- Genearting raw PDUI matrix
For generating raw PDUI matrix in step 1, we assume that the scRNA-seq data has been preprocessed, so that we have one wiggle file per cell. The raw PDUI files are then generated by DaPars2. The raw PDUI matrix for this example "Dapars_hESC_combined_all_chromosome.txt" is included in the example folder.
- install and load scDaPars R package
if(!require(devtools)) install.packages("devtools")
library(devtools)
## Loading required package: devtools
## Loading required package: usethis
devtools::install_github("YiPeng-Gao/scDaPars")
library(scDaPars)
```## Loading required package: penalized
>3. Run scDaPars
scDaPars.res = scDaPars(raw_PDUI_file = "Dapars_hESC_combined_all_chromosome.txt", out_dir = "./scDaPars_result", filter_gene_thre = 0.2, filter_cell_thre = 0.1)
head(scDaPars.res)[,1:6]
>4. Visualize scDaPars' results
hESC_SRA = read.table("SraRunTable.txt", header = T, sep = ",", stringsAsFactors = F) cell_type = hESC_SRA[which(hESC_SRA$Run %in% colnames(scDaPars.res)),] cell_type = cell_type[match(colnames(scDaPars.res), cell_type$Run),] head(cell_type)
Perform UMAP analysis
scDaPars.res.umap = umap(t(scDaPars.res)) scDaPars.res.umap.data = data.frame(scDaPars.res.umap$layout) colnames(scDaPars.res.umap.data) = c("Dim1", "Dim2") scDaPars.res.umap.data$cellType = cell_type$source_name
Generate Scatter plot
ggplot(scDaPars.res.umap.data, aes(x=Dim1, y=Dim2, color = cellType)) +
scale_color_manual(values = c("#7DD2D9", "#FFA500", "#e55b54", "#ad6c58", "#989797", "#166FD5")) +
geom_point(size = 0.5) +
theme_bw(base_size = 14) +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
theme(legend.position = "right", legend.title = element_blank())
```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.