Home

/

GitHub

/

YiPeng-Gao/scDaPars

/

README.md

README.md
In YiPeng-Gao/scDaPars: Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq

scDaPars: Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq

Yipeng Gao, Wei Li 2020-08-22

2020-08-22: - Version 0.0.1 is released!

scDaPars is a bioinformatics algorithm to accurately quantify Alternative Polyadenylation (APA) events at both single-cell and single-gene resolution using standard scRNA-seq data.

Step.1 scDaPars first takes scRNA-seq genome coverage data (bedgraph format) as input and forms a linear regression model to jointly infer the exact location of proximal poly(A) sites (Current Version of scDaPars do not support this function, raw PDUI values are calulated seperately using DaPars2). DaPars2 can be downloaded and the instructions to run DaPars2 can be found in DaPars2 Instruction.

Step.2 scDaPars constructs a nearest neighbor graph based on the sparse APA matrix generated in step.1 to identify a pool of candidate neighboring cells that have similar APA profiles.

Step.3 scDaPars uses a non-negative least square (NNLS) regression model to refine neighboring cells and impute PDUIs of dropout genes in each cell.

We welcome any suggestions on the package. For technical problems, please report to Issues. For suggestions and comments, please contact Yipeng (yipeng.gao@bcm.edu) or Dr. Wei Li (wei.li@uci.edu).

The package is not on CRAN yet. For installation please use the following codes in R

install.packages("devtools")
library(devtools)

install_github("YiPeng-Gao/scDaPars")

The imputation steps of scDaPars takes the APA matrix from step.1 as input and in the simplest case, the imputation can be done with one single function scDaPars:

scDaPars(raw_PDUI_file,         # full path of the raw PDUI matrix generated by step1 of scDaPars
         out_dir,               # full path of the output directory
         filter_gene_thre,      # the percent of cells a gene's APA must be detected in step.1
         filter_cell_thre)      # the percent of gene APA a cell must be detected in step.1

The dataset used in this example is a time-course scRNA-seq dataset containing 758 cells sequenced at 0, 12, 24, 36, 72 and 96 h of differentiation during human definitive endoderm (DE) emergence from Chu et al. under GEO accession code GSE75748. After quality control, there is 739 cells remained for analysis.

Genearting raw PDUI matrix

For generating raw PDUI matrix in step 1, we assume that the scRNA-seq data has been preprocessed, so that we have one wiggle file per cell. The raw PDUI files are then generated by DaPars2. The raw PDUI matrix for this example "Dapars_hESC_combined_all_chromosome.txt" is included in the example folder.

install and load scDaPars R package

if(!require(devtools)) install.packages("devtools")
library(devtools)

## Loading required package: devtools

## Loading required package: usethis

devtools::install_github("YiPeng-Gao/scDaPars")
library(scDaPars)

```## Loading required package: penalized

>3. Run scDaPars

scDaPars.res = scDaPars(raw_PDUI_file = "Dapars_hESC_combined_all_chromosome.txt", out_dir = "./scDaPars_result", filter_gene_thre = 0.2, filter_cell_thre = 0.1)

head(scDaPars.res)[,1:6]

>4. Visualize scDaPars' results

Load cell type information for the example data

hESC_SRA = read.table("SraRunTable.txt", header = T, sep = ",", stringsAsFactors = F) cell_type = hESC_SRA[which(hESC_SRA$Run %in% colnames(scDaPars.res)),] cell_type = cell_type[match(colnames(scDaPars.res), cell_type$Run),] head(cell_type)

Perform UMAP analysis

scDaPars.res.umap = umap(t(scDaPars.res)) scDaPars.res.umap.data = data.frame(scDaPars.res.umap$layout) colnames(scDaPars.res.umap.data) = c("Dim1", "Dim2") scDaPars.res.umap.data$cellType = cell_type$source_name

Generate Scatter plot

ggplot(scDaPars.res.umap.data, aes(x=Dim1, y=Dim2, color = cellType)) + scale_color_manual(values = c("#7DD2D9", "#FFA500", "#e55b54", "#ad6c58", "#989797", "#166FD5")) + geom_point(size = 0.5) + theme_bw(base_size = 14) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(legend.position = "right", legend.title = element_blank()) ```

YiPeng-Gao/scDaPars documentation built on May 29, 2021, 3:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

YiPeng-Gao/scDaPars Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq

README.md In YiPeng-Gao/scDaPars: Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq

scDaPars: Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq

Latest News

Introduction

scDaPars Pipeline

Installation

Quick Start

Example

Loading required package: survival

Welcome to penalized. For extended examples, see vignette("penalized").

Loading required package: foreach

Loading required package: RANN

Loading required package: igraph

Attaching package: 'igraph'

The following objects are masked from 'package:stats':

decompose, spectrum

The following object is masked from 'package:base':

union

[1] "Reading in raw PDUI matrix ..."

[1] "number of genes in raw count matrix 13029"

[1] "number of cells in raw count matrix 743"

[1] "Reading finished!"

[1] "Start Processing raw PDUI matrix"

[1] "Pre-Processing finished"

[1] "Start Imputation Steps ..."

[1] "Find potential Neighboring Cells ..."

[1] "Number of neighbors(clusters) is 4"

[1] "Outliers is/are "

[1] "Imputation with neighboring cells ..."

[1] 100

[1] 100

[1] 100

[1] 200

[1] 300

[1] "Imputation Steps Finished!"

[1] "Writing imputed PDUI matrix ..."

SRR2978558 SRR2978559 SRR2978560 SRR2978561 SRR2978562

ENST00000000412.3 1.00000000 0.804251535 0.857067085 1.000000000 0.706309880

ENST00000002165.10_1 0.01711468 0.002728381 0.001751477 0.001465125 0.001576918

ENST00000003100.13_3 0.05000000 0.607145404 0.496367748 0.733390448 0.611959391

ENST00000005082.13_1 0.75459201 1.000000000 0.788302984 0.110000000 0.672660621

ENST00000005257.7_2 0.15491832 0.000000000 0.289365021 0.285070704 0.224915218

ENST00000005259.8_2 0.05000000 0.046576290 0.036624434 0.050765637 0.070246239

SRR2978564

ENST00000000412.3 0.710688649

ENST00000002165.10_1 0.002705341

ENST00000003100.13_3 0.523058864

ENST00000005082.13_1 0.655225597

ENST00000005257.7_2 1.000000000

ENST00000005259.8_2 0.071559391

Load cell type information for the example data

Run Assay.Type AvgSpotLen BioProject BioSample Center.Name

933 SRR2978558 RNA-Seq 51 PRJNA305280 SAMN04322527 GEO

934 SRR2978559 RNA-Seq 51 PRJNA305280 SAMN04322528 GEO

935 SRR2978560 RNA-Seq 51 PRJNA305280 SAMN04322529 GEO

936 SRR2978561 RNA-Seq 51 PRJNA305280 SAMN04322530 GEO

937 SRR2978562 RNA-Seq 51 PRJNA305280 SAMN04322531 GEO

939 SRR2978564 RNA-Seq 51 PRJNA305280 SAMN04322533 GEO

Consent DATASTORE.filetype DATASTORE.provider

933 public sra gs,ncbi,s3

934 public sra gs,ncbi,s3

935 public sra gs,ncbi,s3

936 public sra gs,ncbi,s3

937 public sra gs,ncbi,s3

939 public sra gs,ncbi,s3

DATASTORE.region Experiment facs_sorting GEO_Accession

933 gs.US,ncbi.public,s3.us-east-1 SRX1468145 not sorted GSM1965870

934 gs.US,ncbi.public,s3.us-east-1 SRX1468146 not sorted GSM1965871

935 gs.US,ncbi.public,s3.us-east-1 SRX1468147 not sorted GSM1965872

936 gs.US,ncbi.public,s3.us-east-1 SRX1468148 not sorted GSM1965873

937 gs.US,ncbi.public,s3.us-east-1 SRX1468149 not sorted GSM1965874

939 gs.US,ncbi.public,s3.us-east-1 SRX1468151 not sorted GSM1965876

Instrument LibraryLayout LibrarySelection LibrarySource MBases

933 Illumina HiSeq 2500 SINGLE cDNA TRANSCRIPTOMIC 105

934 Illumina HiSeq 2500 SINGLE cDNA TRANSCRIPTOMIC 116

935 Illumina HiSeq 2500 SINGLE cDNA TRANSCRIPTOMIC 141

936 Illumina HiSeq 2500 SINGLE cDNA TRANSCRIPTOMIC 99

937 Illumina HiSeq 2500 SINGLE cDNA TRANSCRIPTOMIC 90

939 Illumina HiSeq 2500 SINGLE cDNA TRANSCRIPTOMIC 103

YiPeng-Gao/scDaPars
Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq

README.md
In YiPeng-Gao/scDaPars: Dynamic Analysis of Alternative Polyadenylation from Single-Cell RNA-Seq