README.md

SEAA(Splicing Efficiency Analysis and Annotation)

SEAA provides convenient and rapid splicing efficiency calculation and splicing sites annotation function using next generation sequencing data. Aligned .bam files and a processed splicing sites .saf file are needed. The task can be finished in several minutes multi-core computing with the assitance of 'Rsubread'. Plots of splicing status type and Cumulative Distribution Function (CDF) and annotated vaild splicing efficiency can be exported.

Installation

You can install the released version of SEAA from github with:

install.packages("devtools")
library(devtools)
install_github("PrinceWang2018/SEAA")

You can also install the released version of SEAA from CRAN with:

install.packages("SEAA")

Data preparation

Aligning your sequencing files with hisat2 is strongly recommended. Genome_trans index of hisat2 can be downloaded from http://daehwankimlab.github.io/hisat2/download/. It should be noticed that ensembl version reference files with chromosome number "1" instead of "chr1" were used. Or you will need to edit the saf file manually.

hisat2 -x /indexpath/hisat2/grch37_tran/genome_tran -1 /fastqpath/NC_1.fastq -2 /fastqpath/NC_2.fastq --min-intronlen 20 --max-intronlen 10000 --threads 12 --rna-strandness F | samtools sort -o /outputpath/NC.bam - 

We have already prepared the saf files consisting the splicing sites of human (hg19) and mouse (mm10) which can be downloaded from our github repository https://github.com/PrinceWang2018/SEAA_reference.

Example

This is a basic workflow which shows you how to use this software:

library("SEAA")
#set you work path
setwd("/home/username/workpath/")

Step1: Caculating splicing efficiency.

#Please locate your aligned bam files.
BAM_files <- c("./NC_1.bam","./shUSP39_1.bam")
#Please locate your .saf annotation files.
Anno_SAF<- "/home/wzx/project3tB/SEAA_project/reference/hg19/hg19_NCBI_splicing_sites_20210705.saf"
SEresultlist<-SEcalculation(BAM_files,Anno_SAF,paired = TRUE ,thread = 8,strand = 1)

Step2: Caculating splicing efficiency with featureCounts.

SEtyperesult<-SEtypeplot(SEresultlist,"horizontal")

Step3: Filtering invaild splicing efficiency and reads counts less than min_counts.

efficiency_5ss_3ss_nona_inf_reduct<-SEfilter(SEresultlist,min_counts = 5)

Step4: Cumulative Distribution Function plotting based on filtered splicing efficiency.

CDFplot(SEresultlist,efficiency_5ss_3ss_nona_inf_reduct,zoom.x= c(5,6))

Step5: Annotation of Splicing Sites Acquiring filtered Splicing Efficiency.

SEannotaionresult<-SEsiteanno(SEresultlist, efficiency_5ss_3ss_nona_inf_reduct, species = "hs")

Step6: Labeling the target gene in filtered splicing efficiency list.

targetlabeling(SEresultlist,target_site = "27830321",target_label = "RPL21",xlim.max = 1000, ylim.max = 1000)

Citation

Wang et al., (2021). SEAA: Splicing Efficiency Analysis and Annotation (to be published)

Contact

If you have any question, please contact Zixiang Wang at wangzixiang@mail.sdu.edu.cn or wangzixiang@live.com.



Try the SEAA package in your browser

Any scripts or data that you put into this service are public.

SEAA documentation built on Sept. 19, 2021, 9:07 a.m.