vasp_test: Quantification and Visulization of Variations of Splicing in Population

vasp/VaSP: Quantification and Visulization of Variations of Splicing in Population

VaSP is an R package for discovery of genome-wide variable splicing events from short-read RNA-seq data. Based on R package Ballgown, VaSP calculates Single Splicing Strength (3S) score of any intron by the junction count normalized by the gene-level average read coverage (score=junction count/gene-level average read coverage). The 3S scores can be used for further analysis, such as differential splicing analysis between two sample groups and sQTL (splicing Quantitative Trait Locus) identification in a large population. The VaSP package provides a function to find large-effect differential splicing events without the need of genotypic information in an inbred plant population, so called genotype-specific splicing (GSS). Integrated with functions from R package Sushi, VaSP package also provides function to visualization of gene splicing information for publication-quality multi-panel figures.

Start R and run:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("vasp")

Users need to follow the manual of R package Ballgown (https://github.com/alyssafrazee/ballgown) to creat a ballgown object as an input for the VaSP package. See ?ballgown for detailed information on creating Ballgown objects. The object can be stored in a .RDate file by save() .

library(vasp)
?ballgown
path<-system.file('extdata', package='vasp')
rice.bg<-ballgown(samples = list.dirs(path = path,recursive = F) )

Calculate 3S (Single Splicing Strength) scores, find GSS (genotype-specific splicing) events and display the splicing information.

Calculating 3S scores:

library(vasp)
#> Loading required package: ballgown
#> 
#> Attaching package: 'ballgown'
#> The following object is masked from 'package:base':
#> 
#>     structure
data(rice.bg)
?rice.bg
rice.bg
#> ballgown instance with 33 transcripts and 6 samples
score<-spliceGene(rice.bg, gene="MSTRG.183", junc.type = "score")
tail(round(score,2),2)
#>    Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 58          0       0.02       0.22       0.23       0.23       0.23
#> 59          0       0.00       0.00       0.12       0.12       0.12

Discovering GSS:

gss <- BMfinder(score, cores = 1) 
#> Warning in BMfinder(score, cores = 1): sample size < 30, the result should be
#> further tested!
#> ---BMfinder: 16 features * 6 samples
#> ---Completed:  a total of 4 bimodal distrubition features found!
gss
#>    Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 55          2          2          1          1          1          1
#> 57          1          1          2          2          2          2
#> 58          1          1          2          2          2          2
#> 59          1          1          1          2          2          2

Extracing intron information

gss_intron<-structure(rice.bg)$intron
(gss_intron<-gss_intron[gss_intron$id%in%rownames(gss)])
#> GRanges object with 4 ranges and 2 metadata columns:
#>       seqnames          ranges strand |        id transcripts
#>          <Rle>       <IRanges>  <Rle> | <integer> <character>
#>   [1]     Chr1 1179011-1179226      - |        55       15:16
#>   [2]     Chr1 1179011-1179134      - |        57          17
#>   [3]     Chr1 1179011-1179110      - |        58          18
#>   [4]     Chr1 1179011-1179106      - |        59          19
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
range(gss_intron)
#> GRanges object with 1 range and 0 metadata columns:
#>       seqnames          ranges strand
#>          <Rle>       <IRanges>  <Rle>
#>   [1]     Chr1 1179011-1179226      -
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Showing the splicing information

splicePlot(rice.bg,gene='MSTRG.183',samples = sampleNames(rice.bg)[c(1,3,5)],
           start = 1179000, end = 1179300)
#> [1] "yes"

Currently, there are 6 functions in VaSP: getDepth: Get read depth from a BAM file (in bedgraph format) getGeneinfo: Get gene informaton from a ballgown object spliceGene: Calculate 3S scores for one gene spliceGenome: Calculate genome-wide splicing scores BMfinder: Discover bimodal distrubition features splicePlot: Visualization of read coverage, splicing information and gene information in a gene region

Get read depth from a BAM file (in bedgraph format) and return a data.frame in bedgraph file format which can be used as input for plotBedgraph in the SuShi package.

path <- system.file("extdata", package = "vasp")
bam_files <- list.files(path, "*.bam$")
bam_files
#> [1] "Sample_027.bam" "Sample_042.bam" "Sample_102.bam" "Sample_137.bam"
#> [5] "Sample_237.bam" "Sample_272.bam"

depth <- getDepth(file.path(path, bam_files[1]), "Chr1", start = 1171800, end = 1179400)
head(depth)
#>   chrom   start    stop value
#> 1  Chr1 1171799 1171859     0
#> 2  Chr1 1171859 1171899     1
#> 3  Chr1 1171899 1171902     2
#> 4  Chr1 1171902 1171903     4
#> 5  Chr1 1171903 1171909     5
#> 6  Chr1 1171909 1171911     6

library(Sushi)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
#> Loading required package: biomaRt
par(mar=c(3,5,1,1))
plotBedgraph(depth, "Chr1", chromstart = 1171800, chromend = 1179400, yaxt = "s")
mtext("Depth", side = 2, line = 2.5, cex = 1.2, font = 2)
labelgenome("Chr1", 1171800, 1179400, side = 1, scipen = 20, n = 5, scale = "Kb")

Get gene informaton from a ballgown object by genes or by genomic regions and return a data.frame in bed-like file format that can be used as input for plotGenes in the SuShi package

data(rice.bg)
unique(geneIDs(rice.bg))
#>  [1] "MSTRG.177" "MSTRG.178" "MSTRG.179" "MSTRG.180" "MSTRG.181" "MSTRG.182"
#>  [7] "MSTRG.183" "MSTRG.184" "MSTRG.185" "MSTRG.186"

gene_id <- c("MSTRG.181", "MSTRG.182", "MSTRG.183")
geneinfo <- getGeneinfo(genes = gene_id, rice.bg)
trans <- table(geneinfo$name)  # show how many exons each transcript has
trans
#> 
#> LOC_Os01g03050.1 LOC_Os01g03060.1 LOC_Os01g03060.2 LOC_Os01g03060.3 
#>                5                2                3                3 
#> LOC_Os01g03070.1 LOC_Os01g03070.2      MSTRG.181.1      MSTRG.182.2 
#>               14               14                6                3 
#>      MSTRG.183.3      MSTRG.183.4      MSTRG.183.5 
#>               14               14               14

library(Sushi)

chrom            = geneinfo$chrom[1]
chromstart       = min(geneinfo$start) - 1e3
chromend         = max(geneinfo$stop) + 1e3
color            = rep(SushiColors(2)(length(trans)), trans)

par(mar=c(3,1,1,1))
p<-plotGenes(geneinfo, chrom, chromstart, chromend, col = color, bheight = 0.2, 
          bentline = FALSE, plotgenetype = "arrow", labeloffset = 0.5)
#> [1] "yes"
labelgenome(chrom, chromstart , chromend, side = 1, n = 5, scale = "Kb")

Calculate 3S Scores from ballgown object for a given gene. This function can only calculate one gene. Please use function spliceGenome to obtain genome-wide 3S scores.

data(rice.bg)
rice.bg
#> ballgown instance with 33 transcripts and 6 samples
head(geneIDs(rice.bg))
#>           1           2           3           4           5           6 
#> "MSTRG.177" "MSTRG.177" "MSTRG.177" "MSTRG.178" "MSTRG.178" "MSTRG.179"

score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score")
count <- spliceGene(rice.bg, "MSTRG.183", junc.type = "count")

## compare
tail(score)
#>    Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 54   1.073545  1.1034944  0.9799037  1.0581254  1.0581254  1.0581254
#> 55   1.073545  1.1034944  0.1224880  0.1594436  0.1594436  0.1594436
#> 56   0.788727  1.3148018  1.1803386  1.3190330  1.3190330  1.3190330
#> 57   0.000000  0.0000000  0.5790340  0.4058563  0.4058563  0.4058563
#> 58   0.000000  0.0234786  0.2227054  0.2319179  0.2319179  0.2319179
#> 59   0.000000  0.0000000  0.0000000  0.1159589  0.1159589  0.1159589
tail(count)
#>    Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 54         49         47         88         73         73         73
#> 55         49         47         11         11         11         11
#> 56         36         56        106         91         91         91
#> 57          0          0         52         28         28         28
#> 58          0          1         20         16         16         16
#> 59          0          0          0          8          8          8

## get intron structrue
intron <- structure(rice.bg)$intron
intron[intron$id %in% rownames(score)]
#> GRanges object with 16 ranges and 2 metadata columns:
#>        seqnames          ranges strand |        id transcripts
#>           <Rle>       <IRanges>  <Rle> | <integer> <character>
#>    [1]     Chr1 1172688-1173213      - |        43       15:19
#>    [2]     Chr1 1173416-1173795      - |        44       15:19
#>    [3]     Chr1 1173877-1173965      - |        45       15:19
#>    [4]     Chr1 1174170-1174670      - |        46       15:19
#>    [5]     Chr1 1175175-1176065      - |        48       15:19
#>    ...      ...             ...    ... .       ...         ...
#>   [12]     Chr1 1179011-1179226      - |        55       15:16
#>   [13]     Chr1 1174881-1174952      - |        56       16:19
#>   [14]     Chr1 1179011-1179134      - |        57          17
#>   [15]     Chr1 1179011-1179110      - |        58          18
#>   [16]     Chr1 1179011-1179106      - |        59          19
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Calculate 3S scores from ballgown objects for all genes and return a list of two elelments: "score' is matrix of intron 3S scores with intron rows and sample columns and "intron" is a GRanges object of intron structure.

data(rice.bg)
rice.bg
#> ballgown instance with 33 transcripts and 6 samples

splice <- spliceGenome(rice.bg, gene.select = NA, intron.select = NA)
#> ---Calculate gene-level read coverage:
#>   10 genes selected.
#> ---Extract intron-level read ucount:
#>   81 introns in 9 genes selected.
names(splice)
#> [1] "score"  "intron"

head(splice$score)
#>   Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 1  1.4852698  1.6132865   1.934796   1.834662   1.834662   1.834662
#> 2  1.1580070  1.5164893   1.924559   1.726741   1.726741   1.726741
#> 3  1.2838773  1.7423494   2.129300   2.309516   2.309516   2.309516
#> 4  1.8377067  1.3551606   1.904085   2.050505   2.050505   2.050505
#> 5  0.9817885  0.9679719   1.289864   1.446146   1.446146   1.446146
#> 6  1.1076589  1.1615663   1.637923   1.661988   1.661988   1.661988
splice$intron
#> GRanges object with 81 ranges and 3 metadata columns:
#>      seqnames          ranges strand |        id   transcripts     gene_id
#>         <Rle>       <IRanges>  <Rle> | <integer>   <character> <character>
#>    1     Chr1 1146321-1146479      - |         1           1:2   MSTRG.177
#>    2     Chr1 1146612-1147484      - |         2           1:2   MSTRG.177
#>    3     Chr1 1147563-1148021      - |         3           1:3   MSTRG.177
#>    4     Chr1 1148200-1148268      - |         4           1:3   MSTRG.177
#>    5     Chr1 1148442-1148530      - |         5           1:3   MSTRG.177
#>   ..      ...             ...    ... .       ...           ...         ...
#>   77     Chr1 1187243-1187436      - |        77 c(28, 29, 32)   MSTRG.186
#>   78     Chr1 1189347-1190773      - |        78         29:30   MSTRG.186
#>   79     Chr1 1190863-1190978      - |        79         29:30   MSTRG.186
#>   80     Chr1 1189347-1189914      - |        80         31:33   MSTRG.186
#>   81     Chr1 1189961-1190037      - |        81            33   MSTRG.186
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths

Find bimodal distrubition features and divide the samples into 2 groups by k-means clustering and return a matrix with feature rows and sample columns.

data(rice.bg)
score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score")
score <- round(score, 2)
as <- BMfinder(score, cores = 1)  # 4 bimodal distrubition features found
#> Warning in BMfinder(score, cores = 1): sample size < 30, the result should be
#> further tested!
#> ---BMfinder: 16 features * 6 samples
#> ---Completed:  a total of 4 bimodal distrubition features found!

## compare
as
#>    Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 55          2          2          1          1          1          1
#> 57          1          1          2          2          2          2
#> 58          1          1          2          2          2          2
#> 59          1          1          1          2          2          2
score[rownames(score) %in% rownames(as), ]
#>    Sample_027 Sample_042 Sample_102 Sample_137 Sample_237 Sample_272
#> 55       1.07       1.10       0.12       0.16       0.16       0.16
#> 57       0.00       0.00       0.58       0.41       0.41       0.41
#> 58       0.00       0.02       0.22       0.23       0.23       0.23
#> 59       0.00       0.00       0.00       0.12       0.12       0.12

Visualization of read coverage, splicing information and gene information in a gene region. This function is a wrapper of getDepth, getGeneinfo, spliceGene, plotBedgraph and plotGenes.

data(rice.bg)
rice.bg
#> ballgown instance with 33 transcripts and 6 samples
samples <- paste("Sample", c("027", "102", "237"), sep = "_")
bam.dir <- system.file("extdata", package = "vasp")

## plot the whole gene region without junction lables
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.text = FALSE, bheight = 0.2)
#> [1] "yes"

## plot the alternative splicing region with junction splicing scores
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", start = 1179000)
#> [1] "yes"

If the bam files are provided (bam.dir is not NA), the read depth for each sample is plotted. Otherwise (bam.dir=NA), the coserved exons of the samples are displayed by rectangles (an example is the figure in 4. Quick start). And by default (junc.type = 'score', junc.text = TRUE), the junctions (represented by arcs) are labeled with splicing scores. You can change the argument junc.text = FALSE to unlabel the junctions or change the argument junc.type = 'count' to label with junction read counts.

splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.type = 'count', start = 1179000)
#> [1] "yes"

There are other more options to modify the plot, please see the function ?splicePlot for details.

sessionInfo()
#> R version 3.6.1 (2019-07-05)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] Sushi_1.24.0    biomaRt_2.42.0  zoo_1.8-7       vasp_2.0.1     
#> [5] ballgown_2.18.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Biobase_2.46.0              httr_1.4.1                 
#>  [3] bit64_0.9-7                 splines_3.6.1              
#>  [5] assertthat_0.2.1            askpass_1.1                
#>  [7] stats4_3.6.1                BiocFileCache_1.10.2       
#>  [9] blob_1.2.1                  GenomeInfoDbData_1.2.2     
#> [11] Rsamtools_2.2.2             yaml_2.2.1                 
#> [13] progress_1.2.2              pillar_1.4.3               
#> [15] RSQLite_2.2.0               lattice_0.20-38            
#> [17] glue_1.3.1                  limma_3.42.2               
#> [19] digest_0.6.24               GenomicRanges_1.38.0       
#> [21] RColorBrewer_1.1-2          XVector_0.26.0             
#> [23] htmltools_0.4.0             Matrix_1.2-18              
#> [25] XML_3.99-0.3                pkgconfig_2.0.3            
#> [27] genefilter_1.68.0           zlibbioc_1.32.0            
#> [29] purrr_0.3.3                 xtable_1.8-4               
#> [31] BiocParallel_1.20.1         tibble_2.1.3               
#> [33] openssl_1.4.1               annotate_1.64.0            
#> [35] mgcv_1.8-31                 IRanges_2.20.2             
#> [37] SummarizedExperiment_1.16.1 BiocGenerics_0.32.0        
#> [39] survival_3.1-8              magrittr_1.5               
#> [41] crayon_1.3.4                memoise_1.1.0              
#> [43] evaluate_0.14               nlme_3.1-143               
#> [45] tools_3.6.1                 prettyunits_1.1.1          
#> [47] hms_0.5.3                   matrixStats_0.55.0         
#> [49] stringr_1.4.0               S4Vectors_0.24.3           
#> [51] cluster_2.1.0               DelayedArray_0.12.2        
#> [53] AnnotationDbi_1.48.0        Biostrings_2.54.0          
#> [55] compiler_3.6.1              GenomeInfoDb_1.22.0        
#> [57] rlang_0.4.4                 grid_3.6.1                 
#> [59] RCurl_1.98-1.1              rappdirs_0.3.1             
#> [61] bitops_1.0-6                rmarkdown_2.1              
#> [63] curl_4.3                    DBI_1.1.0                  
#> [65] R6_2.4.1                    GenomicAlignments_1.22.1   
#> [67] knitr_1.28                  dplyr_0.8.4                
#> [69] rtracklayer_1.46.0          bit_1.1-15.2               
#> [71] stringi_1.4.6               parallel_3.6.1             
#> [73] sva_3.34.0                  Rcpp_1.0.3                 
#> [75] vctrs_0.2.3                 tidyselect_1.0.0           
#> [77] dbplyr_1.4.2                xfun_0.12

yuhuihui2011/vasp_test documentation built on March 5, 2020, 12:50 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

yuhuihui2011/vasp_test
Quantification and Visulization of Variations of Splicing in Population

vignettes/vasp.md
In yuhuihui2011/vasp_test: Quantification and Visulization of Variations of Splicing in Population

vasp/VaSP: Quantification and Visulization of Variations of Splicing in Population

1. Introduction

2. Installation

3. Data input

4. Quick start

5. Functions

5.1 getDepth

5.2 getGeneinfo

5.3 spliceGene

5.4 spliceGenome

5.5 BMfinder

5.6 spliceplot

6. Session Information

R Package Documentation

Browse R Packages

We want your feedback!

yuhuihui2011/vasp_test Quantification and Visulization of Variations of Splicing in Population

vignettes/vasp.md In yuhuihui2011/vasp_test: Quantification and Visulization of Variations of Splicing in Population

vasp/VaSP: Quantification and Visulization of Variations of Splicing in Population

1. Introduction

2. Installation

3. Data input

4. Quick start

5. Functions

5.1 getDepth

5.2 getGeneinfo

5.3 spliceGene

5.4 spliceGenome

5.5 BMfinder

5.6 spliceplot

6. Session Information

R Package Documentation

Browse R Packages

We want your feedback!

yuhuihui2011/vasp_test
Quantification and Visulization of Variations of Splicing in Population

vignettes/vasp.md
In yuhuihui2011/vasp_test: Quantification and Visulization of Variations of Splicing in Population