options(tinytex.verbose = TRUE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

1. Introduction

VaSP is an R package for discovery of genome-wide variable splicing events from short-read RNA-seq data. Based on R package Ballgown, VaSP calculates Single Splicing Strength (3S) score of any intron by the junction count normalized by the gene-level average read coverage (score=junction count/gene-level average read coverage). The 3S scores can be used for further analysis, such as differential splicing analysis between two sample groups and sQTL (splicing Quantitative Trait Locus) identification in a large population. The VaSP package provides a function to find large-effect differential splicing events without the need of genotypic information in an inbred plant population, so called genotype-specific splicing (GSS). Integrated with functions from R package Sushi, VaSP package also provides function to visualization of gene splicing information for publication-quality multi-panel figures.

2. Installation

Start R and run:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("vasp")

3. Data input

Users need to follow the manual of R package Ballgown (https://github.com/alyssafrazee/ballgown) to creat a ballgown object as an input for the VaSP package. See ?ballgown for detailed information on creating Ballgown objects. The object can be stored in a .RDate file by save() .

library(ballgown)
?ballgown
path<-system.file('extdata', package='vasp')
rice.bg<-ballgown(samples = list.dirs(path = path,recursive = F) )

4. Quick start

Calculate 3S (Single Splicing Strength) scores, find GSS (genotype-specific splicing) events and display the splicing information.

library(vasp)
data(rice.bg)
?rice.bg
rice.bg
score<-spliceGene(rice.bg, gene="MSTRG.183", junc.type = "score")
tail(round(score,2),2)
gss <- BMfinder(score, cores = 1) 
gss
gss_intron<-structure(rice.bg)$intron
(gss_intron<-gss_intron[gss_intron$id%in%rownames(gss)])
range(gss_intron)
splicePlot(rice.bg,gene='MSTRG.183',samples = sampleNames(rice.bg)[c(1,3,5)],
           start = 1179000, end = 1179300)

5. Functions

Currently, there are 6 functions in VaSP:
getDepth: Get read depth from a BAM file (in bedgraph format)
getGeneinfo: Get gene informaton from a ballgown object
spliceGene: Calculate 3S scores for one gene
spliceGenome: Calculate genome-wide splicing scores
BMfinder: Discover bimodal distrubition features
splicePlot: Visualization of read coverage, splicing information and gene information in a gene region

5.1 getDepth

Get read depth from a BAM file (in bedgraph format) and return a data.frame in bedgraph file format which can be used as input for plotBedgraph in the SuShi package.

path <- system.file("extdata", package = "vasp")
bam_files <- list.files(path, "*.bam$")
bam_files

depth <- getDepth(file.path(path, bam_files[1]), "Chr1", start = 1171800, end = 1179400)
head(depth)

library(Sushi)
par(mar=c(3,5,1,1))
plotBedgraph(depth, "Chr1", chromstart = 1171800, chromend = 1179400, yaxt = "s")
mtext("Depth", side = 2, line = 2.5, cex = 1.2, font = 2)
labelgenome("Chr1", 1171800, 1179400, side = 1, scipen = 20, n = 5, scale = "Kb")

5.2 getGeneinfo

Get gene informaton from a ballgown object by genes or by genomic regions and return a data.frame in bed-like file format that can be used as input for plotGenes in the SuShi package

data(rice.bg)
unique(geneIDs(rice.bg))

gene_id <- c("MSTRG.181", "MSTRG.182", "MSTRG.183")
geneinfo <- getGeneinfo(genes = gene_id, rice.bg)
trans <- table(geneinfo$name)  # show how many exons each transcript has
trans

library(Sushi)

chrom            = geneinfo$chrom[1]
chromstart       = min(geneinfo$start) - 1e3
chromend         = max(geneinfo$stop) + 1e3
color            = rep(SushiColors(2)(length(trans)), trans)

par(mar=c(3,1,1,1))
p<-plotGenes(geneinfo, chrom, chromstart, chromend, col = color, bheight = 0.2, 
          bentline = FALSE, plotgenetype = "arrow", labeloffset = 0.5)
labelgenome(chrom, chromstart , chromend, side = 1, n = 5, scale = "Kb")

5.3 spliceGene

Calculate 3S Scores from ballgown object for a given gene. This function can only calculate one gene. Please use function spliceGenome to obtain genome-wide 3S scores.

data(rice.bg)
rice.bg
head(geneIDs(rice.bg))

score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score")
count <- spliceGene(rice.bg, "MSTRG.183", junc.type = "count")

## compare
tail(score)
tail(count)

## get intron structrue
intron <- structure(rice.bg)$intron
intron[intron$id %in% rownames(score)]

5.4 spliceGenome

Calculate 3S scores from ballgown objects for all genes and return a list of two elelments: "score' is matrix of intron 3S scores with intron rows and sample columns and "intron" is a GRanges object of intron structure.

data(rice.bg)
rice.bg

splice <- spliceGenome(rice.bg, gene.select = NA, intron.select = NA)
names(splice)

head(splice$score)
splice$intron

5.5 BMfinder

Find bimodal distrubition features and divide the samples into 2 groups by k-means clustering and return a matrix with feature rows and sample columns.

data(rice.bg)
score <- spliceGene(rice.bg, "MSTRG.183", junc.type = "score")
score <- round(score, 2)
as <- BMfinder(score, cores = 1)  # 4 bimodal distrubition features found

## compare
as
score[rownames(score) %in% rownames(as), ]

5.6 spliceplot

Visualization of read coverage, splicing information and gene information in a gene region. This function is a wrapper of getDepth, getGeneinfo, spliceGene, plotBedgraph and plotGenes.

data(rice.bg)
rice.bg
samples <- paste("Sample", c("027", "102", "237"), sep = "_")
bam.dir <- system.file("extdata", package = "vasp")

## plot the whole gene region without junction lables
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.text = FALSE, bheight = 0.2)

## plot the alternative splicing region with junction splicing scores
splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", start = 1179000)

If the bam files are provided (bam.dir is not NA), the read depth for each sample is plotted. Otherwise (bam.dir=NA), the coserved exons of the samples are displayed by rectangles (an example is the figure in 4. Quick start). And by default (junc.type = 'score', junc.text = TRUE), the junctions (represented by arcs) are labeled with splicing scores. You can change the argument junc.text = FALSE to unlabel the junctions or change the argument junc.type = 'count' to label with junction read counts.

splicePlot(rice.bg, samples, bam.dir, gene = "MSTRG.183", junc.type = 'count', start = 1179000)

There are other more options to modify the plot, please see the function ?splicePlot for details.

6. Session Information

sessionInfo()


yuhuihui2011/vasp_test documentation built on March 5, 2020, 12:50 a.m.