InPAS Vignette
In InPAS: InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data

suppressPackageStartupMessages({
    library(InPAS)
    library(BSgenome.Mmusculus.UCSC.mm10)
    library(TxDb.Mmusculus.UCSC.mm10.knownGene)
    library(org.Hs.eg.db)
    library(cleanUpdTSeq)
})

Introduction

Alternative polyadenylation (APA) is one of the most important post-transcriptional regulation mechanisms which is prevalent in human genes. Like alternative splicing, APA can create proteome diversity. In addition, it defines 3' UTR and results in altered expression of the gene. It is a tightly controlled process and mis-regulation of APA can lead to pathological effects to the cells, such as uncontrolled cell cycle and growth. Although several high throughput sequencing methods have been developed, there are still limited data available.

RNA-seq has become one of the most commonly used methods for quantifying genome-wide gene expression. There are massive RNA-seq datasets available in public databases such as GEO and TCGA. For this reason, we develop the InPAS algorithm for identifying APA from conventional RNA-seq data.

The workflow for InPAS is:

Calculate coverage from BEDGraph files
Identify putative pA sites based on the coverage changes
Filter putative pA sites using cleanUpdTseq to remove false sites due to internal-priming.
Estimate alternative pA usage using the coverage for the predicted 3' UTR
Identify differential usage of APA in different samples leveraging Limma

How to

First, load the required libraries InPAS, species specific BSgenome, and TxDb as follows.

library(InPAS)
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
path <- file.path(find.package("InPAS"), "extdata")

Next, prepare annotation using the function utr3Annotation with a TxDb and org annotation. Please note that the 3' UTR annotation prepared by utr3Annotation includes the gaps to the next annotated region.

library(org.Hs.eg.db)
samplefile <- system.file("extdata", "hg19_knownGene_sample.sqlite",
            package="GenomicFeatures")
txdb <- loadDb(samplefile)
utr3.sample.anno <- utr3Annotation(txdb=txdb, 
                   orgDbSYMBOL="org.Hs.egSYMBOL")
utr3.sample.anno

Users can also directly load the 3' UTR annotation included in this package for mm10 and hg19. Here we show how to load the pre-built mm10 3' UTR annotation file.

##step1 annotation
# load from dataset
data(utr3.mm10)

For coverage calculation, alignment files in BEDGraph format are required, which can be generated from BAM files using the genomecov tool in bedtools with parameter: -bg -split. Potential pA sites identified from the coverage data can be filtered/adjusted using the classifier provided by cleanUpdTseq. The following scripts illustrate the function calls needed to perform the complete analysis using InPAS.

load(file.path(path, "polyA.rds"))
library(cleanUpdTSeq)
data(classifier)
bedgraphs <- c(file.path(path, "Baf3.extract.bedgraph"), 
               file.path(path, "UM15.extract.bedgraph"))
hugeData <- FALSE
##step1 Calculate coverage
coverage <- coverageFromBedGraph(bedgraphs, 
                                 tags=c("Baf3", "UM15"),
                                 genome=BSgenome.Mmusculus.UCSC.mm10,
                                 hugeData=hugeData)

## we hope the coverage rate of should be greater than 0.75 in the expressed gene.
## which is used because the demo data is a subset of genome.
coverageRate(coverage=coverage,
             txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
             genome=BSgenome.Mmusculus.UCSC.mm10,
             which = GRanges("chr6", ranges=IRanges(98013000, 140678000)))

##step2 Predict cleavage sites
CPs <- CPsites(coverage=coverage, 
               genome=BSgenome.Mmusculus.UCSC.mm10,
               utr3=utr3.mm10, 
               search_point_START=200, 
               cutEnd=.2, 
               long_coverage_threshold=3,
               background="10K",
               txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
               PolyA_PWM=pwm, 
               classifier=classifier,
               shift_range=50,
               step=10)
head(CPs)
##step3 Estimate 3UTR usage
res <- testUsage(CPsites=CPs, 
                 coverage=coverage, 
                 genome=BSgenome.Mmusculus.UCSC.mm10,
                 utr3=utr3.mm10, 
                 method="fisher.exact",
                 gp1="Baf3", gp2="UM15")

##step4 view the results
as(res, "GRanges") 

filterRes(res, gp1="Baf3", gp2="UM15",
          background_coverage_threshold=3,
          adj.P.Val_cutoff=0.05, 
          dPDUI_cutoff=0.3, 
          PDUI_logFC_cutoff=0.59)

The steps described above can be done in one function call.

if(interactive()){
    res <- inPAS(bedgraphs=bedgraphs, tags=c("Baf3", "UM15"), 
              genome=BSgenome.Mmusculus.UCSC.mm10, 
              utr3=utr3.mm10, gp1="Baf3", gp2="UM15",
              txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
              search_point_START=200,
              short_coverage_threshold=15,
              long_coverage_threshold=3, 
              cutStart=0, cutEnd=.2, 
              hugeData=FALSE,
              shift_range=50,
              PolyA_PWM=pwm, classifier=classifier,
              method="fisher.exact",
              adj.P.Val_cutoff=0.05,
              dPDUI_cutoff=0.3, 
              PDUI_logFC_cutoff=0.59)
}

InPAS can also handle single group data.

inPAS(bedgraphs=bedgraphs[1], tags=c("Baf3"), 
      genome=BSgenome.Mmusculus.UCSC.mm10, 
      utr3=utr3.mm10, gp1="Baf3", gp2=NULL,
      txdb=TxDb.Mmusculus.UCSC.mm10.knownGene,
      search_point_START=200,
      short_coverage_threshold=15,
      long_coverage_threshold=3, 
      cutStart=0, cutEnd=.2, 
      hugeData=FALSE, 
      PolyA_PWM=pwm, classifier=classifier,
      method="singleSample",
      adj.P.Val_cutoff=0.05,
      step=10)

Session Info

sessionInfo()

Any scripts or data that you put into this service are public.

InPAS documentation built on Nov. 8, 2020, 5:03 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

InPAS
InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data

InPAS Vignette
In InPAS: InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data

Introduction

How to

Session Info

Try the InPAS package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

InPAS InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data

InPAS Vignette In InPAS: InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data

Introduction

How to

Session Info

Try the InPAS package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

InPAS
InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data

InPAS Vignette
In InPAS: InPAS: a bioconductor package for the identification of novel alternative PolyAdenylation Sites (PAS) using RNA-seq data