annotateByPAS: Annotate a PACdataset with polyA signals
In BMILAB/movAPA: movAPA: Modeling and Visualization of Dynamics of Alternative PolyAdenylation

annotateByPAS

R Documentation

Annotate a PACdataset with polyA signals

Description

annotateByPAS returns distance-to-PAC of give pattern(s) within a given range of PAC.

Usage

annotateByPAS(
  pacds,
  bsgenome,
  grams,
  from,
  to,
  priority = NULL,
  label = NULL,
  chrCheck = TRUE
)

Arguments

`pacds`	a query PACdataset.
`bsgenome`	chrmosome fasta files, an object of BSgenome or FaFile, see faFromPACds().
`grams`	a character vector to specify a gram like AATAAA, or v1 (AATAAA's variants), or multiple grams. grams can be not equal length, like c('AATAAA','AAATTT','CCCT')
`from`	to specify the range near PACs, PAC is the 0 position. e.g., from=-50, to=-1 to subset 50 nt (PAC is the 0 or 51st position, upstream 50nt of PAC), see faFromPACds().
`to`	similar to from.
`priority`	a numeric vector to set the priority and subgroups of grams if grams has multiple elements, default is NULL. For example, if grams=c('AATAAA','ATTAAA','AAAAAA','TTTAT), priority=(1,2,3,3), then will first search for AATAAA, if not exists, then for ATTAAA, then the remaining AAAAAA/TTTAT. If priority=NULL, then will treat all elements in grams as the same group (no priority).
`label`	a character to specify output column name. pacds will be added one or two columns (label_gram, _dist). If only one element in grams, then label could be NULL, then label=gram. Only label_dist will be ouput. If multiple elements in grams, then label should be specified. Then two columns (label_gram, _dist) will be added. The _gram column gives the gram that is closest to the PAC. _dist is the start position of a gram to a PAC.
`chrCheck`	if TRUE, then all chr in PACds should be in bsgenome, otherwise will ignore those non-consistent chr rows in PACds.

Details

This function is used to get polyA signals around PACs and calculates the min distance between the signal to PAC.

Value

A PACdataset with columns (label_gram (if multiple grams), _dist) added.

The _gram column is NA (no signal) or the gram closest to the PAC.

The _dist column is NA (no signal) or a integer denoting the min distance between a PAC and grams.

For example, given * is the PAC, then ...AATAAA*, dist=6; ...*AATAAA, dist=1.

Examples

## Not run: 
## First, load the reference genome sequences that are already represented as a BSgenome object.
library("BSgenome.Oryza.ENSEMBL.IRGSP1")
bsgenome <- BSgenome.Oryza.ENSEMBL.IRGSP1
data(PACds); pacds=PACds
## scan AATAAA upstream 50bp of PACs
test=annotateByPAS(pacds, bsgenome, grams='AATAAA', from=-50, to=-1, label=NULL)
summary(test@anno$AATAAA_dist)
## scan AATAAA's 1nt variants
test=annotateByPAS(pacds, bsgenome, grams='V1', from=-50, to=-1, label=NULL)
table(test@anno$V1_gram)
## scan custom grams
test=annotateByPAS(pacds, bsgenome, grams=c('AATAAA','ATTAAA','GATAAA','AAAA'),
                                     from=-50, to=-1, label='GRAM')
table(test@anno$GRAM_gram)
## scan with priority, scan AATAAA first,
## if no, then scan ATTAAA, if no, then scan the remaining grams.
test2=annotateByPAS(pacds, bsgenome, grams=c('AATAAA','ATTAAA','GATAAA','AAAA'),
                                      priority=c(1,2,3,3), from=-50, to=-1, label='GRAM')
table(test2@anno$GRAM_gram)
## only scan AATAAA, the number will be the same as test2's AATAAA
test3=annotateByPAS(pacds, bsgenome, grams=c('AATAAA'),
                                      priority=NULL, from=-50, to=-1, label='GRAM')
sum(!is.na(test3@anno$GRAM_dist))

## End(Not run)

BMILAB/movAPA documentation built on Jan. 3, 2024, 11:09 p.m.