annotateByPAS | R Documentation |
annotateByPAS returns distance-to-PAC of give pattern(s) within a given range of PAC.
annotateByPAS(
pacds,
bsgenome,
grams,
from,
to,
priority = NULL,
label = NULL,
chrCheck = TRUE
)
pacds |
a query PACdataset. |
bsgenome |
chrmosome fasta files, an object of BSgenome or FaFile, see faFromPACds(). |
grams |
a character vector to specify a gram like AATAAA, or v1 (AATAAA's variants), or multiple grams. grams can be not equal length, like c('AATAAA','AAATTT','CCCT') |
from |
to specify the range near PACs, PAC is the 0 position. e.g., from=-50, to=-1 to subset 50 nt (PAC is the 0 or 51st position, upstream 50nt of PAC), see faFromPACds(). |
to |
similar to from. |
priority |
a numeric vector to set the priority and subgroups of grams if grams has multiple elements, default is NULL. For example, if grams=c('AATAAA','ATTAAA','AAAAAA','TTTAT), priority=(1,2,3,3), then will first search for AATAAA, if not exists, then for ATTAAA, then the remaining AAAAAA/TTTAT. If priority=NULL, then will treat all elements in grams as the same group (no priority). |
label |
a character to specify output column name. pacds will be added one or two columns (label_gram, _dist). If only one element in grams, then label could be NULL, then label=gram. Only label_dist will be ouput. If multiple elements in grams, then label should be specified. Then two columns (label_gram, _dist) will be added. The _gram column gives the gram that is closest to the PAC. _dist is the start position of a gram to a PAC. |
chrCheck |
if TRUE, then all chr in PACds should be in bsgenome, otherwise will ignore those non-consistent chr rows in PACds. |
This function is used to get polyA signals around PACs and calculates the min distance between the signal to PAC.
A PACdataset with columns (label_gram (if multiple grams), _dist) added.
The _gram column is NA (no signal) or the gram closest to the PAC.
The _dist column is NA (no signal) or a integer denoting the min distance between a PAC and grams.
For example, given * is the PAC, then ...AATAAA*, dist=6; ...*AATAAA, dist=1.
Other PACdataset functions:
PACdataset-class
,
PACds
,
annotatePAC()
,
createPACdataset()
,
get3UTRAPAds()
,
get3UTRAPApd()
,
length()
,
makeExamplePACds()
,
mergePACds()
,
normalizePACds()
,
plotPACdsStat()
,
rbind()
,
readPACds()
,
removePACdsIP()
,
scPACds
,
subscript_operator
,
summary()
,
writePACds()
Other APA signal functions:
faFromPACds()
,
getVarGrams()
,
kcount()
,
plotATCGforFAfile()
,
plotSeqLogo()
## Not run:
## First, load the reference genome sequences that are already represented as a BSgenome object.
library("BSgenome.Oryza.ENSEMBL.IRGSP1")
bsgenome <- BSgenome.Oryza.ENSEMBL.IRGSP1
data(PACds); pacds=PACds
## scan AATAAA upstream 50bp of PACs
test=annotateByPAS(pacds, bsgenome, grams='AATAAA', from=-50, to=-1, label=NULL)
summary(test@anno$AATAAA_dist)
## scan AATAAA's 1nt variants
test=annotateByPAS(pacds, bsgenome, grams='V1', from=-50, to=-1, label=NULL)
table(test@anno$V1_gram)
## scan custom grams
test=annotateByPAS(pacds, bsgenome, grams=c('AATAAA','ATTAAA','GATAAA','AAAA'),
from=-50, to=-1, label='GRAM')
table(test@anno$GRAM_gram)
## scan with priority, scan AATAAA first,
## if no, then scan ATTAAA, if no, then scan the remaining grams.
test2=annotateByPAS(pacds, bsgenome, grams=c('AATAAA','ATTAAA','GATAAA','AAAA'),
priority=c(1,2,3,3), from=-50, to=-1, label='GRAM')
table(test2@anno$GRAM_gram)
## only scan AATAAA, the number will be the same as test2's AATAAA
test3=annotateByPAS(pacds, bsgenome, grams=c('AATAAA'),
priority=NULL, from=-50, to=-1, label='GRAM')
sum(!is.na(test3@anno$GRAM_dist))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.