annotateU12 | R Documentation |
Receives coordinates, a reference genome and PWMs of splice site of U12 and U2 type introns, and returns a data.frame with 2 columns. The first column shows wheather the corresponding sequences matches U12, U2 or both (U12/U2) consensus sequences (based on their score when fitting the PWMs). The second column shows whether the match is on positive strand or negative when fitting the PWMs to the sequences.
annotateU12(pwmU12U2=c(), pwmSsIndex=c(), referenceChr, referenceBegin,
referenceEnd, referenceIntronExon, intronExon='intron',
matchWindowRelativeUpstreamPos=c() , matchWindowRelativeDownstreamPos=c(),
minMatchScore='80%', refGenome='', setNaAs='U2', annotateU12Subtype=TRUE,
includeMatchScores=FALSE, ignoreHybrid=TRUE, filterReference)
pwmU12U2 |
A list containing position weight matrices of (in order): Donor site, branch
point, and acceptor site of U12-type introns, and donor site and acceptor site
of U2-type introns. If not provided, the information related to |
pwmSsIndex |
A list (or vector) that contains the column number in each element of
|
referenceChr |
Chromosome names of the references (e.g. introns). |
referenceBegin |
A vector that corresponds to the begin coordinates of the reference (e.g. introns). |
referenceEnd |
A vector that corresponds to the end coordinates of the reference (e.g.
introns). |
referenceIntronExon |
A vector with the same size as the |
intronExon |
Should be assigned either |
matchWindowRelativeUpstreamPos |
A vector the same size as the |
matchWindowRelativeDownstreamPos |
A vector the same size as the |
minMatchScore |
Min percentage match score, when scoring matching of a sequence to |
refGenome |
The reference genome; Object of class BSgenome. Use |
setNaAs |
Defines that if reference (e.g. intron) did not match any of U12 or U2 type introns based on the scores obtained from PWM what should the function return. If an intron was not proven to be U12 or U2 based on PWM scores it can be considered as U2-type since the U12 type introns constitute for about 1% of introns in human genome and they are muxh more conserved than the U2 type introns, hence the default is 'U2'; otherwise it is also possible to set it as NA or nan or 'U12/U2'. |
annotateU12Subtype |
Whether annotate the subtypes of the U12 type
Introns. The value is |
includeMatchScores |
If set as TRUE the final data frame result includes the PWM match scores (FALSE by default). |
ignoreHybrid |
Whether ignore the U12 hybrid subtypes, i.e. GT-AC and AT-AG (TRUE by default). |
filterReference |
Optional parameter that can be defined either as a GRanges or SummarizedExperiment object. If defined as the latter, the first 3 columns of the rowData must be: chr name, start and end of the coordinates. If the parameter is defined the introns/exon coordinates will be mapped against it and the intron type of all those that do not match will be set as NA. |
Data frame containing 3 columns representing (in order): intron type (U12, U2 or none), strand match indicating whether the PWM matches to the sequence (+ strand) or the reverese complement of the sequence (- strand) or none (NA), and the U12 subtype (GT-AG or AT-AC). If includeMatchScores is set as TRUE further columns that include the PWM match scores will also be included.
Ali Oghabian
buildSsTypePwms
.
# Improting genome
BSgenome.Hsapiens.UCSC.hg19 <-
BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
#Choosing subset of rows
ind<- 69:94
# Annotate U12 introns with strong U12 donor site, branch point
# and acceptor site from the u12 data in the package
annoU12<-
annotateU12(pwmU12U2=list(pwmU12db[[1]][,11:17],pwmU12db[[2]]
,pwmU12db[[3]][,38:40],pwmU12db[[4]][,11:17],
pwmU12db[[5]][,38:40]),
pwmSsIndex=list(indexDonU12=1, indexBpU12=1, indexAccU12=3,
indexDonU2=1, indexAccU2=3),
referenceChr=u12[ind,'chr'],
referenceBegin=u12[ind,'begin'],
referenceEnd=u12[ind,'end'],
referenceIntronExon=u12[ind,"int_ex"],
intronExon="intron",
matchWindowRelativeUpstreamPos=c(NA,-29,NA,NA,NA),
matchWindowRelativeDownstreamPos=c(NA,-9,NA,NA,NA),
minMatchScore=c(rep(paste(80,"%",sep=""),2), "60%",
paste(80,"%",sep=""), "60%"),
refGenome=BSgenome.Hsapiens.UCSC.hg19,
setNaAs="U2",
annotateU12Subtype=TRUE)
# How many U12 and U2 type introns with strong U12 donor sites,
# acceptor sites (and branch points for U12-type) are there?
table(annoU12[,1])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.