annotatePAC: Annotate PACs

annotatePACR Documentation

Annotate PACs


annotatePAC annotates PACs with a given GFF annotation.


annotatePAC(pac, aGFF, verbose = FALSE)



a data frame with [chr/coord/strand] or a PACdataset.


specify a genome annotation, see parseGenomeAnnotation().


TRUE to show message.


If after annotation, the PAC number is changed, it will raise a warning but not error. In such case, you may need to check aGFF. PAs in gene named 'character(0)' will also be removed.


A PACdataset with annotation if pac is PACdataset, or a data frame if pac is a data frame. If a PACdataset is returned, the original @anno columns in pac are remained (duplicated annotation columns are removed); duplicated rows with the same chr/strand/coord will be removed; will add @supp$stopCodon to store the stopCodon of all transcripts, which can be used for ext3UTRPACds().

The following columns are added for annotation: "ftr", "ftr_start","ftr_end","gene","biotype",","gene_start","gene_end","gene_stop_codon", "upstream_id","upstream_start","upstream_end","downstream_id","downstram_start", "downstream_end","three_UTR_length","three_extend".

  • ftr: the type of feature about coord of PAC, including 3UTR, 5UTR, CDS, intron, exon, and intergenic;

  • ftr_start: start position of the "ftr";

  • ftr_end: end position of the "ftr";

  • gene: the gene name of the feature;

  • gene_start: start position of the gene;

  • gene_end: end position of the gene;

  • gene_type: the classification of the gene, such as protein_coding, long non-conding RNA (lncRNA), non-coding RNA (ncRNA), tRNA and so on;

  • gene_stop_codon: the end position of stop codon for protein coding gene or transcript, is the end postion of gene for ncRNA;

  • upstream_id: the upstream gene name of poly(A) sites that are located in intergenic;

  • upstream_start: the start position of 'upstream_id';

  • upstream_end: the end position of 'upstream_id';

  • downstream_id same as upstream_id expected for downstream gene;

  • downstream_start see upstream_start;

  • downstream_end see upstream_end;

  • three_UTR_length: the length of 3'UTR, which is equal to poly(A) sites minus the end position of stop codon of gene. This is only for poly(A) site located in 3'UTR or intergenic;

  • three_extend: used to identify the poly(A) site in 3'UTR extension, which is equal to poly(A) sites minus end positon of its upstream gene. This is only for poly(A) site located in intergenic;

See Also

parseGenomeAnnotation to get an gff annotation object.

Other PACdataset functions: PACdataset-class, PACds, annotateByPAS(), createPACdataset(), get3UTRAPAds(), get3UTRAPApd(), length(), makeExamplePACds(), mergePACds(), normalizePACds(), plotPACdsStat(), rbind(), readPACds(), removePACdsIP(), scPACds, subscript_operator, summary(), writePACds()


## Not run: 
## Because the demo data already contain the annotation,
## here we removed the annotation columns first.
PACds1=annotatePAC(PACds1, gff)
## annotate PAC data with a gff3 file
newpac=annotatePAC(pac, aGFF='mm10.gff3')
## Annotate a PACdataset
newpac=annotatePAC(pacds, aGFF=TxDb.Mmusculus.UCSC.mm10.ensGene)
## Annotate a PAC data frame with existing gff.rda from parseGenomeAnnotation()
newpac=annotatePAC(pac, 'mm10.rda')

## End(Not run)

BMILAB/movAPA documentation built on Jan. 3, 2024, 11:09 p.m.