assignTSSByCage: Input a txdb and add a 5' leader for each transcript, that...

View source: R/cage_annotations.R

assignTSSByCageR Documentation

Input a txdb and add a 5' leader for each transcript, that does not have one.

Description

For all cds in txdb, that does not have a 5' leader: Start at 1 base upstream of cds and use CAGE, to assign leader start. All these leaders will be 1 exon based, if you really want exon splicings, you can use exon prediction tools, or run sequencing experiments.

Usage

assignTSSByCage(
  txdb,
  cage,
  extension = 1000,
  filterValue = 1,
  restrictUpstreamToTx = FALSE,
  removeUnused = FALSE,
  preCleanup = TRUE,
  pseudoLength = 1
)

Arguments

txdb

a TxDb file, a path to one of: (.gtf ,.gff, .gff2, .gff2, .db or .sqlite) or an ORFik experiment

cage

Either a filePath for the CageSeq file as .bed .bam or .wig, with possible compressions (".gzip", ".gz", ".bgz"), or already loaded CageSeq peak data as GRanges or GAlignment. NOTE: If it is a .bam file, it will add a score column by running: convertToOneBasedRanges(cage, method = "5prime", addScoreColumn = TRUE) The score column is then number of replicates of read, if score column is something else, like read length, set the score column to NULL first.

extension

The maximum number of basses upstream of the TSS to search for CageSeq peak.

filterValue

The minimum number of reads on cage position, for it to be counted as possible new tss. (represented in score column in CageSeq data) If you already filtered, set it to 0.

restrictUpstreamToTx

a logical (FALSE). If TRUE: restrict leaders to not extend closer than 5 bases from closest upstream leader, set this to TRUE.

removeUnused

logical (FALSE), if False: (standard is to set them to original annotation), If TRUE: remove leaders that did not have any cage support.

preCleanup

logical (TRUE), if TRUE, remove all reads in region (-5:-1, 1:5) of all original tss in leaders. This is to keep original TSS if it is only +/- 5 bases from the original.

pseudoLength

a numeric, default 1. Add a pseudo length for all the UTRs. Will not extend a leader if it would make it go outside the defined seqlengths of the chromosome (for non circular chromosomes), or extending closer than 50 nucleotides to upstream cds. So this length is not guaranteed for all!

Details

Given a TxDb object, reassign the start site per transcript using max peaks from CageSeq data. A max peak is defined as new TSS if it is within boundary of 5' leader range, specified by 'extension' in bp. A max peak must also be higher than minimum CageSeq peak cutoff specified in 'filterValue'. The new TSS will then be the positioned where the cage read (with highest read count in the interval). If no CAGE supports a leader, the width will be set to 1 base.

Value

a TxDb obect of reassigned transcripts

See Also

Other CAGE: reassignTSSbyCage(), reassignTxDbByCage()

Examples

txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite",
 package = "GenomicFeatures")
cagePath <- system.file("extdata", "cage-seq-heart.bed.bgz",
 package = "ORFik")

## Not run: 
  assignTSSByCage(txdbFile, cagePath)
  #Minimum 20 cage tags for new TSS
  assignTSSByCage(txdbFile, cagePath, filterValue = 20)
  # Create pseudo leaders for the ones without hits
  assignTSSByCage(txdbFile, cagePath, pseudoLength = 100)
  # Create only pseudo leaders (in example 2 leaders are added)
  assignTSSByCage(txdbFile, cage = NULL, pseudoLength = 100)

## End(Not run)

Roleren/ORFik documentation built on Dec. 18, 2024, 11:39 p.m.