computeFeaturesCage: Get all possible features in ORFik
In ORFik: Open Reading Frames in Genomics

Description Usage Arguments Details Value See Also Examples

If you have a txdb with correctly reassigned transcripts, use: [computeFeatures()]

computeFeaturesCage(
  grl,
  RFP,
  RNA = NULL,
  Gtf = NULL,
  tx = NULL,
  fiveUTRs = NULL,
  cds = NULL,
  threeUTRs = NULL,
  faFile = NULL,
  riboStart = 26,
  riboStop = 34,
  sequenceFeatures = TRUE,
  uorfFeatures = TRUE,
  grl.is.sorted = FALSE,
  weight.RFP = 1L,
  weight.RNA = 1L
)

`grl`	a `GRangesList` object with usually ORFs, but can also be either leaders, cds', 3' utrs, etc. This is the regions you want to score.
`RFP`	RiboSeq reads as `GAlignments` , `GRanges` or `GRangesList` object
`RNA`	RnaSeq reads as `GAlignments` , `GRanges` or `GRangesList` object
`Gtf`	a TxDb object of a gtf file or path to gtf, gff .sqlite etc.
`tx`	a GrangesList of transcripts, normally called from: exonsBy(Gtf, by = "tx", use.names = T) only add this if you are not including Gtf file If you are using CAGE, you do not need to reassign these to the cage peaks, it will do it for you.
`fiveUTRs`	fiveUTRs as GRangesList, if you used cage-data to extend 5' utrs, remember to input CAGE assigned version and not original!
`cds`	a GRangesList of coding sequences
`threeUTRs`	a GrangesList of transcript 3' utrs, normally called from: threeUTRsByTranscript(Gtf, use.names = T)
`faFile`	a path to fasta indexed genome, an open `FaFile`, a BSgenome, or path to ORFik `experiment` with valid genome.
`riboStart`	usually 26, the start of the floss interval, see ?floss
`riboStop`	usually 34, the end of the floss interval
`sequenceFeatures`	a logical, default TRUE, include all sequence features, that is: Kozak, fractionLengths, distORFCDS, isInFrame, isOverlapping and rankInTx. uorfFeatures = FALSE will remove the 4 last.
`uorfFeatures`	a logical, default TRUE, include all uORF sequence features, that is: distORFCDS, isInFrame, isOverlapping and rankInTx
`grl.is.sorted`	logical (F), a speed up if you know argument grl is sorted, set this to TRUE.
`weight.RFP`	a vector (default: 1L). Can also be character name of column in RFP. As in translationalEff(weight = "score") for: GRanges("chr1", 1, "+", score = 5), would mean score column tells that this alignment region was found 5 times.
`weight.RNA`	Same as weightRFP but for RNA weights. (default: 1L)

A specialized version if you don't have a correct txdb, for example with CAGE reassigned leaders while txdb is not updated. It is 2x faster for tested data. The point of this function is to give you the ability to input transcript etc directly into the function, and not load them from txdb. Each feature have a link to an article describing feature, try ?floss

a data.table with scores, each column is one score type, name of columns are the names of the scores, i.g [floss()] or [fpkm()]

Other features: computeFeatures(), countOverlapsW(), disengagementScore(), distToCds(), distToTSS(), entropy(), floss(), fpkm_calc(), fpkm(), fractionLength(), initiationScore(), insideOutsideORF(), isInFrame(), isOverlapping(), kozakSequenceScore(), orfScore(), rankOrder(), ribosomeReleaseScore(), ribosomeStallingScore(), startRegionCoverage(), startRegion(), stopRegion(), subsetCoverage(), translationalEff()

 # a small example without cage-seq data:
 # we will find ORFs in the 5' utrs
 # and then calculate features on them
 
 if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19")) {
  library(GenomicFeatures)
  # Get the gtf txdb file
  txdbFile <- system.file("extdata", "hg19_knownGene_sample.sqlite",
  package = "GenomicFeatures")
  txdb <- loadDb(txdbFile)

  # Extract sequences of fiveUTRs.
  fiveUTRs <- fiveUTRsByTranscript(txdb, use.names = TRUE)[1:10]
  faFile <- BSgenome.Hsapiens.UCSC.hg19::Hsapiens
  tx_seqs <- extractTranscriptSeqs(faFile, fiveUTRs)

  # Find all ORFs on those transcripts and get their genomic coordinates
  fiveUTR_ORFs <- findMapORFs(fiveUTRs, tx_seqs)
  unlistedORFs <- unlistGrl(fiveUTR_ORFs)
  # group GRanges by ORFs instead of Transcripts
  fiveUTR_ORFs <- groupGRangesBy(unlistedORFs, unlistedORFs$names)

  # make some toy ribo seq and rna seq data
  starts <- unlistGrl(ORFik:::firstExonPerGroup(fiveUTR_ORFs))
  RFP <- promoters(starts, upstream = 0, downstream = 1)
  score(RFP) <- rep(29, length(RFP)) # the original read widths

  # set RNA seq to duplicate transcripts
  RNA <- unlistGrl(exonsBy(txdb, by = "tx", use.names = TRUE))

  #ORFik:::computeFeaturesCage(grl = fiveUTR_ORFs, RFP = RFP,
  #  RNA = RNA, Gtf = txdb, faFile = faFile)

}
# See vignettes for more examples