orfScore: Get ORFscore for a GRangesList of ORFs

orfScoreR Documentation

Get ORFscore for a GRangesList of ORFs

Description

ORFscore tries to check whether the first frame of the 3 possible frames in an ORF has more reads than second and third frame. IMPORTANT: Only use p-shifted libraries, see (detectRibosomeShifts). Else this score makes no sense.

Usage

orfScore(
  grl,
  RFP,
  is.sorted = FALSE,
  weight = "score",
  overlapGrl = NULL,
  coverage = NULL,
  stop3 = TRUE
)

Arguments

grl

a GRangesList of 5' utrs, CDS, transcripts, etc.

RFP

ribosomal footprints, given as GAlignments or GRanges object, must be already shifted and resized to the p-site. Requires a $size column with original read lengths.

is.sorted

logical (FALSE), is grl sorted. That is + strand groups in increasing ranges (1,2,3), and - strand groups in decreasing ranges (3,2,1)

weight

(default: 'score'), if defined a character name of valid meta column in subject. GRanges("chr1", 1, "+", score = 5), would mean score column tells that this alignment region was found 5 times. ORFik ofst, bedoc and .bedo files contains a score column like this. As do CAGEr CAGE files and many other package formats. You can also assign a score column manually.

overlapGrl

an integer, (default: NULL), if defined must be countOverlaps(grl, RFP), added for speed if you already have it

coverage

a data.table from coveragePerTiling of length same as 'grl' argument. Save time if you have already computed it.

stop3

logical, default TRUE. Stop if any input is of width < 3.

Details

Pseudocode: assume rff - is reads fraction in specific frame

ORFScore = log(rff1 + rff2 + rff3)

If rff2 or rff3 is bigger than rff1, negate the resulting value.

ORFScore[rff1Smaller] <- ORFScore[rff1Smaller] * -1

As result there is one value per ORF: - Positive values say that the first frame have the most reads, - zero values means it is uniform: (ORFscore between -2.5 and 2.5 can be considered close to uniform), - negative values say that the first frame does not have the most reads. NOTE non-pshifted reads: If reads are not of width 1, then a read from 1-4 on range of 1-4, will get scores frame1 = 2, frame2 = 1, frame3 = 1. What could be logical is that only the 5' end is important, so that only frame1 = 1, to get this, you first resize reads to 5'end only.

General NOTES: 1. p shifting is not exact, so some functional ORFs will get a bad ORF score.
2. If a score column is defined, it will use it as weights, set to weight = 1L if you don't have weight, and score column is something else. 3. If needed a test for significance and critical values, use chi-squared. There are 3 degrees of freedom (3 frames), so critical 0.05 (3-1 degrees of freedm = 2), value is: log2(6) = 2.58 see getWeights

Value

a data.table with 4 columns, the orfscore (ORFScores) and score of each of the 3 tiles (frame_zero_RP, frame_one_RP, frame_two_RP)

References

doi: 10.1002/embj.201488411

See Also

Other features: computeFeatures(), computeFeaturesCage(), countOverlapsW(), disengagementScore(), distToCds(), distToTSS(), entropy(), floss(), fpkm(), fpkm_calc(), fractionLength(), initiationScore(), insideOutsideORF(), isInFrame(), isOverlapping(), kozakSequenceScore(), rankOrder(), ribosomeReleaseScore(), ribosomeStallingScore(), startRegion(), startRegionCoverage(), stopRegion(), subsetCoverage(), translationalEff()

Examples

ORF <- GRanges(seqnames = "1",
               ranges = IRanges(start = c(1, 10, 20), end = c(5, 15, 25)),
               strand = "+")
names(ORF) <- c("tx1", "tx1", "tx1")
grl <- GRangesList(tx1_1 = ORF)
RFP <- GRanges("1", IRanges(25, 25), "+") # 1 width position based
score(RFP) <- 28 # original width
orfScore(grl, RFP) # negative because more hits on frames 1,2 than 0.

# example with positive result, more hits on frame 0 (in frame of ORF)
RFP <- GRanges("1", IRanges(c(1, 1, 1, 25), width = 1), "+")
score(RFP) <- c(28, 29, 31, 28) # original width
orfScore(grl, RFP)


Roleren/ORFik documentation built on Nov. 13, 2024, 10 p.m.