calcORFScore | R Documentation |
ORFScore is firstly defined in Bazzini et al., 2014 (PMID: 24705786), and is used to dscover novel open reading frames (ORF) or rank ORFs showing active translation. Basically, given an ORF, read counts for the three frames are calculated. Then a Chi-squared test statistic is computed by comparing the read counts with an equal null distribution p = c(1/3, 1/3, 1/3). The log2(1 + test statistic) is called ORFScore. In addition, the sign of the ORFScore is positive if the target frame (by default is frame 1) counts are larger than the counts of the other two frames, and negative otherwise.
calcORFScore( bam, orfGRL, frameOrder = c(1, 2, 3), targetFrame = 1, ignoreStrand = TRUE, probNULL = c(1/3, 1/3, 1/3) )
bam |
A |
orfGRL |
A |
frameOrder |
A numeric vector of length 3 showing the frames for each position in each ORF. By default, the first position in each ORF is frame 1, the second position is frame 2, and the third position is frame 3. Repeat this pattern afterwards (e.g. 4th position is frame 1, 5th is frame 2, and 6th is frame 3. So on and so forth). (Default: c(1, 2, 3)). |
targetFrame |
A numeric variable indicating which frame is expected to have higher read counts. By default, frame 1 is expected to have higher read counts than frame 2 and 3. (Default: 1). |
ignoreStrand |
A logical variable indicating if ignoring that reads and ORFs must be on the same strand. (Default: TRUE). |
probNULL |
A numeric vector of length 3 showing the null distribution of the read counts in the three frames of an ORF. Must be non-negative and sum up to 1. By default, an equal null distribution is used in the chi-squared test. (Default: c(1/3, 1/3, 1/3)). |
A data.frame
with 9 columns, specified below: 1. Column 1 is ORF ID
(orfId
, either user specified in orfGRL
or internally generated); 2. Columns
2 to 4 are the read counts for the three frames where the order is specified by
frameOrder
(e.g. frame1Count
, frame2Count
, and frame3Count
);
3. Columns 5 to 7 are the percentages of positive counts for the three frames where the
order is specified by frameOrder
(e.g. frame1PosPct
, frame2PosPct
,
and frame3PosPct
). For example, if an ORF has 30 positions (10 positions for each
frame), 8 positions for frame 1 are positive, 1 position for frame 2 is positive, and 0
position for frame 3 is positive, then column 5 to 7 are 0.8, 0.1, and 0. The purpose of
these three columns is to help filtering ORFs with high ORFScore, but the reads only show
up in very few positions in the target frame. An example would be an ORF has 300 positions.
Frame 1 has 100 read counts, and frame 2 and 3 has 0 read counts. But all the 100 read
counts for frame 1 are located in the same position. In this case, the ORFScore will be
large (if frame 1 is the target frame), but frame1PosPct
is small (only 0.01). This
ORF might be more likely to be a false positive; and 4. Columns 8 and 9 are raw ORFScores
(rawORFScore
, test statistics with signs) and final ORFScores (ORFScore
,
log2(1 + rawORFScore) with signs). If the read counts for all three frames are zero, the
raw and final ORFScore is set to NA
. The dataframe is sorted by ORFScore in
descending order.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.