computeFeatures: Get all possible features in ORFik

Description Usage Arguments Details Value See Also Examples

View source: R/compute_Features.R

Description

If you want to get all the NGS and/or sequence features easily, you can use this function. Each feature have a link to an article describing its creation and idea behind it. Look at the functions in the feature family to see all of them. Example, if you want to know what the "te" column is, check out: ?translationalEff.

If you used CageSeq to reannotate your leaders, your txDB object must contain the reassigned leaders. Use [reassignTxDbByCage()] to get the txdb.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
computeFeatures(
  grl,
  RFP,
  RNA = NULL,
  Gtf,
  faFile = NULL,
  riboStart = 26,
  riboStop = 34,
  sequenceFeatures = TRUE,
  uorfFeatures = TRUE,
  grl.is.sorted = FALSE,
  weight.RFP = 1L,
  weight.RNA = 1L
)

Arguments

grl

a GRangesList object with usually ORFs, but can also be either leaders, cds', 3' utrs, etc. This is the regions you want to score.

RFP

RiboSeq reads as GAlignments , GRanges or GRangesList object

RNA

RnaSeq reads as GAlignments , GRanges or GRangesList object

Gtf

a TxDb object of a gtf file or path to gtf, gff .sqlite etc.

faFile

a path to fasta indexed genome, an open FaFile, a BSgenome, or path to ORFik experiment with valid genome.

riboStart

usually 26, the start of the floss interval, see ?floss

riboStop

usually 34, the end of the floss interval

sequenceFeatures

a logical, default TRUE, include all sequence features, that is: Kozak, fractionLengths, distORFCDS, isInFrame, isOverlapping and rankInTx. uorfFeatures = FALSE will remove the 4 last.

uorfFeatures

a logical, default TRUE, include all uORF sequence features, that is: distORFCDS, isInFrame, isOverlapping and rankInTx

grl.is.sorted

logical (F), a speed up if you know argument grl is sorted, set this to TRUE.

weight.RFP

a vector (default: 1L). Can also be character name of column in RFP. As in translationalEff(weight = "score") for: GRanges("chr1", 1, "+", score = 5), would mean score column tells that this alignment region was found 5 times.

weight.RNA

Same as weightRFP but for RNA weights. (default: 1L)

Details

As a note the library is reduced to only reads overlapping 'tx', so the library size in fpkm calculation is done on this subset. This will help remove rRNA and other contaminants.
Also if you have only unique reads with a weight column, explaining the number of duplicated reads, set weights to make calculations correct. See getWeights

Value

a data.table with scores, each column is one score type, name of columns are the names of the scores, i.g [floss()] or [fpkm()]

See Also

Other features: computeFeaturesCage(), countOverlapsW(), disengagementScore(), distToCds(), distToTSS(), entropy(), floss(), fpkm_calc(), fpkm(), fractionLength(), initiationScore(), insideOutsideORF(), isInFrame(), isOverlapping(), kozakSequenceScore(), orfScore(), rankOrder(), ribosomeReleaseScore(), ribosomeStallingScore(), startRegionCoverage(), startRegion(), stopRegion(), subsetCoverage(), translationalEff()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Here we make an example from scratch
# Usually the ORFs are found in orfik, which makes names for you etc.
gtf <- system.file("extdata", "annotations.gtf",
package = "ORFik") ## location of the gtf file
suppressWarnings(txdb <-
                  GenomicFeatures::makeTxDbFromGFF(gtf, format = "gtf"))
# use cds' as ORFs for this example
ORFs <- GenomicFeatures::cdsBy(txdb, by = "tx", use.names = TRUE)
ORFs <- makeORFNames(ORFs) # need ORF names
# make Ribo-seq data,
RFP <- unlistGrl(firstExonPerGroup(ORFs))
suppressWarnings(computeFeatures(ORFs, RFP, Gtf = txdb))
# For more details see vignettes.

ORFik documentation built on March 27, 2021, 6 p.m.