calcRPKM: Calculate RPKM

View source: R/expression.R

calcRPKMR Documentation

Calculate RPKM

Description

Calculate read counts per kilo base per million reads (RPKM).

Usage

calcRPKM(
  bam,
  orfGRL,
  libSize = length(bam),
  trimStart = 6,
  trimEnd = 6,
  ignoreStrand = TRUE
)

Arguments

bam

A GRanges or GAlignments object of reads. Note that for Ribo-seq data, the reads should be already size selected and shifted. Check function shiftReads on how to shift reads. For RNA-seq data, there is no need to shift or size select reads. Also, for each read, only the 5'-most position is used. (Required).

orfGRL

A GRangesList object of ORFs. We recommend assigning a unique name to each ORF using names(orfGRL). In addition, the following modifications are also applied: 1. If the names of orfGRL are NULL, rename each element as "orf_1", "orf_2", etc; 2. Strands marked as "*" are replaced with "+"; 3. Remove elements with multiple chromosomes or strands (one ORF is on multiple chromosomes or different strands); 4. Remove elements where the ORF length is not divisible by 3; and 5. MOST IMPORTANTLY, if an ORF is on positive strand, sort by coordinates (seqnames, start, end) in ascending order. Otherwise, sort by coordinates (seqnames, end, start) in descending order. The purpose is to achieve the same behavior as cdsBy function in GenomicFeatures package. (Required).

libSize

A positive numeric variable indicating the library size of the reads. By default, we use the number of reads in bam object specified. (Default: length(bam)).

trimStart

A non-negative numeric variable indicating how many bases to trim for ORF start. (Default: 6).

trimEnd

A non-negative numeric variable indicating how many bases to trim for ORF end. (Default: 6).

ignoreStrand

A logical variable indicating if ignoring that reads and ORFs must be on the same strand. (Default: TRUE).

Value

A data.frame with 4 columns, specified below: 1. Column 1 is ORF ID (orfId, either user specified in orfGRL or internally generated); 2. Column 2 is trimmed ORF length (orfLenTrimmed); 3. Column 3 is the read counts (countORF) in the trimmed ORF region; Column 4 is the RPKM value (rpkmORF).


nzhang89/RiboSeeker documentation built on April 15, 2022, 10:18 a.m.