calcMetagene: Metagene matrix calculation

View source: R/metagene.R

calcMetageneR Documentation

Metagene matrix calculation

Description

Two modes are provided for metagene matrix calculation. If regionGR is specified, metagene will be calculated for the regions given in regionGR. Otherwise, txdb must be specified. If txList is specified, metagene will be calculated in the CDS start and end regions for the transcripts specified. If txList is not specified, the highest-expressed transcripts will be selected and metagene matrix will be calculated in the CDS start and end regions. Also, the 5'-end position will be kept for the reads.

Usage

calcMetagene(
  bam,
  regionGR = NULL,
  txdb = NULL,
  txList = NULL,
  readLen = NULL,
  fiveEndOnly = TRUE,
  relFreq = TRUE,
  nTx = 2000,
  cdsStartUpstream = 50,
  cdsStartDownstream = 50,
  cdsEndUpstream = 50,
  cdsEndDownstream = 50
)

Arguments

bam

A GAlignments object of aligned reads. (Required).

regionGR

A GRanges object of the target regions to calculate metagene. Note that all regions must have equal width. If regionGR is set, txdb will be ignored. If NULL, txdb must be set. (Default: NULL).

txdb

A TxDb object of genome annotation. See GenomicFeatures package for more details. If NULL, regionGR must be set. (Default: NULL).

txList

A character vector of transcript IDs. Note that the transcript IDs set here should also be found in the txdb. (Default: NULL).

readLen

A vector of read lengths to use (positive). If NULL, all lengths will be kept. (Default: NULL).

fiveEndOnly

A logical variable indicating if only keeping the 5'-ends of reads. This option can be used to increase the resolution of Ribo-seq reads. (Default: TRUE).

relFreq

A logical variable indicating if transcript wise relative frequency ( normalized by the total read counts in each metagene region) should be returned instead of raw read counts. Note that if txdb is set but regionGR is NULL, metagene for both CDS start and end regions will be calculated. In this case, if setting relFreq to TRUE, metagene will be normalized by the total read counts in both CDS start and end region for each transcript. (Default: TRUE).

nTx

A numeric variable of the number of transcripts to keep when txdb is set but txList is NULL. The top nTx transcripts with the most reads in coding start and end regions will be kept. (Default: 2000).

cdsStartUpstream

A numeric variable indicating the width to use for the upstream region of CDS start site (not including CDS start site) if txdb is set but regionGR is NULL. (Default: 50).

cdsStartDownstream

A numeric variable indicating the width to use for the downstream region of CDS start site (including CDS start site) if txdb is set but regionGR is NULL. (Default: 50).

cdsEndUpstream

A numeric variable indicating the width to use for the upstream region of CDS end site (including CDS end site) if txdb is set but regionGR is NULL. (Default: 50).

cdsEndDownstream

A numeric variable indicating the width to use for the downstream region of CDS end site (not including CDS end site) if txdb is set but regionGR is NULL. (Default: 50).

Value

A list containing three elements. The first element is metagne, either one matrix if regionGR is set or a list of two matrices in CDS start and end regions (with names of start and end). Each row is a transcript, each column is a position, and a value represents read counts or relative frequency for a transcript at a position. The second element is the GRanges for the metagene, either one range if regionGR is set or a list of two ranges representing the CDS start and end regions. This list also contains information of the transcripts selected and flanking sequence lengths for CDS start and end. The third element is an internal variable indicating if regionGR is specified or not (1 means regionGR is set and 2 means not set).


nzhang89/RiboSeeker documentation built on April 15, 2022, 10:18 a.m.