get_region_counts: Retrieves the region counts from a .ribo file

Description Usage Arguments Details Value Examples

View source: R/region_count_functions.R

Description

get_region_counts will return the particular region counts of any subset of regions for a given set of experiments.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
get_region_counts(
  ribo.object,
  range.lower = length_min(ribo.object),
  range.upper = length_max(ribo.object),
  length = TRUE,
  transcript = TRUE,
  tidy = TRUE,
  alias = FALSE,
  normalize = FALSE,
  region = c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3"),
  compact = TRUE,
  experiment = experiments(ribo.object)
)

Arguments

ribo.object

A 'Ribo' object

range.lower

Lower bound of the read length, inclusive

range.upper

Upper bound of the read length, inclusive

length

Logical value that denotes if the region count information should be summed across read lengths

transcript

Logical value that denotes if the region count information should be summed across transcripts

tidy

Option to return the data frame in a tidy format

alias

Option to report the transcripts as aliases/nicknames

normalize

Option to normalize the counts as counts per million reads

region

Specific region of interest

compact

Option to return a DataFrame with Rle and factor as opposed to a raw data.frame

experiment

List of experiment names

Details

This function will return a data frane of the counts at each specified region for each specified experiment. The region options are "UTR5", "UTR5J", "CDS", "UTR3J", and "UTR3". The user can specify any subset of regions in the form of a vector, a list, or a single string if only one region is desired.

The dimensions of the returned DataFrame depend on the parameters range.lower, range.upper, length, and transcript.

The param 'length' condenses the read lengths together. When length is TRUE and transcript is FALSE, the data frame presents information for each transcript across all of the read lengths. That is, each transcript has a value that is the sum of all of the counts across every read length. As a result, information about the transcript at each specific read length is lost.

The param 'transcript' condenses the transcripts together. When transcript is TRUE and length is FALSE data frame presents information at each read length between range.lower and range.upper inclusive. That is, each separate read length denotes the sum of counts from every transcript. As a result, information about the counts of each individual transcript is lost.

When 'transcript' is set to FALSE, the 'alias' parameter specifies whether or not the returned DataFrame should present each transcript as an alias instead of the original name. If 'alias' is set to TRUE, then the column of the transcript names will contain the aliases rather than the original reference names of the .ribo file.

If both 'length' and 'transcript' are TRUE, then the resulting DataFrame prints out one row for each experiment. This provides the metagene information across all transcripts and all reads in a given experiment.

If both length' and 'transcript' are FALSE, calculations are done to the data, all information is preserved for both the read length and the transcript. The DataFrame would just present the entire stored raw data from the read length 'range.lower' to the read length 'range.upper' which in most cases would result in a slow run time with a massive DataFrame returned.

When 'transcript' is set to FALSE, the 'alias' parameter specifies whether or not the returned DataFrame should present each transcript as an alias instead of the original name. If 'alias' is set to TRUE, then the column of the transcript names will contain the aliases rather than the original reference names of the .ribo file.

Value

An annotated DataFrame or data.frame (if the compact parameter is set to FALSE) of the region count information for the regions specified in the 'region' parameter. The returned data frame will have a length column when the 'length' parameter is set to FALSE, indicating that the count information will not be summed across the provided range of read lengths. Similarly, the returned data frame will have a transcript column when the 'transcript' parameter is set to FALSE, indicating that the count information will not be summed across the transcripts. In the case that transcript parameter is 'FALSE', the returned data frame will present the transcripts according to the aliases specified at the creation of the ribo object if the 'alias' parameter is set to TRUE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
#generate the ribo object
file.path <- system.file("extdata", "sample.ribo", package = "ribor")
sample <- Ribo(file.path)

#specify the regions and experiments of interest
regions <- c("UTR5", "UTR5J", "CDS", "UTR3J", "UTR3")
experiments <- c("Hela_1", "Hela_2", "WT_1")

#obtains the region counts at each individual read length, summed across every transcript
region.counts <- get_region_counts(ribo.object = sample,
                                   region = regions,
                                   range.lower = 2,
                                   range.upper = 5,
                                   length = FALSE,
                                   transcript = TRUE,
                                   tidy = FALSE,
                                   alias = FALSE,
                                   experiment = experiments)

mjgeng/ribor documentation built on Dec. 21, 2021, 7:03 p.m.