codon_usage: Codon usage

codon_usageR Documentation

Codon usage

Description

Per AA / codon, analyse the coverage, get a multitude of features. For both A sites and P-sites (Input reads must be P-sites for now) This function takes inspiration from the codonDT paper, and among others returns the negative binomial estimates, but in addition many other features.

Usage

codon_usage(
  reads,
  cds,
  mrna,
  faFile,
  filter_table,
  filter_cds_mod3 = TRUE,
  min_counts_cds_filter = max(min(quantile(filter_table, 0.5), 1000), 1000),
  with_A_sites = TRUE,
  aligned_position = "center",
  code = GENETIC_CODE
)

Arguments

reads

either a single library (GRanges, GAlignment, GAlignmentPairs), or a list of libraries returned from outputLibs(df) with p-sites. If list, the list must have names coresponding to the library names.

cds

a GRangesList

mrna

a GRangesList

faFile

a FaFile from genome

filter_table

a matrix / vector of length equal to cds

filter_cds_mod3

logical, default TRUE. Remove all ORFs that are not mod3, this speeds up the computation a lot, and usually removes malformed ORFs you would not want anyway.

min_counts_cds_filter

numeric, default: max(min(quantile(filter_table, 0.50), 100), 100). Minimum number of counts from the 'filter_table' argument.

with_A_sites

logical, default TRUE. Not used yet, will also return A site scores.

aligned_position

what positions should be taken to calculate per-codon coverage. By default: "center", meaning that positions -1,0,1 will be taken. Alternative: "left", then positions 0,1,2 are taken.

code

a named character vector of size 64. Default: GENETIC_CODE. Change if organism does not use the standard code.

Details

The primary column to use is "mean_txNorm", this is the fair normalized score.

Value

a data.table of rows per codon / AA. All values are given per library, per site (A or P), sorted by the mean_txNorm_percentage column of the first library in the set, the columns are:

  • variable (character)Library name

  • seq (character)Amino acid:codon

  • sum (integer)total counts per seq

  • sum_txNorm (integer)total counts per seq normalized per tx

  • var (numeric)variance of total counts per seq

  • N (integer)total number of codons of that type

  • mean_txNorm (numeric)Default use output, the fair codon usage, normalized both for gene and genome level for codon and read counts

  • ...

  • alpha (numeric)dirichlet alpha MOM estimator (imagine mean and variance of probability in 1 value, the lower the value, the higher the variance, mean is decided by the relative value between samples)

  • sum_txNorm (integer)total counts per seq normalized per tx

  • relative_to_max_score (integer)Percentage use of codon

  • type (factor(character))Either "P" or "A"

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196831/

See Also

Other codon: codon_usage_exp(), codon_usage_plot()

Examples

df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs

## For single library
reads <- fimport(filepath(df[1,], "pshifted"))
cds <- loadRegion(df, "cds", filterTranscripts(df))
mrna <- loadRegion(df, "mrna", names(cds))
filter_table <- assay(countTable(df, type = "summarized")[names(cds)])
faFile <- findFa(df)
res <- codon_usage(reads, cds, mrna, faFile = faFile,
             filter_table = filter_table, min_counts_cds_filter = 10)

Roleren/ORFik documentation built on Nov. 13, 2024, 10 p.m.