codon_usage | R Documentation |
Per AA / codon, analyse the coverage, get a multitude of features. For both A sites and P-sites (Input reads must be P-sites for now) This function takes inspiration from the codonDT paper, and among others returns the negative binomial estimates, but in addition many other features.
codon_usage(
reads,
cds,
mrna,
faFile,
filter_table,
filter_cds_mod3 = TRUE,
min_counts_cds_filter = max(min(quantile(filter_table, 0.5), 1000), 1000),
with_A_sites = TRUE,
aligned_position = "center",
code = GENETIC_CODE
)
reads |
either a single library (GRanges, GAlignment, GAlignmentPairs),
or a list of libraries returned from |
cds |
a GRangesList |
mrna |
a GRangesList |
faFile |
a FaFile from genome |
filter_table |
a matrix / vector of length equal to cds |
filter_cds_mod3 |
logical, default TRUE. Remove all ORFs that are not mod3, this speeds up the computation a lot, and usually removes malformed ORFs you would not want anyway. |
min_counts_cds_filter |
numeric, default:
|
with_A_sites |
logical, default TRUE. Not used yet, will also return A site scores. |
aligned_position |
what positions should be taken to calculate per-codon coverage. By default: "center", meaning that positions -1,0,1 will be taken. Alternative: "left", then positions 0,1,2 are taken. |
code |
a named character vector of size 64. Default: GENETIC_CODE. Change if organism does not use the standard code. |
The primary column to use is "mean_txNorm", this is the fair normalized score.
a data.table of rows per codon / AA. All values are given per library, per site (A or P), sorted by the mean_txNorm_percentage column of the first library in the set, the columns are:
variable (character)Library name
seq (character)Amino acid:codon
sum (integer)total counts per seq
sum_txNorm (integer)total counts per seq normalized per tx
var (numeric)variance of total counts per seq
N (integer)total number of codons of that type
mean_txNorm (numeric)Default use output, the fair codon usage, normalized both for gene and genome level for codon and read counts
...
alpha (numeric)dirichlet alpha MOM estimator (imagine mean and variance of probability in 1 value, the lower the value, the higher the variance, mean is decided by the relative value between samples)
sum_txNorm (integer)total counts per seq normalized per tx
relative_to_max_score (integer)Percentage use of codon
type (factor(character))Either "P" or "A"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196831/
Other codon:
codon_usage_exp()
,
codon_usage_plot()
df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs
## For single library
reads <- fimport(filepath(df[1,], "pshifted"))
cds <- loadRegion(df, "cds", filterTranscripts(df))
mrna <- loadRegion(df, "mrna", names(cds))
filter_table <- assay(countTable(df, type = "summarized")[names(cds)])
faFile <- findFa(df)
res <- codon_usage(reads, cds, mrna, faFile = faFile,
filter_table = filter_table, min_counts_cds_filter = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.