codon_usage_exp | R Documentation |
Per AA / codon, analyse the coverage, get a multitude of features. For both A sites and P-sites (Input reads must be P-sites for now) This function takes inspiration from the codonDT paper, and among others returns the negative binomial estimates, but in addition many other features.
codon_usage_exp(
df,
reads,
cds = loadRegion(df, "cds", filterTranscripts(df)),
mrna = loadRegion(df, "mrna", names(cds)),
filter_cds_mod3 = TRUE,
filter_table = assay(countTable(df, type = "summarized")[names(cds)]),
faFile = df@fafile,
min_counts_cds_filter = max(min(quantile(filter_table, 0.5), 1000), 1000),
with_A_sites = TRUE,
code = GENETIC_CODE,
aligned_position = "center"
)
df |
an ORFik |
reads |
either a single library (GRanges, GAlignment, GAlignmentPairs),
or a list of libraries returned from |
cds |
a GRangesList, the coding sequences, default:
|
mrna |
a GRangesList, the full mRNA sequences (matching by names
the cds sequences), default:
|
filter_cds_mod3 |
logical, default TRUE. Remove all ORFs that are not mod3, this speeds up the computation a lot, and usually removes malformed ORFs you would not want anyway. |
filter_table |
an numeric(integer) matrix, where rownames are the names of the full set of mRNA transcripts. This will be subsetted to the cds subset you use. Then CDSs are filtered from this table by the 'min_counts_cds_filter' argument. |
faFile |
|
min_counts_cds_filter |
numeric, default:
|
with_A_sites |
logical, default TRUE. Not used yet, will also return A site scores. |
code |
a named character vector of size 64. Default: GENETIC_CODE. Change if organism does not use the standard code. |
aligned_position |
what positions should be taken to calculate per-codon coverage. By default: "center", meaning that positions -1,0,1 will be taken. Alternative: "left", then positions 0,1,2 are taken. |
The primary column to use is "mean_txNorm", this is the fair normalized score.
a data.table of rows per codon / AA. All values are given per library, per site (A or P), sorted by the mean_txNorm_percentage column of the first library in the set, the columns are:
variable (character)Library name
seq (character)Amino acid:codon
sum (integer)total counts per seq
sum_txNorm (integer)total counts per seq normalized per tx
var (numeric)variance of total counts per seq
N (integer)total number of codons of that type
mean_txNorm (numeric)Default use output, the fair codon usage, normalized both for gene and genome level for codon and read counts
...
alpha (numeric)dirichlet alpha MOM estimator (imagine mean and variance of probability in 1 value, the lower the value, the higher the variance, mean is decided by the relative value between samples)
sum_txNorm (integer)total counts per seq normalized per tx
relative_to_max_score (integer)Percentage use of codon
type (factor(character))Either "P" or "A"
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7196831/
Other codon:
codon_usage()
,
codon_usage_plot()
df <- ORFik.template.experiment()[9:10,] # Subset to 2 Ribo-seq libs
## For single library
res <- codon_usage_exp(df, fimport(filepath(df[1,], "pshifted")),
min_counts_cds_filter = 10)
# mean_txNorm is adviced scoring column
# codon_usage_plot(res, res$mean_txNorm)
# Default for plot function is the percentage scaled version of mean_txNorm
# codon_usage_plot(res) # This gives check error
## For multiple libs
res2 <- codon_usage_exp(df, outputLibs(df, type = "pshifted", output.mode = "list"),
min_counts_cds_filter = 10)
# codon_usage_plot(res2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.