CELP_bias: CELP_bias

CELP_biasR Documentation

CELP_bias

Description

Function to compute codon-level stalling bias coefficients and bias-corrected read counts using the CELP (Consistent Excess of Loess Preds) method

Usage

CELP_bias(tr_codon_read_count_list, codon_raduis = 5,
  loess_method = "interpolate", gini_moderation = FALSE)

Arguments

tr_codon_read_count_list

A list generated by psite_to_codon_count containing observed read counts per codon per trancript per sample.

loess_method

Determines whether the fitted surface should be computed exactly ("direct") or via interpolation from a kd tree ("interpolate"). Default: "interpolate".

gini_moderation

Logical argument. If set to TRUE, (bias_coefficient)^(gini_index) is used as correction factor. If set to FALSE, bias_coefficient is used as correction factor. Default: FALSE.

codon_radius

Number of codons on either side of each codon influencing the loess prediction for the middle codon. Default: 5.

Details

This function is the heart of CELP method for stalling bias detection and correction. It starts with running a loess curve on per-codon read counts along the transcript to borrow information from neighboring codons mitigating the uncertainty of p-site offset assignment and experimental stochasticity. Loess span parameter is calculated from the user-defined codon radius and CDS length. Then, a bias coefficient is calculated for each codon by integrating information on the excess of loess-predicted read counts at that codon comapred to the transcript's background across all samples. Finally, loess predicted counts read counts are divided by bias coefficients to calculate bias-corrected counts. The "direct" fitting method takes longer to complete but does not run into kd-tree-related memory issues. Gini index for each transcript is calculated from the bias coefficients of all of its codons. Gini moderation ensures that the strength of bias correction is proportional to the original level of heterogenity in read distribution along the transcript.

Value

A list composed of two lists: 1. bias coefficients 2. bias-corrected read counts The bias coefficient list has the following structure: list$<transcript.ID> data.frame: [1] codon_number [2] codon_type [3] aa_type [4] bias_coefficient. The bias-corrected read count list has the following structure: list$<sample.name>$<transcript.ID> data.frame: [1] codon_number [2] codon_type [3] aa_type [4] observed_count [5] bias_coefficient [6] corrected_count. Gini moderation ensures that the strength of correction is proportional to the original level of heterogenity in read distribution along the transcript.

Examples

tr_codon_bias_coeff_corrected_count_LMCN <- CELP_bias(tr_codon_read_count_LMCN)
tr_codon_bias_coeff_corrected_count_LMCN_gini_moderated <- CELP_bias(tr_codon_read_count_LMCN, gini_moderation = TRUE)
tr_codon_bias_coeff_corrected_count_LMCN_direct_fit <- CELP_bias(tr_codon_read_count_LMCN, loess_method = "direct")

goodarzilab/Ribolog documentation built on Oct. 7, 2022, 10:14 p.m.