Description Usage Arguments Details Value Note Examples
View source: R/fitting_functions.R
This function fits a negative-binomial based Bayesian model to MPRA data. Optional annotations can be included to allow for more informative conditional priors.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | fit_mpra_model(
mpra_data,
annotations = NULL,
group_df = NULL,
out_dir = NULL,
save_nonfunctional = FALSE,
priors = NULL,
n_cores = 1,
n_chains = 4,
tot_samp = 2000,
n_warmup = 200,
vb_pass = TRUE,
vb_prob = 0.8,
ts_hdi_prob = 0.95,
ts_rope = c(-0.405, 0.405),
rep_cutoff = 0.15,
adaptive_precision = TRUE,
verbose = TRUE
)
|
mpra_data |
a data frame of MPRA data with 1 column called variant_id, an allele column, a barcode column, and additional columns per sequencing sample. Each row is for a single barcode. |
annotations |
an optional data frame of annotations with identical variant_ids and an arbitrary number of functional annotations. If omitted, the prior for a given variant is influenced by all other variants in the assay equally. |
group_df |
an optional data frame giving group identity by variant_id in mpra_data |
out_dir |
path to output directory |
save_nonfunctional |
logical indicating whether or not to save the sampler results for variants identified as non-functional |
priors |
optional objects provided by either fit_marg_prior() or fit_cond_prior. |
n_cores |
number of cores across which to parallelize variant MPRA samplers |
n_chains |
number of MCMC chains to run in each sampler |
tot_samp |
total number of MCMC draws to take, spread evenly across chains |
n_warmup |
total number of warmup draws to take from each MCMC chain |
vb_pass |
logical indicating whether to use a variational first pass |
vb_prob |
numeric 0 - 1 indicating probability mass to use as a TS HDI for identifying "promising" candidates for MCMC followup |
ts_hdi_prob |
probability mass to include in the highest density interval on transcription shift to call MPRA-functional variants. |
ts_rope |
length 2 numeric vector describing the boundaries of the transcription shift region of practical equivalence (ROPE), defaulting to +/- log(3/2) |
rep_cutoff |
a representation cutoff quantile (0 to 1) |
adaptive_precision |
logical indicating whether to adaptively adjust the length of the posterior MCMC chain for borderline functional variants |
verbose |
logical indicating whether to print messages |
mpra_data
must contain the following groups of columns:
variant_id
allele - either 'ref' or 'alt'
barcode - a unique index sequence for that row (ideally the same barcode used in the assay)
at least one column of MPRA counts whose column name(s) matches 'DNA'
at least one column of MPRA counts whose column name(s) matches 'RNA'
annotations
must contain the same variant_id's used in mpra_data.
Additional columns are used as informative predictors: when estimating the
priors for one variant, other variants with similar annotations will be
upweighted in the prior-fitting process.
If priors
is provided, any annotations input will be ignored. This
can be useful when you want to fit models again without having to spend
time re-fitting the priors.
Sampler results will be saved to out_dir. By default, only the sampler
results for variants identified as MPRA-functional will be saved. This
behavior can be changed by setting save_nonfunctional
to TRUE.
We've set the sampler parameters (n_chains to n_warmup) to values that work reasonably well at reasonable speeds for typical MPRA data on typical hardware. Final analyses and/or models fit to larger MPRA experiments will likely want to increase n_chains and tot_samp considerably to ensure precise convergence.
vb_pass
indicates whether to use a first pass variational check to
see if a given variant is worth running the MCMC sampler. It does this by
checking if a 40
excludes 0. This speeds up posterior evaluation considerably, but gives
approximate results. If vb_pass
is set to FALSE, all variants get
MCMC.
ts_rope
can be used to define a "Region Of Practical Equivalence"
for transcription shift. This is some small-ish region around 0 where
observed posterior samples are "practically equivalent" to 0. The output
column ts_rope_mass
returns the fraction of transcription shift
posterior samples that fall within the defined ROPE along with the usual
model outputs. If this fraction is small, one can say that there is very
little posterior belief that the variant's transcription shift is
practically equivalent to 0. The user must be cognizant of defining the
region in accordance with observed noise and effect size levels. Note that
the output ROPE fractions are NOT p-values.
Barcodes below the rep_cutoff
quantile of representation in the DNA
pools are discarded.
adaptive_precision
indicates whether to adaptively increase the
length of the MCMC chains for borderline functional variants. The edges of
a 95
one edge of a variant's HDI interval is close to 0, this setting will tell
the sampler to double the length of the MCMC chain. Using the default 95
HDI, this means that if a 92.5
the sampler will double the length of the MCMC chain. This argument should
be turned off if tot_samp is already set to a high value.
a data frame with a row for each variant_id that specifies the
posterior mean TS, upper and lower HDI bounds, a binary call of functional
or non-functional, and other appropriate outputs. The output column
is_functional
is defined by the TS HDI excluding 0.
Sampler results for individual variants will be saved to the specified out_dir as they can be several megabytes each. The table of summary statistics that this function returns will also be saved into this directory into an object called "analysis_res.RData".
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | # This example fits the malacoda model on 3 variants with too-short MCMC chains
example_variants = c("1_205247315_2-3", "10_101274365_1-3", "10_45966422_2-3")
examples_to_evaluate = umpra_example[umpra_example$variant_id %in% example_variants,]
# tot_samp should be set to >50,000 to ensure the posterior chains converge
example_result = fit_mpra_model(mpra_data = examples_to_evaluate,
priors = marg_prior_example,
vb_pass = FALSE,
tot_samp = 20,
n_warmup = 10,
adaptive_precision = FALSE)
print(example_result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.