fit_cond_prior: Fit a informative conditional prior
In andrewGhazi/malacoda: Bayesian Analysis of High-Throughput Genomic Assays

Description Usage Arguments Details Value Examples

Use informative annotations to bias prior estimation towards alleles that show similar annotations in the provided annotation space.

fit_cond_prior(
  mpra_data,
  annotations,
  n_cores = 1,
  plot_rep_cutoff = TRUE,
  rep_cutoff = 0.15,
  min_neighbors = 100,
  kernel_fold_increase = 1.4142,
  verbose = TRUE
)

`mpra_data`	a data frame of mpra data
`annotations`	a data frame of annotations for the same variants in mpra_data
`n_cores`	number of cores to parallelize across
`plot_rep_cutoff`	logical indicating whether to plot the representation cutoff used
`rep_cutoff`	fraction indicating the depth-adjusted DNA count quantile to use as the cutoff
`min_neighbors`	The minimum number of neighbors in annotation space that must contribute to prior estimation
`kernel_fold_increase`	The amount to iteratively increase kernel width by when estimating conditional priors. Smaller values (closer to 1) will yield more refined priors but take longer.
`verbose`	logical indicating whether to print messages

The empirical prior returned by this object is "conditional" in the sense that the prior estimation weights are conditional on the annotations.

The DNA prior is still estimated marginally because the annotations should not be able to provide any information on the DNA inputs (which are presumably only affected by the preparation of the oligonucleotide library at the vendor).

The RNA prior is estimated from the RNA observations of other variants in the assay that are nearby in annotation space. A multivariate t distribution centered on the variant in question is used to weight all other variants in the assay. It is initialized with a very small width, and if there are fewer than min_neighbors that provide substantial input to the prior, the width is iteratively increased by a factor of kernel_fold_increase until that condition is satisfied. This prevents the prior estimation for variants in sparse regions of annotation space from being influenced too heavily by their nearest neighbors.

A list of two data frames. The first is for the DNA and the second is by-variant RNA priors.

cond_prior = fit_cond_prior(mpra_data = umpra_example,
                            annotations = u_deepsea,
                            n_cores = 1,
                            rep_cutoff = .15,
                            plot_rep_cutoff = TRUE,
                            min_neighbors = 5)