fit_cond_prior: Fit a informative conditional prior

Description Usage Arguments Details Value Examples

View source: R/prior_fitting.R

Description

Use informative annotations to bias prior estimation towards alleles that show similar annotations in the provided annotation space.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
fit_cond_prior(
  mpra_data,
  annotations,
  n_cores = 1,
  plot_rep_cutoff = TRUE,
  rep_cutoff = 0.15,
  min_neighbors = 100,
  kernel_fold_increase = 1.4142,
  verbose = TRUE
)

Arguments

mpra_data

a data frame of mpra data

annotations

a data frame of annotations for the same variants in mpra_data

n_cores

number of cores to parallelize across

plot_rep_cutoff

logical indicating whether to plot the representation cutoff used

rep_cutoff

fraction indicating the depth-adjusted DNA count quantile to use as the cutoff

min_neighbors

The minimum number of neighbors in annotation space that must contribute to prior estimation

kernel_fold_increase

The amount to iteratively increase kernel width by when estimating conditional priors. Smaller values (closer to 1) will yield more refined priors but take longer.

verbose

logical indicating whether to print messages

Details

The empirical prior returned by this object is "conditional" in the sense that the prior estimation weights are conditional on the annotations.

The DNA prior is still estimated marginally because the annotations should not be able to provide any information on the DNA inputs (which are presumably only affected by the preparation of the oligonucleotide library at the vendor).

The RNA prior is estimated from the RNA observations of other variants in the assay that are nearby in annotation space. A multivariate t distribution centered on the variant in question is used to weight all other variants in the assay. It is initialized with a very small width, and if there are fewer than min_neighbors that provide substantial input to the prior, the width is iteratively increased by a factor of kernel_fold_increase until that condition is satisfied. This prevents the prior estimation for variants in sparse regions of annotation space from being influenced too heavily by their nearest neighbors.

Value

A list of two data frames. The first is for the DNA and the second is by-variant RNA priors.

Examples

1
2
3
4
5
6
cond_prior = fit_cond_prior(mpra_data = umpra_example,
                            annotations = u_deepsea,
                            n_cores = 1,
                            rep_cutoff = .15,
                            plot_rep_cutoff = TRUE,
                            min_neighbors = 5)

andrewGhazi/malacoda documentation built on Aug. 2, 2020, 12:54 a.m.