fit_grouped_prior: Fit priors by group
In andrewGhazi/malacoda: Bayesian Analysis of High-Throughput Genomic Assays

Description Usage Arguments Details Value

If you have several distinct categories of variants, one may want to fit priors for them separately. Categories could be genomic region: 5'UTR vs intronic vs 3'UTR vs upstream vs downstream. Perhaps you want to quantify the difference by some prediction outputs: up vs down vs no-effect.

This yields a pseudo-hierarchical model without the computational problems associated with fitting a joint model on thousands of variants at once.

fit_grouped_prior(
  mpra_data,
  group_df,
  n_cores,
  plot_rep_cutoff = TRUE,
  rep_cutoff = 0.15,
  verbose = TRUE
)

`mpra_data`	a data frame of mpra data
`group_df`	a data frame giving group identity by variant_id in mpra_data
`n_cores`	number of cores to parallelize across
`plot_rep_cutoff`	logical indicating whether to plot the representation cutoff used
`rep_cutoff`	fraction indicating the depth-adjusted DNA count quantile to use as the cutoff
`verbose`	logical indicating whether to print messages

group_df should have two columns: variant_id and group_id. This function checks that there are >100 variants per group and that there aren't more than 20 groups. These are somewhat arbitrary magic numbers, but having loads of tiny groups is a recipe for over-fitting.

a grouped prior list

andrewGhazi/malacoda documentation built on Aug. 2, 2020, 12:54 a.m.