fit_grouped_prior: Fit priors by group

Description Usage Arguments Details Value

View source: R/prior_fitting.R

Description

If you have several distinct categories of variants, one may want to fit priors for them separately. Categories could be genomic region: 5'UTR vs intronic vs 3'UTR vs upstream vs downstream. Perhaps you want to quantify the difference by some prediction outputs: up vs down vs no-effect.

This yields a pseudo-hierarchical model without the computational problems associated with fitting a joint model on thousands of variants at once.

Usage

1
2
3
4
5
6
7
8
fit_grouped_prior(
  mpra_data,
  group_df,
  n_cores,
  plot_rep_cutoff = TRUE,
  rep_cutoff = 0.15,
  verbose = TRUE
)

Arguments

mpra_data

a data frame of mpra data

group_df

a data frame giving group identity by variant_id in mpra_data

n_cores

number of cores to parallelize across

plot_rep_cutoff

logical indicating whether to plot the representation cutoff used

rep_cutoff

fraction indicating the depth-adjusted DNA count quantile to use as the cutoff

verbose

logical indicating whether to print messages

Details

group_df should have two columns: variant_id and group_id. This function checks that there are >100 variants per group and that there aren't more than 20 groups. These are somewhat arbitrary magic numbers, but having loads of tiny groups is a recipe for over-fitting.

Value

a grouped prior list


andrewGhazi/malacoda documentation built on Aug. 2, 2020, 12:54 a.m.