model_proteins_separately: Run the basehit model one protein at a time

model_proteins_separatelyR Documentation

Run the basehit model one protein at a time

Description

Run the basehit model one protein at a time

Usage

model_proteins_separately(
  count_path,
  out_dir = "outputs/bh_out/",
  cache_dir = "outputs/bh_cache/",
  id_order = c("strain", "repl", "plate"),
  split_data_dir = NULL,
  ixn_prior_width = 0.15,
  algorithm = "variational",
  iter_sampling = 5000,
  iter_warmup = 1000,
  save_split = TRUE,
  save_fits = FALSE,
  save_summaries = TRUE,
  bead_binding_threshold = 1,
  save_bead_binders = TRUE,
  pre_count_threshold = 4,
  min_n_nz = 3,
  min_frac_nz = 0.5,
  weak_score_threshold = 0.5,
  strong_score_threshold = 1,
  weak_concordance_threshold = 0.75,
  strong_concordance_threshold = 0.95,
  verbose = TRUE,
  seed = 1234
)

Arguments

count_path

path to a directory of mapped_bcs.csv files

cache_dir

path to use a cache directory (will be created if non-existent)

id_order

character vector giving the order of dash separated identifiers in the sample_id column

split_data_dir

path to a directory for data split by protein (will be created if non-existent)

ixn_prior_width

standard deviation of zero-centered normal prior on interaction effects

algorithm

stan algorithm to use for posterior evaluation. Any setting other than "variational" uses Stan's adaptive HMC sampler.

iter_sampling

number of post-warmup samples to draw per chain

iter_warmup

number of warmup samples to draw per chain

save_split

logical indicating whether to keep the split data directory intact

save_fits

logical indicating whether to save the posterior fit objects (will use a lot more space if TRUE)

bead_binding_threshold

proteins with enrichment in the beads above this threshold get noted in the output

save_bead_binders

logical indicating whether to save information on bead binders to a separate file

pre_count_threshold

barcodes with counts at or below this value in the Pre ("input") sample are dropped.

min_n_nz

minimum number of non-zero counts required for an interaction to not be entirely discarded

min_frac_nz

minimum proportion of non-zero counts required for an interaction to not be entirely discarded

weak_score_threshold

lower threshold of interaction score to call weak hits

strong_score_threshold

lower threshold of interaction score to call strong hits

weak_concordance_threshold

lower threshold of interaction concordance to call weak hits

strong_concordance_threshold

lower threshold of interaction concordance to call strong hits

verbose

logical indicating whether to print informative progress messages

Details

The count file should have the first row specifying proteins, the second specifying barcodes, and all others after that specifying the output counts for each strain counts for each barcode (i.e. wide format, strain by barcode).

There must be a unique number in the file name of each file in the input directory. This acts as a necessary run identifier e.g. "Mapped_bcs1.csv" is run 1.

The sample_id column in the input MUST have three and only three components separated by dashes. The default order of the three pieces is strain-repl-plate, but you can change the order with the id_order argument if needed. If you need an additional separator for more information in the strain part of the id, underscores are good.

Implemented with furrr, so run a call to plan() that's appropriate for your system in order to parallelize.


andrewGhazi/basehitmodel documentation built on Oct. 22, 2023, 9:21 p.m.