model_proteins_separately: Run the basehit model one protein at a time
In andrewGhazi/basehitmodel: Runs the basehit ZINB model

model_proteins_separately

R Documentation

Run the basehit model one protein at a time

Description

Run the basehit model one protein at a time

Usage

model_proteins_separately(
  count_path,
  out_dir = "outputs/bh_out/",
  cache_dir = "outputs/bh_cache/",
  id_order = c("strain", "repl", "plate"),
  split_data_dir = NULL,
  ixn_prior_width = 0.15,
  algorithm = "variational",
  iter_sampling = 5000,
  iter_warmup = 1000,
  save_split = TRUE,
  save_fits = FALSE,
  save_summaries = TRUE,
  bead_binding_threshold = 1,
  save_bead_binders = TRUE,
  pre_count_threshold = 4,
  min_n_nz = 3,
  min_frac_nz = 0.5,
  weak_score_threshold = 0.5,
  strong_score_threshold = 1,
  weak_concordance_threshold = 0.75,
  strong_concordance_threshold = 0.95,
  verbose = TRUE,
  seed = 1234
)

Arguments

`count_path`	path to a directory of mapped_bcs.csv files
`cache_dir`	path to use a cache directory (will be created if non-existent)
`id_order`	character vector giving the order of dash separated identifiers in the sample_id column
`split_data_dir`	path to a directory for data split by protein (will be created if non-existent)
`ixn_prior_width`	standard deviation of zero-centered normal prior on interaction effects
`algorithm`	stan algorithm to use for posterior evaluation. Any setting other than "variational" uses Stan's adaptive HMC sampler.
`iter_sampling`	number of post-warmup samples to draw per chain
`iter_warmup`	number of warmup samples to draw per chain
`save_split`	logical indicating whether to keep the split data directory intact
`save_fits`	logical indicating whether to save the posterior fit objects (will use a lot more space if TRUE)
`bead_binding_threshold`	proteins with enrichment in the beads above this threshold get noted in the output
`save_bead_binders`	logical indicating whether to save information on bead binders to a separate file
`pre_count_threshold`	barcodes with counts at or below this value in the Pre ("input") sample are dropped.
`min_n_nz`	minimum number of non-zero counts required for an interaction to not be entirely discarded
`min_frac_nz`	minimum proportion of non-zero counts required for an interaction to not be entirely discarded
`weak_score_threshold`	lower threshold of interaction score to call weak hits
`strong_score_threshold`	lower threshold of interaction score to call strong hits
`weak_concordance_threshold`	lower threshold of interaction concordance to call weak hits
`strong_concordance_threshold`	lower threshold of interaction concordance to call strong hits
`verbose`	logical indicating whether to print informative progress messages

Details

The count file should have the first row specifying proteins, the second specifying barcodes, and all others after that specifying the output counts for each strain counts for each barcode (i.e. wide format, strain by barcode).

There must be a unique number in the file name of each file in the input directory. This acts as a necessary run identifier e.g. "Mapped_bcs1.csv" is run 1.

The sample_id column in the input MUST have three and only three components separated by dashes. The default order of the three pieces is strain-repl-plate, but you can change the order with the id_order argument if needed. If you need an additional separator for more information in the strain part of the id, underscores are good.

Implemented with furrr, so run a call to plan() that's appropriate for your system in order to parallelize.

andrewGhazi/basehitmodel documentation built on Oct. 22, 2023, 9:21 p.m.