analyzeLegacyTileseqCounts: analyze tileseq counts from legacy pipeline
In jweile/tileseqMave: TileSeq MAVE Analysis pipeline

analyzeLegacyTileseqCounts

R Documentation

analyze tileseq counts from legacy pipeline

Description

This analysis function performs the following steps for each mutagenesis region: 1. Construction of HGVS variant descriptor strings. 2. Collapsing equivalent codons into amino acic change counts. 3. Error regularization at the level of pre- and post-selection counts. 4. Quality-based filtering filtering based on "Song's rule". 5. Fitness score calculation and error propagation. 6. Secondary error regularization at the level of fitness scores. 7. Determination of synonymous and nonsense medians and re-scaling of fitness scores. 8. Flooring of negative scores and adjustment of associated error. 9. Output in MaveDB format.

Usage

analyzeLegacyTileseqCounts(
  countfile,
  regionfile,
  outdir,
  logger = NULL,
  inverseAssay = FALSE,
  pseudoObservations = 2,
  conservativeMode = TRUE
)

Arguments

`countfile`	the path to the "rawData.txt" file produced by the legacy pipeline.
`regionfile`	the path to a tab-delimited file describing the mutagenesis regions. Must contain columns 'region', start', 'end', 'syn', 'stop', i.e. the region id, the start position, end position, and and optional synonymous and stopm mean overrides.
`outdir`	path to desired output directory
`logger`	a yogilogger object to be used for logging (or NULL for simple printing)
`inverseAssay`	a boolean flag to indicate that the experiment was done with an inverse assay i.e. protein function leading to decreased fitness. Defaults to FALSE
`pseudoObservations`	The number of pseudoObservations to use for the Baldi&Long regularization. Defaults to 2.
`conservativeMode`	Boolean flag. When turned on, pseudoObservations are not counted towards standard error and the first round of regularization uses pessimistic error estimates.