R package to facilitate microbial fitness inference (using Generalized Linear Models) from barcode counts based on the Negative Binomial (NB) distribution.
Designed for ultra-high throughput screening with (more than) tens of millions of observations. This means:
glm
objects causing memory explosionsNot yet on CRAN or BioconductoR, so you need to install using devtools.
In an R session:
devtools::install_github('eachanjohnson/concensusGLM')
This will install the library in your default path. Check .libPaths()
to see what that is.
The script contained in exec/concensusGLM.R
allows for a lot of imagined use cases.
If you use the --sge
option, you need to provide the path on your computer to the exec/sge-template.sh
template submission script, which you may need to edit for your particular setup.
Usage:
concensusGLM.R --help
concensusGLM.R --data <data-file> --neg <negative-control> [--meta <experiment-metadata> --pos <positive-control> --no-count-threshold --checkpoint --parallel --sge <template-script>]
Options:
-h --help Show this message and exit
--data <data-file> Input count table CSV, one line per lane per strain per well
--meta <experiment-metadata> CSV of handling records
--neg <negative-control> Name of negative control compound, e.g. DMSO or none or untreated
--pos <positive-control> Name of positive control compound, e.g. rifampin or BRD-K01507359-001-19-5 or BRD-K01507359
--checkpoint Save checkpoints to allow restart in case of failure
--parallel Analyze strains in parallel (needs multiple cores)
--sge <template-script> Use GridEngine cluster with template to analyze strains in parallel.
--no-count-threshold Don\'t discard plates or strains based on low counts
This package provides the objects:
concensusDataSet
, a container for ConcensusGLM analysisconcenusWorkflow
, a wrapper to the workflow
object from workflows, allowing scale-up to multi-core on a laptop or a GridEngine style clusterBoth objects concensusDataSet
have methods to carry out ConcensusGLM analysis. Using ?methodName
in the R console will give documentation.
getRoughDispersions
getBatchEffects
getFinalDispersions
getFinalModel
write_concensusDataSet
Application of these methods in an order other than listed above is not fully supported and may not work.
Applying the methods to a concensusDataSet
will execute the method immediately. Applying it to a concenusWorkflow
will add it to a list of commands, which can be executed at some other time by calling execute
on a concensusWorkflow
object.
You can also scatter
a concenusWorkflow
based on categorical columns in the input data set to allow chunking for embarassingly parallel computation when execute
is called. This can be done on the local machine using multiple cores, or using a GridEngine style cluster like SGE or UGE, in which case the submission script exec/sge-template.sh
may need to be edited for your specific cluster.
Eachan O. Johnson et. al., Large-scale chemical-genetics yields new M. tuberculosis inhibitor classes, Nature, July 2019, doi: 10.1038/s41586-019-1315-z
Dependencies:
Similar approaches include but are not limited to:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.