sortingsim | R Documentation |
sortingsim simulates a selection-based CRISPR-Cas9 pooled screen with
user-defined parameters. In a way, this is a simplified version of
CRISPRsim
, because there is no simulation of growth and
passaging. It is about entering into an assay with a number of cells, and
collecting the selected and unselected cells for sequencing. This was mainly
created with FACS-based screens in mind, but I suppose one can think of other
selections as well. Drug treatment is still an option! Again, "infected"
cells either have a successful knockout (with associated effect) or not,
based on guide efficacy. Cells are then sorted, which in this simulator means
that all cells with a certain knockout have a base probability to be
positively selected, modified by a gene-specific score. Note that this
function takes the number of sorted cells as input, of which only a fraction
is selected. This fraction is roughly equal to the baseprob
. The
output of this function is a data frame that contains the guide-relevant
parameters and the sequencing coverage per guide for the selected and
unselected arms. Simulated screens will aid researchers with their
experimental setup. Furthermore, this offers a unique platform for the
evaluation of analysis methods for sorting-based pooled gene knockout
screens.
sortingsim( genes, guides, g, f, d, e, baseprob = 0.1, hitfraction = 1/200, hitsup, hitfactor = 10, efraction, eup, efactor, sortedcells, seqdepth, offtargets = FALSE, allseed = NULL, gseed, fseed, dseed, eseed, oseed, t0seed, repseed, perfectsampling = FALSE, perfectseq, returnall = FALSE, outputfile )
genes |
Single integer or character vector. Specify how many or which genes to include in the experiment respectively. Not required when a full list of guides is given. |
guides |
Single integer, integer vector or character vector. In case of single integer, specify by how many guides each gene is represented. In case of an integer vector, specify per gene by how many guides it is represented. In case of a character vector, guides are assumed to contain a gene name, followed by an underscore, followed by an identifier within that gene (e.g. a number or a nucleotide sequence). |
g |
Integer vector. Specify guide efficacies per guide. If omitted, guide efficacies will be sampled from a representative distribution. |
f |
Integer vector. Specify guide abundance at time of infection per guide. If omitted, guide abundance will be sampled from a representative distribution. |
d |
Integer vector. Specify gene-specific modifier of probability of selection. This is a modifier of the odds. If omitted, effect of gene knockout will be sampled from three distributions, depending on base probability and hit factor. If the length of the vector does not match the number of genes, values will be randomly sampled from the specified distribution! |
e |
Integer vector. Specify treatment-specific selection effect per gene. If omitted, effects will be sampled from a representative distribution. If the length of the vector does not match the number of genes, values will be randomly sampled from the specified distribution! |
baseprob |
Numeric. Baseline probability of selection. Needs to be larger than 0 and smaller than 1. Default = 0.1 |
hitfraction |
Numeric. Fraction of genes that affect selection significantly. Default = 1/200 |
hitsup |
Numeric. Fraction of hits of which the selection probability is
multiplied by |
hitfactor |
Numeric. Multiplication factor with which hits affect selection probability on average. Default = 10 |
efraction |
Numeric. Fraction of genes that affects selection
specifically in this treatment arm. Defaults to |
eup |
Numeric. Same as hitsup, but now relating to treatment effects. Defaults to 0.5 |
efactor |
Numeric. Multiplication factor for treatment-specific effects.
Defaults to |
sortedcells |
Integer. Number of cells put through the simulated selection. Note that this is the sum of the selected and unselected cells! |
seqdepth |
Integer. Specify the amount of sequencing reads devoted to each experimental arm. If omitted, depth will default to 500 times the number of guides |
offtargets |
Logical or numeric. Specify the fraction of off-targets. If TRUE, 1 in 1000 guides (0.001) will target a different gene. Default = FALSE |
allseed |
Integer. All unspecified seeds default to this plus an increment of 1 for each different seed. Defaults to NULL, in which case the unspecified seeds are randomly generated. Default = NULL |
gseed |
Integer. Specify seed for guide effiency assignment |
fseed |
Integer. Specify seed for infectious units assignment, which dictates a guide's abundance at the start of the experiment |
dseed |
Integer. Specify seed for straight lethality assignment of genes |
eseed |
Integer. Specify seed for sensitizer assignment of genes |
oseed |
Integer. Specify seed for off-target selection |
t0seed |
Integer. Specify seed for t0, which encompasses assignment of successful knockout cells versus no knockout cells for each guide |
repseed |
Integer. Specify the seed after t0 |
perfectsampling |
Logical. If TRUE, all sampling steps are replaced by simple equations to calculate representation of guides. Useful as null control to isolate the effect of sampling. Default = FALSE |
perfectseq |
Logical. If TRUE, sequencing results are a perfect
representation (though still rounded) of guides in the harvested cells.
Applicable to speed up simulations, assuming sequencing is sufficiently
deep. Defaults to |
returnall |
Logical. If TRUE, function returns a list with the simulated data in the guidesdf, summary per gene in the genesdf, and parameters. Default = FALSE |
outputfile |
Character string. When used, returned data frame will be saved as a tab-delimited text to the specified file path |
sortingsim performs a genome-wide (or subsetted) pooled CRISPR
knockout screen ending a binary selection. Perhaps even more so than
growth-based screens, the outcome of such screens can be a massive black
box. As of yet, no specific analysis methods have been published for these
kind of screens, but the simulator below can help assess those. The
parameters are highly customizable, so I sincerely recommend reading the
documentation for all the options. And it is always possible to provide
your own gene-specific selection modifiers or guide efficacies if you are
not happy with the provided distributions. Seeds are relevant if you want
to create replicate screens. You can easily "practice" by simulating some
small experiments (i.e. limit the amount of genes). The basis of the
simulation are as follows. Cells have an a priori probability
baseprob
to be selected. The corresponding odds
baseprop/(1-baseprop)
are multiplied by gene-specific modifier d
(and optionally gene-specific modifier for treatment e). These odds are
converted to the modified probability mod_prob, which is used to determine
how many cells with a specific knockout are selected. Each guide has an
efficacy, which is the chance to create a successful knockout. Only in case
of successful knockout are the modifiers applied. Selected and not-selected
cells are separately sequenced, both to the indicated sequencing depth.
Returns a data frame with every row representing a single guide. Contains the pertinent parameters of each guide and the number of sequencing reads of selected and not selected cells. If the argument returnall is set to TRUE, the function also returns a data frame with the true values for the genes, and lists all parameters as well.
If you specify an inverted hitfactor (e.g. 0.1 instead of 10), your hits are turned around.
While it also makes sense to be able to specify how many cells are
positively selected (this could be your FACS setup of course), this is not
directly compatible with this simulator. Instead, you can divide the number
of cells you want with baseprob
and use that as input for
sortedcells
. If that does not come close (some wonky parameters
perhaps), or you want to be more precise, you can do a test run with
argument returnall. One of the returned values is selectedcells, which
corresponds to the number of cells used as input for sequencing of the
selected arm. It follows that noteselectedcells equals sortedcells minus
selectedcells
Jos B. Poell
oddscores
, CRISPRsim
sortdf <- sortingsim(18000, 4, e = TRUE, perfectsampling = TRUE) d <- rle(sortdf$d)$values lod <- log(d) e <- rle(sortdf$e)$values loe <- log(e) plot(lod, loe, main = "log odds of selection") enrichment <- log(sortdf$selected+1)-log(sortdf$notselected+1) kocell_logodds <- log(sortdf$mod_prob)-log(1-sortdf$mod_prob) plot(kocell_logodds, enrichment, pch = 16, cex = 0.75, col = rgb(sortdf$g, 0, 1-sortdf$g))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.