CRISPRsim: Simulate a CRISPR-Cas9 pooled screen

View source: R/CSSA.R

CRISPRsimR Documentation

Simulate a CRISPR-Cas9 pooled screen

Description

CRISPRsim simulates a CRISPR-Cas9 pooled screen with user-defined parameters. These include drug treatment screens! Each "infected" cell expands over time based on effect of gene knockout. Other parameters include the abundance of each guide at the start of the experiment, the efficacy of the guide (chance that it results in a successful gene knockout), and the frequency and depth of sampling. In case of drug treatment, genes are assigned a treatment-specific growth modifier as well. The result is a data frame that contains the guide-relevant parameters and the sequencing coverage per guide for the specified time intervals. Simulated screens will aid researchers with their experimental setup. Furthermore, this offers a unique platform for the evaluation of analysis methods for pooled gene knockout screens.

Usage

CRISPRsim(
  genes,
  guides,
  a,
  g,
  f,
  d,
  e,
  seededcells,
  harvestedcells,
  harvestall = TRUE,
  cellreplace = FALSE,
  treatmentdelay = 0,
  seqdepth,
  offtargets = FALSE,
  allseed = NULL,
  gseed,
  fseed,
  dseed,
  eseed,
  oseed,
  t0seed,
  repseed,
  grm = 1,
  em = 1,
  perfectsampling = FALSE,
  perfectseq = FALSE,
  returnall = FALSE,
  outputfile
)

Arguments

genes

Single integer or character vector. Specify how many or which genes to include in the experiment respectively. Not required when a full list of guides is given.

guides

Single integer, integer vector or character vector. In case of single integer, specify by how many guides each gene is represented. In case of an integer vector, specify per gene by how many guides it is represented. In case of a character vector, guides are assumed to contain a gene name, followed by an underscore, followed by an identifier within that gene (e.g. a number or a nucleotide sequence)

a

Numeric. Specify the number of doublings between each "passaging". For example, in case of an experiment that ends after 12 doublings and was passaged 3 times, specify a = c(4,4,4)

g

Integer vector. Specify guide efficacies per guide. If omitted, guide efficacies will be sampled from a representative distribution

f

Integer vector. Specify guide abundance at time of infection per guide. If omitted, guide abundance will be sampled from a representative distribution

d

Integer vector. Specify gene-specific growth effect. If omitted, effect of gene knockout on growth will be sampled from a representative distribution. If the length of the vector does not match the number of genes, values will be randomly sampled from the specified distribution!

e

Integer vector. Specify treatment-specific growth effect per gene. If omitted, effects will be sampled from a representative distribution. If the length of the vector does not match the number of genes, values will be randomly sampled from the specified distribution!

seededcells

Integer. Number of cells seeded at the start of each experimental step. If the length of this argument is smaller than the number of seedings, all unspecified steps will be assumed equal to the last specified step! Defaults to 200 times the number of guides

harvestedcells

Integer. Number of cells from which to sample for subsequent sequencing. This argument is especially useful to restrict the expected DNA copies present in the PCR reaction. If you wish to do so, make sure to set harvestall to FALSE. Defaults to be equal to seededcells

harvestall

Logical. If TRUE, all cells are collected and used for sampling in subsequent sequencing step. Applies to all experimental time points beyond t0. Default = TRUE

cellreplace

Logical. If FALSE, cells are sampled from the total pool of cells without replacement. Note that this is the most realistic simulation of a screen, but then you should also keep realistic passaging times! It is recommended to keep the total number of cells in the experiment below 200 million. Setting this to TRUE can dramatically speed up simulations. Default = FALSE

treatmentdelay

Integer. In case of a treatment experiment, specify when treatment starts. It is currently only possible to start treatment on one of the experimental time points. Default = 0

seqdepth

Integer. Specify the amount of sequencing reads devoted to each experimental arm. If omitted, depth will default to 500 times the number of guides

offtargets

Logical or numeric. Specify the fraction of off-targets. If TRUE, 1 in 1000 guides (0.001) will target a different gene. Default = FALSE

allseed

Integer. All unspecified seeds default to this plus an increment of 1 for each different seed. Defaults to NULL, in which case the unspecified seeds are randomly generated. Default = NULL

gseed

Integer. Specify seed for guide effiency assignment

fseed

Integer. Specify seed for infectious units assignment, which dictates a guide's abundance at the start of the experiment

dseed

Integer. Specify seed for straight lethality assignment of genes

eseed

Integer. Specify seed for sensitizer assignment of genes

oseed

Integer. Specify seed for off-target selection

t0seed

Integer. Specify seed for t0, which encompasses sampling of the first seeding and the assignment of successful knockout cells versus no knockout cells for each guide

repseed

Integer. Specify the seed after t0

grm

Numeric. Growth rate modifier. Specify adjusted growth rate under treatment conditions. Default = 1

em

Numeric. Effect modifier. Specify how effective treatment is. This can be used as a proxy for drug concentration. All individual e-values and grm are modified by this multiplier. Default = 1

perfectsampling

Logical. If TRUE, all sampling steps are replaced by simple equations to calculate representation of guides. Useful as null control to isolate the effect of sampling. Default = FALSE

perfectseq

Logical. If TRUE, sequencing results are a perfect representation (though still rounded) of guides in the harvested cells. Applicable to speed up simulations, assuming sequencing is sufficiently deep. Default = FALSE

returnall

Logical. If TRUE, function returns a list with the simulated data in the guidesdf, summary per gene in the genesdf, and parameters. Default = FALSE

outputfile

Character string. When used, returned data frame will be saved as a tab-delimited text to the specified file path

Details

CRISPRsim performs a genome-wide (or subsetted) pooled CRISPR knockout screen for you without having to go to the lab and spend incredible amounts of time and money. This can be a tremendous help if you want to design an experiment and answer questions such as: how many replicates do I need, how much coverage, will I pick up genes with x effect, et cetera. You can give it a spin, but I highly recommend checking out the documentation for the available parameters! Especially seeds can be relevant for a proper simulation. You can easily "practice" by simulating some small experiments. The basis of the simulation is as follows. Between time points cells with a certain knockout grow according to formula cellsout = cellsin*2^((grm+d+e)*a)) Each guide has an efficacy, which is the chance to create a successful knockout. The cellsin is determined at t0 and depends on guide efficacy and guide abundance. If there is no successful knockout, d and e are 0. Cells with and without successful knockout are followed separately throughout the experiment, but the pairs are pooled in terms of sequencing reads. grm is the growth rate modifier and is generally 1, but it can be lowered to more properly simulate resistance screens.

Value

Returns a data frame with every row representing a single guide. Contains the pertinent parameters of each guide and the number of sequencing reads on t0 and all other sampling time points. If the argument returnall is set to TRUE, the function also returns a data frame with the true values for the genes, and lists all parameters as well.

Note

Seeds are set using with_seed from the withr package, thus leaving any pre-existing seed intact. Avoid using the same seeds for different arguments. If dseed and eseed are identical, the resulting values for d and e will have a distinct pattern of correlation. In this case, CRISPRsim will throw a warning. Given or generated seeds are returned when the option returnall is set to TRUE.

Author(s)

Jos B. Poell

See Also

sortingsim, radjust, rrep, jar, nestedradjust, doublejar

Examples

simdf <- CRISPRsim(18000, 4, a = c(3,3), e = TRUE, perfectsampling = TRUE)
hist(simdf$g, breaks = 100, main = "distribution of guide efficiencies")
d <- rle(simdf$d)$values
e <- rle(simdf$e)$values
plot(d, e, main = "straight lethality and sensitization")


tgac-vumc/CSSA documentation built on Oct. 10, 2022, 7:27 p.m.